Enable Conditional Numerical Reproducibility for tests #4569

sharwell · 2019-12-12T20:31:04Z

No description provided.

frank-dong-ms-zz

codecov · 2019-12-12T23:08:52Z

Codecov Report

❗ No coverage uploaded for pull request base (master@9c9a5c4). Click here to learn what that means.
The diff coverage is 91.66%.

@@            Coverage Diff            @@
##             master    #4569   +/-   ##
=========================================
  Coverage          ?   75.84%           
=========================================
  Files             ?      947           
  Lines             ?   172190           
  Branches          ?    18579           
=========================================
  Hits              ?   130593           
  Misses            ?    36422           
  Partials          ?     5175

Flag	Coverage Δ
#Debug	`75.84% <91.66%> (?)`
#production	`71.42% <ø> (?)`
#test	`90.71% <91.66%> (?)`

Impacted Files	Coverage Δ
test/Microsoft.ML.TimeSeries.Tests/TimeSeries.cs	`87.85% <ø> (ø)`
...crosoft.ML.TimeSeries.Tests/TimeSeriesDirectApi.cs	`99.28% <100%> (ø)`
test/Microsoft.ML.TestFramework/GlobalBase.cs	`29.03% <100%> (ø)`
...est/Microsoft.ML.Predictor.Tests/TestPredictors.cs	`67.9% <50%> (ø)`

...iclassLogisticRegression/MulticlassLogisticRegression-TrainTest-iris-tree-featurized-out.txt

sharwell · 2020-01-31T19:42:48Z

I'm going to submit some parts of this as separate pull requests, and then rebase to focus it and fix conflicts.

antoniovs1029 · 2020-01-31T21:29:16Z

test/Microsoft.ML.Predictor.Tests/TestPredictors.cs

@@ -283,6 +283,11 @@ public void BinaryClassifierLogisticRegressionTest()
        [Trait("Category", "SkipInCI")]


❓ I guess I am missing something... but shouldn't this line be removed in order to check that the updated baselines are actually correct?

These baselines haven't changed in the master branch since I first implemented this pull request. I'm assuming the new values (which were correct then) are still correct now. They would be validated as part of #4722.

sharwell · 2020-01-31T23:48:12Z

Based on the results of #4722 as a validation, the current plan is:

Merge this PR, since the baselines here are a clear statistical improvement
As a separate PR, add baselines for Windows x86, Centos (and Ubuntu separately if necessary), and MacOS
Update [Test Stability] PR to Test BinaryClassifierSymSgdTest #4722 to determine the new status

antoniovs1029 · 2020-02-01T00:12:18Z

For the record, the validation @sharwell has just mentioned, consisted in running 100 times the BinaryClassifierSymSgdTest test on every build configuration. This was to validate the changes he made to the baseline in here, which is used [exclusively?] on that test which is (at the moment of writing) disabled, and that's why it was needed to be validated on PR #4722 where I reenabled the test.

It passed on every Windows x64 build, except for one, where it failed only 1 out of the 100 iterations.

It failed on every other build configuration, but it is thought to be because CNR works differently on each OS and configuration (MacOS, CentOS, Ubuntu and Windows x86), and then it would be necessary to create a new baseline for each one of these. Notice that that test was disabled recently because it was very flaky on most of our CI builds, and that it was previously disabled on MacOS and x86 since #624 and #1008 respectively. So when doing the baselines for the other build configurations, it would be ok to keep this in mind, and see if that now that CNR is enabled, these pending test scenarios are now resolved.

For reference, the same experiment (running it 100 times on all the build configurations) had 3 Windows x64 builds that failed, each one with 100 failures (one per iteration), when ran without the enabled CNR.

antoniovs1029 · 2020-02-01T00:21:49Z

After briefly discussing it with @harishsk , I will run all the tests (including all the disabled ones) with the changes in this PR, and running the test 100 times along with them, just to double check. 😄 For convenience, I am doing it on #4722

It again passed in all Windows x64 builds, except for one, where it failed only 1 out of the 100 iterations. As expected, it didn't pass in the other builds.

antoniovs1029 · 2020-02-03T18:51:55Z

@harishsk has expressed that he doesn't know if it's ok to have different baselines for each build configuration, and would like @eerhardt , @justinormont , and @tannergooding opinion on this regard, before accepting enabling the CNR.

antoniovs1029 · 2020-02-03T19:15:34Z

So I have just ran this again on #4722 , and actually now even the Windows x64 are failing, it fails in all 100 iterations of the BinaryClassifierSymSgdTest test. It doesn't seem to me that any update on master during the weekend could have changed the baseline here again, so I am not sure why this is happening.

@sharwell can you please remove the [Trait("Category", "SkipInCI")] attribute from the test in here, just temporarily, to rerun the CI in here, to be sure that your changes to the baseline are actually correct? thanks!

sharwell · 2020-02-03T19:57:07Z

@antoniovs1029 the baselines were altered in #4710

antoniovs1029 · 2020-02-04T02:42:41Z

@antoniovs1029 the baselines were altered in #4710

You're right. On that PR it's added that the BinaryClassifierSymSGD should also print its bias and weights into the output. So it will only be necessary to add a couple of lines to the baseline.

After running it on the CI, I believe the updated baseline (for Windows x64) should be

On SymSGD-CV-breast-cancer-out.txt

Bias: -467.9297, Weights: [5.415065,76.39395,22.351555,-11.988394,-28.264456,44.58415,22.720116,11.132539,2.851256]
Not training a calibrator because it is not needed.
Not adding a normalizer.
Data fully loaded into memory.
Initial learning rate is tuned to 100.000000
Bias: -484.28625, Weights: [-12.787148,140.42905,121.938194,37.527378,-129.81407,70.90587,-89.37068,81.64309,-32.32781]

On SymSGD-TrainTest-breast-cancer-out.txt

Bias: -473.82773, Weights: [1.8576517,52.924484,9.490602,14.593927,-17.657146,10.126332,-8.042431,1.9634984,26.753149]

By the way, I've gotten baselines for the other platforms and made a couple of experiments with them. So far it seems that enabling CNR does yield more consistent results from the test, although different baselines would indeed be necessary for each platform. I still need to do further experiments on this, though.

sharwell · 2020-02-04T21:00:22Z

@antoniovs1029 baselines have been updated

frank-dong-ms-zz approved these changes Dec 12, 2019

View reviewed changes

sharwell force-pushed the enable-cnr branch from accf186 to 11e9690 Compare December 12, 2019 22:05

sharwell force-pushed the enable-cnr branch from 59294e7 to ea553bb Compare December 13, 2019 00:49

sharwell commented Dec 14, 2019

View reviewed changes

...iclassLogisticRegression/MulticlassLogisticRegression-TrainTest-iris-tree-featurized-out.txt Outdated Show resolved Hide resolved

sharwell force-pushed the enable-cnr branch 5 times, most recently from 320e765 to 8b5d1e2 Compare January 16, 2020 14:45

sharwell marked this pull request as ready for review January 17, 2020 21:15

sharwell requested a review from a team as a code owner January 17, 2020 21:15

sharwell mentioned this pull request Jan 31, 2020

[Test Stability] PR to Test BinaryClassifierSymSgdTest #4722

Closed

sharwell force-pushed the enable-cnr branch from 77f95c3 to 383b062 Compare January 31, 2020 19:49

antoniovs1029 reviewed Jan 31, 2020

View reviewed changes

sharwell added 3 commits February 4, 2020 12:50

Enable Conditional Numerical Reproducibility for tests

cfc9abd

Update baselines for failing tests

d0fe25f

Skip BinaryClassifierSymSgdTest on Linux due to numerical differences

21bd1ff

sharwell force-pushed the enable-cnr branch from 383b062 to 21bd1ff Compare February 4, 2020 20:57

sharwell mentioned this pull request Feb 4, 2020

Enable cnr test #4780

Closed

sharwell merged commit cdd309e into dotnet:master Feb 4, 2020

sharwell deleted the enable-cnr branch February 4, 2020 23:13

frank-dong-ms-zz mentioned this pull request Feb 20, 2020

enable BinaryClassifierSymSgdTest #4863

Merged

ghost locked as resolved and limited conversation to collaborators Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Conditional Numerical Reproducibility for tests #4569

Enable Conditional Numerical Reproducibility for tests #4569

sharwell commented Dec 12, 2019

frank-dong-ms-zz left a comment

codecov bot commented Dec 12, 2019 •

edited

Loading

sharwell commented Jan 31, 2020

antoniovs1029 Jan 31, 2020

sharwell Jan 31, 2020

sharwell commented Jan 31, 2020

antoniovs1029 commented Feb 1, 2020 •

edited

Loading

antoniovs1029 commented Feb 1, 2020 •

edited

Loading

antoniovs1029 commented Feb 3, 2020 •

edited

Loading

antoniovs1029 commented Feb 3, 2020 •

edited

Loading

sharwell commented Feb 3, 2020

antoniovs1029 commented Feb 4, 2020 •

edited

Loading

sharwell commented Feb 4, 2020

		@@ -283,6 +283,11 @@ public void BinaryClassifierLogisticRegressionTest()
		[Trait("Category", "SkipInCI")]

Enable Conditional Numerical Reproducibility for tests #4569

Enable Conditional Numerical Reproducibility for tests #4569

Conversation

sharwell commented Dec 12, 2019

frank-dong-ms-zz left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 12, 2019 • edited Loading

Codecov Report

sharwell commented Jan 31, 2020

antoniovs1029 Jan 31, 2020

Choose a reason for hiding this comment

sharwell Jan 31, 2020

Choose a reason for hiding this comment

sharwell commented Jan 31, 2020

antoniovs1029 commented Feb 1, 2020 • edited Loading

antoniovs1029 commented Feb 1, 2020 • edited Loading

antoniovs1029 commented Feb 3, 2020 • edited Loading

antoniovs1029 commented Feb 3, 2020 • edited Loading

sharwell commented Feb 3, 2020

antoniovs1029 commented Feb 4, 2020 • edited Loading

sharwell commented Feb 4, 2020

codecov bot commented Dec 12, 2019 •

edited

Loading

antoniovs1029 commented Feb 1, 2020 •

edited

Loading

antoniovs1029 commented Feb 1, 2020 •

edited

Loading

antoniovs1029 commented Feb 3, 2020 •

edited

Loading

antoniovs1029 commented Feb 3, 2020 •

edited

Loading

antoniovs1029 commented Feb 4, 2020 •

edited

Loading