Test multiple seeds #174

rikhuijzer · 2022-06-15T12:44:52Z

The tests were failing on Julia 1.8-rc1. The cause appears to be that results from the same seeds are not always the same. In most seeds, the result are the same 5 out of 6 times or so. This wasn't tested in earlier Julia versions and now my theory is that Julia 1.8 was broken because

Julia's default RNG changed between 1.7 and 1.8
This caused slightly different numbers to be generated
These numbers happened to cause the output to differ between run 1 and run 2 (whereas it used to be good for 5 out of 6 times)

I've confirmed this by adding a test over multiple seeds which makes Julia 1.7 fail. In other words, older Julia versions by chance had a RNG set which did cause a difference between run 1 and 2. By chance, the Julia 1.7 RNG did not.

More specifically, the value for tree.featval is not the same between runs. I have no idea where that thing is set so I wasn't able to find the root cause yet.

rikhuijzer · 2022-06-15T12:48:23Z

Test failures are expected. Note that the tests things were already broken, but now there is a test to point it out

codecov-commenter · 2022-06-15T12:49:08Z

Codecov Report

Merging #174 (1ef40ae) into dev (a3398bf) will increase coverage by 0.21%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #174      +/-   ##
==========================================
+ Coverage   89.18%   89.40%   +0.21%     
==========================================
  Files          10       10              
  Lines         999     1000       +1     
==========================================
+ Hits          891      894       +3     
+ Misses        108      106       -2

Impacted Files	Coverage Δ
src/classification/main.jl	`98.17% <100.00%> (+0.60%)`	⬆️
src/measures.jl	`97.64% <100.00%> (+0.01%)`	⬆️
src/regression/main.jl	`94.73% <100.00%> (+2.63%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3398bf...1ef40ae. Read the comment docs.

src/classification/main.jl

ablaom · 2022-06-15T22:49:59Z

Thanks for looking into this @rikhuijzer. Yes, I have also been noticing these inconsistent runs. I would think at least part of the solution is to replace all RNGs in testing with StableRNGs, (adding StableRNGs to the test dependencies).

However, I also agree that it seems that we are not getting repeatability when specifying the same seed each time (in the same Julia version) and I'm not sure why this would be. My first thought was that there was some kind of some parallelism when comparing features to split at a node. Then if there is a draw when two features compete, the one chosen could depend on the unpredictable order in which the impurity improvements were calculated. But I don't see any such parallelism. (Would be a great idea to introduce however!) There are some SIMD loops for minor things, but I would have thought they would not give very large differences in the runs.

Sorry, this is probably not much help.

ablaom · 2022-06-20T03:34:44Z

Okay I think @yufongpeng has isolated the bug, corrected in this PR: #167. In nfoldsCV the rng is not being passed to calls to to train AdaStumpClassifier (but is being passed to the other classifiers).

However, that PR is breaking, so I think we should release a patch to fix the bug before #167, and should probably include StableRNGs at the same time. @rikhuijzer Would you be willing and able to do that this week?

src/classification/main.jl

rikhuijzer · 2022-06-20T07:59:04Z

However, that PR is breaking, so I think we should release a patch to fix the bug before #167,

Great plan 👍

Would you be willing and able to do that this week?

Done.

should probably include StableRNGs at the same time

~~Can we do that in another PR? At this moment, there is no need for StableRNGs. The tests pass at all Julia versions including 1.8-rc1 (which I am running locally).~~
EDIT: I'll add it.

And thanks @yufongpeng for spotting the source of the problem!

rikhuijzer · 2022-06-20T08:35:07Z

In 2f70ff5, the tests pass. It would indeed be better to add StableRNGs as you mentioned @ablaom, but I am running into new problems there. For some reason

DecisionTree.nfoldCV_forest(labels, features, 3, 2, 10; rng=StableRNG(10))

produces different results while rng=10 doesn't.

rikhuijzer · 2022-06-20T16:15:16Z

src/classification/main.jl

+            # The Mersenne Twister (Julia's default) is not thread-safe.
+            _rng = copy(rng)
+            inds = rand(_rng, 1:t_samples, n_samples)


This fixes another bug. This path wasn't tested. The path in that function on the integers does work.

rikhuijzer · 2022-06-20T16:43:28Z

For some reason
DecisionTree.nfoldCV_forest(labels, features, 3, 2, 10; rng=StableRNG(10))
produces different results while rng=10 doesn't.

This was caused by incorrect usage of the RNG in combination with @threads. I fixed it now in src/correlations/main.jl and src/regression/main.jl. I did not fix the code duplication because I fear that it will add even more merge conflicts in the other open PRs.

rikhuijzer · 2022-06-20T16:49:20Z

The tests now fail because many comparisons depend on the seed?

rikhuijzer · 2022-06-20T18:19:28Z

Things are still not stable. The tests such as the ones in regression/random pass for me locally but fail in CI.

ablaom · 2022-06-20T21:18:31Z

@rikhuijzer Thanks for extra work and good catch with the further bug discovery. For now I'm going to hope you get a chance to finish this off, as it seems like a work-in-progress.

Side note: Strange, I didn't realize forest training was multithreaded. I thought Threads only became available in 1.3 and yet 1.0 tests are passing, which I don't understand.

rikhuijzer · 2022-06-21T06:49:36Z

Things are still not stable. The tests such as the ones in regression/random pass for me locally but fail in CI.

Fixed in ed4602b. I forgot to also pass StableRNG to the data generation steps.

rikhuijzer · 2022-06-21T07:02:02Z

In 5a10555, I've put the old numbers back. In earlier commits, I relaxed the bounds of the tests because sometimes the unstable tests had bad luck and the test would be outside of the bounds. That is fixed now by the stable rngs.

@ablaom I think this PR is good to go. If you approve, I would prefer a squash merge on this PR for the reasons mentioned in https://discourse.julialang.org/t/downsides-of-using-squash-merging/68317. But you can decide 👍

ablaom

A labour of love. Great PR, thanks.

rikhuijzer · 2022-06-22T04:29:55Z

A labour of love. Great PR, thanks.

👍👍👍 Thanks

Test multiple rngs

039b45e

ablaom reviewed Jun 15, 2022

View reviewed changes

src/classification/main.jl Outdated Show resolved Hide resolved

Add comment

014b641

ablaom mentioned this pull request Jun 20, 2022

Adabooststump algorithm change #167

Merged

rikhuijzer added 3 commits June 20, 2022 09:36

Merge branch 'dev' into rh/test-multiple-rngs

aa09ef8

Fix rng not being passed correctly

a96e9d2

Revert some changes

c969099

rikhuijzer commented Jun 20, 2022

View reviewed changes

src/classification/main.jl Show resolved Hide resolved

Update comment

4005cd4

rikhuijzer added 3 commits June 20, 2022 10:10

Extend tests

ea537a5

Simplify test

3b630d9

Lower accuracy bound

2f70ff5

Fix a bug in the usage of Mersenne Twister

821ca82

rikhuijzer commented Jun 20, 2022

View reviewed changes

Fix tests

0ab994f

rikhuijzer added 6 commits June 20, 2022 18:52

Use some more StableRNG

6d6ea71

Use some more StableRNG

e3028e3

Use some more StableRNG

924d112

Use some more StableRNG

da0c4fc

Use some more StableRNG

d1766a8

Use some more StableRNG

2739cb0

rikhuijzer added 3 commits June 21, 2022 08:15

Fix Julia 1.6

46de785

Use StableRNG in test/classification/adult

f70c81a

Use StableRNG for data generation too

ed4602b

rikhuijzer added 2 commits June 21, 2022 08:55

Put old numbers back

5a10555

Add one more rng

1ef40ae

ablaom approved these changes Jun 21, 2022

View reviewed changes

ablaom merged commit 969c637 into JuliaAI:dev Jun 21, 2022

This was referenced Jun 21, 2022

For a 0.10.13 release #178

Merged

Issue to trigger new releases #162

Closed

rikhuijzer mentioned this pull request Aug 9, 2022

Clarify status of RNG being thread-safe JuliaLang/julia#46280

Open

dhanak mentioned this pull request Nov 14, 2022

RNG “shuffling” introduced in #174 is fundamentally flawed #194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test multiple seeds #174

Test multiple seeds #174

rikhuijzer commented Jun 15, 2022

rikhuijzer commented Jun 15, 2022

codecov-commenter commented Jun 15, 2022 •

edited

ablaom commented Jun 15, 2022 •

edited

ablaom commented Jun 20, 2022

rikhuijzer commented Jun 20, 2022 •

edited

rikhuijzer commented Jun 20, 2022 •

edited

rikhuijzer Jun 20, 2022 •

edited

rikhuijzer commented Jun 20, 2022

rikhuijzer commented Jun 20, 2022 •

edited

rikhuijzer commented Jun 20, 2022

ablaom commented Jun 20, 2022

rikhuijzer commented Jun 21, 2022

rikhuijzer commented Jun 21, 2022 •

edited

ablaom left a comment

rikhuijzer commented Jun 22, 2022

Test multiple seeds #174

Test multiple seeds #174

Conversation

rikhuijzer commented Jun 15, 2022

rikhuijzer commented Jun 15, 2022

codecov-commenter commented Jun 15, 2022 • edited

Codecov Report

ablaom commented Jun 15, 2022 • edited

ablaom commented Jun 20, 2022

rikhuijzer commented Jun 20, 2022 • edited

rikhuijzer commented Jun 20, 2022 • edited

rikhuijzer Jun 20, 2022 • edited

Choose a reason for hiding this comment

rikhuijzer commented Jun 20, 2022

rikhuijzer commented Jun 20, 2022 • edited

rikhuijzer commented Jun 20, 2022

ablaom commented Jun 20, 2022

rikhuijzer commented Jun 21, 2022

rikhuijzer commented Jun 21, 2022 • edited

ablaom left a comment

Choose a reason for hiding this comment

rikhuijzer commented Jun 22, 2022

codecov-commenter commented Jun 15, 2022 •

edited

ablaom commented Jun 15, 2022 •

edited

rikhuijzer commented Jun 20, 2022 •

edited

rikhuijzer commented Jun 20, 2022 •

edited

rikhuijzer Jun 20, 2022 •

edited

rikhuijzer commented Jun 20, 2022 •

edited

rikhuijzer commented Jun 21, 2022 •

edited