Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test multiple seeds #174

Merged
merged 22 commits into from Jun 21, 2022
Merged

Test multiple seeds #174

merged 22 commits into from Jun 21, 2022

Conversation

rikhuijzer
Copy link
Member

The tests were failing on Julia 1.8-rc1. The cause appears to be that results from the same seeds are not always the same. In most seeds, the result are the same 5 out of 6 times or so. This wasn't tested in earlier Julia versions and now my theory is that Julia 1.8 was broken because

  1. Julia's default RNG changed between 1.7 and 1.8
  2. This caused slightly different numbers to be generated
  3. These numbers happened to cause the output to differ between run 1 and run 2 (whereas it used to be good for 5 out of 6 times)

I've confirmed this by adding a test over multiple seeds which makes Julia 1.7 fail. In other words, older Julia versions by chance had a RNG set which did cause a difference between run 1 and 2. By chance, the Julia 1.7 RNG did not.

More specifically, the value for tree.featval is not the same between runs. I have no idea where that thing is set so I wasn't able to find the root cause yet.

@rikhuijzer
Copy link
Member Author

Test failures are expected. Note that the tests things were already broken, but now there is a test to point it out

@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2022

Codecov Report

Merging #174 (1ef40ae) into dev (a3398bf) will increase coverage by 0.21%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #174      +/-   ##
==========================================
+ Coverage   89.18%   89.40%   +0.21%     
==========================================
  Files          10       10              
  Lines         999     1000       +1     
==========================================
+ Hits          891      894       +3     
+ Misses        108      106       -2     
Impacted Files Coverage Δ
src/classification/main.jl 98.17% <100.00%> (+0.60%) ⬆️
src/measures.jl 97.64% <100.00%> (+0.01%) ⬆️
src/regression/main.jl 94.73% <100.00%> (+2.63%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a3398bf...1ef40ae. Read the comment docs.

src/classification/main.jl Outdated Show resolved Hide resolved
@ablaom
Copy link
Member

ablaom commented Jun 15, 2022

Thanks for looking into this @rikhuijzer. Yes, I have also been noticing these inconsistent runs. I would think at least part of the solution is to replace all RNGs in testing with StableRNGs, (adding StableRNGs to the test dependencies).

However, I also agree that it seems that we are not getting repeatability when specifying the same seed each time (in the same Julia version) and I'm not sure why this would be. My first thought was that there was some kind of some parallelism when comparing features to split at a node. Then if there is a draw when two features compete, the one chosen could depend on the unpredictable order in which the impurity improvements were calculated. But I don't see any such parallelism. (Would be a great idea to introduce however!) There are some SIMD loops for minor things, but I would have thought they would not give very large differences in the runs.

Sorry, this is probably not much help.

@ablaom
Copy link
Member

ablaom commented Jun 20, 2022

Okay I think @yufongpeng has isolated the bug, corrected in this PR: #167. In nfoldsCV the rng is not being passed to calls to to train AdaStumpClassifier (but is being passed to the other classifiers).

However, that PR is breaking, so I think we should release a patch to fix the bug before #167, and should probably include StableRNGs at the same time. @rikhuijzer Would you be willing and able to do that this week?

@rikhuijzer
Copy link
Member Author

rikhuijzer commented Jun 20, 2022

However, that PR is breaking, so I think we should release a patch to fix the bug before #167,

Great plan 👍

Would you be willing and able to do that this week?

Done.

should probably include StableRNGs at the same time

Can we do that in another PR? At this moment, there is no need for StableRNGs. The tests pass at all Julia versions including 1.8-rc1 (which I am running locally).
EDIT: I'll add it.

And thanks @yufongpeng for spotting the source of the problem!

@rikhuijzer
Copy link
Member Author

rikhuijzer commented Jun 20, 2022

In 2f70ff5, the tests pass. It would indeed be better to add StableRNGs as you mentioned @ablaom, but I am running into new problems there. For some reason

DecisionTree.nfoldCV_forest(labels, features, 3, 2, 10; rng=StableRNG(10))

produces different results while rng=10 doesn't.

Comment on lines 225 to 227
# The Mersenne Twister (Julia's default) is not thread-safe.
_rng = copy(rng)
inds = rand(_rng, 1:t_samples, n_samples)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes another bug. This path wasn't tested. The path in that function on the integers does work.

@rikhuijzer
Copy link
Member Author

For some reason

DecisionTree.nfoldCV_forest(labels, features, 3, 2, 10; rng=StableRNG(10))

produces different results while rng=10 doesn't.

This was caused by incorrect usage of the RNG in combination with @threads. I fixed it now in src/correlations/main.jl and src/regression/main.jl. I did not fix the code duplication because I fear that it will add even more merge conflicts in the other open PRs.

@rikhuijzer
Copy link
Member Author

rikhuijzer commented Jun 20, 2022

The tests now fail because many comparisons depend on the seed?

@rikhuijzer
Copy link
Member Author

Things are still not stable. The tests such as the ones in regression/random pass for me locally but fail in CI.

@ablaom
Copy link
Member

ablaom commented Jun 20, 2022

@rikhuijzer Thanks for extra work and good catch with the further bug discovery. For now I'm going to hope you get a chance to finish this off, as it seems like a work-in-progress.

Side note: Strange, I didn't realize forest training was multithreaded. I thought Threads only became available in 1.3 and yet 1.0 tests are passing, which I don't understand.

@rikhuijzer
Copy link
Member Author

Things are still not stable. The tests such as the ones in regression/random pass for me locally but fail in CI.

Fixed in ed4602b. I forgot to also pass StableRNG to the data generation steps.

@rikhuijzer
Copy link
Member Author

rikhuijzer commented Jun 21, 2022

In 5a10555, I've put the old numbers back. In earlier commits, I relaxed the bounds of the tests because sometimes the unstable tests had bad luck and the test would be outside of the bounds. That is fixed now by the stable rngs.

@ablaom I think this PR is good to go. If you approve, I would prefer a squash merge on this PR for the reasons mentioned in https://discourse.julialang.org/t/downsides-of-using-squash-merging/68317. But you can decide 👍

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A labour of love. Great PR, thanks.

@ablaom ablaom merged commit 969c637 into JuliaAI:dev Jun 21, 2022
This was referenced Jun 21, 2022
@rikhuijzer
Copy link
Member Author

A labour of love. Great PR, thanks.

👍👍👍 Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants