New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test multiple seeds #174
Test multiple seeds #174
Conversation
Test failures are expected. Note that the tests things were already broken, but now there is a test to point it out |
Codecov Report
@@ Coverage Diff @@
## dev #174 +/- ##
==========================================
+ Coverage 89.18% 89.40% +0.21%
==========================================
Files 10 10
Lines 999 1000 +1
==========================================
+ Hits 891 894 +3
+ Misses 108 106 -2
Continue to review full report at Codecov.
|
Thanks for looking into this @rikhuijzer. Yes, I have also been noticing these inconsistent runs. I would think at least part of the solution is to replace all RNGs in testing with StableRNGs, (adding StableRNGs to the test dependencies). However, I also agree that it seems that we are not getting repeatability when specifying the same seed each time (in the same Julia version) and I'm not sure why this would be. My first thought was that there was some kind of some parallelism when comparing features to split at a node. Then if there is a draw when two features compete, the one chosen could depend on the unpredictable order in which the impurity improvements were calculated. But I don't see any such parallelism. (Would be a great idea to introduce however!) There are some SIMD loops for minor things, but I would have thought they would not give very large differences in the runs. Sorry, this is probably not much help. |
Okay I think @yufongpeng has isolated the bug, corrected in this PR: #167. In However, that PR is breaking, so I think we should release a patch to fix the bug before #167, and should probably include StableRNGs at the same time. @rikhuijzer Would you be willing and able to do that this week? |
Great plan 👍
Done.
And thanks @yufongpeng for spotting the source of the problem! |
src/classification/main.jl
Outdated
# The Mersenne Twister (Julia's default) is not thread-safe. | ||
_rng = copy(rng) | ||
inds = rand(_rng, 1:t_samples, n_samples) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes another bug. This path wasn't tested. The path in that function on the integers does work.
This was caused by incorrect usage of the RNG in combination with |
The tests now fail because many comparisons depend on the seed? |
Things are still not stable. The tests such as the ones in |
@rikhuijzer Thanks for extra work and good catch with the further bug discovery. For now I'm going to hope you get a chance to finish this off, as it seems like a work-in-progress. Side note: Strange, I didn't realize forest training was multithreaded. I thought |
Fixed in ed4602b. I forgot to also pass |
In 5a10555, I've put the old numbers back. In earlier commits, I relaxed the bounds of the tests because sometimes the unstable tests had bad luck and the test would be outside of the bounds. That is fixed now by the stable rngs. @ablaom I think this PR is good to go. If you approve, I would prefer a squash merge on this PR for the reasons mentioned in https://discourse.julialang.org/t/downsides-of-using-squash-merging/68317. But you can decide 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A labour of love. Great PR, thanks.
👍👍👍 Thanks |
The tests were failing on Julia 1.8-rc1. The cause appears to be that results from the same seeds are not always the same. In most seeds, the result are the same 5 out of 6 times or so. This wasn't tested in earlier Julia versions and now my theory is that Julia 1.8 was broken because
I've confirmed this by adding a test over multiple seeds which makes Julia 1.7 fail. In other words, older Julia versions by chance had a RNG set which did cause a difference between run 1 and 2. By chance, the Julia 1.7 RNG did not.
More specifically, the value for
tree.featval
is not the same between runs. I have no idea where that thing is set so I wasn't able to find the root cause yet.