New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix random seed for releases #76
Comments
Based on my observations, we have a small subset of mostly the same nodes' tests failing. This suggests we do have higher instability in single algorithms. I might be wrong but if not I would conclude that ideally, the test of a node should come with a fixed - as opposed to random - set of test data. At least unless it is not absolutely necessary to have it randomized. I know this solution will be hard to come up with so a simpler version would be to identify the critical nodes by sampling tests and then fix them. I would be happy to support this effort. |
I think it is valuable to collect some data to identify nodes that are specifically fragile. Actually, such nodes should be revisited to ensure it's not a subtle bug or design fail allowing for a stability improvement. I doubt we have capacity for that, but collecting the data would be a good step in any case. |
I wouldn't permanently fix seeds (for single nodes) in regular development. Restarting travis is no big deal and randomized data stresses the tests much better. If the failures should really require to be fixed, a better approach would be to sanitize the random data, e.g. eliminate values below or above a certain threshold. For this, a list of breaking seeds would be helpful, so it would be good to collect these as well. |
I agree with @Stewori here. Every single time someone (or travis) comes up with a test failing, we should collect that seed and inspect the situtation. We can then see if the generated random data is indeed problematic, or if the problem is a more subtle algorithmic issue. Anyway, given that with @NiMlr : could you add a file |
It occurred to me that in BROKEN_SEEDS, we should collect all relevant versions, i.e. of Python, NumPy, SciPy, scikits, (what else?) of the failed run along with the seed. There should be a header as pseudo-comment that specifies the format, e.g.
If it's not CPython, Python version should include the Interpreter name, e.g. PyPy 3.5. A plain version number, e.g. 3.9 means CPython 3.9. |
In the PR, the first failure readily occurred: |
Done in 4d15670 |
I am following up on private email communication where we discussed the issue of small but notable probability of failing random tests due to insane random data.
We concluded it is best to fix a seed for releases, so the tests of a release run reliably.
Alternatives would have been to lower testing precision, but that would spoof test reliability and would not eliminate the risk of failure, just make it less likely, which is not what we want. The algorithms cannot be improved to handle every ill kind of data because in the end this relies on numerical operations and for every numerical operation there is some nasty data that provokes instability. Checking or sanetizing data would in general be unnecessarily expensive, given the low probability of failure. For frequent and annoying cases this may be an option though (c.f. symeig_semidefinite).
A remaining question is how to proceed after the release. I propose we should unfix the seed again, but track results to learn more about the situation. I'd suggest we collect a list of sane seeds (around 1000 or so) and then randomly choose from these. Still fix a sane seed for every release.
For the upcoming release, let's just fix some good seed and unfix it afterwards.
Suggestions, opinions?
The text was updated successfully, but these errors were encountered: