Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix random seed for releases #76

Open
Stewori opened this issue Apr 19, 2020 · 7 comments
Open

fix random seed for releases #76

Stewori opened this issue Apr 19, 2020 · 7 comments

Comments

@Stewori
Copy link
Member

Stewori commented Apr 19, 2020

I am following up on private email communication where we discussed the issue of small but notable probability of failing random tests due to insane random data.
We concluded it is best to fix a seed for releases, so the tests of a release run reliably.

Alternatives would have been to lower testing precision, but that would spoof test reliability and would not eliminate the risk of failure, just make it less likely, which is not what we want. The algorithms cannot be improved to handle every ill kind of data because in the end this relies on numerical operations and for every numerical operation there is some nasty data that provokes instability. Checking or sanetizing data would in general be unnecessarily expensive, given the low probability of failure. For frequent and annoying cases this may be an option though (c.f. symeig_semidefinite).

A remaining question is how to proceed after the release. I propose we should unfix the seed again, but track results to learn more about the situation. I'd suggest we collect a list of sane seeds (around 1000 or so) and then randomly choose from these. Still fix a sane seed for every release.
For the upcoming release, let's just fix some good seed and unfix it afterwards.

Suggestions, opinions?

@NiMlr
Copy link
Member

NiMlr commented Apr 19, 2020

Based on my observations, we have a small subset of mostly the same nodes' tests failing. This suggests we do have higher instability in single algorithms. I might be wrong but if not I would conclude that ideally, the test of a node should come with a fixed - as opposed to random - set of test data. At least unless it is not absolutely necessary to have it randomized. I know this solution will be hard to come up with so a simpler version would be to identify the critical nodes by sampling tests and then fix them. I would be happy to support this effort.

@Stewori
Copy link
Member Author

Stewori commented Apr 19, 2020

I think it is valuable to collect some data to identify nodes that are specifically fragile. Actually, such nodes should be revisited to ensure it's not a subtle bug or design fail allowing for a stability improvement. I doubt we have capacity for that, but collecting the data would be a good step in any case.
Still, I would go for fixed seeds for releases. I am convinced that every node that is based on numerical computation (so, in fact every node) has some nonzero failure probability on random data. E.g., as soon as very large and very small values occur in the same data, numerical results can be arbitrarily inaccurate. This affects matrix decompositions (also regarding eigenvalues), matrix inversion, but also basics like addition (multiplication is better). This is not the only form of numerical instability.

@Stewori
Copy link
Member Author

Stewori commented Apr 19, 2020

I wouldn't permanently fix seeds (for single nodes) in regular development. Restarting travis is no big deal and randomized data stresses the tests much better. If the failures should really require to be fixed, a better approach would be to sanitize the random data, e.g. eliminate values below or above a certain threshold. For this, a list of breaking seeds would be helpful, so it would be good to collect these as well.

@otizonaizit
Copy link
Member

otizonaizit commented Apr 20, 2020

I agree with @Stewori here. Every single time someone (or travis) comes up with a test failing, we should collect that seed and inspect the situtation. We can then see if the generated random data is indeed problematic, or if the problem is a more subtle algorithmic issue.

Anyway, given that with pytest --seed==N mdp/test/test_XXX.py -k test_YYY we can easily and very quickly replicate the failures, I think it's really crucial to collect the failing random seeds.

@NiMlr : could you add a file BROKEN_SEEDS where we start adding the broken seeds once they appear? In the same PR you could also remove the fixed seed from .travis.yml.

@Stewori
Copy link
Member Author

Stewori commented May 17, 2020

It occurred to me that in BROKEN_SEEDS, we should collect all relevant versions, i.e. of Python, NumPy, SciPy, scikits, (what else?) of the failed run along with the seed. There should be a header as pseudo-comment that specifies the format, e.g.

# Format:
# seed, Python version, NumPy version, SciPy version, scikits version, other info

If it's not CPython, Python version should include the Interpreter name, e.g. PyPy 3.5. A plain version number, e.g. 3.9 means CPython 3.9.

@Stewori
Copy link
Member Author

Stewori commented May 17, 2020

In the PR, the first failure readily occurred:
Random Seed: 1766368184

@NiMlr
Copy link
Member

NiMlr commented May 17, 2020

Done in 4d15670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants