fix random seed for releases #76

Stewori · 2020-04-19T18:49:18Z

I am following up on private email communication where we discussed the issue of small but notable probability of failing random tests due to insane random data.
We concluded it is best to fix a seed for releases, so the tests of a release run reliably.

Alternatives would have been to lower testing precision, but that would spoof test reliability and would not eliminate the risk of failure, just make it less likely, which is not what we want. The algorithms cannot be improved to handle every ill kind of data because in the end this relies on numerical operations and for every numerical operation there is some nasty data that provokes instability. Checking or sanetizing data would in general be unnecessarily expensive, given the low probability of failure. For frequent and annoying cases this may be an option though (c.f. symeig_semidefinite).

A remaining question is how to proceed after the release. I propose we should unfix the seed again, but track results to learn more about the situation. I'd suggest we collect a list of sane seeds (around 1000 or so) and then randomly choose from these. Still fix a sane seed for every release.
For the upcoming release, let's just fix some good seed and unfix it afterwards.

Suggestions, opinions?

NiMlr · 2020-04-19T20:50:56Z

Based on my observations, we have a small subset of mostly the same nodes' tests failing. This suggests we do have higher instability in single algorithms. I might be wrong but if not I would conclude that ideally, the test of a node should come with a fixed - as opposed to random - set of test data. At least unless it is not absolutely necessary to have it randomized. I know this solution will be hard to come up with so a simpler version would be to identify the critical nodes by sampling tests and then fix them. I would be happy to support this effort.

Stewori · 2020-04-19T23:21:56Z

I think it is valuable to collect some data to identify nodes that are specifically fragile. Actually, such nodes should be revisited to ensure it's not a subtle bug or design fail allowing for a stability improvement. I doubt we have capacity for that, but collecting the data would be a good step in any case.
Still, I would go for fixed seeds for releases. I am convinced that every node that is based on numerical computation (so, in fact every node) has some nonzero failure probability on random data. E.g., as soon as very large and very small values occur in the same data, numerical results can be arbitrarily inaccurate. This affects matrix decompositions (also regarding eigenvalues), matrix inversion, but also basics like addition (multiplication is better). This is not the only form of numerical instability.

Stewori · 2020-04-19T23:29:47Z

I wouldn't permanently fix seeds (for single nodes) in regular development. Restarting travis is no big deal and randomized data stresses the tests much better. If the failures should really require to be fixed, a better approach would be to sanitize the random data, e.g. eliminate values below or above a certain threshold. For this, a list of breaking seeds would be helpful, so it would be good to collect these as well.

otizonaizit · 2020-04-20T10:17:13Z

I agree with @Stewori here. Every single time someone (or travis) comes up with a test failing, we should collect that seed and inspect the situtation. We can then see if the generated random data is indeed problematic, or if the problem is a more subtle algorithmic issue.

Anyway, given that with pytest --seed==N mdp/test/test_XXX.py -k test_YYY we can easily and very quickly replicate the failures, I think it's really crucial to collect the failing random seeds.

@NiMlr : could you add a file BROKEN_SEEDS where we start adding the broken seeds once they appear? In the same PR you could also remove the fixed seed from .travis.yml.

Stewori · 2020-05-17T13:03:42Z

It occurred to me that in BROKEN_SEEDS, we should collect all relevant versions, i.e. of Python, NumPy, SciPy, scikits, (what else?) of the failed run along with the seed. There should be a header as pseudo-comment that specifies the format, e.g.

# Format:
# seed, Python version, NumPy version, SciPy version, scikits version, other info

If it's not CPython, Python version should include the Interpreter name, e.g. PyPy 3.5. A plain version number, e.g. 3.9 means CPython 3.9.

Stewori · 2020-05-17T13:06:33Z

In the PR, the first failure readily occurred:
Random Seed: 1766368184

NiMlr · 2020-05-17T22:01:58Z

Done in 4d15670

NiMlr mentioned this issue Apr 20, 2020

ERF: Loosen pytest-seed and add a file for collection broken seeds #80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix random seed for releases #76

fix random seed for releases #76

Stewori commented Apr 19, 2020

NiMlr commented Apr 19, 2020 •

edited

Stewori commented Apr 19, 2020

Stewori commented Apr 19, 2020

otizonaizit commented Apr 20, 2020 •

edited

Stewori commented May 17, 2020

Stewori commented May 17, 2020

NiMlr commented May 17, 2020

fix random seed for releases #76

fix random seed for releases #76

Comments

Stewori commented Apr 19, 2020

NiMlr commented Apr 19, 2020 • edited

Stewori commented Apr 19, 2020

Stewori commented Apr 19, 2020

otizonaizit commented Apr 20, 2020 • edited

Stewori commented May 17, 2020

Stewori commented May 17, 2020

NiMlr commented May 17, 2020

NiMlr commented Apr 19, 2020 •

edited

otizonaizit commented Apr 20, 2020 •

edited