Random seed utilities #496

ascillitoe · 2022-05-03T12:26:12Z

This PR adds utility functions to get and set seeds for the Python, Numpy, TensorFlow and PyTorch random number generators.

Motivation

Some detectors such as MMDDrift and LSDDDrift involve operations using NumPy, TensorFlow and/or PyTorch random number generators. The resulting non-deterministic behavior is a problem for testing, as it prevents us performing checks such as:

random.seed(42)
np.random.seed(42)
tf.random.set_seed(42)

detector1 = MMDDrift(X_ref, ...)
preds1 = detector1.predict(x_h0)

detector2 = MMDDrift(X_ref, ...)
preds2 = detector2.predict(x_h0)

assert preds1['data']['p_val'] == preds2['data']['p_val']

Setting all random seeds (with np.random.seed(), tf.random.set_seed() etc) at the start of the test is insufficient to provide determinism here, since the operations on the first detector will still cycle through the RNG states such that the second detector would see a different RNG state. To ensure determinism we would instead have to set seeds again before the init and prediction of the second detector.

Setting global random seeds in individual pytest functions is also undesriable in general, and leads to a difficult-to-maintain and unpredictable test suite.

Solution

This PR addresses the above through two changes:

Introduces the use of pytest-randomly for testing. This package manages the random seeds used for each test. The seed is randomly changed for each test, which avoids us being tricked by tests passing due to a "friendly" hardcoded random state. Importantly, the random seed is reported for each test e.g.:

Using --randomly-seed=1553614239

and a test can then be repeated with a given random seed for debugging purposes:

pytest --randomly-seed=1234 alibi_detect -k <name_of_failed_test>
A new submodule alibi_detect.utils.random is introduced. This contains functionality to set and get RNG seeds (the relevent seeds for NumPy, TensorFlow, PyTorch etc are all set at once). A context manager fixed_seed is also introduced to enable select lines of code to be run with a given seed in an isolated fashion, e.g.:
```
with fixed_seed(42):
   detector1 = MMDDrift(X_ref, ...)
   preds1 = detector1.predict(x_h0)

with fixed_seed(42):
   detector2 = MMDDrift(X_ref, ...)
   preds2 = detector2.predict(x_h0)

assert preds1['data']['p_val'] == preds2['data']['p_val']
```
The fixed_seed's above set all the random seeds to 42 within the with block, but they are reset to their original values upon exit of the block.

The alibi_detect.utils.random has been made public as it might be quite useful for use in example notebooks, experiments etc.

Note: If we decided we didn't want the new utilities public, we could hide it within tests instead, however we would not be able to set set_seeds as an entry_point for pytest, meaning pytest wouldn't use our custom seed setter internally (our custom one also sets the TF and torch random seeds, whereas the pytest reseed function only sets Python and Numpy seeds).

What does this PR not address

Going forward we might want to avoid setting seeds altogether, by using generators such as tf.random.Generator and np.random.Generator. This would likely involve passing a seed kwarg to detectors (similar to the scikit-learn random_state). This PR doesn't help with this, and is instead a stop-gap to be used in the interim.

Related issues

Related to #250.

setup.cfg

jklaise · 2022-05-03T12:58:39Z

alibi_detect/utils/random.py

+    import torch
+
+
+def set_seed(seed):


seed: int
Is None valid/desirable here (e.g. you can do np.random.seed(None) do seed with an unknown state).

Yes I wanted that, but annoyingly torch.random.manual_seed doesn't support None. I guess we could take None, and then generate a random seed ourselves using system time etc, but I didn't want to complicate matters too much for now, especially since random, np.random and tf.random all randomly generate seeds in different ways if None is passed.

On second thoughts, in the event of None, we could call np.random.seed(None) first, and then get that seed to use to set the others?

I don't think it's worth complicating at this point, the utility of a "random random seed" is also not clear in a function specifically for setting a known seed.

jklaise · 2022-05-03T12:59:06Z

alibi_detect/utils/random.py

+    """
+    os.environ['PYTHONHASHSEED'] = str(seed)
+    random.seed(seed)
+    np.random.seed(seed)


Not sure if relevant, but what was the deal with using new-style numpy random number generators?

AFAIK, this shouldn't affect the numpy generators, in that they don't appear to be affected by the np.random.seed() at all. A very crude example:

from alibi_detect.utils.random import set_seed, get_seed rng = np.random.default_rng() set_seed(0) a = rng.random() set_seed(0) b = rng.random() assert a == b # False rng = np.random.default_rng(0) a = rng.random() rng = np.random.default_rng(0) b = rng.random() assert a == b # True

I've changed utils.random to private for now, since going forward we might want to more deeply consider how we can move from the legacy numpy RNG to generators (assume this would involve passing a seed kwarg to detectors, a la scikit-learn).

alibi_detect/utils/random.py

jklaise · 2022-05-03T13:19:26Z

alibi_detect/utils/random.py

+    try:
+        yield
+    finally:
+        set_seed(orig_seed)


I'm not sure resetting seed (or rng state if that was the intention) is that easy (see also above). E.g. what happens in the following scenario where the different library rng's are out of sync?

set_seed(42) -> some numpy calls -> set_seed(other)

I think the new global variable approach fixes this? I've extended the tests in tests_random.py to check that the seed returned by get_seed is unchanged after RNG calls.

ascillitoe · 2022-05-03T16:05:17Z

Since a number of existing tests still use np/tf.random.seed(0) etc in numerous places I've hardcoded in --randomly-seed=0 in setup.cfg for now, so that existing tests don't randomly fail (reworking these tests to avoid setting global seed values will need to be part of a wider project). This does mean that pytest-randomly is not actually doing anything! However, I've left it in the PR since the related code to setup set_seed as an entry point is already there, and it should pave the way for the future work. I can remove pytest-randomly for now though if people think that would be better...

alibi_detect/utils/_random.py

jklaise · 2022-05-06T12:52:49Z

alibi_detect/utils/_random.py

+    else:
+        raise RuntimeError('`set_seed` must be called before `get_seed` can be called.')


What's the reason for not accepting None as valid? Think I'm a bit lost in the motivation of these utilities again...

set_seed doesn't work with None since torch.manual_seed() doesn't accept None (#496 (comment)).

WRT to motivation, the primary motivation of this PR now is to provide the fixed_seed context manager for use in tests. We want to be able to run detectors deterministically, i.e.

with fixed_seed(42): detector1 = MMDDrift(X_ref, ...) preds1 = detector1.predict(x_h0)

But want to avoid setting global seeds, since these transfer into the next pytest test (I've checked this with a simple demo). This is quite an issue since in some tests we don't want the seeds fixed (see #250). If we move to use generators (e.g. np.random.Generator) in the future, and have MMDDrift(x_ref, ..., random_seed=...), we can probably get rid of this entire _random module.

The set_seed and get_seed methods might be helpful to a user in some way, but they're primarily for use with fixed_seed so I've kept the submodule private for now.

Does the above help or am I barking up the wrong tree? 🙂

alibi_detect/utils/tests/test_random.py

jklaise

See individual comments.

ascillitoe · 2022-05-10T10:23:30Z

Rebased onto feature/config_driven_detectors since this functionality is initially intended for use in tests within #497.

alibi_detect/utils/_random.py

jklaise · 2022-05-16T15:16:56Z

requirements/dev.txt

+types-requests~=2.25
+types-toml~=0.10


conscious that we would not keep up with new minor releases if going with this?

To clarify, types-requests~=2.25 means >= 2.25, == 2.*, so I think we are OK, but have changed to types-requests>=2.25, <3.0 for consistency with elsewhere.

jklaise

LGTM

This PR implements a number of final fixes and refactorings for the config driven detectors: 1. The `preprocess_at_init` and `preprocess_at_pred` logic implemented in #381 and #458 has been reworked. This turned out to have a problem in how it dealt with `update_x_ref`, since regardless of `x_ref_preprocessed`, we still need to update reference data within `.predict()` when `update_x_ref` is set. All offline drift detectors have been reworked to use the old logic (but with `preprocess_x_ref` renamed `preprocess_at_init`), with the addition that `self.x_ref_preprocessed` is also checked internally. 2. The previous `get_config` methods involved a lot of boilerplate to try to recover the original args/kwargs from detector attributes. The new approach calls a generic `_set_config()` with `__init__`, and then `self.config` is returned by `get_config`. This should significantly reduce the workload to add save/load to new detectors. To avoid memory overheads, large artefacts such as `x_ref` are not set at `__init__`, and instead are added within `get_config`. 3. Owing to the ease of implementation with the new `get_config` approach, save/load has been added for the model uncertainty and online detectors! 4. Kernels and `preprocess_fn`'s were previously resolved in `_load_detector_config`, which wasn't consistent with how other artefacts were resolved (it also caused added extra challenges). These are now resolved in `resolve_config` instead. Following this the `KernelConfigResolved` and `PreprocessConfigResolved` pydantic models have been removed (they could be added back but it would complicate `resolve_config`). 5. Fixing determinism in #496 has allowed us to compare original and loaded detector predictions in `test_saving.py`. This uncovered bugs with how kernels were saved and loaded. These have been fixed. 6. The readthedocs.yml has been fully updated to the V2 schema so that we can use Python 3.9 for building the docs. This is required as the `class NDArray(Generic[T], np.ndarray[Any, T])` in `utils._typing` causes an error with `autodoc` on older Python versions.

Implemented RNG seed utilities

17e3934

jklaise self-requested a review May 3, 2022 12:28

ascillitoe requested a review from mauicv May 3, 2022 12:36

jklaise reviewed May 3, 2022

View reviewed changes

setup.cfg Show resolved Hide resolved

jklaise reviewed May 3, 2022

View reviewed changes

setup.cfg Outdated Show resolved Hide resolved

jklaise reviewed May 3, 2022

View reviewed changes

alibi_detect/utils/random.py Outdated Show resolved Hide resolved

jklaise reviewed May 3, 2022

View reviewed changes

ascillitoe added 4 commits May 3, 2022 16:00

Reworked cvm and fet online tests slightly

478517e

Tweaked offline fet and cvm tests, and moved utils.random to private

48f4404

Included missing private submodule

7509148

Renamed imports to _random

ca025a0

ascillitoe added 3 commits May 3, 2022 17:14

Revert test changes and fix seed for now

2694f04

Fixes to pytest-randomly setup

83bc057

Add small docstring

1ed62d1

jklaise reviewed May 6, 2022

View reviewed changes

alibi_detect/utils/_random.py Outdated Show resolved Hide resolved

jklaise reviewed May 6, 2022

View reviewed changes

alibi_detect/utils/_random.py Outdated Show resolved Hide resolved

jklaise reviewed May 6, 2022

View reviewed changes

alibi_detect/utils/tests/test_random.py Outdated Show resolved Hide resolved

jklaise reviewed May 6, 2022

View reviewed changes

alibi_detect/utils/tests/test_random.py Outdated Show resolved Hide resolved

jklaise requested changes May 6, 2022

View reviewed changes

Address comments

0a5a04f

ascillitoe mentioned this pull request May 6, 2022

Config driven detectors - part 4 #497

Merged

ascillitoe changed the base branch from master to feature/config_driven_detectors May 10, 2022 10:21

Merge branch 'feature/config_driven_detectors' into feature/random_utils

925231c

ascillitoe added 2 commits May 16, 2022 12:05

Add flake8 and mypy fixes already in #497

2a4b82f

Update licenses

ee15c38

jklaise reviewed May 16, 2022

View reviewed changes

alibi_detect/utils/_random.py Outdated Show resolved Hide resolved

jklaise reviewed May 16, 2022

View reviewed changes

Removed global in get_seed and updated version bounds

8eb65bc

jklaise approved these changes May 16, 2022

View reviewed changes

ascillitoe merged commit a87153b into SeldonIO:feature/config_driven_detectors May 16, 2022

ascillitoe mentioned this pull request Nov 17, 2022

Fix notebook tests #675

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random seed utilities #496

Random seed utilities #496

ascillitoe commented May 3, 2022 •

edited

jklaise May 3, 2022

ascillitoe May 3, 2022

ascillitoe May 3, 2022

jklaise May 3, 2022

jklaise May 3, 2022

ascillitoe May 3, 2022 •

edited

jklaise May 3, 2022

ascillitoe May 3, 2022

ascillitoe commented May 3, 2022 •

edited

jklaise May 6, 2022

ascillitoe May 6, 2022

ascillitoe May 6, 2022

jklaise left a comment

ascillitoe commented May 10, 2022 •

edited

jklaise May 16, 2022

ascillitoe May 16, 2022

jklaise left a comment

		else:
		raise RuntimeError('`set_seed` must be called before `get_seed` can be called.')

		types-requests~=2.25
		types-toml~=0.10

Random seed utilities #496

Random seed utilities #496

Conversation

ascillitoe commented May 3, 2022 • edited

Motivation

Solution

What does this PR not address

Related issues

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ascillitoe May 3, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ascillitoe commented May 3, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jklaise left a comment

Choose a reason for hiding this comment

ascillitoe commented May 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jklaise left a comment

Choose a reason for hiding this comment

ascillitoe commented May 3, 2022 •

edited

ascillitoe May 3, 2022 •

edited

ascillitoe commented May 3, 2022 •

edited

ascillitoe commented May 10, 2022 •

edited