Features/non parametric drift #359

ascillitoe · 2021-10-07T10:38:00Z

What is this PR

This draft PR implements four non-parametric drift detectors, intended for use as supervised drift detectors (suitable for detecting concept drift and/or any other types of malicious drift). CVMDrift and CVMDriftOnline apply the Cramer-von Mises statistical test to continuous data, whilst FETDrift and FETDriftOnline apply Fisher's exact test to binary data. Both detectors use what are essentially univariate tests, but utilize a multivariate correction to allow for multivariate data.

Experiments/Examples

Simple example notebooks available here:

Benchmarking script testing ART and ADD:
https://gist.github.com/ascillitoe/0b051e3905bbf7ada9d8a1c421d7420b.

Results are below. Averaged over five initializations (+- is 1std), with ref data of length N=500, n_bootstraps=50000, varying number of dimensions d.

Detector	ERT	ART	ADD
`CVMDriftOnline`, d=1	150	160.7+-19.3	11.7 +- 0.4
`CVMDriftOnline`, d=5	150	147.3+-13.9	3.3 +- 0.3
`FETDriftOnline`, d=1	150	208.0+-30.3	19.4 +- 1.2
`FETDriftOnline`, d=5	150	160.4+-47.7	8.8 +- 2.0

@ojcobb could do with your thoughts on whether above look ok. - update: After numerous fixes (starting at ab4e4f8), ART is now consistently ~170 (ERT=150) for all d and both alternative hypotheses. We expect ART to be slightly over ERT for the FET detector, so the performance now looks acceptable.

Outstanding decisions

As with other online detectors, the test stats are currently accumulated in self.test_stats (until reset() called). For the FETDriftOnline detector this means storing an array of size (t-t0,len(self.window_sizes),self.n_features). In other words, n_features times larger than for the other online detectors. Is this acceptable? If not, we could only store the test stats from the previous time step (needed for exponential smoothing).
In the docs "Algorithm Overview" currently "Images" is ticked for FET and CVM detectors. This is technically accurate, however using them for images is perhaps a stretch from their intended use case. Shall we leave as is, or remove to avoid confusion?
In "Drift->Detection Background->Online drift detection", the new online detectors are hinted at as following the same strategy as the existing detectors (ref to Cobb et al.), as they also involve threshold config at instantiation. However, there are differences here. Do we need to clarify here?
Configuring thresholds for FETDriftOnline is rather slow in the multivariate case, as need to do for each window and feature. We could have an option for a truly "non-parametric" approach, where we just simulate n_window times with the Bernoulli parameter fixed at 0.5. Docs would need to warn that this is faster but less accurate. @ojcobb what do you think?

TODO

The code is annotated with a number of outstanding TODO blocks. In addition, the below more general TODO's remain:

The offline detectors are built on cramervonmises_2samp and fisher_exact fromscipy.stats. cramervonmises_2samp was only added in scipy==1.7.0, but scipy==1.6.0 dropped support for python 3.6. In this another reason to drop 3.6 support? - CVM test case skipped for python 3.6 tests. UserWarning added if CVMDrift instantiated with old scipy version.
Benchmarking. In particular wall time, memory requirements, wall time and memory usage for different reference data size N, number of streams n_bootstraps, and window sizes window_size. At present, wall times are acceptable, but memory requirements are quite high. Decide on device.
@ojcobb note: Re high memory usage, need to check correct usage of binary vs float32 internally. - binary vs float32 was not an issue, high memory usage was with binary array. This is now mitigated by processing streams in batches.
All the detectors still need checking in more detail. In particular, the FETDriftOnline detector needs further checks. The ADD values get very high when p_h1 < p_h0 in examples/cd_online_supervised.ipynb, but not when p_h1 > p_h0. To start, it is worth checking the thresholds vs t etc.
@ojcobb note: this might just be because of the one-sided nature of the custom online detector implementation. Could do with implementing option for two-sided alternative hypothesis for consistency with offline detector. - 'greater' and 'less' single-sided tests now implemented. Two-sided test removed from offline.
Further checks to ensure the changes made to BaseDriftOnline do not break MMDDriftOnline or LSDDDriftOnline in any potential use case (all tests pass, and relevant example notebooks have been tested, but further checks would be prudent).
Methods docs pages.
Tests for all four detectors. - Very basic tests added.
Progress bar for CVMDriftOnline. This is challenging since tqdm cannot be used inside numba decorated methods running in nopython mode (i.e. the _ids_to_stats method). We could use a basic print statement called from only the first thread, but this would be ugly since flush=True etc can't be used in the nopython version of print. Using the numba objmode context (https://numba.pydata.org/numba-doc/latest/user/withobjmode.html) might be an option, but will likely add a performance penalty. There is also numba-progress, but this is very new and currently doesn't have support for guvectorize functions. - Moved to "issues" in below comment.
Check behavior of np.quantile used in both online detectors. Does this handle tails OK or do we need something else? - simple utility function has been added in utils/misc.py to replicate behaviour of the tensorflow/pytorch specific quantile functions.
The numba decorators seem to break the sphinx docs builds. I haven't found a fix yet, but moving to a newer version of sphinx i.e. >=4.0 seems to fix this. Not ideal, but the easiest option would be simply to wait until we move to the new sphinx version next month... - Moving to sphinx v4.2.0 now which fixes this.

Additional issues to consider after initial implementation

(These will be added as issues once the PR is merged)

For FETDriftOnline a mechanism to determine when the thresholds have stopped decreasing would be useful.
In utils.misc.quantile (and the tensorflow/pytorch specific versions) we could consider whether we want to stick with type=7 as the default. Hyndman and Fan recommend type=8 (see https://robjhyndman.com/hyndsight/sample-quantiles-20-years-later/).
Addition of progress bar for CVMDriftOnline.
All online detectors could do with more detailed CI tests. i.e. check ART and ADD on synthetic data. Need to think more about this as this might introduce non-deterministic tests.
Implement alternative='two-sided' option for FET detectors. We use hypergeom.cdf directly to perform the Fisher exact tests, since this can vectorise over multiple data streams (important for FETDriftOnline). The two-sided test is more challenging to do in a vectorised fashion due to the need to perform a binary search to find where to begin the halves (see [scipy implementation]. - This was implemented as a vectorized numba implementation, but performance targeting ERT was poor. Removed in b3a993b.
Examples for supervised drift detection, and simple example in drift overview page.
Refactor get_thresholds in all online detectors so that self.t+1 is given as an argument in the conditional case, and self.t in the unconditional (see Features/non parametric drift #359 (comment)).
Open an issue to update all citations to the new .bib format.

…s CI

…eadthedocs

WIP WIP WIP

alibi_detect/cd/cvm_online.py

alibi_detect/cd/fet.py

alibi_detect/cd/fet_online.py

doc/source/cd/methods/cvmdrift.ipynb

doc/source/cd/methods/fetdrift.ipynb

doc/source/cd/methods/onlinecvmdrift.ipynb

doc/source/cd/methods/onlinefetdrift.ipynb

doc/source/overview/algorithms.md

README.md

ojcobb · 2021-11-19T10:54:38Z

Looking good - we're almost there! As well as the small comments I've made I think it could be worth ensuring the tests cover both the n_features = 1 and n_features > 1 cases and that in the former we can pass reference data and test data as vectors and floats respectively. It seems like this is something that could easily be accidentally broken in the future.

ascillitoe · 2021-11-19T13:35:05Z

@ojcobb thanks for the suggestion RE tests, that uncovered a slight issue with the existing _check_x method (now fixed). I've added tests for n_features=1 and 3, for the online cases.

I've included the same capability for offline, but only test n_features=2 for now, since the offline detectors can only currently accept 1D data of the form (N,1) and (1,1). I think we should leave this until a later PR as it will involve modifying BaseUnivariateDrift and all the detectors that inherit it.

ascillitoe · 2021-11-19T14:48:17Z

@ojcobb, I added a citation to the online detector docs pages. Let me know if any opposition!

@jklaise @arnaudvl slightly bad form, but I have snuck in a sphinx extension into this PR, sphinxcontrib-bibtex, which allows us to cite references from a central bib file. A new Bibliography page is then populated with nicely formated references. I've checked the html and latex docs builds and it looks nice to me, but if you think this needs more thought I can factor out to a later PR.

jklaise · 2021-11-22T10:15:50Z

@ojcobb, I added a citation to the online detector docs pages. Let me know if any opposition!

@jklaise @arnaudvl slightly bad form, but I have snuck in a sphinx extension into this PR, sphinxcontrib-bibtex, which allows us to cite references from a central bib file. A new Bibliography page is then populated with nicely formated references. I've checked the html and latex docs builds and it looks nice to me, but if you think this needs more thought I can factor out to a later PR.

Sounds good to me. We might need a follow-up PR to go over existing manual citations and consolidate into this new format.

ascillitoe · 2021-11-22T10:27:27Z

@jklaise good idea! I've added a note to the "Additional issues" section in the opening comment, which I'll consolidate down into new issues post-merge.

doc/source/cd/methods/cvmdrift.ipynb

doc/source/cd/methods/onlinefetdrift.ipynb

ojcobb

Two tiny comments, otherwise LGTM!

doc/source/cd/methods/onlinefetdrift.ipynb

ascillitoe and others added 30 commits August 23, 2021 12:42

docs WIP, layout and drift background page

b6ca5cc

docs WIP 2

5624b32

docs WIP 3

801c69f

Minor doc fixes

b5cfea6

Merge branch 'master' of https://github.com/SeldonIO/alibi-detect

9025345

Trigger Build

fcc75de

Minor doc fixes and add doc build to CI

da507f4

Install alibi-detect on CI before building docs

78a7c2f

Install latex during CI

058c0a0

Crop erroneous pixel from logo

08feb9d

Embed images in notebooks and remove install of alibi-detect from doc…

c0e7fce

…s CI

Add explicit pandoc and use latest docker image with Python 3.8 for r…

56c9217

…eadthedocs

CI docker test

bbf164f

WIP

852c141

WIP

32a4b1e

Try RTD 5.0 docker image

b3912eb

Fix use of docker image in ci.yml

4a2dcb5

Set user id

b756151

try as root

213341b

use pip3

d2eef76

Fix embedded images

13c38e9

Update background page

290f28a

Merged master with feature_docs

0cc72a5

Added learnedkernel and spotthediff method notebooks

aa170d6

WIP

ba7ad7e

Corrections to comments, and learned section.

cd05e9b

WIP WIP WIP

Rewrite to .rst file. Fallback to png for latex build

df90412

Move to myst files. Initial RTD test

29f28c4

merged master into feature_docs

451d973

Moved doctrees output to doc/_build

52fa664