Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/non parametric drift #359

Merged
merged 106 commits into from
Dec 1, 2021
Merged

Features/non parametric drift #359

merged 106 commits into from
Dec 1, 2021

Conversation

ascillitoe
Copy link
Contributor

@ascillitoe ascillitoe commented Oct 7, 2021

What is this PR

This draft PR implements four non-parametric drift detectors, intended for use as supervised drift detectors (suitable for detecting concept drift and/or any other types of malicious drift). CVMDrift and CVMDriftOnline apply the Cramer-von Mises statistical test to continuous data, whilst FETDrift and FETDriftOnline apply Fisher's exact test to binary data. Both detectors use what are essentially univariate tests, but utilize a multivariate correction to allow for multivariate data.

Experiments/Examples

Simple example notebooks available here:

Benchmarking script testing ART and ADD:
https://gist.github.com/ascillitoe/0b051e3905bbf7ada9d8a1c421d7420b.

Results are below. Averaged over five initializations (+- is 1std), with ref data of length N=500, n_bootstraps=50000, varying number of dimensions d.

Detector ERT ART ADD
CVMDriftOnline, d=1 150 160.7+-19.3 11.7 +- 0.4
CVMDriftOnline, d=5 150 147.3+-13.9 3.3 +- 0.3
FETDriftOnline, d=1 150 208.0+-30.3 19.4 +- 1.2
FETDriftOnline, d=5 150 160.4+-47.7 8.8 +- 2.0

@ojcobb could do with your thoughts on whether above look ok. - update: After numerous fixes (starting at ab4e4f8), ART is now consistently ~170 (ERT=150) for all d and both alternative hypotheses. We expect ART to be slightly over ERT for the FET detector, so the performance now looks acceptable.

Outstanding decisions

  1. As with other online detectors, the test stats are currently accumulated in self.test_stats (until reset() called). For the FETDriftOnline detector this means storing an array of size (t-t0,len(self.window_sizes),self.n_features). In other words, n_features times larger than for the other online detectors. Is this acceptable? If not, we could only store the test stats from the previous time step (needed for exponential smoothing).
  2. In the docs "Algorithm Overview" currently "Images" is ticked for FET and CVM detectors. This is technically accurate, however using them for images is perhaps a stretch from their intended use case. Shall we leave as is, or remove to avoid confusion?
  3. In "Drift->Detection Background->Online drift detection", the new online detectors are hinted at as following the same strategy as the existing detectors (ref to Cobb et al.), as they also involve threshold config at instantiation. However, there are differences here. Do we need to clarify here?
  4. Configuring thresholds for FETDriftOnline is rather slow in the multivariate case, as need to do for each window and feature. We could have an option for a truly "non-parametric" approach, where we just simulate n_window times with the Bernoulli parameter fixed at 0.5. Docs would need to warn that this is faster but less accurate. @ojcobb what do you think?

TODO

The code is annotated with a number of outstanding TODO blocks. In addition, the below more general TODO's remain:

  • The offline detectors are built on cramervonmises_2samp and fisher_exact fromscipy.stats. cramervonmises_2samp was only added in scipy==1.7.0, but scipy==1.6.0 dropped support for python 3.6. In this another reason to drop 3.6 support? - CVM test case skipped for python 3.6 tests. UserWarning added if CVMDrift instantiated with old scipy version.
  • Benchmarking. In particular wall time, memory requirements, wall time and memory usage for different reference data size N, number of streams n_bootstraps, and window sizes window_size. At present, wall times are acceptable, but memory requirements are quite high. Decide on device.
    @ojcobb note: Re high memory usage, need to check correct usage of binary vs float32 internally. - binary vs float32 was not an issue, high memory usage was with binary array. This is now mitigated by processing streams in batches.
  • All the detectors still need checking in more detail. In particular, the FETDriftOnline detector needs further checks. The ADD values get very high when p_h1 < p_h0 in examples/cd_online_supervised.ipynb, but not when p_h1 > p_h0. To start, it is worth checking the thresholds vs t etc.
    @ojcobb note: this might just be because of the one-sided nature of the custom online detector implementation. Could do with implementing option for two-sided alternative hypothesis for consistency with offline detector. - 'greater' and 'less' single-sided tests now implemented. Two-sided test removed from offline.
  • Further checks to ensure the changes made to BaseDriftOnline do not break MMDDriftOnline or LSDDDriftOnline in any potential use case (all tests pass, and relevant example notebooks have been tested, but further checks would be prudent).
  • Methods docs pages.
  • Tests for all four detectors. - Very basic tests added.
  • Progress bar for CVMDriftOnline. This is challenging since tqdm cannot be used inside numba decorated methods running in nopython mode (i.e. the _ids_to_stats method). We could use a basic print statement called from only the first thread, but this would be ugly since flush=True etc can't be used in the nopython version of print. Using the numba objmode context (https://numba.pydata.org/numba-doc/latest/user/withobjmode.html) might be an option, but will likely add a performance penalty. There is also numba-progress, but this is very new and currently doesn't have support for guvectorize functions. - Moved to "issues" in below comment.
  • Check behavior of np.quantile used in both online detectors. Does this handle tails OK or do we need something else? - simple utility function has been added in utils/misc.py to replicate behaviour of the tensorflow/pytorch specific quantile functions.
  • The numba decorators seem to break the sphinx docs builds. I haven't found a fix yet, but moving to a newer version of sphinx i.e. >=4.0 seems to fix this. Not ideal, but the easiest option would be simply to wait until we move to the new sphinx version next month... - Moving to sphinx v4.2.0 now which fixes this.

Additional issues to consider after initial implementation

(These will be added as issues once the PR is merged)

  • For FETDriftOnline a mechanism to determine when the thresholds have stopped decreasing would be useful.
  • In utils.misc.quantile (and the tensorflow/pytorch specific versions) we could consider whether we want to stick with type=7 as the default. Hyndman and Fan recommend type=8 (see https://robjhyndman.com/hyndsight/sample-quantiles-20-years-later/).
  • Addition of progress bar for CVMDriftOnline.
  • All online detectors could do with more detailed CI tests. i.e. check ART and ADD on synthetic data. Need to think more about this as this might introduce non-deterministic tests.
  • Implement alternative='two-sided' option for FET detectors. We use hypergeom.cdf directly to perform the Fisher exact tests, since this can vectorise over multiple data streams (important for FETDriftOnline). The two-sided test is more challenging to do in a vectorised fashion due to the need to perform a binary search to find where to begin the halves (see [scipy implementation]. - This was implemented as a vectorized numba implementation, but performance targeting ERT was poor. Removed in b3a993b.
  • Examples for supervised drift detection, and simple example in drift overview page.
  • Refactor get_thresholds in all online detectors so that self.t+1 is given as an argument in the conditional case, and self.t in the unconditional (see Features/non parametric drift #359 (comment)).
  • Open an issue to update all citations to the new .bib format.

alibi_detect/cd/fet.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@ojcobb
Copy link
Contributor

ojcobb commented Nov 19, 2021

Looking good - we're almost there! As well as the small comments I've made I think it could be worth ensuring the tests cover both the n_features = 1 and n_features > 1 cases and that in the former we can pass reference data and test data as vectors and floats respectively. It seems like this is something that could easily be accidentally broken in the future.

@ascillitoe
Copy link
Contributor Author

ascillitoe commented Nov 19, 2021

@ojcobb thanks for the suggestion RE tests, that uncovered a slight issue with the existing _check_x method (now fixed). I've added tests for n_features=1 and 3, for the online cases.

I've included the same capability for offline, but only test n_features=2 for now, since the offline detectors can only currently accept 1D data of the form (N,1) and (1,1). I think we should leave this until a later PR as it will involve modifying BaseUnivariateDrift and all the detectors that inherit it.

@ascillitoe
Copy link
Contributor Author

@ojcobb, I added a citation to the online detector docs pages. Let me know if any opposition!

@jklaise @arnaudvl slightly bad form, but I have snuck in a sphinx extension into this PR, sphinxcontrib-bibtex, which allows us to cite references from a central bib file. A new Bibliography page is then populated with nicely formated references. I've checked the html and latex docs builds and it looks nice to me, but if you think this needs more thought I can factor out to a later PR.

@jklaise
Copy link
Contributor

jklaise commented Nov 22, 2021

@ojcobb, I added a citation to the online detector docs pages. Let me know if any opposition!

@jklaise @arnaudvl slightly bad form, but I have snuck in a sphinx extension into this PR, sphinxcontrib-bibtex, which allows us to cite references from a central bib file. A new Bibliography page is then populated with nicely formated references. I've checked the html and latex docs builds and it looks nice to me, but if you think this needs more thought I can factor out to a later PR.

Sounds good to me. We might need a follow-up PR to go over existing manual citations and consolidate into this new format.

@ascillitoe
Copy link
Contributor Author

ascillitoe commented Nov 22, 2021

@jklaise good idea! I've added a note to the "Additional issues" section in the opening comment, which I'll consolidate down into new issues post-merge.

Copy link
Contributor

@ojcobb ojcobb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny comments, otherwise LGTM!

@ascillitoe ascillitoe merged commit e910b40 into SeldonIO:master Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants