-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/non parametric drift #359
Features/non parametric drift #359
Conversation
WIP WIP WIP
Looking good - we're almost there! As well as the small comments I've made I think it could be worth ensuring the tests cover both the n_features = 1 and n_features > 1 cases and that in the former we can pass reference data and test data as vectors and floats respectively. It seems like this is something that could easily be accidentally broken in the future. |
@ojcobb thanks for the suggestion RE tests, that uncovered a slight issue with the existing I've included the same capability for offline, but only test |
@ojcobb, I added a citation to the online detector docs pages. Let me know if any opposition! @jklaise @arnaudvl slightly bad form, but I have snuck in a sphinx extension into this PR, |
Sounds good to me. We might need a follow-up PR to go over existing manual citations and consolidate into this new format. |
@jklaise good idea! I've added a note to the "Additional issues" section in the opening comment, which I'll consolidate down into new issues post-merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two tiny comments, otherwise LGTM!
What is this PR
This draft PR implements four non-parametric drift detectors, intended for use as supervised drift detectors (suitable for detecting concept drift and/or any other types of malicious drift).
CVMDrift
andCVMDriftOnline
apply the Cramer-von Mises statistical test to continuous data, whilstFETDrift
andFETDriftOnline
apply Fisher's exact test to binary data. Both detectors use what are essentially univariate tests, but utilize a multivariate correction to allow for multivariate data.Experiments/Examples
Simple example notebooks available here:
Benchmarking script testing ART and ADD:
https://gist.github.com/ascillitoe/0b051e3905bbf7ada9d8a1c421d7420b.
Results are below. Averaged over five initializations (+- is 1std), with ref data of length N=500,
n_bootstraps=50000
, varying number of dimensions d.CVMDriftOnline
, d=1CVMDriftOnline
, d=5FETDriftOnline
, d=1FETDriftOnline
, d=5@ojcobb could do with your thoughts on whether above look ok. - update: After numerous fixes (starting at ab4e4f8), ART is now consistently ~170 (ERT=150) for all d and both alternative hypotheses. We expect ART to be slightly over ERT for the FET detector, so the performance now looks acceptable.
Outstanding decisions
self.test_stats
(untilreset()
called). For theFETDriftOnline
detector this means storing an array of size(t-t0,len(self.window_sizes),self.n_features)
. In other words,n_features
times larger than for the other online detectors. Is this acceptable? If not, we could only store the test stats from the previous time step (needed for exponential smoothing).FETDriftOnline
is rather slow in the multivariate case, as need to do for each window and feature. We could have an option for a truly "non-parametric" approach, where we just simulaten_window
times with the Bernoulli parameter fixed at 0.5. Docs would need to warn that this is faster but less accurate. @ojcobb what do you think?TODO
The code is annotated with a number of outstanding TODO blocks. In addition, the below more general TODO's remain:
cramervonmises_2samp
andfisher_exact
fromscipy.stats
.cramervonmises_2samp
was only added inscipy==1.7.0
, butscipy==1.6.0
dropped support for python 3.6. In this another reason to drop 3.6 support? - CVM test case skipped for python 3.6 tests.UserWarning
added ifCVMDrift
instantiated with old scipy version.N
, number of streamsn_bootstraps
, and window sizeswindow_size
. At present, wall times are acceptable, but memory requirements are quite high. Decide ondevice
.@ojcobb note: Re high memory usage, need to check correct usage of binary vs float32 internally. - binary vs float32 was not an issue, high memory usage was with binary array. This is now mitigated by processing streams in batches.
FETDriftOnline
detector needs further checks. The ADD values get very high whenp_h1
<p_h0
inexamples/cd_online_supervised.ipynb
, but not whenp_h1
>p_h0
. To start, it is worth checking the thresholds vst
etc.@ojcobb note: this might just be because of the one-sided nature of the custom online detector implementation. Could do with implementing option for two-sided alternative hypothesis for consistency with offline detector. -
'greater'
and'less'
single-sided tests now implemented. Two-sided test removed from offline.BaseDriftOnline
do not breakMMDDriftOnline
orLSDDDriftOnline
in any potential use case (all tests pass, and relevant example notebooks have been tested, but further checks would be prudent).CVMDriftOnline
. This is challenging sincetqdm
cannot be used insidenumba
decorated methods running innopython
mode (i.e. the_ids_to_stats
method). We could use a basicprint
statement called from only the first thread, but this would be ugly sinceflush=True
etc can't be used in thenopython
version ofprint
. Using thenumba
objmode
context (https://numba.pydata.org/numba-doc/latest/user/withobjmode.html) might be an option, but will likely add a performance penalty. There is also numba-progress, but this is very new and currently doesn't have support forguvectorize
functions. - Moved to "issues" in below comment.np.quantile
used in both online detectors. Does this handle tails OK or do we need something else? - simple utility function has been added inutils/misc.py
to replicate behaviour of the tensorflow/pytorch specificquantile
functions.numba
decorators seem to break thesphinx
docs builds. I haven't found a fix yet, but moving to a newer version ofsphinx
i.e.>=4.0
seems to fix this. Not ideal, but the easiest option would be simply to wait until we move to the new sphinx version next month... - Moving tosphinx v4.2.0
now which fixes this.Additional issues to consider after initial implementation
(These will be added as issues once the PR is merged)
FETDriftOnline
a mechanism to determine when the thresholds have stopped decreasing would be useful.utils.misc.quantile
(and the tensorflow/pytorch specific versions) we could consider whether we want to stick withtype=7
as the default. Hyndman and Fan recommendtype=8
(see https://robjhyndman.com/hyndsight/sample-quantiles-20-years-later/).CVMDriftOnline
.alternative='two-sided'
option forFET
detectors. We usehypergeom.cdf
directly to perform the Fisher exact tests, since this can vectorise over multiple data streams (important forFETDriftOnline
). The two-sided test is more challenging to do in a vectorised fashion due to the need to perform a binary search to find where to begin the halves (see [scipy implementation]. - This was implemented as a vectorized numba implementation, but performance targeting ERT was poor. Removed in b3a993b.get_thresholds
in all online detectors so thatself.t+1
is given as an argument in the conditional case, andself.t
in the unconditional (see Features/non parametric drift #359 (comment))..bib
format.