[DOC] ClassifierDrift #692

cmougan · 2022-12-10T18:23:57Z

In which data is the classifier drift trained?
The documentation does not state it very clear.

Classifier-based drift detector. The classifier is trained on a fraction of the combined
reference and test data and drift is detected on the remaining data. To use all the data
to detect drift, a stratified cross-validation scheme can be chosen.

arnaudvl · 2022-12-12T13:48:59Z

The ClassifierDrift detector is trained on a portion of the combined reference set x_ref and test set x_test. If the train_size argument is a float between 0 and 1, then a random sample of size int(train_size * (len(x_ref) + len(x_test))) from the combined data [x_ref, x_test] is used for training. The held out fraction 1 - train_size is then used for testing for drift. If we instead specify n_folds as an int we apply cross-validation to ensure we leverage all the data for both training and out-of-sample testing. The n_folds argument has priority over train_size. This is clarified in the docs under the detector's usage section: https://docs.seldon.io/projects/alibi-detect/en/stable/cd/methods/classifierdrift.html#Usage

cmougan · 2022-12-12T14:20:51Z

Thanks for the clarification and link!

I was thinking that perhaps we can improve the documentation either by extending or adding a link.
What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] ClassifierDrift #692

[DOC] ClassifierDrift #692

cmougan commented Dec 10, 2022

arnaudvl commented Dec 12, 2022 •

edited

cmougan commented Dec 12, 2022

[DOC] ClassifierDrift #692

[DOC] ClassifierDrift #692

Comments

cmougan commented Dec 10, 2022

arnaudvl commented Dec 12, 2022 • edited

cmougan commented Dec 12, 2022

arnaudvl commented Dec 12, 2022 •

edited