In [51]:
import numpy as np
from sklearn.linear_model import SGDClassifier
import warnings
warnings.filterwarnings(action="ignore")

from frouros.datasets.real import Elec2
from frouros.supervised.ddm_based import DDM, DDMConfig

# Supervised - Simple

Example of how to use the supervised method DDM {cite:p}`gama2004learning` without any helper class that interacts with the detector.

In order to demonstrate a simple use case, we use some features of the normalized version of Elec2 {cite:p}`harries1999splice`. Unlike synthetic datasets, in real datasets is not possible to know for sure if and when drift occurs.

In [52]:
# Get Elec2 dataset and preprocess it
elec2 = Elec2()
elec2.download()
data = elec2.load()
X = np.array(data[["nswprice", "vicprice", "transfer"]].tolist())
y = np.array(data[["class"]].tolist()).astype('str')
# First 20000 samples are used as reference to fit the model
split_idx = 20000
X_ref, y_ref, X_test, y_test = X[:split_idx], y[:split_idx].ravel(), X[split_idx:], y[split_idx:]

INFO:frouros:Trying to download data from https://nextcloud.ifca.es/index.php/s/2coqgBEpa82boLS/download to /tmp/tmp8jbe7scz


The following cell defines a scikit-learn estimator and the detector that uses the estimator. Subsequently the estimator that is inside the detector gets fitted.

In [53]:
# scikit-learn estimator
estimator = SGDClassifier(loss="log_loss",
                          random_state=31,)
# Detector configuration class
config = DDMConfig(warning_level=2.0,
                   drift_level=3.0,
                   min_num_instances=2000,)
# Detector with the scikit-learn estimator
model_detector = DDM(estimator=estimator,
                     config=config,)
model_detector.fit(X=X_ref,
                   y=y_ref,)

<frouros.supervised.ddm_based.ddm.DDM at 0x7f20cec97070>

A simulation of stream samples is performed using the test dataset until drift is detected. In each iteration the model performs a prediction that is compared with the ground-truth, resulting in an error value. This error value is used to update the detector. In order to check if drift is occurring, a status attribute can be acceded.

In [54]:
# Simulate stream of samples
for i, (X_sample, y_sample) in enumerate(zip(X_test, y_test)):
    # Get predicted value
    y_pred = model_detector.predict(X=X_sample.reshape(1, -1))
    # Compute error to be used by the detector
    error = int(y_sample != y_pred)
    # Update detector with the error
    model_detector.update(value=error)
    # Obtain detector status
    status = model_detector.status
    if status["drift"]:
        print(f"Drift detected at index {i}")
        break

Drift detected at index 2800


```{bibliography}
:filter: docname in docnames
```