# Detectors

Skchange detectors inherit from and extend the Sktime's `BaseDetector` class.
This enables a unified interface for both change and anomaly detection, making it easy to switch between different detectors and reuse surrounding code such as preprocessing, evaluation, tuning, and visualisation.
It also facilitates the development of new detectors by providing a clear structure and set of guidelines.

## Conceptual model
All detectors in Skchange and Sktime are built around the conceptual model below.

1. Input: A time series.
2. Output: Locations of events in the time series.
    * Changepoints,
    * Segments,
    * Point anomalies.
    * Segment anomalies
    * ...

Length(output) = Number of detected events.

## Change detectors

### The task

Change detection is the task of identifying abrupt changes in the distribution of a time series. The goal is to estimate the time points at which the distribution changes. These points are called change points (or change-points or changepoints).

<!-- Here is an example of two changes in the mean of a Gaussian time series with unit variance.

![](../_static/images/changepoint_illustration.png) -->


Here is some 3-dimensional toy data with three changes in the mean of a Gaussian time series with unit variance. This data will be used in the examples throughout this section.

In [None]:
from skchange.datasets import generate_piecewise_normal_data

x = generate_piecewise_normal_data(
    means=[0, [8.0, 0.0, 0.0], 0.0, [2.0, 3.0, 5.0]],
    lengths=[100, 40, 80, 80],
    seed=8,
)
x.columns = ["var0", "var1", "var2"]
x.index.name = "time"
x

In [None]:
import plotly.express as px
import plotly.io as pio

pio.renderers.default = "notebook"

px.line(x)

Changes may occur in much more complex ways. For example, changes can affect:

- Variance.
- Shape of the distribution.
- Auto-correlation.
- The slope of a linear trend.
- Relationships between variables in multivariate time series.
- An unknown, small portion of variables in a high-dimensional time series.

Skchange supports detecting changes in all of these scenarios, amongst others.

### Composable change detectors
Let us estimate the change points in the toy data using a change detector.

In [None]:
from skchange.change_detectors import MovingWindow
from skchange.change_scores import CUSUM

detector = MovingWindow(
    change_score=CUSUM(),
    penalty=10,
)
detector


Let us look at each each part of the detector in more detail:

1. `change_score`: Represents the choice of feature to detect changes in. `CUSUM` is a popular choice for detecting changes in the mean of a time series.
2. `penalty`: Used to control the complexity of the change point model. The higher the penalty, the fewer change points will be detected.
3. `detector`: The search algorithm for detecting change points. It governs the slices of data the change score is evaluated on and how the results are compiled to a final set of detected change points.

In Skchange, all detectors follow the same pattern. They are composed of a score to be evaluated on data intervals, and a penalty. See the section on [Interval scorers](./interval_scores.ipynb) for more information.

To detect changes and segment anomalies, Skchange follows a familiar scikit-learn-type API.
All detectors inherit from the `BaseDetector` class of Sktime to make it interoperable with the Sktime ecosystem of tools like pipelines, preprocessing, transformations, performance evaluation and so on. 
This also means that you can use the same API to detect both changes and segment anomalies, regardless of which detector you choose.

### `fit`
After initialising your detector of choice, you need to fit it to training data before you can use it to detect change points on test data. `fit` always returns a fitted instance of itself. Not all detectors have any parameters to fit. In this case, `fit` does nothing. This is the case for our example `MovingWindow` detector.

In [None]:
detector.fit(x)

In [None]:
detector.is_fitted

In [None]:
detector.get_fitted_params()

### `predict`
After fitting the detector, you can use it to detect change points in test data `x`. The `predict` method returns a `pd.DataFrame` with the `"ilocs"` column holding the integer locations of the detected changepoints.

In [None]:
detections = detector.predict(x)
detections

In [None]:
from skchange.utils.plotting import plot_detections

plot_detections(x, detections, data_repr="line")

In Skchange, the change points indicate the *inclusive start* of a new segment. That is, the segmentation according to the detected changepoints in this example is `0:100`, `100:140`, `140:220` and `220:300`.

### `transform`
You can use the `transform` method to label the data according to the change point segmentation. The output is a `pd.DataFrame` with the same index as the input `x` and an integer column `"labels"` indicating which segment the index belongs to.

In [None]:
labels = detector.transform(x)
labels

In [None]:
px.line(labels)

This is useful for group-by operations per segment, for example.

In [None]:
x.join(labels).groupby("labels").agg(["mean", "std"])

### `transform_scores`
Some detectors also support the `transform_scores` method, which returns the *penalised* change scores for each data point. This is the case for `MovingWindow`.

In [None]:
detection_scores = detector.transform_scores(x)
detection_scores

In [None]:
px.line(detection_scores)

For the `MovingWindow` detector, the peaks in the penalised scores correspond to the detected change points.

## Segment anomaly detectors


### The task
Segment anomaly detection is the task of identifying segments of a time series where the data behaves differently than expected.
The goal is to estimate starts and ends of such segments.
It is an important special case of change detection where certain segments are deemed "normal" and others are "anomalous". In most settings, a vast majority of the data is "normal".

We use the same data as before, but now we consider the segments `100:140` and `220:300` as segment anomalies, and the remaining data as "normal" or "baseline" data.

In [None]:
px.line(x)

As for change detection, segment anomalies may also affect the data in numerous other ways than sudden jumps in the mean.

### Composable segment anomaly detectors
Let us use the `CAPA` detector to detect segment anomalies in the toy data. It consists of the same components as the change detector we used before: A detector (`CAPA`), an interval score (`segment_saving`) and a penalty (`segment_penalty`). "Savings" is one of two types of anomaly scores supported in Skchange. You can read more about them in the [Concepts](./concepts/interval_scores.ipynb) section.

In [None]:
from skchange.anomaly_detectors import CAPA
from skchange.anomaly_scores import L2Saving

detector = CAPA(
    segment_saving=L2Saving(),
    segment_penalty=20,
)
detector

### `fit`
We fit the detector to obtain a fitted instance.

In [None]:
detector.fit(x)

### `predict`
As for change detection, `predict` is used to detect segment anomalies in test data `x`. The output is a `pd.DataFrame` with the `"ilocs"` column holding the integer locations of segment anomalies as `pd.Interval`s, and the `"labels"` column holding unique labels for each segment. The labels run from 1, ..., K, where K is the number of detected segment anomalies.

In [None]:
detections = detector.predict(x)
detections

In [None]:
plot_detections(x, detections, data_repr="line")

### `transform`
The `transform` method labels the data according to the segment anomaly segmentation. The output is a `pd.DataFrame` with the same index as the input `x` and an integer column `"labels"` indicating which segment the index belongs to. The label `0` denotes the normal segments, and the labels `>0` denote the segment anomalies.

In [None]:
labels = detector.transform(x)
labels

In [None]:
px.line(labels)

### `transform_scores`
`CAPA` also supportes the `transform_scores` method. It returns the cumulative optimal penalised saving at each index.

In [None]:
capa_scores = detector.transform_scores(x)
px.line(capa_scores)