# Getting started

This section is here to help you getting started with Skchange. It covers the fundamental concepts of the library in a brief and concise way.

## Installation
```bash
pip install skchange
```

To make full use of the library, you can install the optional Numba dependency. This will speed up the computation of the algorithms in Skchange, often by as much as 10-100 times.

```bash
pip install skchange[numba]
```

## Change detection

### The task

Change detection is the task of identifying abrupt changes in the distribution of a time series. The goal is to estimate the time points at which the distribution changes. These points are called change points (or change-points or changepoints).

<!-- Here is an example of two changes in the mean of a Gaussian time series with unit variance.

![](../_static/images/changepoint_illustration.png) -->


Here is some 3-dimensional toy data with three changes in the mean of a Gaussian time series with unit variance. This data will be used in the examples throughout this section.

In [46]:
import numpy as np

from skchange.datasets import generate_changing_data

n = 300
changepoints = [100, 140, 220]
means = [
    np.array([0.0, 0.0, 0.0]),
    np.array([8.0, 0.0, 0.0]),
    np.array([0.0, 0.0, 0.0]),
    np.array([2.0, 3.0, 5.0]),
]
x = generate_changing_data(n, changepoints=changepoints, means=means, random_state=8)
x.columns = ["var0", "var1", "var2"]
x.index.name = "time"
x

Unnamed: 0_level_0,var0,var1,var2
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.091205,1.091283,-1.946970
1,-1.386350,-2.296492,2.409834
2,1.727836,2.204556,0.794828
3,0.976421,-1.183427,1.916364
4,-1.123327,-0.664035,-0.378359
...,...,...,...
295,0.325434,2.015049,4.939516
296,3.485036,3.118221,6.393023
297,2.517864,3.445919,3.264219
298,2.290727,2.758822,4.492490


In [47]:
import plotly.express as px

data_fig = px.line(x)
data_fig

Changes may occur in much more complex ways. For example, changes can affect:

- Variance.
- Shape of the distribution.
- Auto-correlation.
- The slope of a linear trend.
- Relationships between variables in multivariate time series.
- An unknown, small portion of variables in a high-dimensional time series.

Skchange supports detecting changes in all of these scenarios, amongst others.

### Composable change detectors
Let us estimate the change points in the toy data using a change detector.

In [48]:
from skchange.change_detectors import MovingWindow
from skchange.change_scores import CUSUM

detector = MovingWindow(
    change_score=CUSUM(),
    penalty=10,
)
detector


Let us look at each each part of the detector in more detail:

1. `change_score`: Represents the choice of feature to detect changes in. `CUSUM` is a popular choice for detecting changes in the mean of a time series.
2. `penalty`: Used to control the complexity of the change point model. The higher the penalty, the fewer change points will be detected.
3. `detector`: The search algorithm for detecting change points. It governs the slices of data the change score is evaluated on and how the results are compiled to a final set of detected change points.

In Skchange, all detectors follow the same pattern. They are composed of some kind of score to be evaluated on data intervals, and a penalty. You can read more about the core components of Skchange in the [Concepts](./concepts/index.rst) section.

To detect changes and segment anomalies, Skchange follows a familiar scikit-learn-type API.
All detectors inherit from the `BaseDetector` class of Sktime to make it interoperable with the Sktime ecosystem of tools like pipelines, preprocessing, transformations, performance evaluation and so on. 
This also means that you can use the same API to detect both changes and segment anomalies, regardless of which detector you choose.

### `fit`
After initialising your detector of choice, you need to fit it to training data before you can use it to detect change points on test data. `fit` always returns a fitted instance of itself. Not all detectors have any parameters to fit. In this case, `fit` does nothing. This is the case for our example `MovingWindow` detector.

In [49]:
detector.fit(x)

In [50]:
detector.is_fitted

True

In [51]:
detector.get_fitted_params()

{}

### `predict`
After fitting the detector, you can use it to detect change points in test data `x`. The `predict` method returns a `pd.DataFrame` with the `"ilocs"` column holding the integer locations of the detected changepoints.

In [52]:
detections = detector.predict(x)
detections

Unnamed: 0,ilocs
0,100
1,140
2,220


In [62]:
import copy

changepoint_fig = copy.deepcopy(data_fig)
for i in detections["ilocs"].values:
    changepoint_fig.add_vline(i)
changepoint_fig

In Skchange, the change points indicate the *inclusive start* of a new segment. That is, the segmentation according to the detected changepoints in this example is `0:100`, `100:140`, `140:220` and `220:300`.

### `transform`
You can use the `transform` method to label the data according to the change point segmentation. The output is a `pd.DataFrame` with the same index as the input `x` and an integer column `"labels"` indicating which segment the index belongs to.

In [53]:
labels = detector.transform(x)
labels

Unnamed: 0_level_0,labels
time,Unnamed: 1_level_1
0,0
1,0
2,0
3,0
4,0
...,...
295,3
296,3
297,3
298,3


In [54]:
px.line(labels)

This is useful for group-by operations per segment, for example.

In [58]:
x.join(labels).groupby("labels").agg(["mean", "std"])

Unnamed: 0_level_0,var0,var0,var1,var1,var2,var2
Unnamed: 0_level_1,mean,std,mean,std,mean,std
labels,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
0,-0.145056,1.0384,0.078223,1.10758,0.016803,1.013129
1,8.085414,0.938503,-0.181219,1.152032,0.205081,0.881243
2,0.143322,1.136743,0.126735,0.975529,0.066954,1.0857
3,2.248388,0.919702,2.959066,1.029075,4.851858,1.018683


### `transform_scores`
Some detectors also support the `transform_scores` method, which returns the *penalised* change scores for each data point. This is the case for `MovingWindow`.

In [None]:
detection_scores = detector.transform_scores(x)
detection_scores

bandwidth,20
time,Unnamed: 1_level_1
0,
1,-6.943667
2,-7.688373
3,-8.703367
4,-6.636503
...,...
295,-8.910835
296,-9.046409
297,-8.271702
298,-8.353627


In [None]:
px.line(detection_scores)

For the `MovingWindow` detector, the peaks in the penalised scores correspond to the detected change points.

## Segment anomaly detection


### The task
Segment anomaly detection is the task of identifying segments of a time series where the data behaves differently than expected.
The goal is to estimate starts and ends of such segments.
It is an important special case of change detection where certain segments are deemed "normal" and others are "anomalous". In most settings, a vast majority of the data is "normal".

We use the same data as before, but now we consider the segments `100:140` and `220:300` as segment anomalies, and the remaining data as "normal" or "baseline" data.

In [63]:
data_fig

As for change detection, segment anomalies may also affect the data in numerous other ways than sudden jumps in the mean.

### Composable segment anomaly detectors

In [68]:
from skchange.anomaly_detectors import CAPA
from skchange.anomaly_scores import L2Saving

detector = CAPA(
    segment_saving=L2Saving(),
    segment_penalty=20,
)
detector

### `fit`

In [69]:
detector.fit(x)

### `predict`

In [70]:
detections = detector.predict(x)
detections

Unnamed: 0,ilocs,labels
0,"[100, 140)",1
1,"[220, 300)",2


In [76]:
anomaly_fig = copy.deepcopy(data_fig)
for segment in detections["ilocs"]:
    anomaly_fig.add_vrect(segment.left, segment.right)
anomaly_fig


### `transform`

In [77]:
labels = detector.transform(x)
labels

Unnamed: 0_level_0,labels
time,Unnamed: 1_level_1
0,0
1,0
2,0
3,0
4,0
...,...
295,2
296,2
297,2
298,2


In [78]:
px.line(labels)

### `transform_scores`

In [None]:
capa_scores = detector.transform_scores(x)

time
0         0.000000
1         0.000000
2         0.000000
3         0.000000
4         0.000000
          ...     
295    5422.336311
296    5481.196712
297    5507.211340
298    5540.074019
299    5566.099846
Name: score, Length: 300, dtype: float64

In [None]:
px.line(capa_scores)