# Combining several cost functions

<!-- {{ add_binder_block(page) }} -->

## Introduction

In `ruptures`, change point detection procedures make use of only one cost function.
The choice of the cost function is critical as it is related to the type of change to find. 
For instance, [CostL2](../user-guide/costs/costl2.md) can detect shifts in the mean, [CostNormal](../user-guide/costs/costnormal.md) can detect shifts in the mean and the covariance structure, [CostAR](../user-guide/costs/costautoregressive.md) can detect shifts in the auto-regressive structure, etc.

However, in many settings, several types of changes co-exist in the same signal and a single cost function is not able to spot all changes simultaneously.
To cope with this issue, a procedure to merge several cost functions has been introduced [[Katser2021]](#Katser2021).
In a nutshell, a number of costs can be combined to yield an aggregated cost function which is sensitive to several types of changes.
The aggregated cost can then be used with any search method (such as the [window search method](../user-guide/detection/window.md)) to create change point detection algorithm.

This example illustrates the aggregation procedure, also referred to as an ensemble model.
Here, only dynamic programming is considered, but all other search methods (e.g. [window sliding search method](../user-guide/detection/window.md), [PELT](../user-guide/detection/pelt.md), [binary segmentation](../user-guide/detection/binseg.md)) could be used.
In addition, the number of changes is assumed to be known by the user.
More details can be found in the original paper introducing the cost aggregation procedure [[Katser2021]](#Katser2021).

## Setup

First, we make the necessary imports and generate a multivariate toy signal which contains mean shifts and linear changes (i.e. changes in the linear relationship between the dimensions of the signal). 

In [None]:
import matplotlib.pyplot as plt  # for display purposes
import numpy as np

import ruptures as rpt  # our package
from ruptures.metrics import hamming

# generate a signal
n_samples, n_dims, sigma = 500, 3, 4
n_bkps = 10  # number of breakpoints

# to make it more complex, we concatenate two different signals together
signal_constant, bkps_constant = rpt.pw_constant(
    n_samples=n_samples // 2, n_features=n_dims, n_bkps=n_bkps // 2, noise_std=sigma
)

signal_linear, bkps_linear = rpt.pw_linear(
    n_samples=n_samples // 2,
    n_features=n_dims - 1,
    n_bkps=n_bkps - n_bkps // 2 - 1,
    noise_std=sigma,
)

signal = np.r_[signal_constant, signal_linear]
bkps_true = sorted(bkps_constant + list(np.array(bkps_linear) + n_samples // 2))

# z-normalization
signal = (signal - signal.mean(axis=0)) / signal.std(axis=0)

The following plot shows the signal and the true change points (changes occur when the background colour shifts from blue to pink and vice versa).

In [None]:
# show the signal
fig, ax_array = rpt.display(signal, bkps_true)

## Using one cost function at a time

Recall that two types of changes are present in the previous signal: mean shifts and linear changes.
The most adapted costs for these types are:

- [CostL2](../user-guide/costs/costl2.md) (for mean shifts),
- [CostLinear](../user-guide/costs/costlinear.md) (for linear changes).

The following cell shows that using a single cost function is not enough to detect all changes.

In [None]:
list_of_cost_functions = ["l2", "linear"]

for cost_str in list_of_cost_functions:
    # Compute the changes
    algo = rpt.Dynp(model=cost_str).fit(signal)
    predicted_bkps = algo.predict(n_bkps=n_bkps)
    # Display the prediction
    fig, (ax,) = rpt.display(signal[:, 0], bkps_true, predicted_bkps)
    ax.margins(x=0)
    ax.set_title(
        f"Cost {cost_str} (Hamming error: {hamming(bkps_true, predicted_bkps):.2f})"
    )

<span style="color:red">
ADD SMALL COMMENTS 

    - Only the first dimension of the signal is shown
    - the hamming error: between 0 and 1, lower is better
    - CostL2 detects the mean shifts, but not the linear shifts
    - CostLinear is not good on either changes.
    - Overall performance is not good.
</span>.

## Using several cost functions: the ensemble method

Roughly, the ensemble method scales each  and aggregate the individual scores to get a new score which will allow us to get a better prediction of the changepoints.

In [[Katser2021]](#Katser2021), the authors perform multiple tests on several scaling and aggregation functions on two datasets. It turned out that the *MinAbs* scaling function and the *WeightedSum* aggregation function worked best.

The *MinAbs* function is defined as follows:
For $s$ a timeseries, 

$$
\textit{MinAbs}(s)_i = \frac{s_i}{|\min_{j}{s_j}|}
$$

The *WeightedSum* function is defined as follows:
Let $s^n, n \in \{1, ..., N\}$ be $N$ timeseries, we have to distinguish the "original" timeseries $s^n$ from its scaled version $\overline{s}^n = \textit{MinAbs}(s^n)$. We then have

$$ \textit{WeightedSum}((s^{n})_{n \in \{1, ..., N\}})_i = \sum_{n \in \{1, ..., N\}}{\lambda_n \overline{s}_i^n} 
$$

where $\lambda_n = \frac{\max_{j}{s_j^n} - \min_{j}{s_j^n}}{\mu(s^n) - \min_{j}{s_j^n}}$ with $\mu(s^n)$ the mean of $s^n$.

In [None]:
def min_abs_scaling(array):
    return array / abs(np.min(array, axis=0) + 1e-8)


def weighted_sum_aggregation(array):
    min_array = array.min(axis=0)
    weights = (array.max(axis=0) - min_array) / (array.mean(axis=0) - min_array)

    return min_abs_scaling(array) @ weights


aggregated_scores = weighted_sum_aggregation(scores)

# Display scaled and aggregated scores
append_scaled_aggregated_scores = np.append(
    np.ones(window_size // 2) * float("inf"), aggregated_scores
)
rpt.display(append_scaled_aggregated_scores, bkps)
_ = plt.title("Scaled and aggregated score")

Let's now detect the changepoints from the newly computed score. For that we need to define a *DummyCost* that will allow us to leverage `ruptures` power.

In [None]:
from ruptures.base import BaseCost


class DummyCost(BaseCost):

    r"""
    Dummy cost to pretend a real cost function.
    """

    model = "Dummy"

    def __init__(self) -> None:
        """Initialize the object."""
        self.signal = None

    def fit(self, signal) -> "DummyCost":
        """Set parameters of the instance.
        Args:
            signal (array): signal. Shape (n_samples,) or (n_samples, n_features)
        Returns:
            self
        """
        if signal.ndim == 1:
            self.signal = signal.reshape(-1, 1)
        else:
            self.signal = signal

        return self

    def error(self, start, end) -> float:
        """Return the approximation cost on the segment [start:end].
        Args:
            start (int): start of the segment
            end (int): end of the segment
        Returns:
            + infinity
        """
        return float("inf")

Then we are using the window search method to predict the changepoints.

In [None]:
# create the ensemble change point detector
dummy_cost = DummyCost().fit(signal)
algo = rpt.Window(width=window_size, custom_cost=dummy_cost, jump=1)
algo.fit(signal)
algo.score = aggregated_scores

ensemble_predicted_bkps = algo.predict(n_bkps=n_bkps)
rand_indexes["ensemble"] = np.around(randindex(bkps, ensemble_predicted_bkps), 3)

_ = [print(cost, rand_index) for cost, rand_index in rand_indexes.items()]

In [None]:
_ = rpt.display(signal, bkps, ensemble_predicted_bkps)

## Conclusion

Through this example, we have seen how to build an ensemble model to detect changepoints in a few lines of code.

This example is using a window search method, the algorithm for ensemble models using other search methods like Binary segmentation or Dynamic programming are given in the paper [Katser2021](#Katser2021).

## Authors

This example notebook has been authored by [Théo VINCENT](https://github.com/theovincent).


## References

<a id="Katser2021">[Katser2021]</a>
Katser, I., Kozitsin, V., Lobachev, V., & Maksimov, I. (2021). Unsupervised Offline Changepoint Detection Ensembles. Applied Sciences, 11(9), 4280.