# Predefined Cluster Sequences
Example demonstrating how to use predefined cluster assignments and centers.

This is useful when you want to transfer clustering results from one time series to another.

Author: Maximilian Hoffmann

Import pandas and the relevant time series aggregation class

In [None]:
%load_ext autoreload
%autoreload 2

import os

import pandas as pd

# Configure Plotly for sphinx/nbsphinx output
import plotly.io as pio

import tsam
from tsam import ClusterConfig

pio.renderers.default = "notebook"

### Input data 

Read in time series from testdata.csv with pandas

In [None]:
raw = pd.read_csv("testdata.csv", index_col=0)

Show a slice of the dataset

In [None]:
raw.head()

Show the shape of the raw input data: 4 types of timeseries (GHI, Temperature, Wind and Load) for every hour in a year

In [None]:
raw.shape

Plot an example series - in this case the wind speed

In [None]:
# Original wind heatmap
tsam.plot.heatmap(raw, column="Wind", period_hours=24, title="Original Wind")

### Hierarchical aggregation

Initialize an aggregation class object with hierarchical clustering as method for eight typical days, without any integration of extreme periods. Alternative clusterMethod's are 'averaging','hierarchical' and 'k_medoids'.

In [None]:
result = tsam.aggregate(
    raw,
    n_periods=8,
    period_hours=24,
    cluster=ClusterConfig(method="hierarchical", representation="medoid"),
)

Create the typical periods

In [None]:
typical_periods = result.typical_periods

In [None]:
typical_periods

Show shape of typical periods: 4 types of timeseries for 8*24 hours

In [None]:
typical_periods.shape

Repredict the original time series based on the typical periods

In [None]:
reconstructed = result.reconstruct()

Plot the repredicted data

In [None]:
# Predicted wind heatmap (all attributes)
tsam.plot.heatmap(
    reconstructed,
    column="Wind",
    period_hours=24,
    title="Predicted Wind (All Attributes)",
)

### Now cluster the wind time series only

Clustering the solar time series only with 8 typical days and hierarchical clustering leads to different typical days in another sequence.

Isolate wind time series and show first lines of data

In [None]:
raw_wind = raw.loc[:, "Wind"].to_frame()
raw_wind.head()

Now same clustering procedure as above for the isolated wind time series

In [None]:
result_wind = tsam.aggregate(
    raw_wind,
    n_periods=8,
    period_hours=24,
    cluster=ClusterConfig(method="hierarchical", representation="medoid"),
)

In [None]:
typical_periods_wind = result_wind.typical_periods

Export for preprocess time series for testing

In [None]:
# Export preprocessed time series for testing (using internal aggregation object)
result_wind._aggregation.normalizedPeriodlyProfiles.to_csv(
    os.path.join("results", "preprocessed_wind.csv")
)

In [None]:
typical_periods_wind.shape

In [None]:
reconstructed_wind = result_wind.reconstruct()

In [None]:
# Predicted wind heatmap (wind only)
tsam.plot.heatmap(
    reconstructed_wind,
    column="Wind",
    period_hours=24,
    title="Predicted Wind (Wind Only)",
)

When we compare both plots, we see that 8 typical periods for wind only can better account extreme periods, but the cluster order in general changes

In [None]:
result.cluster_assignments

In [None]:
result_wind.cluster_assignments

### Predefining cluster sequence

tsam offers the option to aggregate input time series for a predefined cluster order. This means that we can take the cluster Order from the wind time series only and set it as input for the aggregation process for all attributes

In [None]:
# Use predefined cluster order from the wind-only aggregation
result_predef = tsam.aggregate(
    raw,
    n_periods=8,
    period_hours=24,
    cluster=ClusterConfig(
        method="hierarchical",
        representation="medoid",
        predef_cluster_order=tuple(result_wind.cluster_assignments),
    ),
)

In [None]:
typical_periods_predef = result_predef.typical_periods

In [None]:
typical_periods_predef.shape

Save typical periods to .csv file

In [None]:
typical_periods_predef.to_csv(
    os.path.join("results", "testperiods_predef_cluster_order.csv")
)

In [None]:
reconstructed_predef = result_predef.reconstruct()

Now we compare the cluster orders

In [None]:
result_wind.cluster_assignments

In [None]:
result_predef.cluster_assignments

As it can be seen, the cluster order for the four attributes (i.e. he sequence of typical days) is no identical to the cluster order of the wind time series clustering. Now the color plots can be compared:

In [None]:
# Predicted wind heatmap (predefined cluster order)
tsam.plot.heatmap(
    reconstructed_predef,
    column="Wind",
    period_hours=24,
    title="Predicted Wind (Predefined Cluster Order)",
)

As it can be seen, the plot for the aggregated wind time series only and the one for four with the predefined cluster Order from the wind time series still differ from each other. This is because of the fact, that only the cluster Order, but not the cluster centers of each cluster are predefined. Since these are in one case deterined for the wind time series only and in the other case for all four attributes in common, the chosen cluster centers (chosen typical days) differ from each other

### Predefining cluster order and cluster centers

If the cluster order and the cluster centers should be taken from the wind time series clustering, we pass the information which days where chosen as typical days for the wind time series to the aggregation of all four typical attributes as well

In [None]:
# Use predefined cluster order AND cluster centers from wind-only aggregation
result_predef_with_centers = tsam.aggregate(
    raw,
    n_periods=8,
    period_hours=24,
    cluster=ClusterConfig(
        method="hierarchical",
        representation="medoid",
        predef_cluster_order=tuple(result_wind.cluster_assignments),
        predef_cluster_centers=tuple(result_wind.cluster_center_indices),
    ),
)

In [None]:
typical_periods_predef_with_centers = result_predef_with_centers.typical_periods

In [None]:
typical_periods_predef_with_centers.shape

Save typical periods to .csv file

In [None]:
typical_periods_predef_with_centers.to_csv(
    os.path.join("results", "testperiods_predef_order_and_centers.csv")
)

In [None]:
reconstructed_predef_with_centers = result_predef_with_centers.reconstruct()

In [None]:
# Predicted wind heatmap (predefined cluster order and centers)
tsam.plot.heatmap(
    reconstructed_predef_with_centers,
    column="Wind",
    period_hours=24,
    title="Predicted Wind (Predefined Order & Centers)",
)

Now even the chosen typical days for the four attributes are the same as for the aggregated wind time series only