# Aggregation testing

Let's see how TSAM works.
The first challenge is naturally just getting NE-model data read into TSAM.
Probably easiest to do this from the .gdx files directly, as the raw NE-data is
annoyingly varied in its shape.

In [None]:
## Import necessary packages

import gams.transfer as gt # Read .gdx data
import tsam.timeseriesaggregation as tsam # Timeseries aggregation.


In [None]:
## Define stuff for testing
# NOTE! You will likely have to tweak these to get things working
# depending on how you've installed the NE-model.

scenario_name = "National_Trends_2040_nucTypical"
year = 1982
input_folder_path = f"./north_european_model/input_{scenario_name}/"


In [None]:
## Read timeseries and extract DataFrame for testing

gdx = gt.Container(input_folder_path + f"ts_cf_PV_{year}.gdx")
df = gdx["ts_cf"].records
df

In [None]:
## Pivot data for TSAM

df = df.pivot(
    index=["flow", "f", "t"],
    columns="node",
    values="value"
)
df

In [None]:
## Aggregate using TSAM?

aggregation = tsam.TimeSeriesAggregation(
    df,
    noTypicalPeriods=4,
    hoursPerPeriod=24,
    clusterMethod="hierarchical",
    resolution=1,
    rescaleClusterPeriods=False, # This disables automatic rescaling of cluster data.
    segmentation=True, # This is required for hypertuning, but messes up the comparison.
)
typical_periods = aggregation.createTypicalPeriods()
aggregation.clusterCenterIndices

In [None]:
## Check cluster order.

aggregation.clusterOrder

In [None]:
## Plot comparison?

typical_periods.plot()

In [None]:
## Plot raw data for the corresponding inds.

cluster_inds = aggregation.clusterCenterIndices
hour_inds = [val * aggregation.hoursPerPeriod + i for val in cluster_inds for i in range(1,aggregation.hoursPerPeriod)]
df2 = df.iloc[hour_inds]
df2.plot()

## Seems to work surprisingly easy?

Overall, using TSAM is surprisingly easy.
Should be more than doable to run this clustering for the entire NE-model, although I have no guarantee that it is going to be computationally feasible.
The `k-medoids` is much more computationally intensive than e.g. the `hierarchical` clustering.

## What about hypertuning?

This supposedly finds better segment numbers and durations for the data.
However, seems to require "Segmentation", which I'm not sure what it means.

WIP

In [None]:
## Test aggregation hypertuning
# Disabled for now, as this takes forever.

import tsam.hyperparametertuning as hype
hyper = hype.HyperTunedAggregations(aggregation)

In [None]:
# THIS TAKES A LONG TIME! ~1 HOUR!
# Don't enable unless you really mean it!

#hyper.identifyParetoOptimalAggregation()

# Seems to run over all possible aggregations or something,
# no wonder it takes a while.

In [None]:
# The optimal segment period combination seems more promising,
# as it doesn't take ages to run.

reduction_factor = 4 * 168 / 8760 # Determine how much we want to reduce the data.
hyper.identifyOptimalSegmentPeriodCombination(reduction_factor)

# So apparently 56 12-hour periods seems to be the optimal, not 4 consecutive weeks?