<a href="https://colab.research.google.com/github/ap0phasi/cerberusPy/blob/dev_headselection/tests/example_cerberus.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modular CerberusTS Experimentation

First we make sure we are in the base working directory (this is probably not necessary if using the package):

In [1]:
!git clone --branch dev_headselection https://github.com/ap0phasi/cerberusPy

Cloning into 'cerberusPy'...
remote: Enumerating objects: 284, done.[K
remote: Counting objects: 100% (284/284), done.[K
remote: Compressing objects: 100% (193/193), done.[K
remote: Total 284 (delta 132), reused 233 (delta 84), pack-reused 0[K
Receiving objects: 100% (284/284), 19.44 MiB | 16.34 MiB/s, done.
Resolving deltas: 100% (132/132), done.


Google Colab notebooks come with most of the packages we need, but we need to upgrade pandas and install accelerate.

In [3]:
import os
os.chdir("cerberusPy")

In [None]:
!setup.py install

Then we import the necessary CerberusTS modules.

In [4]:
from cerberus_ts import Cerberus, train_cerberus
from cerberus_ts import TimeseriesDataPreparer, ResponseGenerator
from cerberus_ts import Foresight, train_foresight

import pandas as pd

Here we have the ability to do some custom configuration:

In [5]:
from cerberus_ts import CerberusConfig
CerberusConfig.set_masked_norm_zero = True

## Dataset Loading

Next we need a Pandas Dataframe of some wide-formatted mutlivariate timeseries data. Here we will use the Jena Climate Data found here: https://www.kaggle.com/datasets/mnassrib/jena-climate

In [6]:
df = pd.read_csv(r"data/jena_climate_2009_2016.csv",
                parse_dates=['Date Time'],
                index_col=['Date Time'])
df.index = pd.to_datetime(df.index, format='%d.%m.%Y %H:%M:%S')
df = df.iloc[:5000,:]

## User Argument Specification
There are a number of different arguments that can be provided to Cerberus, shown below:

In [7]:
# What ratio of data needs to be available for the various call, response, and context windows to use that timestamp?
thresholds = {
    'call': 1,
    'response': 1,
    'context_0': 1,
    'context_1': 1,
    'context_2': 1
}

# How big should the call, context(s), and response windows be?
sizes = {
    'call': 24,
    'response': 8,
    'context_0': 24,
    'context_1': 12,
    'context_2': 6
}

# What timestep should each head use?
window_timesteps = {
    'call': '10T',
    'response': '10T',
    'context_0': '1H',
    'context_1': '2H',
    'context_2': '6H'
}

# Which columns should be used for each head?
feature_indexes = {
    'call': range(0,14),
    'response': [0, 1, 4],
    'context_0': range(0,14),
    'context_1': range(0,14),
    'context_2': range(0,14)
}

## Prepare Dataset

We will use CerberusTS's TimeseriesDataPreparer class to create Torch dataloaders for all our different heads, scaled, downsampled, and coil-normalized for easy Cerberus use.

In [8]:
# Initialize the preparer
preparer = TimeseriesDataPreparer(df, sizes, thresholds, feature_indexes, window_timesteps, train_len = 20_000, feature_range = (0, 1), batch_size = 100)

# Prepare the data
preparer.prepare_data()

## Foresight Training

First, we can optionally train Foresight to aid CerberusTS:

In [9]:
foresight = Foresight(sizes = sizes,
                      feature_indexes = feature_indexes,
                      csize = 128,
                      hsize = [128, 128],
                      pool_size = [1,1],
                      head_layers=["conv", "conv"])

In [10]:
foresight = train_foresight(foresight, preparer.dataloaders, num_epochs = 30)

Epoch [1/30], Loss: 0.023020256295179328
Epoch [2/30], Loss: 0.007176380101591349
Epoch [3/30], Loss: 0.005819759631219009
Epoch [4/30], Loss: 0.005187379340641201
Epoch [5/30], Loss: 0.00469901837874204
Epoch [6/30], Loss: 0.0042302799038589
Epoch [7/30], Loss: 0.003772135068041583
Epoch [8/30], Loss: 0.0033528091665357353
Epoch [9/30], Loss: 0.002933180200246473
Epoch [10/30], Loss: 0.0025599312083795667
Epoch [11/30], Loss: 0.0021246106573380528
Epoch [12/30], Loss: 0.0017459359850424032
Epoch [13/30], Loss: 0.001411839168285951
Epoch [14/30], Loss: 0.0010982749149358521
Epoch [15/30], Loss: 0.000823784073581919
Epoch [16/30], Loss: 0.0006202362871651228
Epoch [17/30], Loss: 0.0004509404070752983
Epoch [18/30], Loss: 0.0003140269465317639
Epoch [19/30], Loss: 0.00022466196853201836
Epoch [20/30], Loss: 0.00016240851332743962
Epoch [21/30], Loss: 0.00012098297437963386
Epoch [22/30], Loss: 9.539387579328226e-05
Epoch [23/30], Loss: 8.01778999933352e-05
Epoch [24/30], Loss: 6.82446275

## CerberusTS Training

With the Foresight model trained (and weights frozen), we can pass this into a Cerberus model and train the remaining weights to generate predictions.

In [11]:
model = Cerberus(sizes=sizes,
                 feature_indexes=feature_indexes,
                 foresight=foresight,
                 pool_size = [1,1],
                 head_layers=["conv", "conv"])

In [None]:
model = train_cerberus(model, preparer.dataloaders, num_epochs = 60)

Epoch [1/60], Loss: 0.0011595764306063453
Epoch [2/60], Loss: 0.0008236499105502541
Epoch [3/60], Loss: 0.0006850207651344438
Epoch [4/60], Loss: 0.0005727205229535078
Epoch [5/60], Loss: 0.0005031821720573741
Epoch [6/60], Loss: 0.00040569585961444925
Epoch [7/60], Loss: 0.00034079241731281703


## Results Review

CerberusTS has some built in functionality for generating responses as well as visualizing the results.

### Normalized Response Review

In [None]:
# Intialize Response Generator
generator = ResponseGenerator(model, preparer.sliced_data, feature_indexes, preparer.max_change_dfs)

# Generate a response for a specific index
sel_index = 1234
generator.generate_response(sel_index)

generator.plot_normalized_responses()

### Unscaled Response Review

In [None]:
generator.plot_unscaled_responses(preparer.min_max_df, feature_indexes)