In [3]:
!pip install einops

Collecting einops
  Downloading einops-0.8.1-py3-none-any.whl.metadata (13 kB)
Downloading einops-0.8.1-py3-none-any.whl (64 kB)
Installing collected packages: einops
Successfully installed einops-0.8.1


In [4]:
%load_ext autoreload
%autoreload 2

import seaborn as sns
from dataclasses import dataclass
import matplotlib.pyplot as plt

import datautils
from utils import init_dl_program
from utils import find_closest_train_segment
from hdst import HDST
import torch
import gc



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


Mounted at /content/drive


# **T-Rep tutorial**

The goal of this tutorial is to show you in depth:

1. How to instantiate T-Rep with the most important parameters.
2. How to train T-Rep.
3. How to use a trained model to encode test data, at different granularities.

This tutorial is more in-depth than the 'quick tutorial', as it aims to explain the parameters of the various functions used.

#### **1. Instantiate ``Args`` Configuration Class**

The ``Args`` class is normally imported from the `train_eval.py` file but it is redefined here with extra comments for clarity.

In [22]:
# @dataclass
# class Args:
#     # MODEL PARAMETERS
#     task_weights: dict # Weights to attribute to each pretext task
#     repr_dims: int = 128 # Latent representation dimensionality
#     time_embedding: str = None # Time embedding to use ('t2v_sin', 'fully_learnable_big', 'gaussian', 'hybrid' etc.), 'None' for no time embedding. All implemented time-embeddings are defined in models.time_embeddings.py

#     # TRAINING PARAMETERS
#     epochs: int = 80 # Maximum number of training epochs
#     iters: int = None # Maximum number of training iterations. Can be set to 'None' if epochs is set.
#     batch_size: int = 16 # Training batch size
#     lr: float = 0.001 # Learning rate
#     seed: int = 1234 # Random seed for reproducibility
#     max_train_length: int = 800 # Maximum sequence length (depends on your GPU memory).
#                            # Longer sequences will be cut into smaller sequences.

#     # CONFIGURATION
#     dataset: str = "" # Set to "" if using your own dataset. If you use UCR/UEA datasets, the dataset name
#     loader: str = "" # Set to "" if using your own dataset, otherwise "UEA" or "UCR"
#     gpu: int = 0 # The gpu no. used for training and inference (defaults to 0)
#     run_name: str = "" # Run name to save model
#     save_every = None # Save the model checkpoint every <save_every> iterations/epochs
#     max_threads = None # The maximum allowed number of threads used by this process. Set to None if unsure.
#     eval: bool = True # Evaluate model after training if True (doesn't work for custom datasets, only UCR/UEA/ETT/Yahoo)
#     irregular: float = 0.0 # Ratio of missing data (defaults to 0). Used for testing model resilience under missing data regime
#     label_ratio: int = 1.0 # Ratio of available training labels (defaults to 1.0). Used for testing model resilience under missing labels regime


In [5]:
@dataclass
class Args:
    task_weights: dict
    dataset: str = ""
    loader: str = ""
    gpu: int = 0
    static_repr_dims: int = 128
    dynamic_repr_dims: int = 128
    epochs: int = 80

    run_name: str = ""
    batch_size: int = 16
    lr: float = 0.001
    max_train_length = 800
    iters: int = None
    save_every = None
    seed: int = 1234
    max_threads = None
    eval: bool = True
    irregular = 0

    sample_size: int = 50
    window_size: int = 100

- Create an instance of arguments, specifying the necessary arguments and those important to your use case.
- Initialise device as well as config dict

In [6]:
args = Args(
    static_repr_dims=128,
    dynamic_repr_dims=128,
    task_weights={
        'local_static_contrast': 1,
        'global_vatiant_contrast': 0,
        'dynamic_trend_pred': 0,
    },
    eval=False,
    batch_size=16,
)

device = init_dl_program(args.gpu, seed=args.seed, max_threads=args.max_threads)

#### **2. Load your data**

You can use any data, as long as it is an `np.ndarray` of shape $(N, T, C)$ where $N$ is the number of time-series instances, $T$ the number of timesteps per instance, and $C$ the number of channels.

Here, we use a UCR dataset as an example, but you can use any dataset of yours.

**N.B:** For the following cell to work, you will have to have downloaded the `UCR` datasets and placed them in the `datasets/UCR/` folder as instructed in the `README.md`

In [7]:
data, train_slice, valid_slice, test_slice, scaler, pred_lens = datautils.load_forecast_csv("ETTh1")
train_data = data[:, train_slice]
test_data = data[:, test_slice]
print(f"Shapes - train data: {train_data.shape}, test data: {test_data.shape}")

Shapes - train data: (1, 8640, 7), test data: (1, 2880, 7)


#### **3. Create and train T-Rep**

In [8]:
sns.set_theme()
torch.cuda.empty_cache()
gc.collect()

2767

In [9]:
model = HDST(
    input_dims=train_data.shape[-1],
    device=device,
    task_weights=args.task_weights,
    batch_size=args.batch_size,
    lr=args.lr,
    output_dims1=args.static_repr_dims,
    output_dims2=args.dynamic_repr_dims,
    max_train_length=args.max_train_length
)

loss_log = model.fit(
    train_data,
    n_epochs=args.epochs,
    n_iters=args.iters,
    k=args.sample_size,
    w=args.window_size
)

Training data shape: (10, 864, 7)
Epoch #0: loss=2.6888856887817383
Epoch #1: loss=1.8564746379852295
Epoch #2: loss=1.670044183731079
Epoch #3: loss=1.6827473640441895
Epoch #4: loss=1.0348690748214722
Epoch #5: loss=1.0289524793624878
Epoch #6: loss=1.0608772039413452
Epoch #7: loss=0.8572996854782104
Epoch #8: loss=0.8317501544952393
Epoch #9: loss=0.8209478259086609
Epoch #10: loss=0.7570945024490356
Epoch #11: loss=0.5682134628295898
Epoch #12: loss=0.41528385877609253
Epoch #13: loss=0.45433613657951355
Epoch #14: loss=0.48441940546035767
Epoch #15: loss=0.4396260976791382
Epoch #16: loss=0.3055073618888855
Epoch #17: loss=0.28080248832702637
Epoch #18: loss=0.1545635163784027
Epoch #19: loss=0.16639582812786102
Epoch #20: loss=0.1722675859928131
Epoch #21: loss=0.1796943098306656
Epoch #22: loss=0.25509920716285706
Epoch #23: loss=0.10068278759717941
Epoch #24: loss=0.2009529173374176
Epoch #25: loss=0.12905964255332947
Epoch #26: loss=0.12954238057136536
Epoch #27: loss=0.14839

In [10]:
import torch

# Assume model is your trained PyTorch model
torch.save(model, 'mymodel.pth')


In [11]:
print(loss_log)

[2.6888856887817383, 1.8564746379852295, 1.670044183731079, 1.6827473640441895, 1.0348690748214722, 1.0289524793624878, 1.0608772039413452, 0.8572996854782104, 0.8317501544952393, 0.8209478259086609, 0.7570945024490356, 0.5682134628295898, 0.41528385877609253, 0.45433613657951355, 0.48441940546035767, 0.4396260976791382, 0.3055073618888855, 0.28080248832702637, 0.1545635163784027, 0.16639582812786102, 0.1722675859928131, 0.1796943098306656, 0.25509920716285706, 0.10068278759717941, 0.2009529173374176, 0.12905964255332947, 0.12954238057136536, 0.14839030802249908, 0.13741767406463623, 0.15255874395370483, 0.09856972843408585, 0.1176132783293724, 0.0992179661989212, 0.10618412494659424, 0.08062867075204849, 0.08928942680358887, 0.0496666356921196, 0.05954549461603165, 0.06291511654853821, 0.05954553931951523, 0.07728126645088196, 0.08167474716901779, 0.10672558099031448, 0.08187730610370636, 0.10869069397449493, 0.051905471831560135, 0.05297733098268509, 0.08961204439401627, 0.0935868024

I want to test the max_cross_corr function in utils.py

In [74]:
import torch
def max_cross_corr(window1,window2):
    """
    Compute the maxmium cross correlation between window1 and window2.
    """
    print("window1:",window1.shape)
    print("window2:",window2.shape)

    L,C = window1.shape #L: length of window; C: dimension of feature
    window1 = window1.permute(1, 0).contiguous() # C x L
    window2 = window2.permute(1, 0).contiguous()
    window1 = window1 - window1.mean(dim=-1, keepdim=True)
    window2 = window2 - window2.mean(dim=-1, keepdim=True)

    window1_fft = torch.fft.rfft(window1, dim=-1)
    window2_fft = torch.fft.rfft(window2, dim=-1)

    X = window1_fft * torch.conj(window2_fft)

    power_norm = (window1.std(dim=-1, keepdim=True) * window2.std(dim=-1, keepdim=True)).to(X.dtype)
    power_norm = torch.where(power_norm == 0, torch.ones_like(power_norm), power_norm)

    X = X / power_norm

    cc = torch.fft.irfft(X, n=L, dim=-1)
    print("cc:",cc.shape)
    max_cc = cc.max(dim=-1).values

    return max_cc
# Load windows from the .pth file
window1 = torch.load('window1.pth')
window2 = torch.load('window2.pth')

maxcc=max_cross_corr(window1,window2)
corr=torch.mean(maxcc)
print(corr)


window1: torch.Size([50, 7])
window2: torch.Size([50, 7])
cc: torch.Size([7, 50])
tensor(23.5859, device='cuda:0')


#### **4. Encode representations with the trained model**

When creating the train and test instances of a dataset, there are two methods:
1. The most common for classification and clustering is to separate your train and test datasets by choosing different time series instances. This above means that for a dataset $X \in \mathbb{R}^{B \times T \times F}$, you define $X_{train}$ by slicing X along the instances or batch axis: `X_train = X[:n_train, :, :]`.
2. For forecasting and anomaly detection, another way to build your train and test set is to use all instances up to timestep $T_{train}$ for training, and further timesteps for testing: `X_train = X[:, :T_train, :]`.

If using the first method, please skip to section `4.1`. If using the second method, please read through section `4.0.`

##### **4.0. Using correct test-set time indices**

When splitting your train and test sets along the time axis, it is import to adequately label the timesteps corresponding to the test set: T-Rep uses timesteps to compute time-embeddings, so one shouldn't naively use timesteps $[T_{train}:T_{end}]$, or reindex the test set from timestep 0.

As the model was trained on a previous section of the dataset $X_{train} = [x_{t_0}...x_{T_{train}}]$, with corresponding timesteps $[t_0...T_{train}]$, we will try to find the subsequence of $X_{train}$ which most closely resembles our test set (call that subsequence $[x_{t_a}:x_{t_b}]$, ranging from timesteps $t_a$ to $t_b$). When encoding our test set, we then feed T-Rep $X_{test}$ alongside that subsequence's timesteps $[t_a:t_b]$. This ensures we don't feed out-of-distribution inputs (timesteps) to the time-embedding module.

Finding the closest segment to $X_{test}$ in the train data is very easily done using the `find_closest_train_segment` function, which uses a sliding window and the Euclidean distance.

In [94]:
closest_time_indices = find_closest_train_segment(
    train_data,
    test_data,
    squared_dist=True
)
closest_time_indices.shape

(1, 2880, 1)

##### **4.1. Encoding representations for forecasting and anomaly detection**

Encode representations at a timestep granularity (one representation vector per timestep), preserving the original data's temporality. This is typically what you might use for **forecasting** or **anomaly detection**.

In [None]:
test_repr_fine = model.encode(
    data=test_data,
    time_indices=closest_time_indices,
    mask=None, # Used for the Anomaly Detection protocol, can be ignored
    encoding_window=None, # Used to control the temporal granularity of the representation
    causal=True, # Whether to use causal convolutions (for forecasting you might want this) or not.
    sliding_length=1, # The length of sliding window. When this param is specified, a sliding inference would be applied on the time series.
    sliding_padding=100, # Contextual data length used for inference every sliding windows. The timestamp t's representation vector is computed using the observations located in [t - sliding_padding, t].
    batch_size=16,
    return_time_embeddings=False
)
print(f"Fine-grained (timestep-wise) representation shape: {test_repr_fine.shape}")

Fine-grained (timestep-wise) representation shape: (1, 2880, 128)


##### **4.2. Encoding representations for classification and clustering**

Encode representations at an instance granularity (one representation vector per time-series instance), eliminating the temporal dimension of the data. This is more typically used for **classification** or **clustering**. For these tasks, we often discard the temporal dimension as we care more about **inter-instance** differences than **intra-instance** differences. In most cases, reducing each instance to one representation vector is enough, and helps reduce the intrinsic dimensionality of our problem.

To encode representations at an instance granularity, simply set `encoding_window='full_series'`, which will apply a maxpool operation to the temporal dimension of the representation with a kernel size equal to the length of the time series, resulting in a temporal dimension of 1. You thus obtain one representation vector for entire time series instance.

In [None]:
test_repr_coarse = model.encode(
    data=test_data,
    time_indices=closest_time_indices,
    mask=None, # Used for the Anomaly Detection protocol, can be ignored
    encoding_window='full_series', # Used to control the temporal granularity of the representation
    causal=False, # Whether to use causal convolutions (for forecasting for instance) or not.
    sliding_length=None, # The length of sliding window. When this param is specified, a sliding inference would be applied on the time series.
    sliding_padding=0, # Contextual data length used for inference every sliding windows.
    batch_size=16,
    return_time_embeddings=False
)
print(f"Instance-wide representation shape: {test_repr_coarse.shape}")

Instance-wide representation shape: (1, 128)


##### **4.3. Encoding representations with custom temporal granularity**

Encode representations at a custom temporal granularity, by setting the ``encoding_window`` parameter to an integer. This integer specifies the kernel size that will be used to apply a **maxpool** operation to the timestep-level representation. This may be desirable for more advanced use cases in classification, clustering, anomaly detection, forecasting, or any other downstream tasks.

This works by first computing the representation at full temporal granularity, i.e. with the same number of timesteps as the original data. A maxpool operation is then applied to the temporal dimension of the representation, with kernel size controled by the `encoding_window` parameter of the encoding function. The `stride` and `padding` are both set to `encoding_window // 2`. The representation's temporal dimensionality can be pre-determiend using the usual maxpool dimension formula.

In [None]:
test_repr_custom = model.encode(
    data=test_data,
    time_indices=closest_time_indices,
    mask=None, # Used for the Anomaly Detection protocol, can be ignored
    encoding_window=50, # Used to control the temporal granularity of the representation
    causal=False, # Whether to use causal convolutions (for forecasting for instance) or not.
    sliding_length=None, # The length of sliding window. When this param is specified, a sliding inference would be applied on the time series.
    sliding_padding=0, # Contextual data length used for inference every sliding windows.
    batch_size=16,
    return_time_embeddings=False
)
print(f"Custom temporal resolution representation shape: {test_repr_custom.shape}")

Custom temporal resolution representation shape: (1, 115, 128)


A special value can be passed to the ``encoding_window`` parameter of the encoding function: it can be set to 'multiscale'. This will concatenate representations at multiple temporal granularities, resulting in a representation which incorporates both global and local information at every timestep. Essentially this results in a larger representation dimensionality, but no temporal resolution change.

In [None]:
test_repr_multiscale = model.encode(
    data=test_data,
    time_indices=closest_time_indices,
    mask=None, # Used for the Anomaly Detection protocol, can be ignored
    encoding_window='multiscale', # Used to control the temporal granularity of the representation
    causal=False, # Whether to use causal convolutions (for forecasting for instance) or not.
    sliding_length=1, # The length of sliding window. When this param is specified, a sliding inference would be applied on the time series.
    sliding_padding=100, # Contextual data length used for inference every sliding windows.
    batch_size=16,
    return_time_embeddings=False
)
print(f"Multiscale representation shape: {test_repr_multiscale.shape}")

Multiscale representation shape: (1, 2880, 1024)


#### **That's it!**

This is all you need to know to use T-Rep. The produced `np.ndarray` of representations can then be used as inputs for any task ranging from classification, clustering, forecasting, to anomaly detection etc.