# Time-Series Generation using Contrastive Learning

Consider learning a generative model for time-series data.

The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories.

On one hand, autoregressive models
trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts.

On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess.

In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate
compounding error, we optimize a local (but forward-looking) *transition policy*, where the reinforcement signal is provided by a global (but stepwise-decomposable) *energy model* trained by contrastive estimation. 

At **training**, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. 

At **inference**, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.

By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies *“generation by imitation”*. Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm.

Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks.

## 1 Setup

### 1.1 Install libraries

Run the cell below to **install** the necessary libraries.

In [None]:
%pip install wandb
%pip install pytorch-lightning
%pip install matplotlib
%pip install numpy
%pip install pandas
%pip install scikit-learn

Change or remove these commands with the right ones for your machine

In [None]:
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
%pip install cuda-python

In [None]:
import torch
print(torch.__version__)

### 1.2 Import Libraries

Run the cell below to **import** the necessary libraries

In [None]:
import torch
import numpy as np

In [None]:
from pytorch_lightning.callbacks import EarlyStopping
from pytorch_lightning import Trainer

In [None]:
import wandb
from pytorch_lightning.loggers.wandb import WandbLogger

In [None]:
from hyperparameters import Config
import utilities as ut
import dataset_handling as dh

Eh eh

In [None]:
import warnings
warnings.filterwarnings("ignore")

### 1.3 Hyper-parameters

The cell below contains *all* the hyper-parameters nedded by this script, for easy tweaking.

In [None]:
hparams = Config()

Comment this cell if you don't want to use Weights & Biases to log the process

In [None]:
#!wandb login

### 1.5 Initialization

Initialize the modules needed by running the cells in this section.

#### 1.5.1 reproducibility.

In [None]:
ut.set_seed(seed=1337)

#### 1.5.2 Device

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device {device}.")

#### 1.5.3 Data

Path to the folder containing the datasets.

In [None]:
datasets_folder = hparams.dataset_folder

Generate the dataset as requested

In [None]:
from data_generation import iid_sequence_generator, sine_process, wiener_process

In [None]:
if hparams.dataset_name in ['sine', 'wien', 'iid', 'cov']:
  # Generate and store the dataset as requested
  dataset_path = f"../datasets/{hparams.dataset_name}_generated_stream.csv"
  if hparams.dataset_name == 'sine':
    sine_process.save_sine_process(p=hparams.data_dim, N=hparams.num_samples, file_path=dataset_path)
  elif hparams.dataset_name == 'wien':
    wiener_process.save_wiener_process(p=hparams.data_dim, N=hparams.num_samples, file_path=dataset_path)
  elif hparams.dataset_name == 'iid':
    iid_sequence_generator.save_iid_sequence(p=hparams.data_dim, N=hparams.num_samples, file_path=dataset_path)
  elif hparams.dataset_name == 'cov':
    iid_sequence_generator.save_cov_sequence(p=hparams.data_dim, N=hparams.num_samples, file_path=dataset_path)
  else:
    raise ValueError
  print(f"The {hparams.dataset_name} dataset has been succesfully created and stored into:\n\t- {dataset_path}")
elif hparams.dataset_name == 'real':
  pass
else:
  raise ValueError("Dataset not supported.")

Train / Test split

In [None]:
if hparams.dataset_name in ['sine', 'wien', 'iid', 'cov']:
    train_dataset_path = f"{datasets_folder}{hparams.dataset_name}_training.csv"
    test_dataset_path = f"{datasets_folder}{hparams.dataset_name}_testing.csv"
    val_dataset_path  = f"{datasets_folder}{hparams.dataset_name}_validating.csv"

    # Train & Test
    dh.train_test_split(X=np.loadtxt(dataset_path, delimiter=",", dtype=np.float32),
                    split=hparams.train_test_split,
                    train_file_name=train_dataset_path,
                    test_file_name=test_dataset_path    
                    )

    # Train & Validation
    dh.train_test_split(X=np.loadtxt(train_dataset_path, delimiter=",", dtype=np.float32),
                    split=hparams.train_val_split,
                    train_file_name=train_dataset_path,
                    test_file_name=val_dataset_path    
                    )
    
    print(f"The {hparams.dataset_name} dataset has been split successfully into:\n\t- {train_dataset_path}\n\t- {val_dataset_path}")
elif hparams.dataset_name == 'real':
    train_dataset_path = datasets_folder + hparams.train_file_name
    test_dataset_path  = datasets_folder + hparams.test_file_name
    val_dataset_path   = datasets_folder + hparams.val_file_name
else:
  raise ValueError("Dataset not supported.")

## 1.6 Model

This cell loads the TimeGAN model class.

In [None]:
from timegan_model import TimeGAN

## 2 Train

This chapter will train the model according to the hyper-parameters defined above in section [Hyper-parameters](#13-hyper-parameters).

In [None]:
from training_loop import train

In [None]:
train(datasets_folder=datasets_folder)

# 3 Testing

In [None]:
import testing_loop as test

Load the model

In [None]:
timegan = TimeGAN(hparams=hparams,
                    train_file_path=train_dataset_path,
                    val_file_path=val_dataset_path
                    )
timegan.load_state_dict(torch.load(f"./timegan-{hparams.dataset_name}.pth"))

timegan.eval()
print(f"TimeGAN {hparams.dataset_name} model loaded and ready for testing.")

Load the Test Dataset

In [None]:
test_dataset = dh.RealDataset(
                file_path=test_dataset_path,
                seq_len=hparams.seq_len
                )

## 3.1 Sequence Recovery

In [None]:
avg_rec_loss = test.recovery_seq_test(model=timegan,
                                      test_dataset=test_dataset,
                                      limit=hparams.limit,
                                      frequency=hparams.pic_frequency
                                      )

## 3.2 Sequence Generation

In [None]:
avg_gen_loss = test.generate_seq_test(model=timegan,
                                      test_dataset=test_dataset,
                                      limit=hparams.limit,
                                      frequency=hparams.pic_frequency
                                      )

## 3.3 Complete Generation

In [None]:
test.generate_stream_test(model=timegan,
                          test_dataset=test_dataset,
                          limit=hparams.limit,
                          folder_path="./test_results/",
                          save_pic=True,
                          compare=False
                          )

## 3.5 Distribution Visualization

In [None]:
test.distribution_visualization(model=timegan,
                                test_dataset=test_dataset,
                                limit=hparams.limit,
                                folder_path="./test_results/",
                                save_pic=True
                                )

## 3.5 Prediction

In [None]:
test.predictive_test(model=timegan,
                     test_dataset=test_dataset,
                     test_dataset_path=test_dataset_path,
                     folder_path="./test_results/",
                     save_pic=True,
                     limit=hparams.limit
                     )

## 3.6 Linear Deterministic Anomaly Detector

In these tests the model will be asked to generate sequences that a deterministic PCA-based anomaly detector will scan looking for irregularities with respect to the real sequences.  

In [None]:
FAR, TAR = test.AD_tests(model=timegan, test_dataset=test_dataset)