# Time-Series Generation using Contrastive Learning

Consider learning a generative model for time-series data.

The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories.

On one hand, autoregressive models
trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts.

On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess.

In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate
compounding error, we optimize a local (but forward-looking) *transition policy*, where the reinforcement signal is provided by a global (but stepwise-decomposable) *energy model* trained by contrastive estimation. 

At **training**, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. 

At **inference**, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.

By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies *“generation by imitation”*. Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm.

Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks.

## 1 Setup

### 1.1 Install libraries

Run the cell below to **install** the necessary libraries.

In [21]:
# !pip install wandb
# !pip install pytorch-lightning
# !pip install pyyaml
# !pip install torchvision
# !pip install plotly
!pip install tensorflow

Found existing installation: tensorflow 2.15.0.post1
Uninstalling tensorflow-2.15.0.post1:
  Would remove:
    /home/dima/.local/bin/estimator_ckpt_converter
    /home/dima/.local/bin/import_pb_to_tensorboard
    /home/dima/.local/bin/saved_model_cli
    /home/dima/.local/bin/tensorboard
    /home/dima/.local/bin/tf_upgrade_v2
    /home/dima/.local/bin/tflite_convert
    /home/dima/.local/bin/toco
    /home/dima/.local/bin/toco_from_protos
    /home/dima/.local/lib/python3.10/site-packages/tensorflow-2.15.0.post1.dist-info/*
    /home/dima/.local/lib/python3.10/site-packages/tensorflow/*
Proceed (Y/n)? ^C
[31mERROR: Operation cancelled by user[0m[31m
[0m

### 1.2 Import Libraries

Run the cell below to **import** the necessary libraries

In [12]:
# from typing import Sequence, List, Dict, Tuple, Optional, Any, Set, Union, Callable, Mapping
# import itertools

# import dataclasses
# from dataclasses import dataclass
# from dataclasses import asdict
# from pathlib import Path
# from pprint import pprint
# from urllib.request import urlopen
# import random

# from PIL import Image
# import PIL

# import torchvision.utils
# import matplotlib.pyplot as plt
# import plotly.graph_objects as go
# import plotly.express as px

# import numpy as np
# import torch
# import torchvision.transforms as transforms
# from torch.utils.data import DataLoader
# from torchvision.datasets import MNIST
# from torch.utils.data import DataLoader, Dataset
# from torch import nn, optim
# import torch.nn.functional as F

# import wandb
# import pytorch_lightning as pl
# from pytorch_lightning.loggers.wandb import WandbLogger
# from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint

# import torchvision
# from torchvision import transforms
# from tqdm.notebook import tqdm

## Necessary packages
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import warnings
warnings.filterwarnings("ignore")

# 1. TimeGAN model
from timegan import timegan
# 2. Data loading
from data_loading import real_data_loading, sine_data_generation
# 3. Metrics
from metrics.discriminative_metrics import discriminative_score_metrics
from metrics.predictive_metrics import predictive_score_metrics
from metrics.visualization_metrics import visualization


import random

### 1.3 Hyper-parameters

The cell below contains *all* the hyper-parameters nedded by this script, for easy tweaking.

In [None]:
c = 1.0 # . . . Domain bounds for the loss functions
M = 32 #. . . . Mini-batch size
lr = 0.0007 # . Learning Rate
k = 1.0 # . . . Regularization coefficient 

## Data loading
data_name = 'stock' # . . which dataset to use
seq_len = 24 #. . . . . . max length of the input sequence

## Newtork parameters
module = 'gru' #. . . . . Can be 'gru', 'lstm' or 'lstmLN'
hidden_dim = 24 # . . . . Hidden dimensions
num_layer = 3 # . . . . . Number of layers
iterations = 10000 #. . . Number of epochs
batch_size = 128 #. . . . Amount of samples in each batch

metric_iteration = 5 #. . Number of iteration for each metric

Parameters for humans.

In [18]:
use_wandb = False # . . will require login for Weights & Biases

### 1.4 Utils

In [None]:
# Just a function to count the number of parameters
def count_parameters(model: torch.nn.Module) -> int:
  """ Counts the number of trainable parameters of a module

  :param model: model that contains the parameters to count
  :returns: the number of parameters in the model
  """
  return sum(p.numel() for p in model.parameters() if p.requires_grad)

### 1.5 Initialization

Initialize the modules needed by running the cells in this section.

For reproducibility.

In [13]:
np.random.seed(0)
random.seed(0)

torch.cuda.manual_seed(0)
torch.manual_seed(0)
torch.backends.cudnn.deterministic = True  # Note that this Deterministic mode can have a performance impact
torch.backends.cudnn.benchmark = False

_ = pl.seed_everything(0)

Global seed set to 0


Weights & Biases

In [19]:
if use_wandb:
    !wandb login

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Data Loading

In [None]:
if data_name in ['stock', 'energy']:
  ori_data = real_data_loading(data_name, seq_len)
elif data_name == 'sine':
  # Set number of samples and its dimensions
  no, dim = 10000, 5
  ori_data = sine_data_generation(no, seq_len, dim)
else:
  assert(False)
    
print(data_name + ' dataset has been loaded.')

Network Parameters

In [None]:
## Newtork parameters
parameters = dict()

parameters['module'] = module 
parameters['hidden_dim'] = hidden_dim
parameters['num_layer'] = num_layer
parameters['iterations'] = iterations
parameters['batch_size'] = batch_size

## 2 Train

This chapter will train the model according to the hyper-parameters defined above in section [Hyper-parameters](#13-hyper-parameters).

# 3 Validation

## 4 Visualize Results