# TimeGAN Tutorial

## Time-series Generative Adversarial Networks

- Paper: Jinsung Yoon, Daniel Jarrett, Mihaela van der Schaar, "Time-series Generative Adversarial Networks," Neural Information Processing Systems (NeurIPS), 2019.

- Paper link: https://papers.nips.cc/paper/8789-time-series-generative-adversarial-networks

- Last updated Date: April 24th 2020

- Code author: Jinsung Yoon (jsyoon0823@gmail.com)

This notebook describes the user-guide of a time-series synthetic data generation application using timeGAN framework. We use Stock, Energy, and Sine dataset as examples.

### Prerequisite
Clone https://github.com/jsyoon0823/timeGAN.git to the current directory.

## Necessary packages and functions call

- timegan: Synthetic time-series data generation module
- data_loading: 2 real datasets and 1 synthetic datasets loading and preprocessing
- metrics: 
    - discriminative_metrics: classify real data from synthetic data
    - predictive_metrics: train on synthetic, test on real
    - visualization: PCA and tSNE analyses

In [1]:
## Necessary packages
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
import warnings
warnings.filterwarnings("ignore")

# 1. TimeGAN model
from timegan import timegan
# 2. Data loading
from data_loading import real_data_loading, sine_data_generation
# 3. Metrics
from metrics.discriminative_metrics import discriminative_score_metrics
from metrics.predictive_metrics import predictive_score_metrics
from metrics.visualization_metrics import visualization

## Data Loading

Load original dataset and preprocess the loaded data.

- data_name: stock, energy, or sine
- seq_len: sequence length of the time-series data

In [2]:
## Data loading
data_name = 'stock'
seq_len = 24

if data_name in ['stock', 'energy']:
  ori_data = real_data_loading(data_name, seq_len)
elif data_name == 'sine':
  # Set number of samples and its dimensions
  no, dim = 10000, 5
  ori_data = sine_data_generation(no, seq_len, dim)
    
print(data_name + ' dataset is ready.')

ori data shape: (3685, 6)
length of temp data: 3661
len(ori_data) // seq_len: 153
first element of temp_data:
[[0.93861141 0.94147215 0.95343567 0.94167311 0.94167311 0.01036247]
 [0.94834356 0.94670361 0.95907934 0.94708071 0.94708071 0.01029843]
 [0.954155   0.95285887 0.96338287 0.94979688 0.94979688 0.01086633]
 [0.9467474  0.95240934 0.96263369 0.95623843 0.95623843 0.01138349]
 [0.94800794 0.95292433 0.95961209 0.94878758 0.94878758 0.01220273]
 [0.93805478 0.94070371 0.94730091 0.94433178 0.94433178 0.00990814]
 [0.92887107 0.93687002 0.94421275 0.93935911 0.93935911 0.01503863]
 [0.92134078 0.92242602 0.92829732 0.92202844 0.92202844 0.01524888]
 [0.91859058 0.91635658 0.92542641 0.91807318 0.91807318 0.01213748]
 [0.93001701 0.92943054 0.9253756  0.92179044 0.92179044 0.0168233 ]
 [0.94068228 0.94191349 0.93981767 0.93130916 0.93130916 0.02287694]
 [0.93937268 0.94483007 0.94840806 0.93818564 0.93818564 0.01799053]
 [0.96342873 0.96412306 0.96154737 0.9484429  0.9484429  0.020

In [3]:
len(ori_data)

3661

## Set network parameters

TimeGAN network parameters should be optimized for different datasets.

- module: gru, lstm, or lstmLN
- hidden_dim: hidden dimensions
- num_layer: number of layers
- iteration: number of training iterations
- batch_size: the number of samples in each batch

In [4]:
## Newtork parameters
parameters = dict()

parameters['module'] = 'gru' 
parameters['hidden_dim'] = 24
parameters['num_layer'] = 3
parameters['iterations'] = 10000
parameters['batch_size'] = 128

## Run TimeGAN for synthetic time-series data generation

TimeGAN uses the original data and network parameters to return the generated synthetic data.

In [5]:
# Run TimeGAN
generated_data = timegan(ori_data, parameters)   
print('Finish Synthetic Data Generation')





Instructions for updating:
This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
This class is equivalent as tf.keras.layers.StackedRNNCells, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Please use `layer.add_weight` method instead.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons


2024-06-23 13:00:33.134428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2024-06-23 13:00:33.334597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: NVIDIA TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:63:00.0
2024-06-23 13:00:33.338000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: NVIDIA TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:66:00.0
2024-06-23 13:00:33.341354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: 
name: NVIDIA GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:3e:00.0
2024-06-23 13:00:33.349177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 3 with properties: 
name: NVIDIA GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:40:00.0
2




2024-06-23 13:00:34.763139: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55902b51aa70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-06-23 13:00:34.763220: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA TITAN RTX, Compute Capability 7.5
2024-06-23 13:00:34.763238: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): NVIDIA TITAN RTX, Compute Capability 7.5
2024-06-23 13:00:34.763250: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2024-06-23 13:00:34.763261: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2024-06-23 13:00:34.763273: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (4): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2024-06-23 13:00:34.772306: I tensorf

Start Embedding Network Training


2024-06-23 13:00:38.597015: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0


step: 0/10000, e_loss: 0.3189
step: 1000/10000, e_loss: 0.0222
step: 2000/10000, e_loss: 0.0205
step: 3000/10000, e_loss: 0.0097
step: 4000/10000, e_loss: 0.0084
step: 5000/10000, e_loss: 0.0053
step: 6000/10000, e_loss: 0.0062
step: 7000/10000, e_loss: 0.0047
step: 8000/10000, e_loss: 0.0041
step: 9000/10000, e_loss: 0.006
Finish Embedding Network Training
Start Training with Supervised Loss Only
step: 0/10000, s_loss: 0.2502
step: 1000/10000, s_loss: 0.0195
step: 2000/10000, s_loss: 0.0136
step: 3000/10000, s_loss: 0.0139
step: 4000/10000, s_loss: 0.0135
step: 5000/10000, s_loss: 0.0131
step: 6000/10000, s_loss: 0.0143
step: 7000/10000, s_loss: 0.0129
step: 8000/10000, s_loss: 0.0116
step: 9000/10000, s_loss: 0.0124
Finish Training with Supervised Loss Only
Start Joint Training
step: 0/10000, d_loss: 2.0115, g_loss_u: 0.7004, g_loss_s: 0.0188, g_loss_v: 0.2995, e_loss_t0: 0.0663
step: 1000/10000, d_loss: 1.4308, g_loss_u: 1.8306, g_loss_s: 0.019, g_loss_v: 0.0292, e_loss_t0: 0.003
st

## Evaluate the generated data

### 1. Discriminative score

To evaluate the classification accuracy between original and synthetic data using post-hoc RNN network. The output is |classification accuracy - 0.5|.

- metric_iteration: the number of iterations for metric computation.

In [None]:
metric_iteration = 5

discriminative_score = list()
for _ in range(metric_iteration):
  temp_disc = discriminative_score_metrics(ori_data, generated_data)
  discriminative_score.append(temp_disc)

print('Discriminative score: ' + str(np.round(np.mean(discriminative_score), 4)))

Instructions for updating:
Please use tf.global_variables instead.
Discriminative score: 0.1759


## Evaluate the generated data

### 2. Predictive score

To evaluate the prediction performance on train on synthetic, test on real setting. More specifically, we use Post-hoc RNN architecture to predict one-step ahead and report the performance in terms of MAE.

In [None]:
predictive_score = list()
for tt in range(metric_iteration):
  temp_pred = predictive_score_metrics(ori_data, generated_data)
  predictive_score.append(temp_pred)   
    
print('Predictive score: ' + str(np.round(np.mean(predictive_score), 4)))


Predictive score: 0.0412


## Evaluate the generated data

### 3. Visualization

We visualize the original and synthetic data distributions using PCA and tSNE analysis.

In [1]:
visualization(ori_data, generated_data, 'pca')
visualization(ori_data, generated_data, 'tsne')

NameError: name 'visualization' is not defined