# Time-series synthetic data 
### A generation and evaluation example with **Clearbox Synthetic Kit**

This notebook walks you through the time-series synthetic data generation and evaluation process with **Clearbox Synthetic Kit**.

You can run this notebook on Google Colab or on your local machine.<br> 
In the second case, we highly recommend to create a dedicated virtual environment.

<div class="alert alert-secondary">
To run this notebook, make sure you change the runtime to <strong>GPU</strong><br>
<hr>
<strong>Runtime</strong> --> <strong>Change Runtime Type</strong> <br>
and set <strong>Hardware Accelerator</strong> to "<strong>GPU</strong>"
</div>

In [None]:
# Install the library and its dependencies

%pip install clearbox-synthetic-kit

In [2]:
# Import necessary dependencies
import numpy as np
import pandas as pd
from tqdm import tqdm

from clearbox_synthetic.utils import Dataset
from clearbox_synthetic.generation import TimeSeriesEngine

## 0. Data import and preparation

In [3]:
df = pd.read_csv('data/daily_delhi_climate/DailyDelhiClimateTrain.csv')
# Adding a time index column with month, as "mm" to be used as target column
df['id'] =df['date'].apply(lambda x: ''.join(x.split('-')[0:2]))
train_dataset = Dataset.from_dataframe(df)

In [4]:
train_dataset.data

Unnamed: 0,date,meantemp,humidity,wind_speed,meanpressure,id
0,2013-01-01,10.000000,84.500000,0.000000,1015.666667,201301
1,2013-01-02,7.400000,92.000000,2.980000,1017.800000,201301
2,2013-01-03,7.166667,87.000000,4.633333,1018.666667,201301
3,2013-01-04,8.666667,71.333333,1.233333,1017.166667,201301
4,2013-01-05,6.000000,86.833333,3.700000,1016.500000,201301
...,...,...,...,...,...,...
1457,2016-12-28,17.217391,68.043478,3.547826,1015.565217,201612
1458,2016-12-29,15.238095,87.857143,6.000000,1016.904762,201612
1459,2016-12-30,14.095238,89.666667,6.266667,1017.904762,201612
1460,2016-12-31,15.052632,87.000000,7.325000,1016.100000,201612


## 1. Synhetic Data Generation

In [6]:
# Initializing the time series generator

engine = TimeSeriesEngine(train_dataset, time_id="id")

In [7]:
# Start the training of the tabular synthetic data generator
engine.fit(train_dataset, epochs=20)
synth_df = engine.generate(train_dataset,n_samples=100)#, epochs=50, learning_rate=0.001)

[32m2025-05-09 16:09:18.727[0m | [1mINFO    [0m | [36m__main__[0m:[36m<module>[0m:[36m2[0m - [1mPreprocessing training time series[0m
49it [00:00, 1047.89it/s]
Engine fitting in progress: 100%|██████████| 20/20 [00:02<00:00,  7.55epoch/s, Train loss=13889.474]
100%|██████████| 100/100 [00:00<00:00, 833.15it/s]


In [8]:
synth_df.describe()

Unnamed: 0,meantemp,humidity,wind_speed,meanpressure,id
count,3200.0,3200.0,3200.0,3200.0,3200.0
mean,24.985123,62.043026,6.62646,997.979675,49.5
std,5.54454,13.496647,3.840055,143.538406,28.870581
min,0.723747,-5.179913,-11.470397,214.965271,0.0
25%,21.515738,53.644192,4.367959,907.168655,24.75
50%,24.97183,62.167347,6.613979,994.704346,49.5
75%,28.323034,70.400633,8.888786,1086.286194,74.25
max,57.569599,128.677094,30.789803,1630.916016,99.0


In [9]:
train_dataset.data.describe()

Unnamed: 0,meantemp,humidity,wind_speed,meanpressure
count,1462.0,1462.0,1462.0,1462.0
mean,25.495521,60.771702,6.802209,1011.104548
std,7.348103,16.769652,4.561602,180.231668
min,6.0,13.428571,0.0,-3.041667
25%,18.857143,50.375,3.475,1001.580357
50%,27.714286,62.625,6.221667,1008.563492
75%,31.305804,72.21875,9.238235,1014.944901
max,38.714286,100.0,42.22,7679.333333
