# Time-series synthetic data 
### A generation and evaluation example with **Clearbox Engine**

This notebook walks you through the time-series synthetic data generation and evaluation process with **Clearbox Engine**.

You can run this notebook on Google Colab or on your local machine.<br> 
In the second case, we highly recommend to create a dedicated virtual environment.

<div class="alert alert-secondary">
To run this notebook, make sure you change the runtime to <strong>GPU</strong><br>
<hr>
<strong>Runtime</strong> --> <strong>Change Runtime Type</strong> <br>
and set <strong>Hardware Accelerator</strong> to "<strong>GPU</strong>"
</div>

In [None]:
# Install the library and its dependencies

%pip install clearbox-synthetic-kit

In [73]:
# Import necessary dependencies
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm import tqdm
from clearbox_synthetic.utils import Dataset

#from clearbox_synthetic.generation import TimeSeriesEngine
from clearbox_preprocessor import Preprocessor

## 0. Data import and preparation

In [74]:

train_dataset = Dataset.from_csv('./data/daily_delhi_climate/DailyDelhiClimateTrain.csv')

### Data pre-processing
Datasets are pre-processd with the **Preprocessor** class, which prepares data for the subsequent steps.

In [75]:
# Adding a time index column with year and month, as "yyyymm"
train_dataset.data['id'] =train_dataset.data['date'].apply(lambda x: ''.join(x.split('-')[0:2]))


## 1. Synhetic Data Generation

In [77]:
# Initializing the time series generator

engine = TimeSeriesEngine(
    dataset = train_dataset,
    layers_size=[40],
    time_id='id',
)

import jax.numpy as jnp

def count_parameters(params_dict):
    total_count = 0
    for key, value in params_dict.items():
        if isinstance(value, dict):  # If the value is another dictionary, recurse
            total_count += count_parameters(value)
        elif isinstance(value, jnp.ndarray):  # If the value is an array, count parameters
            total_count += value.size
    return total_count

# Count the total number of parameters

total_params = count_parameters(engine.params['encoder'])
print("Number of parameters (encoder):", total_params)
total_params = count_parameters(engine.params['decoder'])
print("Number of parameters (decoder):", total_params)

Number of parameters (encoder): 26996
Number of parameters (decoder): 25200


In [78]:
engine.fit(train_dataset, epochs=1000, learning_rate=0.00001)

[32m2025-03-30 18:08:28.940[0m | [1mINFO    [0m | [36m__main__[0m:[36mfit[0m:[36m570[0m - [1mPreprocessing training time series[0m
49it [00:00, 1387.73it/s]
Engine fitting in progress: 100%|█████████████████████████| 1000/1000 [00:08<00:00, 115.33epoch/s, Train loss=12521.055]


In [79]:
engine.generate(train_dataset,11)