# Model Validation Notebook

This notebook provides a comprehensive validation for all implemented time series generative models (parametric and non-parametric). It covers the entire pipeline from data preprocessing to model training and synthetic data generation, ensuring that each model functions as expected and produces output in the desired format `(R, l, N)`.

## Table of Contents:
1.  [Setup and Imports](#Setup-and-Imports)
2.  [Data Preprocessing](#Data-Preprocessing)
3.  [Parametric Model Validation](#Parametric-Model-Validation)
    *   [Geometric Brownian Motion](#Geometric-Brownian-Motion)
    *   [Ornstein-Uhlenbeck Process](#Ornstein-Uhlenbeck-Process)
4.  [Non-Parametric Model Validation](#Non-Parametric-Model-Validation)
    *   [Vanilla GAN](#Vanilla-GAN)
    *   [Wasserstein GAN](#Wasserstein-GAN)



In [1]:
import sys
import os
import numpy as np
import torch
from pathlib import Path

# Add project root to sys.path
project_root = Path().resolve().parents[0]
sys.path.append(str(project_root))

print(f"Project root added to sys.path: {project_root}")

# Import preprocessing utilities
from data.preprocess import (
    preprocess_data, 
    load_preprocessed_data,
    create_dataset_from_preprocessed,
)
from utils.preprocess_utils import (
    TimeSeriesDataset,
    create_dataloaders
)

# Import BaseModel and all implemented models
from models.base_model import BaseGenerativeModel, ParametricModel, DeepLearningModel
from models.parametric.gbm import GeometricBrownianMotion
from models.parametric.ou_process import OrnsteinUhlenbeckProcess

from models.non_parametric.vanilla_gan import VanillaGAN
from models.non_parametric.wasserstein_gan import WassersteinGAN

print("All necessary modules imported successfully!")



Project root added to sys.path: C:\Users\14165\Downloads\Unified-benchmark-for-SDGFTS-main
All necessary modules imported successfully!


## Data Preprocessing

This section demonstrates how to preprocess a sample dataset (`GOOG.csv`) using the provided utilities and create PyTorch `DataLoader` objects. This data will be used to train and validate our generative models.



In [2]:
# Configuration for data preprocessing
config_goog = {
    'original_data_path': str(project_root / 'data' / 'GOOG' / 'GOOG.csv'),
    'output_ori_path': str(project_root / 'testing_results' / 'preprocessed'),
    'dataset_name': 'goog_stock_validation',
    'valid_ratio': 0.1,
    'do_normalization': True,
    'seed': 42  # Reproducible shuffling
}

print(f"Preprocessing data with config: {config_goog}")

# Preprocess the data
train_data_np, valid_data_np = preprocess_data(config_goog)

# Create PyTorch DataLoaders
batch_size = 32
train_loader, valid_loader = create_dataloaders(
    train_data_np, valid_data_np,
    batch_size=batch_size,
    train_seed=42,
    valid_seed=123,
    num_workers=0,
    pin_memory=False
)

# Display data shapes and DataLoader info
print(f"\nTrain data shape: {train_data_np.shape}")
print(f"Valid data shape: {valid_data_np.shape}")
print(f"Number of training batches: {len(train_loader)}")
print(f"Number of validation batches: {len(valid_loader)}")

# Get output dimensions for models
num_samples_real, length, num_channels = train_data_np.shape
print(f"\nInferred model output dimensions: length={length}, num_channels={num_channels}")



Preprocessing data with config: {'original_data_path': 'C:\\Users\\14165\\Downloads\\Unified-benchmark-for-SDGFTS-main\\data\\GOOG\\GOOG.csv', 'output_ori_path': 'C:\\Users\\14165\\Downloads\\Unified-benchmark-for-SDGFTS-main\\testing_results\\preprocessed', 'dataset_name': 'goog_stock_validation', 'valid_ratio': 0.1, 'do_normalization': True, 'seed': 42}
Data preprocessing with settings:{'original_data_path': 'C:\\Users\\14165\\Downloads\\Unified-benchmark-for-SDGFTS-main\\data\\GOOG\\GOOG.csv', 'output_ori_path': 'C:\\Users\\14165\\Downloads\\Unified-benchmark-for-SDGFTS-main\\testing_results\\preprocessed', 'dataset_name': 'goog_stock_validation', 'valid_ratio': 0.1, 'do_normalization': True, 'seed': 42}
Data shape: (1132, 125, 5)
Preprocessing done. Preprocessed files saved to C:\Users\14165\Downloads\Unified-benchmark-for-SDGFTS-main\testing_results\preprocessed\goog_stock_validation.


Train data shape: (1018, 125, 5)
Valid data shape: (114, 125, 5)
Number of training batches: 32

## Parametric Model Validation

This section validates the functionality of each parametric time series generative model. For each model, we will:
1.  Instantiate the model with appropriate parameters.
2.  Train the model using the preprocessed training data.
3.  Generate new synthetic time series samples.
4.  Verify the shape and basic statistics of the generated data.



### Geometric Brownian Motion



In [3]:
print("\n" + "=" * 50)
print("Validating Geometric Brownian Motion (GBM)")
print("=" * 50)

# Instantiate GBM model
gbm_model = GeometricBrownianMotion(length=length, num_channels=num_channels)
print(f"GBM Model instantiated: {gbm_model}")

# Fit the model
print("Fitting GBM model...")
gbm_model.fit(train_loader)
print(f"GBM model parameters after fitting: mu={gbm_model.mu.data}, sigma={gbm_model.sigma.data}")

# Generate samples
num_generated_samples = 100
gbm_generated_data = gbm_model.generate(num_generated_samples)
print(f"Generated GBM data shape: {gbm_generated_data.shape}")

# Validation checks
assert gbm_generated_data.shape == (num_generated_samples, length, num_channels), \
    f"GBM: Generated data shape mismatch. Expected ({num_generated_samples}, {length}, {num_channels}), got {gbm_generated_data.shape}"
print("GBM: Generated data shape is correct.")

print(f"GBM: Generated data min: {gbm_generated_data.min():.4f}, max: {gbm_generated_data.max():.4f}, mean: {gbm_generated_data.mean():.4f}")




Validating Geometric Brownian Motion (GBM)
GBM Model instantiated: <models.parametric.gbm.GeometricBrownianMotion object at 0x00000247525E7470>
Fitting GBM model...
GBM model parameters after fitting: mu=tensor([-0.0026, -0.0026, -0.0025, -0.0028, -0.0002]), sigma=tensor([0.6190, 0.5936, 0.5898, 0.6390, 0.8405])
Generated GBM data shape: torch.Size([100, 125, 5])
GBM: Generated data shape is correct.
GBM: Generated data min: 0.0894, max: 6.7962, mean: 0.9926


### Ornstein-Uhlenbeck Process



In [4]:
print("\n" + "=" * 50)
print("Validating Ornstein-Uhlenbeck (O-U) Process")
print("=" * 50)

# Instantiate O-U model
ou_model = OrnsteinUhlenbeckProcess(length=length, num_channels=num_channels)
print(f"O-U Model instantiated: {ou_model}")

# Fit the model
print("Fitting O-U model...")
ou_model.fit(train_loader)
print(f"O-U model parameters after fitting: theta={ou_model.theta.data}, mu={ou_model.mu.data}, sigma={ou_model.sigma.data}")

# Generate samples
num_generated_samples = 100
ou_generated_data = ou_model.generate(num_generated_samples)
print(f"Generated O-U data shape: {ou_generated_data.shape}")

# Validation checks
assert ou_generated_data.shape == (num_generated_samples, length, num_channels), \
    f"O-U: Generated data shape mismatch. Expected ({num_generated_samples}, {length}, {num_channels}), got {ou_generated_data.shape}"
print("O-U: Generated data shape is correct.")

print(f"O-U: Generated data min: {ou_generated_data.min():.4f}, max: {ou_generated_data.max():.4f}, mean: {ou_generated_data.mean():.4f}")




Validating Ornstein-Uhlenbeck (O-U) Process
O-U Model instantiated: <models.parametric.ou_process.OrnsteinUhlenbeckProcess object at 0x00000247189D6E70>
Fitting O-U model...
O-U model parameters after fitting: theta=tensor([ 1.1829,  1.1025,  1.1259,  1.2380, 10.0000]), mu=tensor([0.4126, 0.4066, 0.4108, 0.4106, 0.1972]), sigma=tensor([0.3606, 0.3425, 0.3451, 0.3566, 1.0000])
Generated O-U data shape: torch.Size([100, 125, 5])
O-U: Generated data shape is correct.
O-U: Generated data min: -0.6173, max: 1.1716, mean: 0.3638


## Non-Parametric Model Validation

This section validates the functionality of each non-parametric (GAN-based) time series generative model. For each model, we will:
1.  Instantiate the model with appropriate parameters.
2.  Train the model using the preprocessed training data.
3.  Generate new synthetic time series samples.
4.  Verify the shape and basic statistics of the generated data.

Note: GAN training can be unstable and convergence is not guaranteed with simple validation. This is primarily to check code execution and output format.



### Vanilla GAN



In [5]:
print("\n" + "=" * 50)
print("Validating Vanilla GAN")
print("=" * 50)

# Instantiate Vanilla GAN model
# Using a smaller num_epochs for quicker validation, adjust as needed
vanilla_gan_model = VanillaGAN(length=length, num_channels=num_channels, latent_dim=64, hidden_dim=128, lr=0.0002)
print(f"Vanilla GAN Model instantiated: {vanilla_gan_model}")

# Fit the model
print("Fitting Vanilla GAN model (this may take a while)...")
vanilla_gan_model.fit(train_loader, num_epochs=5) # Reduced epochs for testing
print("Vanilla GAN model fitting complete.")

# Generate samples
num_generated_samples = 100
vanilla_gan_generated_data = vanilla_gan_model.generate(num_generated_samples)
print(f"Generated Vanilla GAN data shape: {vanilla_gan_generated_data.shape}")

# Validation checks
assert vanilla_gan_generated_data.shape == (num_generated_samples, length, num_channels), \
    f"Vanilla GAN: Generated data shape mismatch. Expected ({num_generated_samples}, {length}, {num_channels}), got {vanilla_gan_generated_data.shape}"
print("Vanilla GAN: Generated data shape is correct.")

print(f"Vanilla GAN: Generated data min: {vanilla_gan_generated_data.min():.4f}, max: {vanilla_gan_generated_data.max():.4f}, mean: {vanilla_gan_generated_data.mean():.4f}")




Validating Vanilla GAN


Vanilla GAN Model instantiated: VanillaGAN(
  (generator): Generator(
    (model): Sequential(
      (0): Linear(in_features=64, out_features=128, bias=True)
      (1): LeakyReLU(negative_slope=0.2)
      (2): Linear(in_features=128, out_features=256, bias=True)
      (3): LeakyReLU(negative_slope=0.2)
      (4): Linear(in_features=256, out_features=625, bias=True)
    )
  )
  (discriminator): Discriminator(
    (model): Sequential(
      (0): Linear(in_features=625, out_features=256, bias=True)
      (1): LeakyReLU(negative_slope=0.2)
      (2): Linear(in_features=256, out_features=128, bias=True)
      (3): LeakyReLU(negative_slope=0.2)
      (4): Linear(in_features=128, out_features=1, bias=True)
      (5): Sigmoid()
    )
  )
  (bce_loss): BCELoss()
)
Fitting Vanilla GAN model (this may take a while)...
Vanilla GAN model fitting complete.
Generated Vanilla GAN data shape: torch.Size([100, 125, 5])
Vanilla GAN: Generated data shape is correct.
Vanilla GAN: Generated data min: -0.314

### Wasserstein GAN



In [6]:
print("\n" + "=" * 50)
print("Validating Wasserstein GAN")
print("=" * 50)

# Instantiate Wasserstein GAN model
# Using a smaller num_epochs for quicker validation, adjust as needed
wasserstein_gan_model = WassersteinGAN(length=length, num_channels=num_channels, latent_dim=64, hidden_dim=128, lr=0.00005, n_critic=5, clip_value=0.01)
print(f"Wasserstein GAN Model instantiated: {wasserstein_gan_model}")

# Fit the model
print("Fitting Wasserstein GAN model (this may take a while)...")
wasserstein_gan_model.fit(train_loader, num_epochs=5) # Reduced epochs for testing
print("Wasserstein GAN model fitting complete.")

# Generate samples
num_generated_samples = 100
wasserstein_gan_generated_data = wasserstein_gan_model.generate(num_generated_samples)
print(f"Generated Wasserstein GAN data shape: {wasserstein_gan_generated_data.shape}")

# Validation checks
assert wasserstein_gan_generated_data.shape == (num_generated_samples, length, num_channels), \
    f"Wasserstein GAN: Generated data shape mismatch. Expected ({num_generated_samples}, {length}, {num_channels}), got {wasserstein_gan_generated_data.shape}"
print("Wasserstein GAN: Generated data shape is correct.")

print(f"Wasserstein GAN: Generated data min: {wasserstein_gan_generated_data.min():.4f}, max: {wasserstein_gan_generated_data.max():.4f}, mean: {wasserstein_gan_generated_data.mean():.4f}")




Validating Wasserstein GAN
Wasserstein GAN Model instantiated: <models.non_parametric.wasserstein_gan.WassersteinGAN object at 0x0000024754EE3140>
Fitting Wasserstein GAN model (this may take a while)...
Wasserstein GAN model fitting complete.
Generated Wasserstein GAN data shape: torch.Size([100, 125, 5])
Wasserstein GAN: Generated data shape is correct.
Wasserstein GAN: Generated data min: -0.1976, max: 1.0327, mean: 0.3171
