# CSDI for Time Series Forecasting

This notebook demonstrates the usage of the CSDI (Conditional Score-based Diffusion Models for Imputation) model for time series forecasting tasks, specifically for electricity consumption data.

In [2]:
!pip install gdown

Collecting gdown
  Obtaining dependency information for gdown from https://files.pythonhosted.org/packages/54/70/e07c381e6488a77094f04c85c9caf1c8008cdc30778f7019bc52e5285ef0/gdown-5.2.0-py3-none-any.whl.metadata
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Downloading gdown-5.2.0-py3-none-any.whl (18 kB)
Installing collected packages: gdown
Successfully installed gdown-5.2.0


## Library Imports and Model Setup

In this section, we import necessary libraries and modules required for the implementation of the CSDI model. This includes standard data handling libraries like `numpy` and `pandas`, deep learning libraries from `torch`, and specific components for building and training the CSDI model.

In [3]:
import os
import sys
import argparse
import torch
import yaml
import numpy as np
import datetime
import json
import gdown
import zipfile

# Add the parent directory to sys.path to import local modules
sys.path.append(os.path.abspath(os.path.join('.')))

from main_model import CSDI_Forecasting
from dataset_forecasting import get_dataloader
from utils import train, evaluate

## Data Download and Module Import

The provided code snippet performs several tasks essential for setting up a data processing and model training environment:

1. **Add Parent Directory to `sys.path`**:
   - This allows the script to import local modules from the parent directory, ensuring all necessary components are accessible.

2. **Import Required Modules**:
   - It imports various functions and classes from local modules (`main_model`, `dataset_forecasting`, `utils`) needed for model training and evaluation.

3. **Download Data Function**:
   - A function named `download_data` is defined to download all files from a specified Google Drive folder and save them to a local directory.
   - The function:
     - Creates the output directory if it doesn't exist.
     - Extracts the folder ID from the provided Google Drive URL.
     - Downloads the files from the Google Drive folder to the specified local path.
     - Prints a message indicating whether the download was successful or if it failed.

4. **Calling the `download_data` Function**:
   - The function is called with a specific Google Drive folder URL to initiate the download process.


In [4]:
def download_data(folder_url, output_path='data/electricity_nips'):
    """
    Download all files from a Google Drive folder and save them to the specified output path.
    """
    # Create the output directory if it doesn't exist
    os.makedirs(output_path, exist_ok=True)
    
    # Extract the folder ID from the URL
    folder_id = folder_url.split('/')[-1]
    
    # List files in the folder
    url = f"https://drive.google.com/drive/folders/{folder_id}"
    output = gdown.download_folder(url, output=output_path, quiet=False, use_cookies=False)
    
    if output is None:
        print("Failed to download files. Make sure the folder is publicly accessible.")
    else:
        print(f"Downloaded files to {output_path}")

folder_url = "https://drive.google.com/drive/folders/1krZQofLdeQrzunuKkLXy8L_kMzQrVFI_"
download_data(folder_url)

Retrieving folder contents


Retrieving folder 156s2fZlxFjkE1C6ndckCZ9Yoc6ZFK-h- electricity_nips
Processing file 198PkjDgXjFKg4J8Q73ZUuH0gXF5_fFNi data.pkl
Processing file 1F2XK0Z_4IczbU1iwE5qySCEhwIZmuBMv meanstd.pkl


Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=198PkjDgXjFKg4J8Q73ZUuH0gXF5_fFNi
To: /fs01/home/gsharma/diffusion_model_bootcamp/reference_implementations/time_series_reference_impelementation/CSDI/data/electricity_nips/electricity_nips/data.pkl
100%|███████████████████████████████████████████████████████████████████████████████████████| 35.5M/35.5M [00:00<00:00, 72.4MB/s]
Downloading...
From: https://drive.google.com/uc?id=1F2XK0Z_4IczbU1iwE5qySCEhwIZmuBMv
To: /fs01/home/gsharma/diffusion_model_bootcamp/reference_implementations/time_series_reference_impelementation/CSDI/data/electricity_nips/electricity_nips/meanstd.pkl
100%|███████████████████████████████████████████████████████████████████████████████████████| 6.11k/6.11k [00:00<00:00, 11.4MB/s]

Downloaded files to data/electricity_nips



Download completed


## Load Configuration File

The provided code snippet demonstrates how to load a configuration file in YAML format and convert it to a dictionary for use in a Python script:

1. **Load Configuration Function**:
   - A function named `load_config` is defined to read and parse a YAML configuration file.
   - The function:
     - Opens the specified YAML file (`config_path`), with a default path of `config/base_forecasting.yaml`.
     - Uses `yaml.safe_load` to parse the file content into a Python dictionary.
     - Returns the parsed configuration dictionary.

2. **Load and Print Configuration**:
   - The `load_config` function is called to load the configuration from the specified YAML file.
   - The resulting configuration dictionary is printed in a readable JSON format using `json.dumps` with indentation.

In [5]:
def load_config(config_path='config/base_forecasting.yaml'):
    with open(config_path, 'r') as f:
        config = yaml.safe_load(f)
    return config

config = load_config()
print(json.dumps(config, indent=4))

{
    "train": {
        "epochs": 100,
        "batch_size": 8,
        "lr": 0.001,
        "itr_per_epoch": 100000000.0
    },
    "diffusion": {
        "layers": 4,
        "channels": 64,
        "nheads": 8,
        "diffusion_embedding_dim": 128,
        "beta_start": 0.0001,
        "beta_end": 0.5,
        "num_steps": 50,
        "schedule": "quad",
        "is_linear": true
    },
    "model": {
        "is_unconditional": 0,
        "timeemb": 128,
        "featureemb": 16,
        "target_strategy": "test",
        "num_sample_features": 64
    }
}


## Model Setup and Initialization

The provided code snippet outlines the steps to set up and initialize the model, data loaders, and configuration settings for a forecasting task:

1. **Set Device**:
   - Determines whether a GPU (`cuda`) or CPU is available and sets the device accordingly.

2. **Set Up Data Loaders**:
   - Specifies the datatype as 'electricity' and the target dimension as 370 for the electricity dataset.
   - Calls the `get_dataloader` function to set up training, validation, and test data loaders, along with scalers. These loaders are essential for batching and normalizing data during model training and evaluation.

3. **Set Up Model**:
   - Initializes the `CSDI_Forecasting` model with the loaded configuration, device, and target dimension.
   - Transfers the model to the specified device (GPU or CPU).

4. **Set Up Output Folder**:
   - Creates a unique folder for saving model outputs and configurations, named based on the current date and time.

5. **Save Configuration**:
   - Saves the loaded configuration to a JSON file within the output folder, ensuring that the exact settings used for the model are recorded.

In [6]:
# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Set up dataloaders
datatype = 'electricity'
target_dim = 370  # for electricity dataset

train_loader, valid_loader, test_loader, scaler, mean_scaler = get_dataloader(
    datatype=datatype,
    device=device,
    batch_size=config['train']['batch_size']
)

# Set up model
model = CSDI_Forecasting(config, device, target_dim).to(device)

# Set up output folder
current_time = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
foldername = f"./save/forecasting_{datatype}_{current_time}/"
os.makedirs(foldername, exist_ok=True)

# Save config
with open(foldername + "config.json", "w") as f:
    json.dump(config, f, indent=4)

## Model Training

The provided code snippet demonstrates how to initiate the training process for the model:

1. **Train the Model**:
   - Calls the `train` function to start the training process.
   - Passes the following parameters to the `train` function:
     - `model`: The initialized `CSDI_Forecasting` model.
     - `config['train']`: The training configuration settings.
     - `train_loader`: The data loader for the training dataset.
     - `valid_loader`: The data loader for the validation dataset (optional).
     - `foldername`: The directory where training outputs and logs will be saved.

In [7]:
# Train the model
train(
    model,
    config['train'],
    train_loader,
    valid_loader=valid_loader,
    foldername=foldername
)

  3%|█▌                                                          | 18/691 [00:55<34:45,  3.10s/it, avg_epoch_loss=0.658, epoch=0]


KeyboardInterrupt: 

## Model Evaluation

The provided code snippet outlines the steps to evaluate the trained model:

1. **Set Evaluation Parameters**:
   - Specifies the number of samples (`nsample`) to be used for evaluation, set to 100 in this case.

2. **Evaluate the Model**:
   - Calls the `evaluate` function to assess the performance of the trained model.
   - Passes the following parameters to the `evaluate` function:
     - `model`: The trained `CSDI_Forecasting` model.
     - `test_loader`: The data loader for the test dataset.
     - `nsample`: The number of samples for evaluation.
     - `scaler`: The scaler used for normalizing the data during training.
     - `mean_scaler`: The mean scaler used for normalization.
     - `foldername`: The directory where evaluation results and logs will be saved.

In [None]:
# Set evaluation parameters
nsample = 100  # number of samples for evaluation

# Evaluate the model
evaluate(
    model,
    test_loader,
    nsample=nsample,
    scaler=scaler,
    mean_scaler=mean_scaler,
    foldername=foldername
)

## Load Pre-trained Model and Evaluate

In [None]:
def load_pretrained_model(model, modelfolder='pretrained', device=device):
    model_path = f"./save/{modelfolder}/model.pth"
    model.load_state_dict(torch.load(model_path, map_location=device))
    return model

# Load pre-trained model
pretrained_model = load_pretrained_model(model)
pretrained_model.target_dim = target_dim

# Evaluate the pre-trained model
evaluate(
    pretrained_model,
    test_loader,
    nsample=nsample,
    scaler=scaler,
    mean_scaler=mean_scaler,
    foldername=foldername
)