## 14. Wind Power Prediction Using the Tiny Time Mixers model

**Task Description**:

Use the TTM model to forecast wind power generation, incorporating wind speed and power data. At each time point, forecast the hourly output for the next day. Analyze the forecast errors for the different forecast horizons using the RMSE and MAE. Use the last available year of data as your test set.
Predict the sum of Offshore_Wind and Onshore_Wind (you can choose whether to forecast both and then sum, or forecast the sum directly) from the file: Realised_Supply_Germany.csv

Bonus: Test different feature selection approaches for the weather input.


In [None]:
# ensure that Jupyter automatically updates imports when the imported files change
%load_ext autoreload
%autoreload 2

In [9]:
# # add google drive
# from google.colab import drive
# drive.mount('/content/drive')

In [10]:
# !pip install -q "granite-tsfm[notebooks] @ git+https://github.com/ibm-granite/granite-tsfm.git"

In [11]:
# # sometimes needed to be run twice in colab
# %cd /content/drive/MyDrive/ttm

In [12]:
import math
import os
import tempfile

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

from torch.optim import AdamW
from torch.optim.lr_scheduler import OneCycleLR
from transformers import EarlyStoppingCallback, Trainer, TrainingArguments, set_seed
from transformers.integrations import INTEGRATION_TO_CALLBACK

import warnings

# Suppress all warnings
warnings.filterwarnings("ignore")

In [13]:
SEED = 42
set_seed(SEED)

# TTM Model path. The default model path is Granite-R2. Below, you can choose other TTM releases.
TTM_MODEL_PATH = "ibm-granite/granite-timeseries-ttm-r2"
# TTM_MODEL_PATH = "ibm-granite/granite-timeseries-ttm-r1"
# TTM_MODEL_PATH = "ibm-research/ttm-research-r2"

# Context length, Or Length of the history.
# Currently supported values are: 512/1024/1536 for Granite-TTM-R2 and Research-Use-TTM-R2, and 512/1024 for Granite-TTM-R1
CONTEXT_LENGTH = 512

# Granite-TTM-R2 supports forecast length upto 720 and Granite-TTM-R1 supports forecast length upto 96
PREDICTION_LENGTH = 24 # "hourly output for the next day"

dataset_path = "data/Realised_Supply_Germany.csv"
weather_path = "data/Weather_Data_Germany.csv"
weather_path_2 = "data/Weather_Data_Germany_2022.csv"
# Results dir
OUT_DIR = "results/"

In [14]:
df = pd.read_csv(dataset_path, on_bad_lines="skip")

In [15]:
weather = pd.read_csv(weather_path)
weather2022 = pd.read_csv(weather_path_2)
weather = pd.concat([weather, weather2022], axis=0).reset_index(drop=True)

### Exploratory Data Analysis

Looking at wind power generation data first

In [None]:
from plot_utils import wind_data_summary, create_wind_eda_plots, explain_data_completeness, wind_scatter_analysis

df = pd.read_csv(dataset_path)
df_processed = wind_data_summary(df)
create_wind_eda_plots(df_processed)
completeness = explain_data_completeness(df_processed)
wind_correlation = wind_scatter_analysis(df_processed)

Wind Offshore: Relatively consistent (Coefficient of Variation = 0.23), stable baseline ~414 MW, no missing values
Wind Onshore: Extremely variable (Coefficient of Variation = 1.96), low baseline ~99 MW, highly irregular

Consider forecasting them separately then summing, rather than forecasting the total directly. Offshore wind is the dominant, predictable component while onshore is the volatile component.

robust scaling and consider log transformation for onshore wind to handle the extreme right skew

Potentially doing something about DST irregularities? (1) fill gaps, 2) convert to UTC?

Seasonalities: Yearly (summer slightly higher than winter)

### 🌤️ Weather Data Description


| **Column**         | **Description**                                                                 | **Usage for TTM**                                           |
|--------------------|----------------------------------------------------------------------------------|--------------------------------------------------------------|
| `longitude`, `latitude` | Geographical coordinates of the forecast grid point                        | Use for **spatial alignment** or **interpolation** to plant |
| `forecast_origin`   | Time when the forecast was issued                                               | Use to determine **forecast lead time**                     |
| `time`              | Valid time of the forecast (i.e., when the forecast applies)                    | Align with **target wind power timestamp**                  |
| `cdir`              | Cardinal wind direction (°)                                                     | Optional directional feature                                |
| `z`                 | Geopotential height                                                             | Rarely needed for power; may relate to elevation effects    |
| `msl`               | Mean sea level pressure (hPa)                                                   | Could help model pressure-driven wind patterns              |
| `blh`               | Boundary layer height                                                           | Indicates turbulence layer; optional                        |
| `tcc`               | Total cloud cover (0–1)                                                         | Useful proxy for sunlight → solar gain interference         |
| `u10`, `v10`        | 10m wind speed, u-component (east-west), v-component (north-south) (m/s)                                                       | **Primary feature** for wind power                          |
| `t2m`               | Temperature at 2 meters (K)                                                     | Optional; affects air density                               |
| `ssr`, `tsr`, `sund`| Surface/Top solar radiation & sunshine duration                                 | Optional; mostly for solar models                           |
| `tp`                | Total precipitation                                                             | Might relate to wind instability                            |
| `fsr`               | Forecast solar radiation                                                        | See above                                                   |
| `u100`, `v100`      | 100m wind components (m/s), u-component (east-west), v-component (north-south)                                                      | **Highly relevant** — turbines operate around this height   |


In [None]:
from plot_utils import plot_weather_on_real_map, create_interactive_wind_analysis
coord_analysis = plot_weather_on_real_map(weather)
interactive_map = create_interactive_wind_analysis(coord_analysis)

-> Clear north-south wind resource gradient  
-> Lower pressure = higher winds (weather systems)  
-> 10m + 100m wind strongly correlate, might be redundant to use both  

### Data Processing - ETL

In [17]:
from tsfm_public import TimeSeriesPreprocessor, TrackingCallback, count_parameters, get_datasets
from tsfm_public.toolkit.get_model import get_model
from tsfm_public.toolkit.lr_finder import optimal_lr_finder
from tsfm_public.toolkit.visualization import plot_predictions

from preprocess import clean_wind_data, create_weather_features_simple, merge_wind_weather_data, create_temporal_splits

In [None]:
# TODO: create additional features (time of day, forecast lead time, lags) - look at medium article linked by her

In [18]:
from preprocess import clean_wind_data, create_weather_features_simple, merge_wind_weather_data, create_temporal_splits
def setup_ttm_preprocessor(df_combined, context_length=512, prediction_length=24, timestamp_column="timestamp"):
    """
    Set up TimeSeriesPreprocessor for TTM model
    """
    print("=== Setting up TTM preprocessor ===")

    context_length = 512 # 512 hours ≈ 21 days
    prediction_length = 24  # Next 24 hours

    # Column specifications
    id_columns = []  # Single aggregated time series

    # Target: What we want to forecast
    target_columns = ["wind_power_offshore", "wind_power_onshore"]

    # Observable: Weather variables (known in future via forecasts)
    # Since we have weather forecasts for all time periods, all weather features are observable
    observable_columns = [
        "wind_speed_100m",
        "wind_speed_10m",
        "pressure_hpa",
        "temperature_c",
        "wind_dir_sin",  # Wind direction is observable (from forecasts)
        "wind_dir_cos",  # Wind direction is observable (from forecasts)
        "u100", "v100",
        "u10", "v10",
        "blh",
        "tcc",
        "tp",  # precipitation is also observable from forecasts
        "forecast_lead_hours"  # forecast lead time as feature
    ]

    # Conditional: Variables known in past but not future
    # Since we have weather forecasts, most variables are observable rather than conditional
    conditional_columns = []

    # Control: None for this use case
    control_columns = []

    column_specifiers = {
        "timestamp_column": timestamp_column,
        "id_columns": id_columns,
        "target_columns": target_columns,
        "observable_columns": observable_columns,
        "conditional_columns": conditional_columns,
        "control_columns": control_columns,
    }

    print("TTM Column Configuration:")
    print(f"Target columns ({len(target_columns)}): {target_columns}")
    print(f"Observable columns ({len(observable_columns)}): {observable_columns}")
    print(f"Conditional columns ({len(conditional_columns)}): {conditional_columns}")
    print(f"Context length: {context_length} hours (~{context_length/24:.1f} days)")
    print(f"Prediction length: {prediction_length} hours")

    # Create preprocessor
    tsp = TimeSeriesPreprocessor(
        **column_specifiers,
        context_length=context_length,
        prediction_length=prediction_length,
        scaling=True,  # Important for different units
        encode_categorical=False,  # No categorical data
        scaler_type="standard",  # Standardize all features
    )

    return tsp, column_specifiers, context_length, prediction_length

# Main processing pipeline
def prepare_wind_power_dataset(df_wind, df_weather):
    """
    Complete pipeline to prepare wind power dataset for TTM
    """
    print("=== WIND POWER TTM DATA PREPARATION PIPELINE ===\n")

    print("Step 1: Cleaning and resampling wind power data (15min -> hourly)")
    df_wind_clean = clean_wind_data(df_wind)

    print("\nStep 2: Creating spatially averaged weather features")
    df_weather_agg = create_weather_features_simple(df_weather)

    print("\nStep 3: Merging datasets")
    df_combined = merge_wind_weather_data(df_wind_clean, df_weather_agg)

    print("\nStep 4: Creating temporal splits")
    df_combined, split_config = create_temporal_splits(df_combined, plot=False)
    # The following is model specific:
    print("\nStep 5: Setting up TTM preprocessor")
    tsp, column_specifiers, context_length, prediction_length = setup_ttm_preprocessor(df_combined)

    # Final check: ensure data is properly sorted
    if not df_combined['timestamp'].is_monotonic_increasing:
        print("\nFinal sorting check: Re-sorting data by timestamp")
        df_combined = df_combined.sort_values('timestamp').reset_index(drop=True)
    else:
        print("\nFinal sorting check: Data already properly sorted")

    print(f"\n✅ PIPELINE COMPLETE")
    print(f"Final dataset: {df_combined.shape}")
    print(f"Temporal resolution: Hourly")
    print(f"Spatial coverage: Germany (averaged)")

    return {
        'data': df_combined,
        'preprocessor': tsp,
        'split_config': split_config,
        'column_specifiers': column_specifiers,
        'context_length': context_length,
        'prediction_length': prediction_length
    }


In [None]:
df_wind = df.copy()
df_weather = weather.copy()
result = prepare_wind_power_dataset(df_wind, df_weather)

In [None]:
print(result.keys())
df_combined = result['data']
tsp = result['preprocessor']
split_config = result['split_config']

TARGET_DATASET = "german_wind_power"
dataset_name=TARGET_DATASET
context_length=CONTEXT_LENGTH
forecast_length=PREDICTION_LENGTH
batch_size=64

zeroshot_model = get_model(
    TTM_MODEL_PATH,
    context_length=context_length,
    prediction_length=forecast_length,
    freq_prefix_tuning=False,
    freq=None,
    prefer_l1_loss=False,
    prefer_longer_context=True,
)

# Creates the preprocessed pytorch datasets needed for training and evaluation using the HuggingFace trainer
dset_train, dset_valid, dset_test = get_datasets(
    tsp, df_combined, split_config, use_frequency_token=zeroshot_model.config.resolution_prefix_tuning
)
print(type(dset_train))

In [None]:
# print(f"\nPreprocessor target columns: {tsp.target_columns}")
# print(f"Preprocessor observable columns: {getattr(tsp, 'observable_columns', 'Not set')}")
# print(f"All feature columns: {getattr(tsp, 'feature_columns', 'Not set')}")
# print("\n2️⃣ TEST DATASET STRUCTURE:")
# print(f"Dataset type: {type(dset_test)}")
# print(f"Dataset length: {len(dset_test)}")
# sample = dset_test[0]
# print(f"\nSample structure: {type(sample)}")
# if isinstance(sample, dict):
#     for key, value in sample.items():
#         if hasattr(value, 'shape'):
#             print(f"  {key}: {value.shape} (dtype: {value.dtype})")
#         else:
#             print(f"  {key}: {type(value)}")
# # vars(zeroshot_trainer)
# dset_test[0]['past_values'].shape, dset_test[0]['future_values'].shape, len(dset_test)

In [None]:
# def compute_metrics(eval_pred):
#     predictions, labels = eval_pred
#     print(f"labels length: {len(labels)}")
#     for item in predictions:
#         print(type(item))
#         print(item.shape)
#     print(f"len(predictions) {len(predictions)}")
#     print(f"type(predictions) {type(predictions)}")
#     # print(f"predictions {predictions}")
#     print(f"len(labels) {len(labels)}")
#     mse = ((predictions - labels) ** 2).mean()
#     return {"custom_mse": mse}


In [None]:
temp_dir = tempfile.mkdtemp()

zeroshot_trainer = Trainer(
    model=zeroshot_model,
    args=TrainingArguments(
        output_dir=temp_dir,
        per_device_eval_batch_size=batch_size,
        seed=SEED,
        report_to="none",
    ),
    # compute_metrics=compute_metrics,
)

In [None]:
# # TODO: t+h step ahead metrics RMSE + MAE - maybe change loss metric from mse to mae?
# # evaluate = zero-shot performance
# print("+" * 20, "Test MSE zero-shot", "+" * 20)
# zeroshot_output = zeroshot_trainer.evaluate(dset_test)
# print(zeroshot_output)

In [None]:
# get predictions
predictions_dict = zeroshot_trainer.predict(dset_test)
predictions_np = predictions_dict.predictions[0]
print(predictions_np.shape)
print(f"    Dimension 0 ({predictions_np.shape[0]}): Number of sliding windows")
print(f"    Dimension 1 ({predictions_np.shape[1]}): Prediction horizon (hours ahead)")
print(f"    Dimension 2 ({predictions_np.shape[2]}): Number of channels/features being predicted")
df_test = df_combined.iloc[split_config['test'][0]:split_config['test'][1]].copy()

In [None]:
def denormalize_ttm_predictions_simple(predictions_np, tsp):
    """
    Simple denormalization for TTM predictions

    Parameters:
    -----------
    predictions_np : numpy array, shape (n_windows, horizon, n_channels)
        Raw model predictions in normalized space
    tsp : TimeSeriesPreprocessor
        Fitted preprocessor with target_scaler_dict

    Returns:
    --------
    dict : Denormalized predictions for each target
    """

    # Get target scaler
    target_scaler = tsp.target_scaler_dict['0']

    # Extract target predictions (first 2 channels)
    n_targets = len(tsp.target_columns)
    target_preds = predictions_np[:, :, :n_targets]  # (n_windows, 24, 2)

    # Reshape and denormalize
    n_windows, horizon, _ = target_preds.shape
    reshaped = target_preds.reshape(-1, n_targets)
    denormalized = target_scaler.inverse_transform(reshaped)
    final_shape = denormalized.reshape(n_windows, horizon, n_targets)

    # Return as dictionary
    result = {}
    for i, col in enumerate(tsp.target_columns):
        result[col] = final_shape[:, :, i]

    return result

def plot_clean_comparison(denorm_preds, df_combined, start_idx=0, length=500):
    """
    Clean plot with multiple forecast horizons - 5 prediction lines per target

    Parameters:
    -----------
    denorm_preds : dict
        Denormalized predictions from denormalize_ttm_predictions_simple()
    df_combined : pandas DataFrame
        Ground truth data
    start_idx : int
        Starting time step
    length : int
        Number of time steps to plot
    """

    print(f"📊 CREATING MULTI-HORIZON COMPARISON PLOT")
    print("="*50)

    # Extract predictions and ground truth
    offshore_pred = denorm_preds['wind_power_offshore']  # (n_windows, 24)
    onshore_pred = denorm_preds['wind_power_onshore']    # (n_windows, 24)

    offshore_truth = df_combined['wind_power_offshore'].values
    onshore_truth = df_combined['wind_power_onshore'].values

    # Define forecast horizons (1h, 6h, 12h, 18h, 24h ahead)
    horizons = [0, 5, 11, 17, 23]  # 0-indexed: 1h, 6h, 12h, 18h, 24h ahead
    horizon_labels = ['1h', '6h', '12h', '18h', '24h']
    colors = ['red', 'orange', 'green', 'blue', 'purple']

    # Create continuous prediction lines for each horizon
    continuous_preds = {}

    for i, (h_idx, h_label) in enumerate(zip(horizons, horizon_labels)):
        continuous_preds[f'offshore_{h_label}'] = offshore_pred[:, h_idx]
        continuous_preds[f'onshore_{h_label}'] = onshore_pred[:, h_idx]

    print(f"Ground truth length: {len(offshore_truth)}")
    print(f"Prediction windows: {offshore_pred.shape[0]}")
    print(f"Forecast horizons: {horizon_labels}")

    # Create the plot
    plt.figure(figsize=(18, 12))

    # Plot 1: Offshore Wind Power
    plt.subplot(2, 1, 1)

    # Ground truth
    plot_end = min(start_idx + length, len(offshore_truth))
    time_range = range(start_idx, plot_end)

    plt.plot(time_range, offshore_truth[start_idx:plot_end],
             'k-', linewidth=3, label='Ground Truth Offshore', alpha=0.9, zorder=10)

    # Prediction lines for different horizons
    for i, (h_idx, h_label, color) in enumerate(zip(horizons, horizon_labels, colors)):
        # For h-hour ahead predictions, we need to align with ground truth
        # Window i predicts for hour i+h+1, so we start plotting from hour h
        horizon_hours = h_idx + 1  # Convert 0-indexed to 1-indexed hours

        pred_start = start_idx + horizon_hours
        pred_end = min(pred_start + len(continuous_preds[f'offshore_{h_label}']), plot_end)

        if pred_start < plot_end and pred_start < len(continuous_preds[f'offshore_{h_label}']):
            pred_range = range(pred_start, pred_end)
            pred_values = continuous_preds[f'offshore_{h_label}'][start_idx:start_idx + len(pred_range)]

            plt.plot(pred_range, pred_values,
                    color=color, linewidth=2, linestyle='--', alpha=0.8,
                    label=f'Prediction {h_label} ahead', zorder=5-i)

    plt.title('Offshore Wind Power: Multi-Horizon Predictions vs Ground Truth', fontsize=14, fontweight='bold')
    plt.xlabel('Time Step (Hours)')
    plt.ylabel('Power (MW)')
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, alpha=0.3)

    # Plot 2: Onshore Wind Power
    plt.subplot(2, 1, 2)

    # Ground truth
    plt.plot(time_range, onshore_truth[start_idx:plot_end],
             'k-', linewidth=3, label='Ground Truth Onshore', alpha=0.9, zorder=10)

    # Prediction lines for different horizons
    for i, (h_idx, h_label, color) in enumerate(zip(horizons, horizon_labels, colors)):
        horizon_hours = h_idx + 1

        pred_start = start_idx + horizon_hours
        pred_end = min(pred_start + len(continuous_preds[f'onshore_{h_label}']), plot_end)

        if pred_start < plot_end and pred_start < len(continuous_preds[f'onshore_{h_label}']):
            pred_range = range(pred_start, pred_end)
            pred_values = continuous_preds[f'onshore_{h_label}'][start_idx:start_idx + len(pred_range)]

            plt.plot(pred_range, pred_values,
                    color=color, linewidth=2, linestyle='--', alpha=0.8,
                    label=f'Prediction {h_label} ahead', zorder=5-i)

    plt.title('Onshore Wind Power: Multi-Horizon Predictions vs Ground Truth', fontsize=14, fontweight='bold')
    plt.xlabel('Time Step (Hours)')
    plt.ylabel('Power (MW)')
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

    # Calculate metrics for each horizon
    print(f"\n📈 MULTI-HORIZON PREDICTION METRICS")
    print("="*60)

    metrics_summary = {}

    for target in ['offshore', 'onshore']:
        print(f"\n🎯 {target.upper()} WIND POWER:")
        print("-" * 30)

        if target == 'offshore':
            truth = offshore_truth
        else:
            truth = onshore_truth

        target_metrics = {}

        for h_idx, h_label in zip(horizons, horizon_labels):
            pred_key = f'{target}_{h_label}'
            pred_values = continuous_preds[pred_key]

            # Calculate metrics with proper alignment
            horizon_hours = h_idx + 1

            # Align predictions with ground truth
            if len(truth) > horizon_hours and len(pred_values) > 0:
                max_len = min(len(truth) - horizon_hours, len(pred_values))

                aligned_truth = truth[horizon_hours:horizon_hours + max_len]
                aligned_pred = pred_values[:max_len]

                mae = np.mean(np.abs(aligned_truth - aligned_pred))
                rmse = np.sqrt(np.mean((aligned_truth - aligned_pred)**2))
                mape = np.mean(np.abs((aligned_truth - aligned_pred) / aligned_truth)) * 100

                target_metrics[h_label] = {
                    'mae': mae,
                    'rmse': rmse,
                    'mape': mape,
                    'pred_mean': np.mean(aligned_pred),
                    'truth_mean': np.mean(aligned_truth)
                }

                print(f"  {h_label:>3} ahead: MAE={mae:6.1f} MW, RMSE={rmse:6.1f} MW, MAPE={mape:5.1f}%")

        metrics_summary[target] = target_metrics

    # Show horizon degradation
    print(f"\n📉 PREDICTION QUALITY DEGRADATION:")
    print("-" * 40)

    for target in ['offshore', 'onshore']:
        if target in metrics_summary:
            mae_1h = metrics_summary[target]['1h']['mae']
            mae_24h = metrics_summary[target]['24h']['mae']
            degradation = ((mae_24h - mae_1h) / mae_1h) * 100

            print(f"{target.capitalize():>8}: {mae_1h:.1f}→{mae_24h:.1f} MW (+{degradation:.1f}% error increase)")

    return {
        'metrics': metrics_summary,
        'continuous_predictions': continuous_preds
    }

def quick_evaluation(denorm_preds, df_combined, target_col="wind_power_offshore"):
    """
    Quick performance evaluation
    """
    pred = denorm_preds[target_col]
    truth = df_combined[target_col].values

    # Calculate metrics for first 100 windows
    maes = []
    for i in range(min(100, pred.shape[0])):
        if i + 24 <= len(truth):
            gt_window = truth[i:i+24]
            pred_window = pred[i]
            maes.append(np.mean(np.abs(gt_window - pred_window)))

    avg_mae = np.mean(maes)

    print(f"🎯 {target_col} Performance:")
    print(f"  Average MAE: {avg_mae:.1f} MW")
    print(f"  Prediction mean: {pred.mean():.1f} MW")
    print(f"  Ground truth mean: {np.mean(truth):.1f} MW")
    print(f"  Error as % of mean: {avg_mae/np.mean(truth)*100:.1f}%")

    return avg_mae

# Usage:
# denorm_preds = denormalize_ttm_predictions_simple(predictions_np, tsp)
# results = plot_clean_comparison(denorm_preds, df_test, start_idx=0, length=500)

In [None]:
import json
import pandas as pd
import numpy as np
from datetime import datetime
from pathlib import Path

def extract_experiment_metadata(result, ttm_model, dset_test, predictions_np,
                               batch_size=64, model_type="zero-shot", model_path=None,
                               dataset_name="german_wind_power", spatial_coverage="germany_averaged"):
    """
    Extract experiment-specific metadata for evaluation pipeline tracking

    Parameters:
    -----------
    result : dict
        Output from prepare_wind_power_dataset()
    ttm_model : model
        The TTM model instance
    dset_test : dataset
        Test dataset
    predictions_np : numpy array
        Raw model predictions
    batch_size : int
        Batch size used for inference
    model_type : str
        "zero-shot", "fine-tuned", etc.
    model_path : str
        Path to model weights
    dataset_name : str
        Name of dataset (configurable)
    spatial_coverage : str
        Spatial coverage description (configurable)

    Returns:
    --------
    dict : Experiment-specific metadata only
    """

    print("🔍 EXTRACTING EXPERIMENT METADATA")
    print("="*45)

    tsp = result['preprocessor']

    # 1. Feature configuration (varies between experiments)
    feature_config = {
        'target_columns': tsp.target_columns.copy(),
        'observable_columns': getattr(tsp, 'observable_columns', []).copy(),
        'conditional_columns': getattr(tsp, 'conditional_columns', []).copy(),
        'num_targets': len(tsp.target_columns),
        'num_observables': len(getattr(tsp, 'observable_columns', [])),
        'num_total_features': len(tsp.target_columns) + len(getattr(tsp, 'observable_columns', [])) + len(getattr(tsp, 'conditional_columns', [])),
        'scaling_enabled': getattr(tsp, 'scaling', True),
        'scaler_type': getattr(tsp, 'scaler_type', 'standard')
    }

    # 2. Model and training hyperparameters (varies between experiments)
    model_config = {
        'model_type': model_type,
        'model_path': model_path,
        'context_length': result['context_length'],
        'prediction_length': result['prediction_length'],
        'batch_size': batch_size,
        'freq_prefix_tuning': getattr(ttm_model.config, 'freq_prefix_tuning', False),
        'resolution_prefix_tuning': getattr(ttm_model.config, 'resolution_prefix_tuning', False)
    }

    # 3. Prediction metadata (varies between experiments)
    prediction_info = {
        'prediction_shape': list(predictions_np.shape),
        'num_windows': predictions_np.shape[0],
        'horizon_length': predictions_np.shape[1],
        'num_channels': predictions_np.shape[2],
        'test_windows': len(dset_test)
    }

    # 4. Experiment tracking info
    experiment_info = {
        'timestamp': datetime.now().isoformat(),
        'experiment_id': generate_experiment_id(model_config, feature_config),
        'run_name': generate_run_name(model_type, model_config, feature_config),
        'dataset_name': dataset_name,
        'spatial_coverage': spatial_coverage
    }

    # Combine all metadata (experiment-specific only)
    metadata = {
        'feature_config': feature_config,
        'model_config': model_config,
        'prediction_info': prediction_info,
        'experiment_info': experiment_info
    }

    # Print summary
    print_metadata_summary(metadata)

    return metadata

def generate_experiment_id(model_config, feature_config):
    """Generate timestamp-based experiment ID"""
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    return timestamp

def generate_run_name(model_type, model_config, feature_config):
    """Generate human-readable run name"""
    return f"{model_type}_cxt{model_config['context_length']}_pred{model_config['prediction_length']}_f{feature_config['num_total_features']}"

def print_metadata_summary(metadata):
    """Print a concise summary of the extracted metadata"""

    print(f"\n📋 EXPERIMENT METADATA SUMMARY")
    print("-" * 40)

    exp = metadata['experiment_info']
    features = metadata['feature_config']
    model = metadata['model_config']
    pred = metadata['prediction_info']

    print(f"🆔 Experiment ID: {exp['experiment_id']}")
    print(f"🏷️  Run Name: {exp['run_name']}")
    print(f"📅 Timestamp: {exp['timestamp'][:19]}")
    print(f"🗺️  Coverage: {exp['spatial_coverage']}")

    print(f"\n🎯 Features:")
    print(f"   Targets: {features['num_targets']} {features['target_columns']}")
    print(f"   Observables: {features['num_observables']}")
    print(f"   Total features: {features['num_total_features']}")
    print(f"   Scaling: {features['scaler_type'] if features['scaling_enabled'] else 'none'}")

    print(f"\n🤖 Model: {model['model_type']}")
    print(f"   Context: {model['context_length']}h ({model['context_length']/24:.1f} days)")
    print(f"   Prediction: {model['prediction_length']}h")
    print(f"   Batch size: {model['batch_size']}")

    print(f"\n🔮 Predictions: {pred['prediction_shape']}")
    print(f"   ({pred['num_windows']:,} windows, {pred['horizon_length']}h horizon, {pred['num_channels']} channels)")

def compute_multihorizon_metrics(predictions_denormalized, ground_truth, target_columns, horizons=None):
    """
    Compute comprehensive metrics for all forecast horizons

    Parameters:
    -----------
    predictions_denormalized : dict
        Output from denormalize_ttm_predictions_simple()
    ground_truth : dict
        Ground truth arrays for each target
    target_columns : list
        List of target column names
    horizons : list, optional
        Forecast horizons to evaluate (in hours, 1-indexed). If None, uses all 24 hours.

    Returns:
    --------
    dict : Metrics organized by target and horizon
    """

    print(f"\n📊 COMPUTING MULTI-HORIZON METRICS")
    print("-" * 40)

    # Default to all 24 forecast horizons
    if horizons is None:
        horizons = list(range(1, 25))  # 1h, 2h, 3h, ..., 24h ahead

    metrics = {}

    for target in target_columns:
        if target not in predictions_denormalized or target not in ground_truth:
            continue

        pred_array = predictions_denormalized[target]  # (n_windows, 24)
        truth_array = ground_truth[target]

        target_metrics = {}

        print(f"\n  {target}:")

        for h in horizons:
            h_idx = h - 1  # Convert to 0-indexed

            if h_idx >= pred_array.shape[1]:
                continue

            # Extract h-hour ahead predictions
            pred_h = pred_array[:, h_idx]

            # Align with ground truth (account for forecast horizon offset)
            max_len = min(len(truth_array) - h, len(pred_h))

            if max_len <= 0:
                continue

            aligned_truth = truth_array[h:h + max_len]
            aligned_pred = pred_h[:max_len]

            # Calculate metrics (only MAE and RMSE)
            mae = np.mean(np.abs(aligned_truth - aligned_pred))
            rmse = np.sqrt(np.mean((aligned_truth - aligned_pred)**2))

            target_metrics[f'{h}h'] = {
                'mae': float(mae),
                'rmse': float(rmse)
            }

            # Print every 4th hour to avoid clutter
            if h % 4 == 1 or h == 24:
                print(f"    {h:>2}h ahead: MAE={mae:6.1f} MW, RMSE={rmse:6.1f} MW")

        # Overall metrics (average across all horizons)
        if target_metrics:
            overall_mae = np.mean([m['mae'] for m in target_metrics.values()])
            overall_rmse = np.mean([m['rmse'] for m in target_metrics.values()])

            target_metrics['overall'] = {
                'mae': float(overall_mae),
                'rmse': float(overall_rmse)
            }

            print(f"    Overall: MAE={overall_mae:6.1f} MW, RMSE={overall_rmse:6.1f} MW")

        metrics[target] = target_metrics

    return metrics

def prepare_evaluation_data(metadata, predictions_np, df_combined, tsp, split_config,
                          out_dir="results/", save_results=True):
    """
    Prepare data, compute metrics, and save evaluation results

    Parameters:
    -----------
    metadata : dict
        Experiment metadata
    predictions_np : numpy array
        Raw model predictions
    df_combined : pandas.DataFrame
        Full combined dataset
    tsp : TimeSeriesPreprocessor
        Fitted preprocessor
    split_config : dict
        Train/val/test split configuration
    out_dir : str
        Base output directory
    save_results : bool
        Whether to save results to disk

    Returns:
    --------
    dict : Complete evaluation results
    """

    print(f"\n🔧 PREPARING EVALUATION DATA")
    print("-" * 30)

    # Extract test data
    test_start, test_end = split_config['test']
    df_test = df_combined.iloc[test_start:test_end].copy()

    # Prepare ground truth arrays
    ground_truth = {}
    for target in metadata['feature_config']['target_columns']:
        if target in df_test.columns:
            ground_truth[target] = df_test[target].values

    # Denormalize predictions (function should be defined in jupyter cell above)
    denormalized_predictions = denormalize_ttm_predictions_simple(predictions_np, tsp)

    # Compute metrics
    metrics = compute_multihorizon_metrics(
        denormalized_predictions,
        ground_truth,
        metadata['feature_config']['target_columns']
    )

    evaluation_results = {
        'metadata': metadata,
        'metrics': metrics,
        'predictions_raw': predictions_np,
        'predictions_denormalized': denormalized_predictions,
        'ground_truth': ground_truth,
        'test_timestamps': df_test['timestamp'].values
    }

    if save_results:
        save_evaluation_results(evaluation_results, out_dir)

    print(f"\n✅ Evaluation complete:")
    print(f"   Targets evaluated: {list(metrics.keys())}")
    print(f"   Horizons per target: {len([k for k in list(metrics.values())[0].keys() if k != 'overall'])}")

    return evaluation_results

def save_evaluation_results(evaluation_results, out_dir="results/"):
    """
    Save evaluation results to structured directory

    Parameters:
    -----------
    evaluation_results : dict
        Complete evaluation results from prepare_evaluation_data()
    out_dir : str
        Base output directory
    """

    print(f"\n💾 SAVING EVALUATION RESULTS")
    print("-" * 30)

    metadata = evaluation_results['metadata']
    metrics = evaluation_results['metrics']

    # Create base output directory
    out_path = Path(out_dir)
    out_path.mkdir(exist_ok=True)

    # Create experiment-specific directory
    exp_id = metadata['experiment_info']['experiment_id']
    run_name = metadata['experiment_info']['run_name']
    exp_dir = out_path / f"{exp_id}_{run_name}"
    exp_dir.mkdir(exist_ok=True)

    print(f"📁 Experiment directory: {exp_dir}")

    # 1. Save metadata as structured CSV
    metadata_df = flatten_metadata_to_dataframe(metadata)
    metadata_path = exp_dir / "metadata.csv"
    metadata_df.to_csv(metadata_path, index=False)
    print(f"   ✅ Metadata saved: {metadata_path.name}")

    # 2. Save metrics as structured CSV
    metrics_df = flatten_metrics_to_dataframe(metrics)
    metrics_path = exp_dir / "metrics.csv"
    metrics_df.to_csv(metrics_path, index=False)
    print(f"   ✅ Metrics saved: {metrics_path.name}")

    # 3. Save predictions as npz
    predictions_path = exp_dir / "predictions.npz"
    np.savez_compressed(
        predictions_path,
        raw_predictions=evaluation_results['predictions_raw'],
        **evaluation_results['predictions_denormalized'],  # Save each target separately
        timestamps=evaluation_results['test_timestamps']
    )
    print(f"   ✅ Predictions saved: {predictions_path.name}")

    # 4. Save ground truth once (check if exists)
    ground_truth_path = out_path / "ground_truth.npz"
    if not ground_truth_path.exists():
        np.savez_compressed(
            ground_truth_path,
            **evaluation_results['ground_truth'],
            timestamps=evaluation_results['test_timestamps']
        )
        print(f"   ✅ Ground truth saved: {ground_truth_path.name}")
    else:
        print(f"   ⏭️  Ground truth exists: {ground_truth_path.name}")

    print(f"\n📁 All results saved in: {exp_dir}")
    return exp_dir

def flatten_metadata_to_dataframe(metadata):
    """
    Convert nested metadata dict to flat DataFrame for easy CSV reading
    """

    rows = []

    # Flatten each section
    for section_name, section_data in metadata.items():
        if isinstance(section_data, dict):
            for key, value in section_data.items():
                if isinstance(value, (list, tuple)):
                    # Convert lists to string representation
                    value = str(value)

                rows.append({
                    'section': section_name,
                    'parameter': key,
                    'value': value
                })

    return pd.DataFrame(rows)

def flatten_metrics_to_dataframe(metrics):
    """
    Convert nested metrics dict to flat DataFrame for easy analysis
    """

    rows = []

    for target, target_metrics in metrics.items():
        for horizon, horizon_metrics in target_metrics.items():
            if isinstance(horizon_metrics, dict):
                for metric_name, metric_value in horizon_metrics.items():
                    rows.append({
                        'target': target,
                        'horizon': horizon,
                        'metric': metric_name,
                        'value': metric_value
                    })

    df = pd.DataFrame(rows)

    # Pivot for easier reading: columns = metrics, rows = target_horizon combinations
    if not df.empty:
        df_pivot = df.pivot_table(
            index=['target', 'horizon'],
            columns='metric',
            values='value',
            fill_value=None
        ).reset_index()

        # Flatten column names
        df_pivot.columns.name = None

        return df_pivot

    return df

metadata = extract_experiment_metadata(
    result=result,
    ttm_model=zeroshot_model,
    dset_test=dset_test,
    predictions_np=predictions_np,
    batch_size=batch_size,
    model_type="zero-shot",
    model_path=TTM_MODEL_PATH,
    dataset_name="german_wind_power",
    spatial_coverage="germany_averaged"  # Pass as parameter
)

eval_results = prepare_evaluation_data(
    metadata, predictions_np, df_combined, tsp, split_config,
    out_dir="results/", save_results=True
)

### Finetuning

In [None]:
# Fine-tune the model with improved configuration
print("🚀 FINE-TUNING TTM MODEL")
print("="*40)

# Training configuration
learning_rate = 0.001  # Higher learning rate as per TTM recommendations
num_epochs = 5
batch_size = 64

print(f"Using learning rate = {learning_rate}")

# Create temporary directory for training artifacts
temp_dir = tempfile.mkdtemp(prefix="ttm_finetune_")
print(f"📁 Training in: {temp_dir}")

import os
os.environ["WANDB_DISABLED"] = "true"

model_type = "finetuned"
model_path = TTM_MODEL_PATH

model_config = {
    'model_type': model_type,
    'model_path': model_path,
    'context_length': result['context_length'],
    'prediction_length': result['prediction_length'],
    # 'batch_size': batch_size,
    # 'freq_prefix_tuning': getattr(ttm_model.config, 'freq_prefix_tuning', False),
    # 'resolution_prefix_tuning': getattr(ttm_model.config, 'resolution_prefix_tuning', False)
}
feature_config = {
      'target_columns': tsp.target_columns.copy(),
      # 'observable_columns': getattr(tsp, 'observable_columns', []).copy(),
      # 'conditional_columns': getattr(tsp, 'conditional_columns', []).copy(),
      'num_targets': len(tsp.target_columns),
      'num_observables': len(getattr(tsp, 'observable_columns', [])),
      'num_total_features': len(tsp.target_columns) + len(getattr(tsp, 'observable_columns', [])) + len(getattr(tsp, 'conditional_columns', [])),
      # 'scaling_enabled': getattr(tsp, 'scaling', True),
      # 'scaler_type': getattr(tsp, 'scaler_type', 'standard')
  }


# Training arguments (adapted from TTM example)
training_args = TrainingArguments(
    run_name=generate_run_name(model_type, model_config, feature_config),
    output_dir=temp_dir,
    overwrite_output_dir=True,
    learning_rate=learning_rate,
    num_train_epochs=num_epochs,
    do_eval=True,
    eval_strategy="epoch",  # <-- Use this instead of evaluation_strategy
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    dataloader_num_workers=4,  # Reduced for stability
    report_to=None,
    save_strategy="epoch",
    logging_strategy="epoch",
    save_total_limit=1,
    logging_dir=os.path.join(temp_dir, "logs"),
    # load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    seed=42,
)

# Create callbacks
early_stopping_callback = EarlyStoppingCallback(
    early_stopping_patience=5,  # Reasonable patience for our epochs
    early_stopping_threshold=0.0,
)

# Check if TrackingCallback is available
try:
    tracking_callback = TrackingCallback()
    callbacks = [early_stopping_callback, tracking_callback]
except:
    callbacks = [early_stopping_callback]
    print("TrackingCallback not available, using only early stopping")

# Custom optimizer and scheduler (as per TTM recommendations)
optimizer = AdamW(zeroshot_model.parameters(), lr=learning_rate)
scheduler = OneCycleLR(
    optimizer,
    learning_rate,
    epochs=num_epochs,
    steps_per_epoch=math.ceil(len(dset_train) / batch_size),
)

finetuned_model = get_model(
    TTM_MODEL_PATH,
    context_length=context_length,
    prediction_length=forecast_length,
    freq_prefix_tuning=False,
    freq=None,
    prefer_l1_loss=False,
    prefer_longer_context=True,
)

# Create trainer
trainer = Trainer(
    model=finetuned_model,  # This will be fine-tuned in place
    args=training_args,
    train_dataset=dset_train,
    eval_dataset=dset_valid,
    callbacks=callbacks,
    optimizers=(optimizer, scheduler),
)

# Train the model
print(f"🏃 Training for max {num_epochs} epochs...")
print(f"   Train samples: {len(dset_train):,}")
print(f"   Val samples: {len(dset_valid):,}")
print(f"   Steps per epoch: {math.ceil(len(dset_train) / batch_size)}")

train_result = trainer.train()

print(f"✅ Training completed!")
print(f"   Final loss: {train_result.training_loss:.4f}")
print(f"   Steps: {train_result.global_step}")

In [None]:
# Get predictions
print(f"🔮 Getting test predictions...")
predictions_dict = trainer.predict(dset_test)
predictions_np = predictions_dict.predictions[0]
print(f"   Shape: {predictions_np.shape}")

print(f"\n📊 Model fine-tuned! Ready for evaluation.")
print(f"Use: predictions_np for the fine-tuned predictions")

In [None]:
metadata = extract_experiment_metadata(
    result=result,
    ttm_model=zeroshot_model, # finetuned_model
    dset_test=dset_test,
    predictions_np=predictions_np,
    batch_size=batch_size,
    model_type=model_type,
    model_path=TTM_MODEL_PATH,
    dataset_name="german_wind_power",
    spatial_coverage="germany_averaged"
)

eval_results = prepare_evaluation_data(
    metadata, predictions_np, df_combined, tsp, split_config,
    out_dir="results/", save_results=True
)

In [None]:
# Let’s say your application needs to forecast 24 hours in the future.
# You can still use the 512-96 TTM model and set `prediction_filter_length=24` argument during model loading.
# Try it on etth1, and note the evaluation error (on all channels)?

In [None]:
# In your notebook, add `prediction_channel_indices=[0,2]`
# during model loading to forecast only 0th and 2nd channels.
# In this case, execute the following code and note the output shape.
# from tsfm_public.models.tinytimemixer import TinyTimeMixerForPrediction
# zeroshot_model = TinyTimeMixerForPrediction.from_pretrained("ibm/TTM", revision=TTM_MODEL_REVISION, prediction_channel_indices=[0,2])
# output = zeroshot_model.forward(test_dataset[0]['past_values'].unsqueeze(0), return_loss=False)
# output.prediction_outputs.shape

### Streamlit App

In [None]:
!pip install -q streamlit

In [None]:
!streamlit run app.py &>/content/drive/MyDrive/ttm/logs.txt &

In [None]:
# !npm install localtunnel
!npx localtunnel --port 8501

In [None]:
# !pip install pyngrok
# !ngrok authtoken <YOUR_AUTHTOKEN>
# !ngrok http 8501


### tsfm Code

In [None]:
# get backbone embeddings (if needed for further analysis)
# backbone_embedding = predictions_dict.predictions[1]
# print(backbone_embedding.shape)

In [None]:
# TODO: Show complete context - should be 512 + 24?
# plot
plot_predictions(
    model=zeroshot_trainer.model,
    dset=dset_test,
    plot_dir=os.path.join(OUT_DIR, dataset_name),
    plot_prefix="test_zeroshot",
    # indices=[685, 118, 902, 1984, 894, 967, 304, 57, 265, 1015],
    channel=0,
)

In [None]:
example_data

In [None]:
TARGET_DATASET = "etth1"
example_dataset_path = "https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTh1.csv"

timestamp_column = "date"
id_columns = []  # mention the ids that uniquely identify a time-series.

target_columns = ["HUFL", "HULL", "MUFL", "MULL", "LUFL", "LULL", "OT"]
split_config = {
    "train": [0, 8640],
    "valid": [8640, 11520],
    "test": [
        11520,
        14400,
    ],
}
# Understanding the split config -- slides

example_data = pd.read_csv(
    example_dataset_path,
    parse_dates=[timestamp_column],
)

column_specifiers = {
    "timestamp_column": timestamp_column,
    "id_columns": id_columns,
    "target_columns": target_columns,
    "control_columns": [],
}

# def zeroshot_eval(dataset_name, batch_size, context_length=512, forecast_length=96):
    # Get data
# zeroshot_eval(
#     dataset_name=TARGET_DATASET, context_length=CONTEXT_LENGTH, forecast_length=PREDICTION_LENGTH, batch_size=64
# )
dataset_name=TARGET_DATASET
context_length=CONTEXT_LENGTH
forecast_length=PREDICTION_LENGTH
batch_size=64

tsp = TimeSeriesPreprocessor(
    **column_specifiers,
    context_length=context_length,
    prediction_length=forecast_length,
    scaling=True,
    encode_categorical=False,
    scaler_type="standard",
)

# Load model
zeroshot_model = get_model(
    TTM_MODEL_PATH,
    context_length=context_length,
    prediction_length=forecast_length,
    freq_prefix_tuning=False,
    freq=None,
    prefer_l1_loss=False,
    prefer_longer_context=True,
)

dset_train, dset_valid, dset_test = get_datasets(
    tsp, data, split_config, use_frequency_token=zeroshot_model.config.resolution_prefix_tuning
)

temp_dir = tempfile.mkdtemp()
# zeroshot_trainer
zeroshot_trainer = Trainer(
    model=zeroshot_model,
    args=TrainingArguments(
        output_dir=temp_dir,
        per_device_eval_batch_size=batch_size,
        seed=SEED,
        report_to="none",
    ),
)
# evaluate = zero-shot performance
print("+" * 20, "Test MSE zero-shot", "+" * 20)
zeroshot_output = zeroshot_trainer.evaluate(dset_test)
print(zeroshot_output)

# get predictions

predictions_dict = zeroshot_trainer.predict(dset_test)

predictions_np = predictions_dict.predictions[0]

print(predictions_np.shape)

# get backbone embeddings (if needed for further analysis)

backbone_embedding = predictions_dict.predictions[1]

print(backbone_embedding.shape)

# plot
plot_predictions(
    model=zeroshot_trainer.model,
    dset=dset_test,
    plot_dir=os.path.join(OUT_DIR, dataset_name),
    plot_prefix="test_zeroshot",
    indices=[685, 118, 902, 1984, 894, 967, 304, 57, 265, 1015],
    channel=0,
)

In [None]:
# Input:      (8737, 512, 16)     # Raw time series
# Patchify:   (8737, 16, 8, 64)   # 64-timestep patches
# Embed:      (8737, 16, 8, 192)  # Project to 192 dims
# Block 0:    (8737, 16, 36, 48)  # Expand patches, reduce dims
# Block 1:    (8737, 16, 18, 96)  # Reduce patches, expand dims
# Block 2:    (8737, 16, 9, 192)  # 🎯 FINAL: 9 patches × 192 dims
# Decoder:    (8737, 16, 9, 128)  # Reduce to 128 dims
# Head:       (8737, 48, 16)      # Flatten: 9×128=1152 → predict 48 hours
# Filter:     (8737, 24, 16)      # Keep first 24 hours