# Transformer Oil Temperature Prediction using RNN-ResNet Hybrid Model

This notebook implements a hybrid deep learning architecture combining **Recurrent Neural Networks (RNN)** and **Residual Networks (ResNet)** for predicting transformer oil temperature.

## Model Architecture

**RNN → ResNet Pipeline**:
1. **RNN Layer** (LSTM/GRU): Captures temporal dependencies in the time series
2. **ResNet Blocks**: Deep feature learning with skip connections
3. **Output Layer**: Temperature prediction

**Two Variants**:
- **LSTM-ResNet**: Better for long-term dependencies
- **GRU-ResNet**: Faster training, fewer parameters

## Prediction Horizons
- **1 hour ahead**: Short-term prediction
- **1 day ahead**: Medium-term prediction
- **1 week ahead**: Long-term prediction

## Comparison with Baselines
- Random Forest (R² = 0.60 for 1h)
- Pure ResNet
- **New**: LSTM-ResNet
- **New**: GRU-ResNet

## 1. Environment Setup and Imports

In [None]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
from datetime import datetime
warnings.filterwarnings('ignore')

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, Dataset

# Sklearn for metrics and preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
import joblib

# Set random seeds for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    
# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 6)

print(f"PyTorch Version: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA Device: {torch.cuda.get_device_name(0)}")
    print(f"CUDA Version: {torch.version.cuda}")
print(f"NumPy Version: {np.__version__}")
print(f"Pandas Version: {pd.__version__}")

## 2. Device Configuration

In [None]:
# Automatically select GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

if device.type == 'cuda':
    print(f"\nGPU Information:")
    print(f"  Name: {torch.cuda.get_device_name(0)}")
    print(f"  Memory Allocated: {torch.cuda.memory_allocated(0) / 1024**3:.2f} GB")
    print(f"  Memory Reserved: {torch.cuda.memory_reserved(0) / 1024**3:.2f} GB")
    print(f"  Max Memory Allocated: {torch.cuda.max_memory_allocated(0) / 1024**3:.2f} GB")
else:
    print("\nRunning on CPU (GPU not available or not configured)")
    print("Training will be slower. Consider using CUDA if available.")

## 3. Data Loading and Exploration

Load data from the `dataset/` folder.

In [None]:
# Define data paths
data_dir = Path('../dataset')
trans1_path = data_dir / 'trans_1.csv'
trans2_path = data_dir / 'trans_2.csv'

# Check if files exist
if not trans1_path.exists():
    raise FileNotFoundError(f"Data file not found: {trans1_path}")
if not trans2_path.exists():
    raise FileNotFoundError(f"Data file not found: {trans2_path}")

print("Loading transformer data...")
df1 = pd.read_csv(trans1_path)
df2 = pd.read_csv(trans2_path)

print(f"\nTransformer 1 data shape: {df1.shape}")
print(f"Transformer 2 data shape: {df2.shape}")

# Display first few rows
print("\nTransformer 1 - First 5 rows:")
display(df1.head())

print("\nData columns:")
print(df1.columns.tolist())

# Basic statistics
print("\nBasic Statistics:")
display(df1.describe())

In [None]:
# Check for missing values
print("Missing values in Transformer 1:")
print(df1.isnull().sum())

print("\nMissing values in Transformer 2:")
print(df2.isnull().sum())

# Data types
print("\nData types:")
print(df1.dtypes)

## 4. Time Series Data Preprocessing

Create sliding window sequences for RNN input.

In [None]:
def create_sequences(data, seq_length, target_col='OT', feature_cols=None, forecast_horizon=1):
    """
    Create sequences for time series prediction with RNN
    
    Args:
        data: DataFrame with time series data
        seq_length: Length of input sequence (number of time steps to look back)
        target_col: Name of target column (oil temperature)
        feature_cols: List of feature column names. If None, use all except target
        forecast_horizon: Steps ahead to predict (1 for 1h, 96 for 1d, 672 for 1w)
    
    Returns:
        X: Input sequences (samples, seq_length, features)
        y: Target values (samples,)
    """
    if feature_cols is None:
        feature_cols = [col for col in data.columns if col != target_col]
    
    # Extract features and target
    features = data[feature_cols].values
    target = data[target_col].values
    
    X, y = [], []
    
    # Create sequences
    for i in range(len(data) - seq_length - forecast_horizon + 1):
        # Input sequence: seq_length time steps
        X.append(features[i:i+seq_length])
        # Target: temperature at forecast_horizon steps ahead
        y.append(target[i+seq_length+forecast_horizon-1])
    
    return np.array(X), np.array(y)


def prepare_data_for_config(df, config='1h', seq_length=10, test_size=0.2, val_size=0.2):
    """
    Prepare data for a specific forecasting configuration
    
    Args:
        df: Input DataFrame
        config: '1h', '1d', or '1w'
        seq_length: Length of input sequences
        test_size: Fraction for test set
        val_size: Fraction of training data for validation
    
    Returns:
        X_train, X_val, X_test, y_train, y_val, y_test, scaler
    """
    # Define forecast horizons (based on 15-min intervals)
    # 1h = 4 steps, 1d = 96 steps, 1w = 672 steps
    horizons = {
        '1h': 4,
        '1d': 96,
        '1w': 672
    }
    
    horizon = horizons[config]
    
    # Feature columns (exclude target)
    feature_cols = ['HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL']
    
    # Create sequences
    print(f"\nCreating sequences for {config} prediction...")
    print(f"  Sequence length: {seq_length}")
    print(f"  Forecast horizon: {horizon} steps ({config})")
    
    X, y = create_sequences(
        df, 
        seq_length=seq_length,
        target_col='OT',
        feature_cols=feature_cols,
        forecast_horizon=horizon
    )
    
    print(f"  Total sequences created: {len(X)}")
    print(f"  X shape: {X.shape}  # (samples, seq_length, features)")
    print(f"  y shape: {y.shape}  # (samples,)")
    
    # Split into train and test
    split_idx = int(len(X) * (1 - test_size))
    X_train_full, X_test = X[:split_idx], X[split_idx:]
    y_train_full, y_test = y[:split_idx], y[split_idx:]
    
    # Further split train into train and validation
    val_idx = int(len(X_train_full) * (1 - val_size))
    X_train, X_val = X_train_full[:val_idx], X_train_full[val_idx:]
    y_train, y_val = y_train_full[:val_idx], y_train_full[val_idx:]
    
    # Normalize features (fit on training data only)
    # Reshape for scaling: (samples * seq_length, features)
    n_samples_train, seq_len, n_features = X_train.shape
    
    scaler = StandardScaler()
    X_train_2d = X_train.reshape(-1, n_features)
    X_train_scaled = scaler.fit_transform(X_train_2d).reshape(n_samples_train, seq_len, n_features)
    
    # Transform validation and test
    n_samples_val = X_val.shape[0]
    n_samples_test = X_test.shape[0]
    
    X_val_2d = X_val.reshape(-1, n_features)
    X_val_scaled = scaler.transform(X_val_2d).reshape(n_samples_val, seq_len, n_features)
    
    X_test_2d = X_test.reshape(-1, n_features)
    X_test_scaled = scaler.transform(X_test_2d).reshape(n_samples_test, seq_len, n_features)
    
    print(f"\nData splits:")
    print(f"  Train: {X_train_scaled.shape[0]} samples")
    print(f"  Validation: {X_val_scaled.shape[0]} samples")
    print(f"  Test: {X_test_scaled.shape[0]} samples")
    
    return X_train_scaled, X_val_scaled, X_test_scaled, y_train, y_val, y_test, scaler


# Test the preprocessing function
print("Testing data preprocessing...")
SEQ_LENGTH = 10  # Look back 10 time steps (2.5 hours with 15-min intervals)

# Prepare data for all three configurations
data_1h = prepare_data_for_config(df1, config='1h', seq_length=SEQ_LENGTH)
data_1d = prepare_data_for_config(df1, config='1d', seq_length=SEQ_LENGTH)
data_1w = prepare_data_for_config(df1, config='1w', seq_length=SEQ_LENGTH)

print("\n✓ Data preprocessing completed successfully!")

## 5. LSTM-ResNet Model Architecture

In [None]:
class ResidualBlock(nn.Module):
    """
    Residual block with skip connection
    """
    def __init__(self, input_dim, hidden_dim, dropout=0.3):
        super(ResidualBlock, self).__init__()
        
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.bn1 = nn.BatchNorm1d(hidden_dim)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(dropout)
        
        self.fc2 = nn.Linear(hidden_dim, input_dim)
        self.bn2 = nn.BatchNorm1d(input_dim)
        
    def forward(self, x):
        identity = x
        
        out = self.fc1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.dropout(out)
        
        out = self.fc2(out)
        out = self.bn2(out)
        
        out += identity  # Skip connection
        out = self.relu(out)
        
        return out


class LSTM_ResNet(nn.Module):
    """
    LSTM-ResNet hybrid model for time series prediction
    
    Architecture:
    Input (batch, seq_len, features)
        ↓
    Bidirectional LSTM
        ↓
    Flatten/Concatenate
        ↓
    Dense Layer (projection)
        ↓
    ResNet Blocks (with skip connections)
        ↓
    Output Layer
    """
    def __init__(self, 
                 input_dim=6, 
                 seq_length=10,
                 rnn_hidden_dim=64, 
                 rnn_num_layers=2,
                 bidirectional=True,
                 resnet_hidden_dim=128,
                 resnet_num_blocks=2,
                 dropout=0.3):
        super(LSTM_ResNet, self).__init__()
        
        self.input_dim = input_dim
        self.seq_length = seq_length
        self.rnn_hidden_dim = rnn_hidden_dim
        self.rnn_num_layers = rnn_num_layers
        self.bidirectional = bidirectional
        self.resnet_hidden_dim = resnet_hidden_dim
        
        # LSTM layer
        self.lstm = nn.LSTM(
            input_size=input_dim,
            hidden_size=rnn_hidden_dim,
            num_layers=rnn_num_layers,
            batch_first=True,
            dropout=dropout if rnn_num_layers > 1 else 0,
            bidirectional=bidirectional
        )
        
        # Calculate LSTM output dimension
        lstm_output_dim = rnn_hidden_dim * 2 if bidirectional else rnn_hidden_dim
        
        # Projection layer (from LSTM output to ResNet input)
        self.projection = nn.Sequential(
            nn.Linear(lstm_output_dim, resnet_hidden_dim),
            nn.BatchNorm1d(resnet_hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout)
        )
        
        # ResNet blocks
        self.resnet_blocks = nn.ModuleList([
            ResidualBlock(resnet_hidden_dim, resnet_hidden_dim * 2, dropout)
            for _ in range(resnet_num_blocks)
        ])
        
        # Output layer
        self.output_layer = nn.Linear(resnet_hidden_dim, 1)
        
    def forward(self, x):
        # x shape: (batch, seq_length, features)
        batch_size = x.size(0)
        
        # LSTM processing
        lstm_out, (h_n, c_n) = self.lstm(x)
        # lstm_out shape: (batch, seq_length, hidden_dim * num_directions)
        
        # Use the last time step output
        last_output = lstm_out[:, -1, :]
        # last_output shape: (batch, hidden_dim * num_directions)
        
        # Project to ResNet dimension
        x = self.projection(last_output)
        # x shape: (batch, resnet_hidden_dim)
        
        # Pass through ResNet blocks
        for resnet_block in self.resnet_blocks:
            x = resnet_block(x)
        
        # Output
        x = self.output_layer(x)
        
        return x
    
    def count_parameters(self):
        """Count trainable parameters"""
        return sum(p.numel() for p in self.parameters() if p.requires_grad)


# Test LSTM-ResNet model
print("Testing LSTM-ResNet model...")
test_model = LSTM_ResNet(
    input_dim=6,
    seq_length=10,
    rnn_hidden_dim=64,
    rnn_num_layers=2,
    bidirectional=True,
    resnet_hidden_dim=128,
    resnet_num_blocks=2,
    dropout=0.3
)

print(f"\nModel architecture:")
print(test_model)
print(f"\nTotal trainable parameters: {test_model.count_parameters():,}")

# Test forward pass
test_input = torch.randn(32, 10, 6)  # (batch, seq_length, features)
test_output = test_model(test_input)
print(f"\nTest input shape: {test_input.shape}")
print(f"Test output shape: {test_output.shape}")
print("✓ LSTM-ResNet model test passed!")

## 6. GRU-ResNet Model Architecture

In [None]:
class GRU_ResNet(nn.Module):
    """
    GRU-ResNet hybrid model for time series prediction
    
    Similar to LSTM-ResNet but uses GRU (faster, fewer parameters)
    """
    def __init__(self, 
                 input_dim=6, 
                 seq_length=10,
                 rnn_hidden_dim=64, 
                 rnn_num_layers=2,
                 bidirectional=True,
                 resnet_hidden_dim=128,
                 resnet_num_blocks=2,
                 dropout=0.3):
        super(GRU_ResNet, self).__init__()
        
        self.input_dim = input_dim
        self.seq_length = seq_length
        self.rnn_hidden_dim = rnn_hidden_dim
        self.rnn_num_layers = rnn_num_layers
        self.bidirectional = bidirectional
        self.resnet_hidden_dim = resnet_hidden_dim
        
        # GRU layer
        self.gru = nn.GRU(
            input_size=input_dim,
            hidden_size=rnn_hidden_dim,
            num_layers=rnn_num_layers,
            batch_first=True,
            dropout=dropout if rnn_num_layers > 1 else 0,
            bidirectional=bidirectional
        )
        
        # Calculate GRU output dimension
        gru_output_dim = rnn_hidden_dim * 2 if bidirectional else rnn_hidden_dim
        
        # Projection layer
        self.projection = nn.Sequential(
            nn.Linear(gru_output_dim, resnet_hidden_dim),
            nn.BatchNorm1d(resnet_hidden_dim),
            nn.ReLU(),
            nn.Dropout(dropout)
        )
        
        # ResNet blocks
        self.resnet_blocks = nn.ModuleList([
            ResidualBlock(resnet_hidden_dim, resnet_hidden_dim * 2, dropout)
            for _ in range(resnet_num_blocks)
        ])
        
        # Output layer
        self.output_layer = nn.Linear(resnet_hidden_dim, 1)
        
    def forward(self, x):
        # x shape: (batch, seq_length, features)
        batch_size = x.size(0)
        
        # GRU processing
        gru_out, h_n = self.gru(x)
        # gru_out shape: (batch, seq_length, hidden_dim * num_directions)
        
        # Use the last time step output
        last_output = gru_out[:, -1, :]
        
        # Project to ResNet dimension
        x = self.projection(last_output)
        
        # Pass through ResNet blocks
        for resnet_block in self.resnet_blocks:
            x = resnet_block(x)
        
        # Output
        x = self.output_layer(x)
        
        return x
    
    def count_parameters(self):
        """Count trainable parameters"""
        return sum(p.numel() for p in self.parameters() if p.requires_grad)


# Test GRU-ResNet model
print("Testing GRU-ResNet model...")
test_model_gru = GRU_ResNet(
    input_dim=6,
    seq_length=10,
    rnn_hidden_dim=64,
    rnn_num_layers=2,
    bidirectional=True,
    resnet_hidden_dim=128,
    resnet_num_blocks=2,
    dropout=0.3
)

print(f"\nModel architecture:")
print(test_model_gru)
print(f"\nTotal trainable parameters: {test_model_gru.count_parameters():,}")

# Test forward pass
test_output_gru = test_model_gru(test_input)
print(f"\nTest output shape: {test_output_gru.shape}")
print("✓ GRU-ResNet model test passed!")

# Compare parameter counts
print(f"\n📊 Parameter Comparison:")
print(f"  LSTM-ResNet: {test_model.count_parameters():,} parameters")
print(f"  GRU-ResNet: {test_model_gru.count_parameters():,} parameters")
print(f"  Difference: {test_model.count_parameters() - test_model_gru.count_parameters():,} parameters")
print(f"  GRU is {(1 - test_model_gru.count_parameters()/test_model.count_parameters())*100:.1f}% smaller")

## 7. Training Utilities and Functions

Complete training pipeline with early stopping and progress tracking.