<h1 align="center">LSTM Time Series Forecasting Training Notebook</h1>

This notebook demonstrates the process of training various LSTM models for time series forecasting using the components from the LSTM_Dockerized project. It covers data preparation, model creation, training, and evaluation.

# 1. Introduction

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) architecture designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. This notebook walks through the complete workflow for training LSTM models for time series forecasting, using the components from the LSTM_Dockerized project.

We'll cover:
- Data loading and preprocessing
- Creating sequences for LSTM input
- Building and configuring different LSTM architectures
- Training and evaluating the models
- Visualizing results and model performance

# 2. Environment Setup

In [None]:
# Import standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
import os
import json
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, TensorBoard

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure plot style
plt.style.use('ggplot')
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (15, 6)

# Make TensorFlow less verbose
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'

# Check TensorFlow version
print(f"TensorFlow version: {tf.__version__}")

# Check if GPU is available
print(f"GPU Available: {len(tf.config.list_physical_devices('GPU')) > 0}")
print(f"GPU Devices: {tf.config.list_physical_devices('GPU')}")

# 3. Data Loading and Exploration

In [None]:
# Import custom data preprocessing utilities
from utils.preprocessing import normalize_data, create_sequences, train_val_test_split, detect_stationarity

# Load sample time series data (e.g., stock prices)
# For this example, let's use a synthetic dataset. In practice, you would load your actual data.
def generate_sample_data(n_samples=1000):
    """Generate a synthetic time series with trend, seasonality, and noise."""
    time = np.arange(n_samples)
    # Trend component
    trend = 0.001 * time
    # Seasonal component (multiple seasonal patterns)
    season1 = 0.5 * np.sin(2 * np.pi * time / 50)  # 50-day cycle
    season2 = 0.2 * np.sin(2 * np.pi * time / 7)   # Weekly cycle 
    # Random noise
    noise = 0.1 * np.random.randn(n_samples)
    # Combine components
    data = trend + season1 + season2 + noise
    return pd.DataFrame({'value': data})

# Generate and plot sample data
data = generate_sample_data(1000)
data.index = pd.date_range(start='2020-01-01', periods=len(data), freq='D')
print(f"Data shape: {data.shape}")
print(data.head())

# Plot the time series
plt.figure(figsize=(15, 6))
plt.plot(data.index, data['value'])
plt.title('Sample Time Series Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True)
plt.tight_layout()
plt.show()

# Check for stationarity
is_stationary, pvalue, test_statistic = detect_stationarity(data['value'], test='adfuller')
print(f"Is stationary: {is_stationary}")
print(f"p-value: {pvalue:.4f}")
print(f"Test statistic: {test_statistic:.4f}")

# 4. Data Preprocessing

In [None]:
# Normalize the data
normalized_data, scaler = normalize_data(data, method='minmax')
print("Normalized data head:")
print(normalized_data.head())

# Create sequences for LSTM input
seq_length = 30  # Look back 30 time steps
horizon = 1      # Predict 1 step ahead

X, y = create_sequences(normalized_data, seq_length=seq_length, horizon=horizon)
print(f"X shape: {X.shape}, y shape: {y.shape}")

# Split data into training, validation, and test sets
X_train, X_val, X_test, y_train, y_val, y_test = train_val_test_split(
    X, y, val_size=0.2, test_size=0.1, shuffle=False
)

print(f"Training set: {X_train.shape}, {y_train.shape}")
print(f"Validation set: {X_val.shape}, {y_val.shape}")
print(f"Test set: {X_test.shape}, {y_test.shape}")

# 5. Model Creation

In [None]:
# Import model creation utilities
from models.model_factory import create_model, create_model_from_config
from models.training.trainer import train_model
from models.evaluation.evaluator import evaluate_model
from models.training.callbacks import create_callbacks

# Define a model configuration
model_config = {
    "architecture": "vanilla_lstm",
    "input_shape": (seq_length, 1),
    "lstm_units": 50,
    "dropout_rate": 0.2,
    "output_units": 1,
    "activation": "linear",
    "compile": {
        "optimizer": "adam",
        "learning_rate": 0.001,
        "loss": "mse",
        "metrics": ["mae"]
    }
}

# Create the model
model = create_model_from_config(model_config)

# Print model summary
model.summary()

# 6. Model Training

In [None]:
# Training parameters
epochs = 100
batch_size = 32
patience = 15

# Create directory for model checkpoints
checkpoint_dir = os.path.join(os.getcwd(), 'models', 'checkpoints')
os.makedirs(checkpoint_dir, exist_ok=True)
checkpoint_path = os.path.join(checkpoint_dir, 'best_model.h5')

# Train the model
history = train_model(
    model=model,
    X_train=X_train,
    y_train=y_train,
    X_val=X_val, 
    y_val=y_val,
    epochs=epochs,
    batch_size=batch_size,
    early_stopping=True,
    patience=patience,
    checkpoint_path=checkpoint_path,
    tensorboard=True,
    log_dir='logs',
    save_best_model=True
)

# Plot training history
plt.figure(figsize=(15, 6))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()
plt.title('Training and Validation Loss')

plt.subplot(1, 2, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.xlabel('Epoch')
plt.ylabel('Mean Absolute Error (MAE)')
plt.legend()
plt.title('Training and Validation MAE')

plt.tight_layout()
plt.show()

# 7. Model Evaluation

In [None]:
# Evaluate the model on test data
from utils.metrics import calculate_all_metrics

# Generate predictions on the test set
y_pred = model.predict(X_test)

# Calculate metrics
metrics = evaluate_model(model, X_test, y_test, scaler)

# Print evaluation results
print("Evaluation Metrics on Test Data:")
for metric, value in metrics.items():
    print(f"{metric}: {value:.4f}")

# Plot predictions vs actual values (using inverse transformation for actual scale)
# First, reshape the data for inverse transformation
y_test_reshaped = y_test.reshape(-1, 1)
y_pred_reshaped = y_pred.reshape(-1, 1)

# Inverse transform to get back to original scale
y_test_inv = scaler.inverse_transform(y_test_reshaped)
y_pred_inv = scaler.inverse_transform(y_pred_reshaped)

# Plot the predictions
plt.figure(figsize=(15, 6))
plt.plot(y_test_inv, label='Actual Values', color='blue')
plt.plot(y_pred_inv, label='Predictions', color='red', linestyle='--')
plt.xlabel('Time Step')
plt.ylabel('Value')
plt.title('Model Predictions vs Actual Values on Test Data')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# 8. Trying Different LSTM Architectures

In [None]:
# Function to train and evaluate a model with a specific configuration
def train_and_evaluate_model(architecture_name, hyperparams=None):
    """Train and evaluate a model with a specific architecture."""
    # Default hyperparameters if not provided
    if hyperparams is None:
        hyperparams = {}
    
    # Base configuration
    config = {
        "architecture": architecture_name,
        "input_shape": (seq_length, 1),
        "output_units": 1,
        "activation": "linear",
        "compile": {
            "optimizer": "adam",
            "learning_rate": 0.001,
            "loss": "mse",
            "metrics": ["mae"]
        }
    }
    
    # Update with provided hyperparameters
    for key, value in hyperparams.items():
        if key not in ["architecture", "compile"]:
            config[key] = value
        elif key == "compile":
            for compile_key, compile_value in value.items():
                config["compile"][compile_key] = compile_value
    
    # Create and compile the model
    print(f"Training {architecture_name} model...")
    model = create_model_from_config(config)
    
    # Train the model
    history = train_model(
        model=model,
        X_train=X_train,
        y_train=y_train,
        X_val=X_val, 
        y_val=y_val,
        epochs=50,  # Reduced epochs for demonstration
        batch_size=batch_size,
        early_stopping=True,
        patience=10,
        save_best_model=False,
        verbose=0  # Less verbose output
    )
    
    # Evaluate the model
    metrics = evaluate_model(model, X_test, y_test, scaler)
    
    return model, history, metrics

# Define architectures to try
architectures = [
    {
        "name": "vanilla_lstm",
        "hyperparams": {"lstm_units": 50, "dropout_rate": 0.2}
    },
    {
        "name": "stacked_lstm",
        "hyperparams": {"lstm_units": [50, 25], "dropout_rate": 0.2}
    },
    {
        "name": "bidirectional_lstm",
        "hyperparams": {"lstm_units": 50, "dropout_rate": 0.2}
    }
]

# Train and evaluate each architecture
results = {}

for arch in architectures:
    print(f"\n{'-'*50}")
    print(f"Training {arch['name']} architecture")
    print(f"{'-'*50}")
    
    model, history, metrics = train_and_evaluate_model(
        architecture_name=arch["name"],
        hyperparams=arch["hyperparams"]
    )
    
    results[arch["name"]] = {
        "model": model,
        "history": history,
        "metrics": metrics
    }
    
    print(f"Test metrics for {arch['name']}:")
    for metric, value in metrics.items():
        print(f"  {metric}: {value:.4f}")

# 9. Model Comparison

In [None]:
# Compare the performance of different architectures
from models.evaluation.model_comparison import compare_models

# Extract the metrics from the results
metrics_comparison = {}
for name, result in results.items():
    metrics_comparison[name] = result["metrics"]

# Compare RMSE and MAE across models
metrics_to_compare = ['rmse', 'mae', 'mape']
model_names = list(metrics_comparison.keys())

plt.figure(figsize=(15, 6))
for i, metric in enumerate(metrics_to_compare):
    plt.subplot(1, 3, i+1)
    values = [metrics_comparison[name][metric] for name in model_names]
    plt.bar(model_names, values)
    plt.title(f'{metric.upper()} Comparison')
    plt.xticks(rotation=45)
    plt.ylabel(metric.upper())

plt.tight_layout()
plt.show()

# Create a comparison table
comparison_df = pd.DataFrame(
    {name: {metric: metrics_comparison[name][metric] for metric in metrics_to_compare} 
     for name in model_names}
).T

print("Model Performance Comparison:")
print(comparison_df)

# 10. Hyperparameter Tuning

In [None]:
# Hyperparameter tuning with grid search
from sklearn.model_selection import ParameterGrid

def tune_hyperparameters(architecture, param_grid):
    """Tune hyperparameters using grid search."""
    best_val_loss = float('inf')
    best_params = None
    all_results = []
    
    # Iterate through parameter combinations
    for params in ParameterGrid(param_grid):
        print(f"Trying parameters: {params}")
        
        # Create model config
        config = {
            "architecture": architecture,
            "input_shape": (seq_length, 1),
            "output_units": 1,
            "activation": "linear",
            **params,
            "compile": {
                "optimizer": "adam",
                "learning_rate": params.pop("learning_rate", 0.001),
                "loss": "mse",
                "metrics": ["mae"]
            }
        }
        
        # Create model
        model = create_model_from_config(config)
        
        # Train model
        history = train_model(
            model=model,
            X_train=X_train,
            y_train=y_train,
            X_val=X_val, 
            y_val=y_val,
            epochs=30,  # Reduced for faster tuning
            batch_size=batch_size,
            early_stopping=True,
            patience=5,
            save_best_model=False,
            verbose=0
        )
        
        # Get the best validation loss
        val_loss = min(history.history['val_loss'])
        
        # Evaluate on test data
        test_metrics = evaluate_model(model, X_test, y_test, scaler)
        
        # Save results
        result = {
            "params": params,
            "val_loss": val_loss,
            "test_metrics": test_metrics
        }
        all_results.append(result)
        
        # Check if this is the best model
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_params = params
            print(f"New best model found! Validation loss: {val_loss:.4f}")
    
    return best_params, all_results

# Define parameter grid for vanilla LSTM
param_grid = {
    "lstm_units": [32, 64, 128],
    "dropout_rate": [0.1, 0.2, 0.3],
    "learning_rate": [0.01, 0.001]
}

# Run hyperparameter tuning
best_params, all_results = tune_hyperparameters("vanilla_lstm", param_grid)

print("\nHyperparameter Tuning Results:")
print(f"Best parameters: {best_params}")
print(f"Best validation loss: {min(result['val_loss'] for result in all_results):.4f}")

# Plot the validation loss for different parameter combinations
plt.figure(figsize=(15, 6))
results_df = pd.DataFrame([
    {**r["params"], "val_loss": r["val_loss"]} 
    for r in all_results
])

# Sort by validation loss
results_df = results_df.sort_values("val_loss")
print("\nTop 5 parameter combinations:")
print(results_df.head())

# 11. Training the Best Model

In [None]:
# Train the best model with the tuned hyperparameters
best_config = {
    "architecture": "vanilla_lstm",
    "input_shape": (seq_length, 1),
    "output_units": 1,
    "activation": "linear",
    **best_params,
    "compile": {
        "optimizer": "adam",
        "learning_rate": best_params.get("learning_rate", 0.001),
        "loss": "mse",
        "metrics": ["mae"]
    }
}

# Create the best model
best_model = create_model_from_config(best_config)

# Create directory for the best model
best_model_dir = os.path.join(os.getcwd(), 'models', 'best_model')
os.makedirs(best_model_dir, exist_ok=True)
best_model_path = os.path.join(best_model_dir, 'tuned_model.h5')

# Train with more epochs for the final model
history = train_model(
    model=best_model,
    X_train=X_train,
    y_train=y_train,
    X_val=X_val, 
    y_val=y_val,
    epochs=100,
    batch_size=batch_size,
    early_stopping=True,
    patience=15,
    checkpoint_path=best_model_path,
    save_best_model=True
)

# Plot training history
plt.figure(figsize=(15, 6))

plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.legend()
plt.title('Best Model: Training and Validation Loss')

plt.subplot(1, 2, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.xlabel('Epoch')
plt.ylabel('Mean Absolute Error (MAE)')
plt.legend()
plt.title('Best Model: Training and Validation MAE')

plt.tight_layout()
plt.show()

# Evaluate the best model
best_metrics = evaluate_model(best_model, X_test, y_test, scaler)

print("Best Model Test Metrics:")
for metric, value in best_metrics.items():
    print(f"{metric}: {value:.4f}")

# 12. Making Forecasts with the Best Model

In [None]:
# Generate forecasts with the best model
def generate_forecast(model, initial_sequence, steps=30, scaler=None):
    """Generate a multi-step forecast using the trained model."""
    forecast = []
    current_sequence = initial_sequence.copy()
    
    # Make predictions step by step
    for _ in range(steps):
        # Predict the next value
        pred = model.predict(current_sequence.reshape(1, *current_sequence.shape))
        forecast.append(pred[0, 0])
        
        # Update the sequence for the next prediction (remove oldest, add newest)
        current_sequence = np.append(current_sequence[1:], pred[0, 0])
        current_sequence = current_sequence.reshape(initial_sequence.shape)
    
    # Convert predictions back to the original scale if a scaler is provided
    if scaler is not None:
        forecast = scaler.inverse_transform(np.array(forecast).reshape(-1, 1)).flatten()
    
    return forecast

# Use the last sequence from the test set as the starting point
initial_sequence = X_test[-1]

# Generate a 30-day forecast
forecast_steps = 30
forecast = generate_forecast(best_model, initial_sequence, steps=forecast_steps, scaler=scaler)

# Create dates for the forecast
last_date = data.index[-1]
forecast_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=forecast_steps, freq='D')

# Plot the historical data and the forecast
plt.figure(figsize=(15, 6))

# Historical data
plt.plot(data.index[-100:], data['value'].values[-100:], label='Historical Data')

# Forecast
plt.plot(forecast_dates, forecast, label='Forecast', color='red', linestyle='--')

# Add a vertical line at the forecast start
plt.axvline(x=last_date, color='black', linestyle='-', alpha=0.2)
plt.fill_between(forecast_dates, forecast, alpha=0.2, color='red')

plt.title('Time Series Forecast')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

# 13. Saving and Loading Models

In [None]:
# Save the best model
model_save_path = os.path.join(os.getcwd(), 'models', 'final_model.h5')
best_model.save(model_save_path)
print(f"Model saved to {model_save_path}")

# Save the scaler for future use
import pickle
scaler_path = os.path.join(os.getcwd(), 'models', 'scaler.pkl')
with open(scaler_path, 'wb') as f:
    pickle.dump(scaler, f)
print(f"Scaler saved to {scaler_path}")

# Save the model configuration
config_path = os.path.join(os.getcwd(), 'models', 'model_config.json')
with open(config_path, 'w') as f:
    json.dump(best_config, f, indent=4)
print(f"Model configuration saved to {config_path}")

# Example of how to load the model and scaler
def load_model_and_scaler(model_path, scaler_path):
    """Load a saved model and scaler."""
    # Load the model
    from tensorflow.keras.models import load_model
    model = load_model(model_path)
    
    # Load the scaler
    with open(scaler_path, 'rb') as f:
        scaler = pickle.load(f)
    
    return model, scaler

# The loaded model and scaler could be used for new predictions
# loaded_model, loaded_scaler = load_model_and_scaler(model_save_path, scaler_path)

# 14. Conclusion

In this notebook, we've demonstrated the complete workflow for training LSTM models for time series forecasting:

1. We loaded and preprocessed time series data, creating sequences suitable for LSTM input.
2. We built and trained various LSTM architectures, from simple Vanilla LSTM to more complex Stacked and Bidirectional LSTMs.
3. We evaluated model performance using multiple metrics and compared different architectures.
4. We performed hyperparameter tuning to find the optimal configuration.
5. We trained a final model with the best parameters and used it to generate forecasts.
6. We saved the model, scaler, and configuration for future use.

This workflow can be adapted for different time series forecasting tasks by adjusting the data preprocessing, model architecture, and training parameters to suit the specific requirements of your application.

Next steps could include:
- Trying different feature engineering approaches
- Incorporating exogenous variables into the model
- Experimenting with more advanced architectures like Seq2Seq models or attention mechanisms
- Deploying the model for real-time forecasting