# Runoff Forecasting using LSTM and Transformer Models

This notebook implements and evaluates LSTM and Transformer models for runoff forecasting at two different stations (21609641 and 20380357).

## Import Required Libraries
Import libraries such as NumPy, Pandas, Matplotlib, TensorFlow, and potentially PyTorch for data handling, visualization, and model development.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# import torch # Uncomment if using PyTorch

# Evaluation metrics (if not using standard libraries)
# from sklearn.metrics import mean_squared_error, r2_score # Example
# Define custom metrics if needed (e.g., NSE, PBIAS, CC)

print("TensorFlow Version:", tf.__version__)
# print("PyTorch Version:", torch.__version__) # Uncomment if using PyTorch

## Load and Explore Data
Load the raw and processed data for both stations. Perform exploratory data analysis to understand patterns and anomalies.

In [None]:
# Define file paths (adjust as necessary)
raw_data_path_s1 = 'path/to/raw_data_station_21609641.csv'
processed_data_path_s1 = 'path/to/processed_data_station_21609641.csv'
raw_data_path_s2 = 'path/to/raw_data_station_20380357.csv'
processed_data_path_s2 = 'path/to/processed_data_station_20380357.csv'

# Load data (assuming CSV format)
try:
    # Station 1 (LSTM)
    # df_raw_s1 = pd.read_csv(raw_data_path_s1, index_col='datetime', parse_dates=True)
    df_processed_s1 = pd.read_csv(processed_data_path_s1, index_col='datetime', parse_dates=True)
    print("Loaded processed data for Station 21609641:")
    print(df_processed_s1.head())
    print(df_processed_s1.info())
    print(df_processed_s1.describe())

    # Station 2 (Transformer)
    # df_raw_s2 = pd.read_csv(raw_data_path_s2, index_col='datetime', parse_dates=True)
    df_processed_s2 = pd.read_csv(processed_data_path_s2, index_col='datetime', parse_dates=True)
    print("\nLoaded processed data for Station 20380357:")
    print(df_processed_s2.head())
    print(df_processed_s2.info())
    print(df_processed_s2.describe())

except FileNotFoundError:
    print("Error: Data files not found. Please check the file paths.")
    # Create dummy data for demonstration if files are missing
    dates_s1 = pd.date_range(start='2010-01-01', end='2015-12-31', freq='D')
    df_processed_s1 = pd.DataFrame({
        'nwm_forecast_lead_1': np.random.rand(len(dates_s1)) * 100,
        'nwm_forecast_lead_2': np.random.rand(len(dates_s1)) * 100,
        'usgs_observation': np.random.rand(len(dates_s1)) * 100 + 5
    }, index=dates_s1)
    print("\nUsing dummy data for Station 21609641.")

    dates_s2 = pd.date_range(start='2010-01-01', end='2015-12-31', freq='D')
    df_processed_s2 = pd.DataFrame({
        'nwm_forecast_lead_1': np.random.rand(len(dates_s2)) * 50,
        'nwm_forecast_lead_2': np.random.rand(len(dates_s2)) * 50,
        'usgs_observation': np.random.rand(len(dates_s2)) * 50 + 3
    }, index=dates_s2)
    print("Using dummy data for Station 20380357.")


# --- Exploratory Data Analysis ---

# Example: Plot time series for one station
if 'df_processed_s1' in locals():
    plt.figure(figsize=(12, 6))
    plt.plot(df_processed_s1.index, df_processed_s1['usgs_observation'], label='USGS Observation (S1)')
    plt.plot(df_processed_s1.index, df_processed_s1['nwm_forecast_lead_1'], label='NWM Forecast Lead 1 (S1)', alpha=0.7)
    plt.title('Station 21609641 - Observed vs. NWM Forecast')
    plt.xlabel('Date')
    plt.ylabel('Runoff')
    plt.legend()
    plt.show()

if 'df_processed_s2' in locals():
    plt.figure(figsize=(12, 6))
    plt.plot(df_processed_s2.index, df_processed_s2['usgs_observation'], label='USGS Observation (S2)')
    plt.plot(df_processed_s2.index, df_processed_s2['nwm_forecast_lead_1'], label='NWM Forecast Lead 1 (S2)', alpha=0.7)
    plt.title('Station 20380357 - Observed vs. NWM Forecast')
    plt.xlabel('Date')
    plt.ylabel('Runoff')
    plt.legend()
    plt.show()

# Add more EDA as needed (histograms, correlation matrices, etc.)

## Preprocess Data
Implement data preprocessing steps, including cleaning, aligning NWM forecasts with USGS observations, creating input-output sequences, and splitting data into training and testing sets.

In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split

# --- Configuration ---
SEQUENCE_LENGTH = 10 # Example: Use past 10 days to predict next day
TARGET_LEAD_TIME = 1 # Example: Predict 1 day ahead
TEST_SIZE = 0.2
VALIDATION_SIZE = 0.1 # Fraction of training data to use for validation

# --- Helper Function for Sequence Creation ---
def create_sequences(input_data, target_data, sequence_length):
    """Creates input sequences and corresponding targets."""
    X, y = [], []
    for i in range(len(input_data) - sequence_length):
        X.append(input_data[i:(i + sequence_length)])
        y.append(target_data[i + sequence_length])
    return np.array(X), np.array(y)

# --- Preprocessing for Station 1 (LSTM) ---
if 'df_processed_s1' in locals():
    print("\nPreprocessing Station 21609641...")
    # Select features and target
    # Assuming NWM forecasts are features and USGS observation is target
    features_s1 = df_processed_s1[['nwm_forecast_lead_1', 'nwm_forecast_lead_2']].values # Add more features if available
    target_s1 = df_processed_s1['usgs_observation'].values.reshape(-1, 1)

    # Scale features and target (important for NNs)
    scaler_features_s1 = MinMaxScaler()
    features_scaled_s1 = scaler_features_s1.fit_transform(features_s1)

    scaler_target_s1 = MinMaxScaler()
    target_scaled_s1 = scaler_target_s1.fit_transform(target_s1)

    # Create sequences
    X_s1, y_s1 = create_sequences(features_scaled_s1, target_scaled_s1.flatten(), SEQUENCE_LENGTH)
    print(f"Station 1 - X shape: {X_s1.shape}, y shape: {y_s1.shape}")

    # Split data (chronological split is often preferred for time series)
    split_idx_test_s1 = int(len(X_s1) * (1 - TEST_SIZE))
    split_idx_val_s1 = int(split_idx_test_s1 * (1 - VALIDATION_SIZE))

    X_train_s1, y_train_s1 = X_s1[:split_idx_val_s1], y_s1[:split_idx_val_s1]
    X_val_s1, y_val_s1 = X_s1[split_idx_val_s1:split_idx_test_s1], y_s1[split_idx_val_s1:split_idx_test_s1]
    X_test_s1, y_test_s1 = X_s1[split_idx_test_s1:], y_s1[split_idx_test_s1:]

    print(f"Train shapes (S1): {X_train_s1.shape}, {y_train_s1.shape}")
    print(f"Validation shapes (S1): {X_val_s1.shape}, {y_val_s1.shape}")
    print(f"Test shapes (S1): {X_test_s1.shape}, {y_test_s1.shape}")


# --- Preprocessing for Station 2 (Transformer) ---
if 'df_processed_s2' in locals():
    print("\nPreprocessing Station 20380357...")
    # Select features and target
    features_s2 = df_processed_s2[['nwm_forecast_lead_1', 'nwm_forecast_lead_2']].values # Add more features if available
    target_s2 = df_processed_s2['usgs_observation'].values.reshape(-1, 1)

    # Scale features and target
    scaler_features_s2 = MinMaxScaler()
    features_scaled_s2 = scaler_features_s2.fit_transform(features_s2)

    scaler_target_s2 = MinMaxScaler()
    target_scaled_s2 = scaler_target_s2.fit_transform(target_s2)

    # Create sequences (adjust if Transformer needs different input format)
    X_s2, y_s2 = create_sequences(features_scaled_s2, target_scaled_s2.flatten(), SEQUENCE_LENGTH)
    print(f"Station 2 - X shape: {X_s2.shape}, y shape: {y_s2.shape}")

    # Split data
    split_idx_test_s2 = int(len(X_s2) * (1 - TEST_SIZE))
    split_idx_val_s2 = int(split_idx_test_s2 * (1 - VALIDATION_SIZE))

    X_train_s2, y_train_s2 = X_s2[:split_idx_val_s2], y_s2[:split_idx_val_s2]
    X_val_s2, y_val_s2 = X_s2[split_idx_val_s2:split_idx_test_s2], y_s2[split_idx_val_s2:split_idx_test_s2]
    X_test_s2, y_test_s2 = X_s2[split_idx_test_s2:], y_s2[split_idx_test_s2:]

    print(f"Train shapes (S2): {X_train_s2.shape}, {y_train_s2.shape}")
    print(f"Validation shapes (S2): {X_val_s2.shape}, {y_val_s2.shape}")
    print(f"Test shapes (S2): {X_test_s2.shape}, {y_test_s2.shape}")


## Define LSTM Model for Station 21609641
Define the LSTM architecture tailored for Station 21609641, including input layers, hidden layers, and output layers.

In [None]:
# Ensure data for Station 1 exists before defining the model
if 'X_train_s1' in locals():
    input_shape_s1 = (X_train_s1.shape[1], X_train_s1.shape[2]) # (SEQUENCE_LENGTH, num_features)
    print(f"LSTM Input Shape (S1): {input_shape_s1}")

    lstm_model_s1 = keras.Sequential(
        [
            keras.Input(shape=input_shape_s1),
            layers.LSTM(64, return_sequences=True), # Hidden layer 1
            layers.Dropout(0.2),
            layers.LSTM(32), # Hidden layer 2
            layers.Dropout(0.2),
            layers.Dense(1) # Output layer (predicting one value)
        ]
    )

    lstm_model_s1.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss='mse') # Mean Squared Error loss
    lstm_model_s1.summary()
else:
    print("Skipping LSTM model definition: Station 1 data not preprocessed.")

## Define Transformer Model for Station 20380357
Define the Transformer architecture tailored for Station 20380357, including encoder-decoder layers and attention mechanisms.
*Note: This is a simplified Transformer block example. A full implementation might require separate encoder/decoder structures depending on the task.*

In [None]:
# Ensure data for Station 2 exists before defining the model
if 'X_train_s2' in locals():
    input_shape_s2 = (X_train_s2.shape[1], X_train_s2.shape[2]) # (SEQUENCE_LENGTH, num_features)
    print(f"Transformer Input Shape (S2): {input_shape_s2}")

    # --- Transformer Block Components (Simplified Example) ---
    def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
        # Attention and Normalization
        x = layers.MultiHeadAttention(
            key_dim=head_size, num_heads=num_heads, dropout=dropout
        )(inputs, inputs)
        x = layers.Dropout(dropout)(x)
        x = layers.LayerNormalization(epsilon=1e-6)(inputs + x) # Add & Norm

        # Feed Forward Part
        ff_out = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(x)
        ff_out = layers.Dropout(dropout)(ff_out)
        ff_out = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(ff_out)
        x = layers.LayerNormalization(epsilon=1e-6)(x + ff_out) # Add & Norm
        return x

    # --- Build the Transformer Model ---
    def build_transformer_model(
        input_shape,
        head_size,
        num_heads,
        ff_dim,
        num_transformer_blocks,
        mlp_units,
        dropout=0,
        mlp_dropout=0,
    ):
        inputs = keras.Input(shape=input_shape)
        x = inputs
        for _ in range(num_transformer_blocks):
            x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)

        # Pooling or Flattening before final MLP
        x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
        # x = layers.Flatten()(x) # Alternative

        # MLP Head
        for dim in mlp_units:
            x = layers.Dense(dim, activation="relu")(x)
            x = layers.Dropout(mlp_dropout)(x)
        outputs = layers.Dense(1)(x) # Output layer

        return keras.Model(inputs, outputs)

    # --- Instantiate the Model ---
    transformer_model_s2 = build_transformer_model(
        input_shape_s2,
        head_size=128,
        num_heads=4,
        ff_dim=128,
        num_transformer_blocks=3,
        mlp_units=[64],
        dropout=0.1,
        mlp_dropout=0.1,
    )

    transformer_model_s2.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss='mse')
    transformer_model_s2.summary()

else:
    print("Skipping Transformer model definition: Station 2 data not preprocessed.")


## Train Models
Train both the LSTM and Transformer models using the training data. Implement early stopping and learning rate scheduling.

In [None]:
# --- Training Configuration ---
EPOCHS = 50 # Adjust as needed
BATCH_SIZE = 32 # Adjust based on memory

# Callbacks
early_stopping = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=1e-6)

# --- Train LSTM Model (Station 1) ---
history_lstm_s1 = None
if 'lstm_model_s1' in locals() and 'X_train_s1' in locals():
    print("\nTraining LSTM model for Station 21609641...")
    history_lstm_s1 = lstm_model_s1.fit(
        X_train_s1, y_train_s1,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_data=(X_val_s1, y_val_s1),
        callbacks=[early_stopping, reduce_lr],
        verbose=1 # Set to 0 for less output, 1 for progress bar, 2 for one line per epoch
    )
    print("LSTM training finished.")

    # Plot training history
    plt.figure(figsize=(10, 4))
    plt.plot(history_lstm_s1.history['loss'], label='Training Loss')
    plt.plot(history_lstm_s1.history['val_loss'], label='Validation Loss')
    plt.title('LSTM Model Training History (Station 1)')
    plt.xlabel('Epoch')
    plt.ylabel('Loss (MSE)')
    plt.legend()
    plt.show()
else:
    print("Skipping LSTM training: Model or data not available.")


# --- Train Transformer Model (Station 2) ---
history_transformer_s2 = None
if 'transformer_model_s2' in locals() and 'X_train_s2' in locals():
    print("\nTraining Transformer model for Station 20380357...")
    history_transformer_s2 = transformer_model_s2.fit(
        X_train_s2, y_train_s2,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        validation_data=(X_val_s2, y_val_s2),
        callbacks=[early_stopping, reduce_lr],
        verbose=1
    )
    print("Transformer training finished.")

    # Plot training history
    plt.figure(figsize=(10, 4))
    plt.plot(history_transformer_s2.history['loss'], label='Training Loss')
    plt.plot(history_transformer_s2.history['val_loss'], label='Validation Loss')
    plt.title('Transformer Model Training History (Station 2)')
    plt.xlabel('Epoch')
    plt.ylabel('Loss (MSE)')
    plt.legend()
    plt.show()
else:
    print("Skipping Transformer training: Model or data not available.")


## Evaluate Models
Evaluate the trained models on the test set using metrics such as CC (Correlation Coefficient), RMSE (Root Mean Squared Error), PBIAS (Percent Bias), and NSE (Nash-Sutcliffe Efficiency). Compare results against raw NWM forecasts.

In [None]:
# --- Evaluation Metrics Functions ---
def calculate_rmse(y_true, y_pred):
    return np.sqrt(np.mean((y_true - y_pred)**2))

def calculate_cc(y_true, y_pred):
    return np.corrcoef(y_true, y_pred)[0, 1]

def calculate_pbias(y_true, y_pred):
    return 100 * np.sum(y_pred - y_true) / np.sum(y_true)

def calculate_nse(y_true, y_pred):
    return 1 - (np.sum((y_true - y_pred)**2) / np.sum((y_true - np.mean(y_true))**2))

# --- Evaluate LSTM Model (Station 1) ---
results_s1 = {}
if 'lstm_model_s1' in locals() and 'X_test_s1' in locals():
    print("\nEvaluating LSTM model on Test Set (Station 1)...")
    # Predict on test set (scaled)
    y_pred_scaled_s1 = lstm_model_s1.predict(X_test_s1)

    # Inverse transform predictions and true values
    y_pred_s1 = scaler_target_s1.inverse_transform(y_pred_scaled_s1)
    y_true_s1 = scaler_target_s1.inverse_transform(y_test_s1.reshape(-1, 1)) # Reshape y_test back

    # Calculate metrics
    rmse_s1 = calculate_rmse(y_true_s1.flatten(), y_pred_s1.flatten())
    cc_s1 = calculate_cc(y_true_s1.flatten(), y_pred_s1.flatten())
    pbias_s1 = calculate_pbias(y_true_s1.flatten(), y_pred_s1.flatten())
    nse_s1 = calculate_nse(y_true_s1.flatten(), y_pred_s1.flatten())

    results_s1['LSTM'] = {'RMSE': rmse_s1, 'CC': cc_s1, 'PBIAS': pbias_s1, 'NSE': nse_s1}
    print(f"LSTM (S1) - Test Metrics:")
    print(f"  RMSE: {rmse_s1:.4f}")
    print(f"  CC:   {cc_s1:.4f}")
    print(f"  PBIAS:{pbias_s1:.4f}%")
    print(f"  NSE:  {nse_s1:.4f}")

    # Compare with raw NWM (Example: Lead time 1)
    # Need to align NWM forecasts with the test set period
    test_start_index_s1 = split_idx_test_s1 + SEQUENCE_LENGTH # Adjust index back to original dataframe
    nwm_test_s1 = df_processed_s1['nwm_forecast_lead_1'].iloc[test_start_index_s1 : test_start_index_s1 + len(y_true_s1)].values

    if len(nwm_test_s1) == len(y_true_s1):
        rmse_nwm_s1 = calculate_rmse(y_true_s1.flatten(), nwm_test_s1)
        cc_nwm_s1 = calculate_cc(y_true_s1.flatten(), nwm_test_s1)
        pbias_nwm_s1 = calculate_pbias(y_true_s1.flatten(), nwm_test_s1)
        nse_nwm_s1 = calculate_nse(y_true_s1.flatten(), nwm_test_s1)
        results_s1['NWM_Lead1'] = {'RMSE': rmse_nwm_s1, 'CC': cc_nwm_s1, 'PBIAS': pbias_nwm_s1, 'NSE': nse_nwm_s1}
        print(f"\nNWM Lead 1 (S1) - Test Metrics:")
        print(f"  RMSE: {rmse_nwm_s1:.4f}")
        print(f"  CC:   {cc_nwm_s1:.4f}")
        print(f"  PBIAS:{pbias_nwm_s1:.4f}%")
        print(f"  NSE:  {nse_nwm_s1:.4f}")
    else:
         print("\nWarning: Could not align NWM forecast for comparison (S1). Length mismatch.")

else:
    print("Skipping LSTM evaluation: Model or test data not available.")


# --- Evaluate Transformer Model (Station 2) ---
results_s2 = {}
if 'transformer_model_s2' in locals() and 'X_test_s2' in locals():
    print("\nEvaluating Transformer model on Test Set (Station 2)...")
    # Predict on test set (scaled)
    y_pred_scaled_s2 = transformer_model_s2.predict(X_test_s2)

    # Inverse transform predictions and true values
    y_pred_s2 = scaler_target_s2.inverse_transform(y_pred_scaled_s2)
    y_true_s2 = scaler_target_s2.inverse_transform(y_test_s2.reshape(-1, 1)) # Reshape y_test back

    # Calculate metrics
    rmse_s2 = calculate_rmse(y_true_s2.flatten(), y_pred_s2.flatten())
    cc_s2 = calculate_cc(y_true_s2.flatten(), y_pred_s2.flatten())
    pbias_s2 = calculate_pbias(y_true_s2.flatten(), y_pred_s2.flatten())
    nse_s2 = calculate_nse(y_true_s2.flatten(), y_pred_s2.flatten())

    results_s2['Transformer'] = {'RMSE': rmse_s2, 'CC': cc_s2, 'PBIAS': pbias_s2, 'NSE': nse_s2}
    print(f"Transformer (S2) - Test Metrics:")
    print(f"  RMSE: {rmse_s2:.4f}")
    print(f"  CC:   {cc_s2:.4f}")
    print(f"  PBIAS:{pbias_s2:.4f}%")
    print(f"  NSE:  {nse_s2:.4f}")

    # Compare with raw NWM (Example: Lead time 1)
    test_start_index_s2 = split_idx_test_s2 + SEQUENCE_LENGTH # Adjust index back to original dataframe
    nwm_test_s2 = df_processed_s2['nwm_forecast_lead_1'].iloc[test_start_index_s2 : test_start_index_s2 + len(y_true_s2)].values

    if len(nwm_test_s2) == len(y_true_s2):
        rmse_nwm_s2 = calculate_rmse(y_true_s2.flatten(), nwm_test_s2)
        cc_nwm_s2 = calculate_cc(y_true_s2.flatten(), nwm_test_s2)
        pbias_nwm_s2 = calculate_pbias(y_true_s2.flatten(), nwm_test_s2)
        nse_nwm_s2 = calculate_nse(y_true_s2.flatten(), nwm_test_s2)
        results_s2['NWM_Lead1'] = {'RMSE': rmse_nwm_s2, 'CC': cc_nwm_s2, 'PBIAS': pbias_nwm_s2, 'NSE': nse_nwm_s2}
        print(f"\nNWM Lead 1 (S2) - Test Metrics:")
        print(f"  RMSE: {rmse_nwm_s2:.4f}")
        print(f"  CC:   {cc_nwm_s2:.4f}")
        print(f"  PBIAS:{pbias_nwm_s2:.4f}%")
        print(f"  NSE:  {nse_nwm_s2:.4f}")
    else:
         print("\nWarning: Could not align NWM forecast for comparison (S2). Length mismatch.")

else:
    print("Skipping Transformer evaluation: Model or test data not available.")

# Store results in a DataFrame for easier comparison
results_df_s1 = pd.DataFrame(results_s1)
results_df_s2 = pd.DataFrame(results_s2)

print("\n--- Evaluation Summary ---")
if not results_df_s1.empty:
    print("\nStation 1 (LSTM vs NWM):")
    print(results_df_s1)
if not results_df_s2.empty:
    print("\nStation 2 (Transformer vs NWM):")
    print(results_df_s2)

## Generate Visualizations
Create box plots for observed, NWM, and corrected runoff for each lead time. Generate box plots for evaluation metrics across lead times (if multiple lead times were predicted/evaluated).

In [None]:
# --- Visualization 1: Time Series Plot of Predictions vs Actuals ---

# Station 1 (LSTM)
if 'y_true_s1' in locals() and 'y_pred_s1' in locals():
    plt.figure(figsize=(15, 6))
    # Get corresponding dates from the original dataframe index
    test_dates_s1 = df_processed_s1.index[test_start_index_s1 : test_start_index_s1 + len(y_true_s1)]
    plt.plot(test_dates_s1, y_true_s1.flatten(), label='True Observations (S1)', color='blue')
    plt.plot(test_dates_s1, y_pred_s1.flatten(), label='LSTM Predictions (S1)', color='red', alpha=0.8)
    if 'nwm_test_s1' in locals() and len(nwm_test_s1) == len(y_true_s1):
         plt.plot(test_dates_s1, nwm_test_s1, label='NWM Forecast Lead 1 (S1)', color='green', alpha=0.6, linestyle='--')
    plt.title('Station 21609641: LSTM Forecast vs Actual Runoff (Test Set)')
    plt.xlabel('Date')
    plt.ylabel('Runoff')
    plt.legend()
    plt.show()

# Station 2 (Transformer)
if 'y_true_s2' in locals() and 'y_pred_s2' in locals():
    plt.figure(figsize=(15, 6))
    test_dates_s2 = df_processed_s2.index[test_start_index_s2 : test_start_index_s2 + len(y_true_s2)]
    plt.plot(test_dates_s2, y_true_s2.flatten(), label='True Observations (S2)', color='blue')
    plt.plot(test_dates_s2, y_pred_s2.flatten(), label='Transformer Predictions (S2)', color='red', alpha=0.8)
    if 'nwm_test_s2' in locals() and len(nwm_test_s2) == len(y_true_s2):
         plt.plot(test_dates_s2, nwm_test_s2, label='NWM Forecast Lead 1 (S2)', color='green', alpha=0.6, linestyle='--')
    plt.title('Station 20380357: Transformer Forecast vs Actual Runoff (Test Set)')
    plt.xlabel('Date')
    plt.ylabel('Runoff')
    plt.legend()
    plt.show()


# --- Visualization 2: Box Plots (Example for Station 1) ---
# This requires predictions for multiple lead times or grouping data somehow (e.g., by season)
# For simplicity, let's just plot the distribution of errors

if 'y_true_s1' in locals() and 'y_pred_s1' in locals():
    errors_lstm_s1 = y_true_s1.flatten() - y_pred_s1.flatten()
    plot_data = {'LSTM Errors (S1)': errors_lstm_s1}
    if 'nwm_test_s1' in locals() and len(nwm_test_s1) == len(y_true_s1):
        errors_nwm_s1 = y_true_s1.flatten() - nwm_test_s1
        plot_data['NWM Errors (S1)'] = errors_nwm_s1

    plt.figure(figsize=(8, 5))
    sns.boxplot(data=pd.DataFrame(plot_data))
    plt.title('Distribution of Prediction Errors (Station 1)')
    plt.ylabel('Error (True - Predicted)')
    plt.grid(axis='y', linestyle='--')
    plt.show()

# Add similar box plot for Station 2 if data is available
if 'y_true_s2' in locals() and 'y_pred_s2' in locals():
    errors_transformer_s2 = y_true_s2.flatten() - y_pred_s2.flatten()
    plot_data_s2 = {'Transformer Errors (S2)': errors_transformer_s2}
    if 'nwm_test_s2' in locals() and len(nwm_test_s2) == len(y_true_s2):
        errors_nwm_s2 = y_true_s2.flatten() - nwm_test_s2
        plot_data_s2['NWM Errors (S2)'] = errors_nwm_s2

    plt.figure(figsize=(8, 5))
    sns.boxplot(data=pd.DataFrame(plot_data_s2))
    plt.title('Distribution of Prediction Errors (Station 2)')
    plt.ylabel('Error (True - Predicted)')
    plt.grid(axis='y', linestyle='--')
    plt.show()


# --- Visualization 3: Metrics Comparison (Bar Chart) ---
if not results_df_s1.empty:
    results_df_s1.plot(kind='bar', figsize=(12, 6))
    plt.title('Evaluation Metrics Comparison (Station 1)')
    plt.ylabel('Metric Value')
    plt.xticks(rotation=0)
    plt.grid(axis='y', linestyle='--')
    plt.tight_layout()
    plt.show()

if not results_df_s2.empty:
    results_df_s2.plot(kind='bar', figsize=(12, 6))
    plt.title('Evaluation Metrics Comparison (Station 2)')
    plt.ylabel('Metric Value')
    plt.xticks(rotation=0)
    plt.grid(axis='y', linestyle='--')
    plt.tight_layout()
    plt.show()

# Add more visualizations as needed (e.g., scatter plots of true vs predicted)

## Compare Model Performance
Analyze and compare the performance of LSTM and Transformer models for both stations. Highlight differences in behavior and accuracy based on the evaluation metrics and visualizations.

**Station 21609641 (LSTM):**
*   Analyze the LSTM model's performance based on RMSE, CC, PBIAS, NSE compared to the raw NWM forecast.
*   Discuss the shape of the error distribution (from box plot). Is there bias? How wide is the spread?
*   Examine the time series plot: Does the LSTM capture peaks and troughs better than NWM? Are there lags?

**Station 20380357 (Transformer):**
*   Analyze the Transformer model's performance similarly.
*   Compare its metrics to the raw NWM forecast for this station.
*   Discuss its error distribution and time series behavior.

**Overall Comparison:**
*   Which model type performed better overall, considering the metrics? (Note: They are applied to different stations here, so direct comparison is tricky unless the stations/data are very similar).
*   Did one model type show specific strengths (e.g., capturing extremes, lower bias)?
*   Relate performance differences to potential characteristics of the data for each station or the inherent differences between LSTM (sequential processing) and Transformer (attention mechanism) architectures.
*   Discuss potential reasons for observed performance (e.g., data quality, sequence length choice, model complexity).
*   Suggest future improvements or experiments (e.g., hyperparameter tuning, different features, longer sequences, different model variants).