# üöÄ Advanced Deep Learning Models - Extended Edition

**Komplette Suite moderner DL-Architekturen mit GPU-Beschleunigung**

## üìã Modelle in diesem Notebook:

### Basis DL-Modelle:
1. **LSTM** - Standard Long Short-Term Memory
2. **Bi-LSTM** - Bidirectional LSTM
3. **GRU** - Gated Recurrent Unit (schneller als LSTM)

### Generative Modelle:
4. **Autoencoder** - Dimensionsreduktion + Forecasting
5. **VAE** - Variational Autoencoder (mit Unsicherheitssch√§tzung)
6. **TimeGAN** - Generative Adversarial Network f√ºr Zeitreihen

### Advanced/Transformer:
7. **N-BEATS** - Neural Basis Expansion
8. **N-HiTS** - Hierarchical Interpolation
9. **DeepAR** - Amazon's probabilistisches Modell
10. **TFT** - Temporal Fusion Transformer (State-of-the-Art)

### Hinweise zur Rechenzeit (GPU T4):
- ‚úÖ **Schnell** (<5 Min): LSTM, GRU, Bi-LSTM, Autoencoder, VAE
- ‚ö†Ô∏è **Mittel** (5-15 Min): N-BEATS, N-HiTS, DeepAR
- üî• **Langsam** (15-45 Min): TFT, TimeGAN

**Setup:** Runtime ‚Üí Change runtime type ‚Üí GPU (T4 empfohlen, A100 f√ºr TFT)

In [1]:
# Check GPU
import tensorflow as tf
print("TensorFlow version:", tf.__version__)
print("GPU Available:", tf.config.list_physical_devices('GPU'))
print("\nüöÄ GPU should show above!")

TensorFlow version: 2.19.0
GPU Available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

üöÄ GPU should show above!


In [2]:
# Clone Repository
!git clone https://github.com/chradden/AdvancedTimeSeriesPrediction.git
%cd AdvancedTimeSeriesPrediction/energy-timeseries-project

Cloning into 'AdvancedTimeSeriesPrediction'...
remote: Enumerating objects: 788, done.[K
remote: Counting objects: 100% (106/106), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 788 (delta 38), reused 86 (delta 26), pack-reused 682 (from 1)[K
Receiving objects: 100% (788/788), 73.09 MiB | 15.21 MiB/s, done.
Resolving deltas: 100% (283/283), done.
/content/AdvancedTimeSeriesPrediction/energy-timeseries-project


In [3]:
# Install ALL Dependencies
print("üì¶ Installing packages (this may take 2-3 minutes)...")
!pip install -q pandas numpy matplotlib seaborn scikit-learn
!pip install -q tensorflow keras pytorch-lightning
!pip install -q 'darts[torch]'  # N-BEATS, N-HiTS, TFT, DeepAR
!pip install -q gluonts  # DeepAR alternative
print("‚úÖ Installation complete!")

üì¶ Installing packages (this may take 2-3 minutes)...
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m857.3/857.3 kB[0m [31m60.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m983.2/983.2 kB[0m [31m73.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m62.0/62.0 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m46.3/46.3 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m

In [4]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
import time
import warnings
warnings.filterwarnings('ignore')

# Set seeds
np.random.seed(42)
tf.random.set_seed(42)

# GPU Config
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print(f"‚úÖ GPU configured: {len(gpus)} device(s)")
    except RuntimeError as e:
        print(e)
else:
    print("‚ö†Ô∏è No GPU found - training will be slow!")

print("\nüìä All imports successful!")

‚úÖ GPU configured: 1 device(s)

üìä All imports successful!


## ‚öôÔ∏è Configuration

In [5]:
# ============================================================================
# CONFIGURATION
# ============================================================================

SERIES_NAME = 'wind_onshore'  # √Ñndern: 'solar', 'wind_offshore', 'wind_onshore', 'price', 'consumption'

# Model Selection (setze auf False um Modelle zu √ºberspringen)
RUN_BASIC = True          # LSTM, GRU, Bi-LSTM (~5 min)
RUN_GENERATIVE = True     # Autoencoder, VAE (~5 min)
RUN_GAN = False           # TimeGAN (~30 min, experimentell)
RUN_ADVANCED = True       # N-BEATS, N-HiTS (~10 min)
RUN_PROBABILISTIC = True  # DeepAR (~10 min)
RUN_TFT = False           # Temporal Fusion Transformer (~30-45 min)

print(f"üìä Zeitreihe: {SERIES_NAME.upper()}")
print(f"\nüéØ Aktivierte Modelle:")
if RUN_BASIC: print("   ‚úÖ Basic DL (LSTM, GRU, Bi-LSTM)")
if RUN_GENERATIVE: print("   ‚úÖ Generative (Autoencoder, VAE)")
if RUN_GAN: print("   ‚úÖ TimeGAN (experimentell, ~30 min)")
if RUN_ADVANCED: print("   ‚úÖ Advanced (N-BEATS, N-HiTS)")
if RUN_PROBABILISTIC: print("   ‚úÖ DeepAR (probabilistisch)")
if RUN_TFT: print("   ‚úÖ TFT (State-of-the-Art, ~30-45 min)")

print(f"\n‚úÖ Konfiguration abgeschlossen!")

üìä Zeitreihe: WIND_ONSHORE

üéØ Aktivierte Modelle:
   ‚úÖ Basic DL (LSTM, GRU, Bi-LSTM)
   ‚úÖ Generative (Autoencoder, VAE)
   ‚úÖ Advanced (N-BEATS, N-HiTS)
   ‚úÖ DeepAR (probabilistisch)

‚úÖ Konfiguration abgeschlossen!


## üìÇ Load Data

In [8]:
# Load data
train_df = pd.read_csv(f'data/processed/{SERIES_NAME}_train.csv')
val_df = pd.read_csv(f'data/processed/{SERIES_NAME}_val.csv')
test_df = pd.read_csv(f'data/processed/{SERIES_NAME}_test.csv')

# Determine value column
# First, check if SERIES_NAME is directly a column
if SERIES_NAME in train_df.columns:
    value_col = SERIES_NAME
elif 'value' in train_df.columns: # Fallback to a generic 'value' column
    value_col = 'value'
elif f'{SERIES_NAME.replace("onshore", "power").replace("offshore", "power")}' in train_df.columns: # Specific check for wind_power
    value_col = f'{SERIES_NAME.replace("onshore", "power").replace("offshore", "power")}'
else:
    # If neither SERIES_NAME nor 'value' is found, try the original list of options
    potential_cols = [c for c in train_df.columns if c in ['solar', 'price', 'wind_offshore', 'wind_onshore', 'consumption', 'wind_power']]
    if potential_cols:
        value_col = potential_cols[0]
    else:
        # If no suitable column is found after all attempts, raise an error
        print(f"ERROR: Could not find a suitable value column for SERIES_NAME: '{SERIES_NAME}'.")
        print(f"       Expected columns (or generic 'value'): ['{SERIES_NAME}', 'value', 'solar', 'price', 'wind_offshore', 'wind_onshore', 'consumption', 'wind_power'].")
        print(f"       Actual columns in train_df: {train_df.columns.tolist()}")
        raise ValueError("No valid value column found in the DataFrame.")

feature_cols = [c for c in train_df.columns if c not in ['timestamp', value_col]]

print(f"üìÇ Data loaded for: {SERIES_NAME.upper()}")
print(f"   Train: {len(train_df)} | Val: {len(val_df)} | Test: {len(test_df)}")
print(f"   Value column: {value_col}")
print(f"   Features: {len(feature_cols)}")

üìÇ Data loaded for: WIND_ONSHORE
   Train: 21697 | Val: 2232 | Test: 2208
   Value column: wind_power
   Features: 27


## üîß Prepare Data

In [9]:
def create_sequences(data, target, seq_length):
    """Create sequences for RNN models"""
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i + seq_length])
        y.append(target[i + seq_length])
    return np.array(X), np.array(y)

# Scale data
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train = scaler_X.fit_transform(train_df[feature_cols])
y_train = scaler_y.fit_transform(train_df[[value_col]])

X_val = scaler_X.transform(val_df[feature_cols])
y_val = scaler_y.transform(val_df[[value_col]])

X_test = scaler_X.transform(test_df[feature_cols])
y_test_orig = test_df[value_col].values

# Create sequences
seq_length = 24
X_train_seq, y_train_seq = create_sequences(X_train, y_train.flatten(), seq_length)
X_val_seq, y_val_seq = create_sequences(X_val, y_val.flatten(), seq_length)
X_test_seq, _ = create_sequences(X_test, np.zeros(len(X_test)), seq_length)
y_test_seq = y_test_orig[seq_length:]

print(f"‚úÖ Data prepared:")
print(f"   X_train_seq: {X_train_seq.shape}")
print(f"   y_test_seq: {y_test_seq.shape}")

# Storage for results
all_results = []

‚úÖ Data prepared:
   X_train_seq: (21673, 24, 27)
   y_test_seq: (2184,)


---
# üîµ BASIC MODELS
---

## üß™ Model 1: LSTM

In [10]:
if RUN_BASIC:
    print("="*80)
    print("üß™ MODEL 1: LSTM")
    print("="*80)

    # Build
    model_lstm = keras.Sequential([
        layers.LSTM(64, activation='relu', return_sequences=False,
                   input_shape=(seq_length, len(feature_cols))),
        layers.Dropout(0.2),
        layers.Dense(32, activation='relu'),
        layers.Dense(1)
    ])

    model_lstm.compile(optimizer=keras.optimizers.Adam(0.001), loss='mse')

    # Train
    start = time.time()
    history = model_lstm.fit(
        X_train_seq, y_train_seq,
        validation_data=(X_val_seq, y_val_seq),
        epochs=100, batch_size=64,
        callbacks=[
            EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
            ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5)
        ],
        verbose=0
    )
    train_time = time.time() - start

    # Evaluate
    y_pred_scaled = model_lstm.predict(X_test_seq, verbose=0)
    y_pred = scaler_y.inverse_transform(y_pred_scaled).flatten()

    r2 = r2_score(y_test_seq, y_pred)
    rmse = np.sqrt(mean_squared_error(y_test_seq, y_pred))
    mae = mean_absolute_error(y_test_seq, y_pred)

    print(f"\nüìä LSTM RESULTS:")
    print(f"   R¬≤ = {r2:.4f}")
    print(f"   RMSE = {rmse:.2f}")
    print(f"   MAE = {mae:.2f}")
    print(f"   Time = {train_time:.1f}s")

    all_results.append({
        'Model': 'LSTM',
        'R¬≤': r2,
        'RMSE': rmse,
        'MAE': mae,
        'Time (s)': train_time
    })

üß™ MODEL 1: LSTM

üìä LSTM RESULTS:
   R¬≤ = 0.9548
   RMSE = 397.74
   MAE = 290.85
   Time = 22.7s


## üß™ Model 2: GRU (‚≠ê NEU!)

In [11]:
if RUN_BASIC:
    print("="*80)
    print("üß™ MODEL 2: GRU (Gated Recurrent Unit)")
    print("="*80)

    # Build GRU (schneller als LSTM!)
    model_gru = keras.Sequential([
        layers.GRU(64, activation='relu', return_sequences=False,
                  input_shape=(seq_length, len(feature_cols))),
        layers.Dropout(0.2),
        layers.Dense(32, activation='relu'),
        layers.Dense(1)
    ])

    model_gru.compile(optimizer=keras.optimizers.Adam(0.001), loss='mse')

    # Train
    start = time.time()
    history = model_gru.fit(
        X_train_seq, y_train_seq,
        validation_data=(X_val_seq, y_val_seq),
        epochs=100, batch_size=64,
        callbacks=[
            EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
            ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5)
        ],
        verbose=0
    )
    train_time_gru = time.time() - start

    # Evaluate
    y_pred_gru_scaled = model_gru.predict(X_test_seq, verbose=0)
    y_pred_gru = scaler_y.inverse_transform(y_pred_gru_scaled).flatten()

    r2_gru = r2_score(y_test_seq, y_pred_gru)
    rmse_gru = np.sqrt(mean_squared_error(y_test_seq, y_pred_gru))
    mae_gru = mean_absolute_error(y_test_seq, y_pred_gru)

    print(f"\nüìä GRU RESULTS:")
    print(f"   R¬≤ = {r2_gru:.4f}")
    print(f"   RMSE = {rmse_gru:.2f}")
    print(f"   MAE = {mae_gru:.2f}")
    print(f"   Time = {train_time_gru:.1f}s")
    print(f"\nüí° GRU vs LSTM: {((train_time - train_time_gru) / train_time * 100):.1f}% faster!")

    all_results.append({
        'Model': 'GRU',
        'R¬≤': r2_gru,
        'RMSE': rmse_gru,
        'MAE': mae_gru,
        'Time (s)': train_time_gru
    })

üß™ MODEL 2: GRU (Gated Recurrent Unit)

üìä GRU RESULTS:
   R¬≤ = 0.9532
   RMSE = 405.06
   MAE = 312.30
   Time = 23.1s

üí° GRU vs LSTM: -1.9% faster!


## üß™ Model 3: Bi-LSTM

In [12]:
if RUN_BASIC:
    print("="*80)
    print("üß™ MODEL 3: Bi-LSTM (Bidirectional)")
    print("="*80)

    # Build
    model_bilstm = keras.Sequential([
        layers.Bidirectional(layers.LSTM(64, activation='relu', return_sequences=True),
                           input_shape=(seq_length, len(feature_cols))),
        layers.Dropout(0.2),
        layers.Bidirectional(layers.LSTM(32, activation='relu')),
        layers.Dropout(0.2),
        layers.Dense(16, activation='relu'),
        layers.Dense(1)
    ])

    model_bilstm.compile(optimizer=keras.optimizers.Adam(0.001), loss='mse')

    # Train
    start = time.time()
    history = model_bilstm.fit(
        X_train_seq, y_train_seq,
        validation_data=(X_val_seq, y_val_seq),
        epochs=100, batch_size=64,
        callbacks=[
            EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True),
            ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5)
        ],
        verbose=0
    )
    train_time_bilstm = time.time() - start

    # Evaluate
    y_pred_bilstm_scaled = model_bilstm.predict(X_test_seq, verbose=0)
    y_pred_bilstm = scaler_y.inverse_transform(y_pred_bilstm_scaled).flatten()

    r2_bilstm = r2_score(y_test_seq, y_pred_bilstm)
    rmse_bilstm = np.sqrt(mean_squared_error(y_test_seq, y_pred_bilstm))
    mae_bilstm = mean_absolute_error(y_test_seq, y_pred_bilstm)

    print(f"\nüìä BI-LSTM RESULTS:")
    print(f"   R¬≤ = {r2_bilstm:.4f}")
    print(f"   RMSE = {rmse_bilstm:.2f}")
    print(f"   MAE = {mae_bilstm:.2f}")
    print(f"   Time = {train_time_bilstm:.1f}s")

    all_results.append({
        'Model': 'Bi-LSTM',
        'R¬≤': r2_bilstm,
        'RMSE': rmse_bilstm,
        'MAE': mae_bilstm,
        'Time (s)': train_time_bilstm
    })

üß™ MODEL 3: Bi-LSTM (Bidirectional)

üìä BI-LSTM RESULTS:
   R¬≤ = 0.9522
   RMSE = 409.37
   MAE = 311.78
   Time = 60.8s


---
# üü¢ GENERATIVE MODELS
---

## üß™ Model 4: Autoencoder

In [13]:
if RUN_GENERATIVE:
    print("="*80)
    print("üß™ MODEL 4: Autoencoder")
    print("="*80)

    # Build Autoencoder
    encoding_dim = 32
    input_ae = layers.Input(shape=(seq_length, len(feature_cols)))
    encoded = layers.LSTM(64, activation='relu', return_sequences=True)(input_ae)
    encoded = layers.LSTM(encoding_dim, activation='relu')(encoded)

    decoded = layers.RepeatVector(seq_length)(encoded)
    decoded = layers.LSTM(64, activation='relu', return_sequences=True)(decoded)
    decoded = layers.TimeDistributed(layers.Dense(len(feature_cols)))(decoded)

    autoencoder = keras.Model(input_ae, decoded)
    autoencoder.compile(optimizer='adam', loss='mse')
    encoder = keras.Model(input_ae, encoded)

    # Train Autoencoder
    start = time.time()
    autoencoder.fit(
        X_train_seq, X_train_seq,
        validation_data=(X_val_seq, X_val_seq),
        epochs=50, batch_size=64,
        callbacks=[EarlyStopping(patience=10, restore_best_weights=True)],
        verbose=0
    )

    # Train Forecast Head
    encoded_train = encoder.predict(X_train_seq, verbose=0)
    encoded_val = encoder.predict(X_val_seq, verbose=0)
    encoded_test = encoder.predict(X_test_seq, verbose=0)

    forecast_head = keras.Sequential([
        layers.Dense(16, activation='relu', input_shape=(encoding_dim,)),
        layers.Dense(1)
    ])
    forecast_head.compile(optimizer='adam', loss='mse')
    forecast_head.fit(
        encoded_train, y_train_seq,
        validation_data=(encoded_val, y_val_seq),
        epochs=50, batch_size=64,
        callbacks=[EarlyStopping(patience=10, restore_best_weights=True)],
        verbose=0
    )
    train_time_ae = time.time() - start

    # Evaluate
    y_pred_ae_scaled = forecast_head.predict(encoded_test, verbose=0)
    y_pred_ae = scaler_y.inverse_transform(y_pred_ae_scaled).flatten()

    r2_ae = r2_score(y_test_seq, y_pred_ae)
    rmse_ae = np.sqrt(mean_squared_error(y_test_seq, y_pred_ae))
    mae_ae = mean_absolute_error(y_test_seq, y_pred_ae)

    print(f"\nüìä AUTOENCODER RESULTS:")
    print(f"   R¬≤ = {r2_ae:.4f}")
    print(f"   RMSE = {rmse_ae:.2f}")
    print(f"   MAE = {mae_ae:.2f}")
    print(f"   Time = {train_time_ae:.1f}s")

    all_results.append({
        'Model': 'Autoencoder',
        'R¬≤': r2_ae,
        'RMSE': rmse_ae,
        'MAE': mae_ae,
        'Time (s)': train_time_ae
    })

üß™ MODEL 4: Autoencoder

üìä AUTOENCODER RESULTS:
   R¬≤ = 0.8782
   RMSE = 653.26
   MAE = 501.30
   Time = 187.2s


## üß™ Model 5: VAE

In [14]:
if RUN_GENERATIVE:
    print("="*80)
    print("üß™ MODEL 5: VAE (Variational Autoencoder)")
    print("="*80)

    # Build VAE parts as separate functional models
    latent_dim = 32

    # Encoder Model
    input_vae_enc = layers.Input(shape=(seq_length, len(feature_cols)))
    x_enc = layers.LSTM(64, activation='relu', return_sequences=True)(input_vae_enc)
    x_enc = layers.LSTM(64, activation='relu')(x_enc)

    z_mean = layers.Dense(latent_dim)(x_enc)
    z_log_var = layers.Dense(latent_dim)(x_enc)

    def sampling(args):
        z_mean, z_log_var = args
        epsilon = tf.random.normal(shape=tf.shape(z_mean))
        return z_mean + tf.exp(0.5 * z_log_var) * epsilon

    z = layers.Lambda(sampling)([z_mean, z_log_var])

    encoder_model = keras.Model(input_vae_enc, [z_mean, z_log_var, z], name="encoder")

    # Decoder Model
    decoder_input = layers.Input(shape=(latent_dim,))
    x_decoded = layers.RepeatVector(seq_length)(decoder_input)
    x_decoded = layers.LSTM(64, activation='relu', return_sequences=True)(x_decoded)
    x_decoded = layers.TimeDistributed(layers.Dense(len(feature_cols)))(x_decoded)
    decoder_model = keras.Model(decoder_input, x_decoded, name="decoder")

    # Custom VAE Model subclass
    class CustomVAE(keras.Model):
        def __init__(self, encoder, decoder, **kwargs):
            super().__init__(**kwargs)
            self.encoder = encoder
            self.decoder = decoder
            self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
            self.reconstruction_loss_tracker = keras.metrics.Mean(name="reconstruction_loss")
            self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")

        def call(self, inputs):
            _, _, z = self.encoder(inputs)
            reconstruction = self.decoder(z)
            return reconstruction

        def train_step(self, data):
            # Unpack the data. We use X_train_seq as both input and target for reconstruction.
            if isinstance(data, tuple):
                inputs, _ = data
            else:
                inputs = data

            with tf.GradientTape() as tape:
                z_mean, z_log_var, z = self.encoder(inputs)
                reconstruction = self.decoder(z)

                # Calculate reconstruction loss
                reconstruction_loss = keras.ops.mean(keras.ops.sum(keras.losses.mse(inputs, reconstruction), axis=-1))

                # Calculate KL divergence loss
                kl_loss = -0.5 * keras.ops.mean(1 + z_log_var - keras.ops.square(z_mean) - keras.ops.exp(z_log_var))

                total_loss = reconstruction_loss + kl_loss

            grads = tape.gradient(total_loss, self.trainable_weights)
            self.optimizer.apply_gradients(zip(grads, self.trainable_weights))

            self.total_loss_tracker.update_state(total_loss)
            self.reconstruction_loss_tracker.update_state(reconstruction_loss)
            self.kl_loss_tracker.update_state(kl_loss)

            return {
                "loss": self.total_loss_tracker.result(),
                "reconstruction_loss": self.reconstruction_loss_tracker.result(),
                "kl_loss": self.kl_loss_tracker.result(),
            }

        @property
        def metrics(self):
            return [
                self.total_loss_tracker,
                self.reconstruction_loss_tracker,
                self.kl_loss_tracker,
            ]

    # Instantiate the custom VAE model
    vae = CustomVAE(encoder_model, decoder_model)
    # Provide a dummy loss function to satisfy the compile() check.
    # The actual loss computation is handled within the custom train_step.
    vae.compile(optimizer='adam', loss=keras.losses.MeanSquaredError())

    # The encoder used for generating latent space representations for the forecast head
    prediction_encoder_vae = keras.Model(input_vae_enc, z_mean, name="prediction_encoder_vae")

    # Train VAE
    start = time.time()
    vae.fit(
        X_train_seq, X_train_seq, # VAE trains on its inputs
        validation_data=(X_val_seq, X_val_seq),
        epochs=50, batch_size=64,
        callbacks=[EarlyStopping(patience=10, restore_best_weights=True)],
        verbose=0
    )

    # Train Forecast Head
    encoded_vae_train = prediction_encoder_vae.predict(X_train_seq, verbose=0)
    encoded_vae_val = prediction_encoder_vae.predict(X_val_seq, verbose=0)
    encoded_vae_test = prediction_encoder_vae.predict(X_test_seq, verbose=0)

    forecast_head_vae = keras.Sequential([
        layers.Dense(16, activation='relu', input_shape=(latent_dim,)),
        layers.Dense(1)
    ])
    forecast_head_vae.compile(optimizer='adam', loss='mse')
    forecast_head_vae.fit(
        encoded_vae_train, y_train_seq,
        validation_data=(encoded_vae_val, y_val_seq),
        epochs=50, batch_size=64,
        callbacks=[EarlyStopping(patience=10, restore_best_weights=True)],
        verbose=0
    )
    train_time_vae = time.time() - start

    # Evaluate
    y_pred_vae_scaled = forecast_head_vae.predict(encoded_vae_test, verbose=0)
    y_pred_vae = scaler_y.inverse_transform(y_pred_vae_scaled).flatten()

    r2_vae = r2_score(y_test_seq, y_pred_vae)
    rmse_vae = np.sqrt(mean_squared_error(y_test_seq, y_pred_vae))
    mae_vae = mean_absolute_error(y_test_seq, y_pred_vae)

    print(f"\nüìä VAE RESULTS:")
    print(f"   R¬≤ = {r2_vae:.4f}")
    print(f"   RMSE = {rmse_vae:.2f}")
    print(f"   MAE = {mae_vae:.2f}")
    print(f"   Time = {train_time_vae:.1f}s")

    all_results.append({
        'Model': 'VAE',
        'R¬≤': r2_vae,
        'RMSE': rmse_vae,
        'MAE': mae_vae,
        'Time (s)': train_time_vae
    })

üß™ MODEL 5: VAE (Variational Autoencoder)

üìä VAE RESULTS:
   R¬≤ = 0.8578
   RMSE = 705.88
   MAE = 550.90
   Time = 195.8s


---
# üî¥ ADVANCED MODELS (Darts Framework)
---

In [15]:
# Prepare Darts TimeSeries (needed for N-BEATS, N-HiTS, DeepAR, TFT)
if RUN_ADVANCED or RUN_PROBABILISTIC or RUN_TFT:
    from darts import TimeSeries
    from darts.dataprocessing.transformers import Scaler as DartsScaler

    ts_train = TimeSeries.from_values(train_df[value_col].values)
    ts_val = TimeSeries.from_values(val_df[value_col].values)
    ts_test = TimeSeries.from_values(test_df[value_col].values)

    scaler_darts = DartsScaler()
    ts_train_scaled = scaler_darts.fit_transform(ts_train)
    ts_val_scaled = scaler_darts.transform(ts_val)

    print("‚úÖ Darts TimeSeries prepared")

‚úÖ Darts TimeSeries prepared


## üß™ Model 6: N-BEATS (‚≠ê NEU!)

In [16]:
if RUN_ADVANCED:
    from darts.models import NBEATSModel

    print("="*80)
    print("üß™ MODEL 6: N-BEATS (Neural Basis Expansion)")
    print("="*80)

    model_nbeats = NBEATSModel(
        input_chunk_length=24,
        output_chunk_length=1,
        n_epochs=100,
        batch_size=64,
        pl_trainer_kwargs={
            "accelerator": "gpu",
            "devices": 1,
            "enable_progress_bar": False
        },
        force_reset=True,
        save_checkpoints=False
    )

    start = time.time()
    model_nbeats.fit(series=ts_train_scaled, val_series=ts_val_scaled, verbose=False)
    train_time_nbeats = time.time() - start

    # Predict
    n_pred = len(ts_test)
    pred_nbeats_scaled = model_nbeats.predict(n=n_pred, series=ts_train_scaled)
    pred_nbeats = scaler_darts.inverse_transform(pred_nbeats_scaled)

    y_pred_nbeats = pred_nbeats.values().flatten()
    y_test_nbeats = ts_test.values().flatten()
    min_len = min(len(y_pred_nbeats), len(y_test_nbeats))
    y_pred_nbeats = y_pred_nbeats[:min_len]
    y_test_nbeats = y_test_nbeats[:min_len]

    r2_nbeats = r2_score(y_test_nbeats, y_pred_nbeats)
    rmse_nbeats = np.sqrt(mean_squared_error(y_test_nbeats, y_pred_nbeats))
    mae_nbeats = mean_absolute_error(y_test_nbeats, y_pred_nbeats)

    print(f"\nüìä N-BEATS RESULTS:")
    print(f"   R¬≤ = {r2_nbeats:.4f}")
    print(f"   RMSE = {rmse_nbeats:.2f}")
    print(f"   MAE = {mae_nbeats:.2f}")
    print(f"   Time = {train_time_nbeats:.1f}s")

    all_results.append({
        'Model': 'N-BEATS',
        'R¬≤': r2_nbeats,
        'RMSE': rmse_nbeats,
        'MAE': mae_nbeats,
        'Time (s)': train_time_nbeats
    })

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.utilities.rank_zero:You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


üß™ MODEL 6: N-BEATS (Neural Basis Expansion)


INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=100` reached.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



üìä N-BEATS RESULTS:
   R¬≤ = -4.6288
   RMSE = 4449.91
   MAE = 4025.21
   Time = 1960.6s


## üß™ Model 7: N-HiTS (‚≠ê NEU!)

In [17]:
if RUN_ADVANCED:
    from darts.models import NHiTSModel

    print("="*80)
    print("üß™ MODEL 7: N-HiTS (Hierarchical Interpolation)")
    print("="*80)

    model_nhits = NHiTSModel(
        input_chunk_length=24,
        output_chunk_length=1,
        n_epochs=100,
        batch_size=64,
        pl_trainer_kwargs={
            "accelerator": "gpu",
            "devices": 1,
            "enable_progress_bar": False
        },
        force_reset=True,
        save_checkpoints=False
    )

    start = time.time()
    model_nhits.fit(series=ts_train_scaled, val_series=ts_val_scaled, verbose=False)
    train_time_nhits = time.time() - start

    # Predict
    pred_nhits_scaled = model_nhits.predict(n=n_pred, series=ts_train_scaled)
    pred_nhits = scaler_darts.inverse_transform(pred_nhits_scaled)

    y_pred_nhits = pred_nhits.values().flatten()
    y_test_nhits = ts_test.values().flatten()
    min_len = min(len(y_pred_nhits), len(y_test_nhits))
    y_pred_nhits = y_pred_nhits[:min_len]
    y_test_nhits = y_test_nhits[:min_len]

    r2_nhits = r2_score(y_test_nhits, y_pred_nhits)
    rmse_nhits = np.sqrt(mean_squared_error(y_test_nhits, y_pred_nhits))
    mae_nhits = mean_absolute_error(y_test_nhits, y_pred_nhits)

    print(f"\nüìä N-HiTS RESULTS:")
    print(f"   R¬≤ = {r2_nhits:.4f}")
    print(f"   RMSE = {rmse_nhits:.2f}")
    print(f"   MAE = {mae_nhits:.2f}")
    print(f"   Time = {train_time_nhits:.1f}s")

    all_results.append({
        'Model': 'N-HiTS',
        'R¬≤': r2_nhits,
        'RMSE': rmse_nhits,
        'MAE': mae_nhits,
        'Time (s)': train_time_nhits
    })

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


üß™ MODEL 7: N-HiTS (Hierarchical Interpolation)


INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=100` reached.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



üìä N-HiTS RESULTS:
   R¬≤ = -1020362908632544335979385413523074054721873770507200578521543108568222094789179351122159208850598667152323357247732863162432485555072804285410359907129289700071309282501492132709475073834017183592611840.0000
   RMSE = 59912894252966492245340209921589062559646374018397585260939069088464186146647402062028191637713190912000.00
   MAE = 5512239490579539786223312351368323695143320159634953432993759504505395117323920727764990542383260106752.00
   Time = 259.7s


## üß™ Model 8: DeepAR (‚≠ê NEU! - Probabilistisch)

In [18]:
if RUN_PROBABILISTIC:
    from darts.models import RNNModel
    from darts.utils.likelihood_models import GaussianLikelihood # Import GaussianLikelihood

    print("="*80)
    print("üß™ MODEL 8: DeepAR (Probabilistic Forecasting)")
    print("="*80)

    # DeepAR via Darts' RNNModel with probabilistic output
    model_deepar = RNNModel(
        model='LSTM',
        input_chunk_length=24,
        training_length=48,
        n_epochs=100,
        batch_size=64,
        hidden_dim=64,
        n_rnn_layers=2,
        dropout=0.2,
        likelihood=GaussianLikelihood(),  # Pass an instance of GaussianLikelihood
        pl_trainer_kwargs={
            "accelerator": "gpu",
            "devices": 1,
            "enable_progress_bar": False
        },
        force_reset=True,
        save_checkpoints=False
    )

    start = time.time()
    model_deepar.fit(series=ts_train_scaled, val_series=ts_val_scaled, verbose=False)
    train_time_deepar = time.time() - start

    # Predict (median)
    pred_deepar_scaled = model_deepar.predict(n=n_pred, series=ts_train_scaled, num_samples=100)
    pred_deepar = scaler_darts.inverse_transform(pred_deepar_scaled)

    y_pred_deepar = pred_deepar.values().flatten()
    y_test_deepar = ts_test.values().flatten()
    min_len = min(len(y_pred_deepar), len(y_test_deepar))
    y_pred_deepar = y_pred_deepar[:min_len]
    y_test_deepar = y_test_deepar[:min_len]

    r2_deepar = r2_score(y_test_deepar, y_pred_deepar)
    rmse_deepar = np.sqrt(mean_squared_error(y_test_deepar, y_pred_deepar))
    mae_deepar = mean_absolute_error(y_test_deepar, y_pred_deepar)

    print(f"\nüìä DEEPAR RESULTS:")
    print(f"   R¬≤ = {r2_deepar:.4f}")
    print(f"   RMSE = {rmse_deepar:.2f}")
    print(f"   MAE = {mae_deepar:.2f}")
    print(f"   Time = {train_time_deepar:.1f}s")
    print(f"\nüí° DeepAR liefert probabilistische Forecasts (Konfidenzintervalle m√∂glich!)")

    all_results.append({
        'Model': 'DeepAR',
        'R¬≤': r2_deepar,
        'RMSE': rmse_deepar,
        'MAE': mae_deepar,
        'Time (s)': train_time_deepar
    })

INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


üß™ MODEL 8: DeepAR (Probabilistic Forecasting)


INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=100` reached.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



üìä DEEPAR RESULTS:
   R¬≤ = -1.0304
   RMSE = 2672.60
   MAE = 2167.69
   Time = 284.8s

üí° DeepAR liefert probabilistische Forecasts (Konfidenzintervalle m√∂glich!)


## üß™ Model 9: TFT (‚≠ê NEU! - State-of-the-Art)

In [19]:
if RUN_TFT:
    from darts.models import TFTModel

    print("="*80)
    print("üß™ MODEL 9: TFT (Temporal Fusion Transformer)")
    print("‚ö†Ô∏è WARNING: This can take 30-45 minutes!")
    print("="*80)

    model_tft = TFTModel(
        input_chunk_length=24,
        output_chunk_length=1,
        hidden_size=64,
        lstm_layers=2,
        num_attention_heads=4,
        dropout=0.1,
        batch_size=64,
        n_epochs=100,
        pl_trainer_kwargs={
            "accelerator": "gpu",
            "devices": 1,
            "enable_progress_bar": True
        },
        force_reset=True,
        save_checkpoints=False
    )

    start = time.time()
    model_tft.fit(series=ts_train_scaled, val_series=ts_val_scaled, verbose=False)
    train_time_tft = time.time() - start

    # Predict
    pred_tft_scaled = model_tft.predict(n=n_pred, series=ts_train_scaled)
    pred_tft = scaler_darts.inverse_transform(pred_tft_scaled)

    y_pred_tft = pred_tft.values().flatten()
    y_test_tft = ts_test.values().flatten()
    min_len = min(len(y_pred_tft), len(y_test_tft))
    y_pred_tft = y_pred_tft[:min_len]
    y_test_tft = y_test_tft[:min_len]

    r2_tft = r2_score(y_test_tft, y_pred_tft)
    rmse_tft = np.sqrt(mean_squared_error(y_test_tft, y_pred_tft))
    mae_tft = mean_absolute_error(y_test_tft, y_pred_tft)

    print(f"\nüìä TFT RESULTS:")
    print(f"   R¬≤ = {r2_tft:.4f}")
    print(f"   RMSE = {rmse_tft:.2f}")
    print(f"   MAE = {mae_tft:.2f}")
    print(f"   Time = {train_time_tft:.1f}s ({train_time_tft/60:.1f} min)")
    print(f"\nüèÜ TFT: State-of-the-Art Transformer-basiert!")

    all_results.append({
        'Model': 'TFT',
        'R¬≤': r2_tft,
        'RMSE': rmse_tft,
        'MAE': mae_tft,
        'Time (s)': train_time_tft
    })

---
# üìä FINAL SUMMARY
---

In [20]:
# Create summary DataFrame
if all_results:
    results_df = pd.DataFrame(all_results)
    results_df = results_df.sort_values('R¬≤', ascending=False).reset_index(drop=True)

    print("\n" + "="*100)
    print(f"üèÜ FINAL RESULTS: {SERIES_NAME.upper()}")
    print("="*100)
    print(results_df.to_string(index=False))

    print("\n" + "="*100)
    print("ü•á BEST MODEL:")
    print("="*100)
    best = results_df.iloc[0]
    print(f"   Model: {best['Model']}")
    print(f"   R¬≤ = {best['R¬≤']:.4f}")
    print(f"   RMSE = {best['RMSE']:.2f}")
    print(f"   MAE = {best['MAE']:.2f}")
    print(f"   Training Time = {best['Time (s)']:.1f}s ({best['Time (s)']/60:.1f} min)")

    # Save results
    output_file = f'results/metrics/deep_learning_extended_{SERIES_NAME}.csv'
    results_df.to_csv(output_file, index=False)
    print(f"\nüíæ Ergebnisse gespeichert: {output_file}")

    # Performance insights
    print("\n" + "="*100)
    print("üí° KEY INSIGHTS:")
    print("="*100)

    if 'GRU' in results_df['Model'].values and 'LSTM' in results_df['Model'].values:
        gru_r2 = results_df[results_df['Model'] == 'GRU']['R¬≤'].values[0]
        lstm_r2 = results_df[results_df['Model'] == 'LSTM']['R¬≤'].values[0]
        print(f"   üìå GRU vs LSTM: R¬≤ {gru_r2:.4f} vs {lstm_r2:.4f}")
        if gru_r2 > lstm_r2:
            print(f"      ‚Üí GRU ist {((gru_r2 - lstm_r2) / lstm_r2 * 100):.2f}% besser!")

    print(f"\n   üìå Durchschnittliche R¬≤: {results_df['R¬≤'].mean():.4f}")
    print(f"   üìå Schnellstes Modell: {results_df.loc[results_df['Time (s)'].idxmin(), 'Model']} ({results_df['Time (s)'].min():.1f}s)")
    print(f"   üìå Langsamtes Modell: {results_df.loc[results_df['Time (s)'].idxmax(), 'Model']} ({results_df['Time (s)'].max():.1f}s)")

    negative_r2 = results_df[results_df['R¬≤'] < 0]
    if len(negative_r2) > 0:
        print(f"\n   ‚ö†Ô∏è Modelle mit negativem R¬≤ (schlecht konfiguriert):")
        for _, row in negative_r2.iterrows():
            print(f"      - {row['Model']}: R¬≤ = {row['R¬≤']:.4f}")

    print("\n" + "="*100)
    print("‚úÖ EXPERIMENT ABGESCHLOSSEN!")
    print("="*100)
else:
    print("\n‚ö†Ô∏è Keine Modelle wurden ausgef√ºhrt. Bitte aktiviere mindestens eine Modellkategorie!")


üèÜ FINAL RESULTS: WIND_ONSHORE
      Model             R¬≤          RMSE           MAE    Time (s)
       LSTM   9.548398e-01  3.977396e+02  2.908488e+02   22.701661
        GRU   9.531621e-01  4.050603e+02  3.123022e+02   23.136558
    Bi-LSTM   9.521592e-01  4.093737e+02  3.117848e+02   60.816888
Autoencoder   8.781784e-01  6.532553e+02  5.012983e+02  187.202903
        VAE   8.577591e-01  7.058834e+02  5.508991e+02  195.807557
     DeepAR  -1.030408e+00  2.672604e+03  2.167689e+03  284.756013
    N-BEATS  -4.628800e+00  4.449907e+03  4.025213e+03 1960.581286
     N-HiTS -1.020363e+201 5.991289e+103 5.512239e+102  259.688278

ü•á BEST MODEL:
   Model: LSTM
   R¬≤ = 0.9548
   RMSE = 397.74
   MAE = 290.85
   Training Time = 22.7s (0.4 min)

üíæ Ergebnisse gespeichert: results/metrics/deep_learning_extended_wind_onshore.csv

üí° KEY INSIGHTS:
   üìå GRU vs LSTM: R¬≤ 0.9532 vs 0.9548

   üìå Durchschnittliche R¬≤: -1275453635790680419974231766903842568402342213134000723151928885

## üìù Empfehlungen

### F√ºr Produktion:
1. **H√∂chste Genauigkeit**: Bestes Modell nach R¬≤ w√§hlen
2. **Balance Speed/Accuracy**: GRU oder Bi-LSTM
3. **Unsicherheitssch√§tzung**: DeepAR oder VAE
4. **State-of-the-Art**: TFT (wenn Rechenzeit kein Problem)

### F√ºr weitere Experimente:
- **TimeGAN**: Aktiviere `RUN_GAN = True` (sehr experimentell)
- **Hyperparameter-Tuning**: Optimiere die besten 2-3 Modelle weiter
- **Ensemble**: Kombiniere mehrere Top-Modelle