# 03 - Deep Learning Models

This notebook demonstrates how to build and train deep learning models for network anomaly detection.

## Overview
- Convert data to sequences for time-series models
- Build and train Keras models (LSTM, CNN, Autoencoder)
- Visualize training history and model performance
- Compare different deep learning architectures

## Models
We'll explore LSTM, 1D-CNN, and Autoencoder architectures for anomaly detection.


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import sys
import warnings
warnings.filterwarnings('ignore')

# Deep Learning imports
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers, models, callbacks
    from tensorflow.keras.utils import to_categorical
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
    TENSORFLOW_AVAILABLE = True
    print("✅ TensorFlow imported successfully!")
except ImportError as e:
    TENSORFLOW_AVAILABLE = False
    print(f"❌ TensorFlow not available: {e}")
    print("Install with: pip install tensorflow")

# Set plotting style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Set random seeds for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
if TENSORFLOW_AVAILABLE:
    tf.random.set_seed(RANDOM_SEED)


## 1. Load and Prepare Data

Let's load the processed data and prepare it for deep learning models.


In [None]:
# Load processed data
data_path = Path("../data/processed/processed.csv")

if data_path.exists():
    df = pd.read_csv(data_path)
    print(f"✅ Data loaded successfully!")
    print(f"Dataset shape: {df.shape}")
else:
    print("❌ Processed data not found. Please run preprocessing first.")
    print("Run: python ../src/preprocess.py --input ../data/raw/sample.csv --output ../data/processed/processed.csv")


In [None]:
# Separate features and target
X = df.drop(columns=['label']).values
y = df['label'].values

# Convert to binary classification (0: BENIGN, 1: Anomaly)
y_binary = (y != 0).astype(int)

print(f"Features shape: {X.shape}")
print(f"Labels shape: {y_binary.shape}")
print(f"Binary label distribution: {np.bincount(y_binary)}")
print(f"Label distribution (percentages): {np.bincount(y_binary) / len(y_binary) * 100}")


## 2. Sequence Preparation

For time-series models like LSTM and CNN, we need to convert our data into sequences.


In [None]:
def create_sequences(X, y, sequence_length=10):
    """
    Convert features into sequences for time-series models.
    
    Args:
        X: Input features
        y: Target labels
        sequence_length: Length of each sequence
        
    Returns:
        sequences: Array of sequences
        labels: Array of corresponding labels
    """
    sequences = []
    labels = []
    
    for i in range(len(X) - sequence_length + 1):
        # Take a window of features
        seq = X[i:i + sequence_length]
        # Use the label of the last element in the sequence
        label = y[i + sequence_length - 1]
        
        sequences.append(seq)
        labels.append(label)
    
    return np.array(sequences), np.array(labels)

# Create sequences
sequence_length = 10
X_seq, y_seq = create_sequences(X, y_binary, sequence_length)

print(f"Original data shape: {X.shape}")
print(f"Sequence data shape: {X_seq.shape}")
print(f"Sequence labels shape: {y_seq.shape}")
print(f"Sequence length: {sequence_length}")


In [None]:
# Split data into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X_seq, y_seq, test_size=0.2, random_state=42, stratify=y_seq
)

# Further split training data for validation
X_train, X_val, y_train, y_val = train_test_split(
    X_train, y_train, test_size=0.2, random_state=42, stratify=y_train
)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")
print(f"Test set: {X_test.shape}")
print(f"Training labels distribution: {np.bincount(y_train)}")
print(f"Test labels distribution: {np.bincount(y_test)}")


## 3. Build LSTM Model

Let's build and train an LSTM model for sequence classification.



In [None]:
if TENSORFLOW_AVAILABLE:
    # Build LSTM model
    def build_lstm_model(input_shape, num_classes=1):
        model = models.Sequential([
            # First LSTM layer with return sequences for stacked architecture
            layers.LSTM(64, return_sequences=True, input_shape=input_shape),
            layers.Dropout(0.2),
            
            # Second LSTM layer
            layers.LSTM(32, return_sequences=False),
            layers.Dropout(0.2),
            
            # Dense layers for classification
            layers.Dense(16, activation='relu'),
            layers.Dropout(0.1),
            
            # Output layer for binary classification
            layers.Dense(num_classes, activation='sigmoid')
        ])
        
        # Compile model
        model.compile(
            optimizer='adam',
            loss='binary_crossentropy',
            metrics=['accuracy', 'precision', 'recall']
        )
        
        return model
    
    # Build LSTM model
    input_shape = (X_train.shape[1], X_train.shape[2])
    lstm_model = build_lstm_model(input_shape)
    
    print("LSTM Model Architecture:")
    lstm_model.summary()
else:
    print("TensorFlow not available - cannot build LSTM model")


In [None]:
if TENSORFLOW_AVAILABLE:
    # Train LSTM model
    print("Training LSTM model...")
    
    # Define callbacks
    callbacks_list = [
        callbacks.EarlyStopping(
            monitor='val_loss',
            patience=5,
            restore_best_weights=True,
            verbose=1
        ),
        callbacks.ReduceLROnPlateau(
            monitor='val_loss',
            factor=0.5,
            patience=3,
            min_lr=1e-7,
            verbose=1
        )
    ]
    
    # Train model
    history_lstm = lstm_model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=10,  # Small number for demo
        batch_size=32,
        callbacks=callbacks_list,
        verbose=1
    )
    
    print("✅ LSTM training completed!")
else:
    print("TensorFlow not available - cannot train LSTM model")


## 4. Build CNN Model

Let's build and train a 1D CNN model for sequence classification.


In [None]:
if TENSORFLOW_AVAILABLE:
    # Build CNN model
    def build_cnn_model(input_shape, num_classes=1):
        model = models.Sequential([
            # First Conv1D layer
            layers.Conv1D(64, kernel_size=3, activation='relu', input_shape=input_shape),
            layers.BatchNormalization(),
            layers.MaxPooling1D(pool_size=2),
            layers.Dropout(0.2),
            
            # Second Conv1D layer
            layers.Conv1D(32, kernel_size=3, activation='relu'),
            layers.BatchNormalization(),
            layers.MaxPooling1D(pool_size=2),
            layers.Dropout(0.2),
            
            # Global max pooling to reduce dimensions
            layers.GlobalMaxPooling1D(),
            
            # Dense layers for classification
            layers.Dense(32, activation='relu'),
            layers.Dropout(0.3),
            layers.Dense(16, activation='relu'),
            
            # Output layer for binary classification
            layers.Dense(num_classes, activation='sigmoid')
        ])
        
        # Compile model
        model.compile(
            optimizer='adam',
            loss='binary_crossentropy',
            metrics=['accuracy', 'precision', 'recall']
        )
        
        return model
    
    # Build CNN model
    cnn_model = build_cnn_model(input_shape)
    
    print("CNN Model Architecture:")
    cnn_model.summary()
else:
    print("TensorFlow not available - cannot build CNN model")


In [None]:
if TENSORFLOW_AVAILABLE:
    # Train CNN model
    print("Training CNN model...")
    
    # Train model
    history_cnn = cnn_model.fit(
        X_train, y_train,
        validation_data=(X_val, y_val),
        epochs=10,  # Small number for demo
        batch_size=32,
        callbacks=callbacks_list,
        verbose=1
    )
    
    print("✅ CNN training completed!")
else:
    print("TensorFlow not available - cannot train CNN model")


## 5. Build Autoencoder Model

Let's build an autoencoder for unsupervised anomaly detection.


In [None]:
if TENSORFLOW_AVAILABLE:
    # Build Autoencoder model
    def build_autoencoder_model(input_dim):
        # Encoder
        encoder_input = layers.Input(shape=(input_dim,), name='encoder_input')
        encoder = layers.Dense(64, activation='relu')(encoder_input)
        encoder = layers.Dropout(0.2)(encoder)
        encoder = layers.Dense(32, activation='relu')(encoder)
        encoder = layers.Dropout(0.2)(encoder)
        encoder = layers.Dense(16, activation='relu', name='encoder_output')(encoder)
        
        # Decoder
        decoder = layers.Dense(32, activation='relu')(encoder)
        decoder = layers.Dropout(0.2)(decoder)
        decoder = layers.Dense(64, activation='relu')(decoder)
        decoder = layers.Dropout(0.2)(decoder)
        decoder = layers.Dense(input_dim, activation='linear', name='decoder_output')(decoder)
        
        # Create autoencoder model
        autoencoder = models.Model(encoder_input, decoder, name='autoencoder')
        
        # Compile model
        autoencoder.compile(
            optimizer='adam',
            loss='mse',
            metrics=['mae']
        )
        
        return autoencoder
    
    # Build Autoencoder model
    input_dim = X_train.shape[2]  # Number of features
    autoencoder_model = build_autoencoder_model(input_dim)
    
    print("Autoencoder Model Architecture:")
    autoencoder_model.summary()
else:
    print("TensorFlow not available - cannot build Autoencoder model")


In [None]:
if TENSORFLOW_AVAILABLE:
    # Train Autoencoder model
    print("Training Autoencoder model...")
    
    # For autoencoder, use only normal data for training
    normal_mask = y_train == 0
    X_train_normal = X_train[normal_mask]
    
    print(f"Training autoencoder on {len(X_train_normal)} normal samples")
    
    # Train model
    history_autoencoder = autoencoder_model.fit(
        X_train_normal, X_train_normal,  # Autoencoder: input = target
        validation_data=(X_val, X_val),
        epochs=10,  # Small number for demo
        batch_size=32,
        callbacks=callbacks_list,
        verbose=1
    )
    
    print("✅ Autoencoder training completed!")
else:
    print("TensorFlow not available - cannot train Autoencoder model")


## 6. Plot Training History

Let's visualize the training progress for all models.


In [None]:
if TENSORFLOW_AVAILABLE:
    # Plot training history for LSTM and CNN
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # LSTM Loss
    axes[0, 0].plot(history_lstm.history['loss'], label='Training Loss')
    axes[0, 0].plot(history_lstm.history['val_loss'], label='Validation Loss')
    axes[0, 0].set_title('LSTM - Loss')
    axes[0, 0].set_xlabel('Epoch')
    axes[0, 0].set_ylabel('Loss')
    axes[0, 0].legend()
    axes[0, 0].grid(True, alpha=0.3)
    
    # LSTM Accuracy
    axes[0, 1].plot(history_lstm.history['accuracy'], label='Training Accuracy')
    axes[0, 1].plot(history_lstm.history['val_accuracy'], label='Validation Accuracy')
    axes[0, 1].set_title('LSTM - Accuracy')
    axes[0, 1].set_xlabel('Epoch')
    axes[0, 1].set_ylabel('Accuracy')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # CNN Loss
    axes[1, 0].plot(history_cnn.history['loss'], label='Training Loss')
    axes[1, 0].plot(history_cnn.history['val_loss'], label='Validation Loss')
    axes[1, 0].set_title('CNN - Loss')
    axes[1, 0].set_xlabel('Epoch')
    axes[1, 0].set_ylabel('Loss')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    # CNN Accuracy
    axes[1, 1].plot(history_cnn.history['accuracy'], label='Training Accuracy')
    axes[1, 1].plot(history_cnn.history['val_accuracy'], label='Validation Accuracy')
    axes[1, 1].set_title('CNN - Accuracy')
    axes[1, 1].set_xlabel('Epoch')
    axes[1, 1].set_ylabel('Accuracy')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("TensorFlow not available - cannot plot training history")


In [None]:
if TENSORFLOW_AVAILABLE:
    # Plot Autoencoder training history
    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
    
    # Autoencoder Loss
    axes[0].plot(history_autoencoder.history['loss'], label='Training Loss')
    axes[0].plot(history_autoencoder.history['val_loss'], label='Validation Loss')
    axes[0].set_title('Autoencoder - Loss (MSE)')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Autoencoder MAE
    axes[1].plot(history_autoencoder.history['mae'], label='Training MAE')
    axes[1].plot(history_autoencoder.history['val_mae'], label='Validation MAE')
    axes[1].set_title('Autoencoder - Mean Absolute Error')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('MAE')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("TensorFlow not available - cannot plot autoencoder training history")


## 7. Model Evaluation

Let's evaluate the performance of our trained models on the test set.


In [None]:
if TENSORFLOW_AVAILABLE:
    # Evaluate LSTM model
    print("Evaluating LSTM model...")
    lstm_pred = lstm_model.predict(X_test, verbose=0)
    lstm_pred_binary = (lstm_pred > 0.5).astype(int).flatten()
    
    print("LSTM Classification Report:")
    print(classification_report(y_test, lstm_pred_binary))
    
    # Evaluate CNN model
    print("\nEvaluating CNN model...")
    cnn_pred = cnn_model.predict(X_test, verbose=0)
    cnn_pred_binary = (cnn_pred > 0.5).astype(int).flatten()
    
    print("CNN Classification Report:")
    print(classification_report(y_test, cnn_pred_binary))
    
    # Evaluate Autoencoder model
    print("\nEvaluating Autoencoder model...")
    autoencoder_pred = autoencoder_model.predict(X_test, verbose=0)
    
    # Calculate reconstruction error
    mse = np.mean(np.square(X_test - autoencoder_pred), axis=1)
    
    # Use reconstruction error as anomaly score
    threshold = np.percentile(mse, 95)  # 95th percentile as threshold
    autoencoder_pred_binary = (mse > threshold).astype(int)
    
    print("Autoencoder Classification Report:")
    print(classification_report(y_test, autoencoder_pred_binary))
else:
    print("TensorFlow not available - cannot evaluate models")


In [None]:
if TENSORFLOW_AVAILABLE:
    # Plot ROC curves for comparison
    fig, ax = plt.subplots(1, 1, figsize=(8, 6))
    
    # LSTM ROC
    fpr_lstm, tpr_lstm, _ = roc_curve(y_test, lstm_pred)
    roc_auc_lstm = roc_auc_score(y_test, lstm_pred)
    ax.plot(fpr_lstm, tpr_lstm, label=f'LSTM (AUC = {roc_auc_lstm:.4f})')
    
    # CNN ROC
    fpr_cnn, tpr_cnn, _ = roc_curve(y_test, cnn_pred)
    roc_auc_cnn = roc_auc_score(y_test, cnn_pred)
    ax.plot(fpr_cnn, tpr_cnn, label=f'CNN (AUC = {roc_auc_cnn:.4f})')
    
    # Autoencoder ROC
    fpr_ae, tpr_ae, _ = roc_curve(y_test, mse)
    roc_auc_ae = roc_auc_score(y_test, mse)
    ax.plot(fpr_ae, tpr_ae, label=f'Autoencoder (AUC = {roc_auc_ae:.4f})')
    
    # Random classifier
    ax.plot([0, 1], [0, 1], 'k--', label='Random')
    
    ax.set_xlim([0.0, 1.0])
    ax.set_ylim([0.0, 1.05])
    ax.set_xlabel('False Positive Rate')
    ax.set_ylabel('True Positive Rate')
    ax.set_title('ROC Curves Comparison')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
else:
    print("TensorFlow not available - cannot plot ROC curves")


## 8. Save Models

Let's save the trained models for future use.


In [None]:
if TENSORFLOW_AVAILABLE:
    # Create models directory if it doesn't exist
    models_dir = Path("../models")
    models_dir.mkdir(exist_ok=True)
    
    # Save models
    lstm_model.save("../models/dl_lstm.h5")
    print("✅ LSTM model saved to models/dl_lstm.h5")
    
    cnn_model.save("../models/dl_cnn.h5")
    print("✅ CNN model saved to models/dl_cnn.h5")
    
    autoencoder_model.save("../models/dl_autoencoder.h5")
    print("✅ Autoencoder model saved to models/dl_autoencoder.h5")
    
    # Save model metadata
    metadata = {
        'sequence_length': sequence_length,
        'input_shape': input_shape,
        'models_trained': ['lstm', 'cnn', 'autoencoder'],
        'training_epochs': 10,
        'random_seed': RANDOM_SEED
    }
    
    import joblib
    joblib.dump(metadata, "../models/dl_models_metadata.joblib")
    print("✅ Model metadata saved to models/dl_models_metadata.joblib")
else:
    print("TensorFlow not available - cannot save models")


## 9. Summary and Next Steps

### Key Findings:
1. **Model Performance**: [To be filled based on actual results]
2. **Best Architecture**: [To be filled based on actual results]
3. **Training Convergence**: [To be filled based on actual results]

### Model Comparison:
- **LSTM**: Good for capturing temporal dependencies in sequences
- **CNN**: Effective for local pattern recognition in sequences
- **Autoencoder**: Unsupervised approach for anomaly detection

### Next Steps:
1. **Hyperparameter Tuning**: Optimize model architectures and training parameters
2. **Ensemble Methods**: Combine multiple models for improved performance
3. **Advanced Architectures**: Try Transformer, GRU, or hybrid models
4. **Data Augmentation**: Generate more training data for better generalization
5. **Transfer Learning**: Use pre-trained models for feature extraction
6. **Model Compression**: Optimize models for deployment
7. **Real-time Inference**: Implement streaming prediction capabilities
8. **Production Deployment**: Deploy models using the FastAPI service
