# 📄 Detailed Example Notebook

Below is the complete example notebook with all the ``detailed_hwm_vs_lstm.ipynb`` cells combined for ease of use.


In [1]:
# -*- coding: utf-8 -*-
#!pip install tensorflow

"""
Adaptive Hammerstein-Wiener and LSTM Modeling on KDD Cup Dataset
===============================================================

This example demonstrates the use of the HWM toolkit for adaptive dynamic system 
modeling by applying both the Hammerstein-Wiener classifier and an LSTM neural 
network to the KDD Cup 1999 dataset. The goal is to classify network intrusions 
and evaluate the performance of intelligent models in handling complex, nonlinear 
relationships within the data.

The workflow includes:
1. **Data Loading and Resampling**: Loading the KDD Cup dataset and resampling 
   to a manageable size for efficient processing.
2. **Data Preprocessing**: Scaling numerical features and encoding categorical 
   variables to prepare the data for modeling.
3. **Model Training with Hammerstein-Wiener Classifier**: Utilizing the 
   `HammersteinWienerClassifier` for classification tasks.
4. **Hyperparameter Tuning**: Applying `RandomizedSearchCV` to optimize model 
   parameters.
5. **Evaluation and Visualization**: Assessing model performance using accuracy, 
   prediction stability score (PSS), and time-weighted accuracy (TWA), along with 
   plotting confusion matrices and ROC curves.
6. **LSTM Model Training**: Implementing an LSTM neural network to handle sequence-based 
   data and comparing its performance with the Hammerstein-Wiener classifier.

This example provides practical insights into building and evaluating intelligent 
network models using HWM and TensorFlow's Keras API.

Author: Daniel
Created on: Fri Nov  1 17:36:16 2024
"""

import os
import random
import warnings

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import randint, uniform
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.compose import ColumnTransformer
from sklearn.metrics import (
    accuracy_score, auc, confusion_matrix, roc_curve, 
    ConfusionMatrixDisplay
)
from sklearn.model_selection import (
    RandomizedSearchCV, train_test_split
)
from sklearn.preprocessing import OneHotEncoder, StandardScaler

from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential

from hwm.estimators import HammersteinWienerClassifier
from hwm.metrics import prediction_stability_score, twa_score
from hwm.utils import resample_data

# Set the data path
data_path = r'F:\repositories'

# Define column names as per KDD Cup 1999 dataset
column_names = [
    'duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes',
    'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins',
    'logged_in', 'num_compromised', 'root_shell', 'su_attempted',
    'num_root', 'num_file_creations', 'num_shells', 'num_access_files',
    'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count',
    'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate',
    'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
    'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
    'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate',
    'dst_host_srv_diff_host_rate', 'dst_host_serror_rate',
    'dst_host_srv_serror_rate', 'dst_host_rerror_rate',
    'dst_host_srv_rerror_rate', 'label'
]

# Define continuous and categorical features
continuous_features = [
    'duration', 'src_bytes', 'dst_bytes', 'wrong_fragment', 'urgent', 'hot',
    'num_failed_logins', 'num_compromised', 'num_root', 'num_file_creations',
    'num_shells', 'num_access_files', 'count', 'srv_count', 'serror_rate',
    'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate',
    'diff_srv_rate', 'srv_diff_host_rate', 'dst_host_count',
    'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate',
    'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate',
    'dst_host_serror_rate', 'dst_host_srv_serror_rate',
    'dst_host_rerror_rate', 'dst_host_srv_rerror_rate'
]

categorical_features = [
    'protocol_type', 'service', 'flag', 'land', 'logged_in',
    'is_host_login', 'is_guest_login', 'root_shell', 'su_attempted',
    'num_outbound_cmds'
]

# Load the dataset
data = pd.read_csv(
    os.path.join(data_path, 'kddcup.data_10_percent_corrected'),
    names=column_names,
    header=None
)

# Resample the dataset to 100,000 samples for efficiency
data = resample_data(data, samples=100000, random_state=42)

# Encode the target variable: 0 for 'normal.', 1 for any attack
data['label'] = data['label'].apply(lambda x: 0 if x == 'normal.' else 1)

# Define the preprocessing pipeline
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), continuous_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ]
)

# Separate features and target
X = data.drop('label', axis=1)
y = data['label']

# Apply preprocessing
X_processed = preprocessor.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_processed, y.values, test_size=0.2, random_state=42, stratify=y
)

# Define a custom ReLU transformer for the Hammerstein-Wiener model
class ReLUTransformer(BaseEstimator, TransformerMixin):
    """Custom transformer that applies the ReLU activation function."""
    
    def fit(self, X, y=None):
        """Fit method. Returns self."""
        return self

    def transform(self, X):
        """Apply ReLU activation function."""
        return np.maximum(0, X)

# Initialize the Hammerstein-Wiener Classifier
hw_model = HammersteinWienerClassifier(
    nonlinear_input_estimator=ReLUTransformer(),
    nonlinear_output_estimator=ReLUTransformer(),
    p=9,
    loss="cross_entropy",
    time_weighting="linear",
    batch_size="auto",
    optimizer='sgd',
    learning_rate=0.001,
    max_iter=173, 
    early_stopping=True,
    verbose=1, 
)

# Train the Hammerstein-Wiener Classifier
hw_model.fit(X_train, y_train)

# Define the parameter grid for RandomizedSearchCV
param_distributions = {
    'p': randint(1, 10),  # Dependency order from 1 to 10
    'batch_size': randint(32, 128),  # Batch size between 32 and 128
    'optimizer': ['sgd', 'adam', 'adagrad'],  # Optimizers to choose from
    'learning_rate': uniform(0.0001, 0.01),  # Learning rate from 0.0001 to 0.01
    'max_iter': randint(50, 200)  # Max iterations between 50 and 200
}

# Initialize the Hammerstein-Wiener Classifier with fixed components
fixed_hw_model = HammersteinWienerClassifier(
    nonlinear_input_estimator=ReLUTransformer(),
    nonlinear_output_estimator=ReLUTransformer(),
    loss="cross_entropy",
    time_weighting="linear",
    verbose=0, 
    batch_size=200, 
    early_stopping=True, 
)

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(
    estimator=fixed_hw_model,
    param_distributions=param_distributions,
    n_iter=20,  # Number of parameter settings sampled
    scoring='accuracy',  # Evaluation metric
    cv=3,  # 3-fold cross-validation
    verbose=0,
    random_state=42,
    n_jobs=-1  # Use all available cores
)

# Fit RandomizedSearchCV to find the best parameters
random_search.fit(X_train, y_train)

# Display the best parameters and the corresponding score
print("Best Parameters:", random_search.best_params_)
print("Best Score:", random_search.best_score_)

# Use the best estimator to make predictions
best_hw_model = random_search.best_estimator_
y_pred_hw = best_hw_model.predict(X_test)

# Evaluate the Hammerstein-Wiener Classifier
accuracy_hw = accuracy_score(y_test, y_pred_hw)
y_pred_proba_hw = best_hw_model.predict_proba(X_test)[:, 1]
pss_hw = prediction_stability_score(y_pred_proba_hw)
twa_hw = twa_score(y_test, y_pred_hw, alpha=0.9)

# Print evaluation metrics
print(f"Hammerstein-Wiener Classifier Accuracy: {accuracy_hw:.4f}")
print(f"Hammerstein-Wiener Classifier PSS: {pss_hw:.4f}")
print(f"Hammerstein-Wiener Classifier TWA: {twa_hw:.4f}")

def plot_results(y_true, y_pred, y_pred_proba, title):
    """
    Plots the Confusion Matrix and ROC Curve for the given predictions.
    
    Parameters
    ----------
    y_true : array-like
        True target values.
    y_pred : array-like
        Predicted target values.
    y_pred_proba : array-like
        Predicted probabilities for the positive class.
    title : str
        Title for the plots.
    """
    # Confusion Matrix
    ConfusionMatrixDisplay.from_predictions(y_true, y_pred)
    plt.title(f'Confusion Matrix - {title}')
    plt.show()

    # ROC Curve
    fpr, tpr, _ = roc_curve(y_true, y_pred_proba)
    roc_auc_score = auc(fpr, tpr)
    plt.figure()
    plt.plot(fpr, tpr, label=f'{title} (AUC = {roc_auc_score:.4f})')
    plt.plot([0, 1], [0, 1], 'k--')
    plt.title('ROC Curve')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.legend(loc='lower right')
    plt.show()

# Plot results for Hammerstein-Wiener Classifier
plot_results(y_test, y_pred_hw, y_pred_proba_hw, 'Hammerstein-Wiener Classifier')

# Determine the number of features
n_features = X_processed.shape[1]

# Define the number of timesteps
timesteps = 9  # Should match the 'p' parameter used in Hammerstein-Wiener model

def create_sequences(X, y, timesteps):
    """
    Creates input sequences and corresponding targets for LSTM.

    Parameters
    ----------
    X : ndarray
        Feature matrix.
    y : ndarray
        Target vector.
    timesteps : int
        Number of timesteps for each input sequence.

    Returns
    -------
    X_seq : ndarray
        Array of input sequences.
    y_seq : ndarray
        Array of target values corresponding to each sequence.
    """
    X_seq, y_seq = [], []
    for i in range(len(X) - timesteps):
        X_seq.append(X[i:i + timesteps])
        y_seq.append(y[i + timesteps])
    return np.array(X_seq), np.array(y_seq)

# Create sequences for LSTM
X_train_seq, y_train_seq = create_sequences(X_train, y_train, timesteps)
X_test_seq, y_test_seq = create_sequences(X_test, y_test, timesteps)

# Verify the shapes
print(f'X_train_seq shape: {X_train_seq.shape}')
print(f'y_train_seq shape: {y_train_seq.shape}')
print(f'X_test_seq shape: {X_test_seq.shape}')
print(f'y_test_seq shape: {y_test_seq.shape}')

# Build the LSTM model
lstm_model = Sequential([
    LSTM(64, input_shape=(timesteps, n_features)),
    Dense(1, activation='sigmoid')
])

# Compile the model
lstm_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Define early stopping callback
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

# Display the model architecture
lstm_model.summary()

# Train the LSTM model
lstm_history = lstm_model.fit(
    X_train_seq, y_train_seq,
    epochs=10,
    batch_size=64,
    validation_split=0.1,
    callbacks=[early_stopping],
    verbose=1
)

# Evaluate the LSTM Model
lstm_loss, lstm_accuracy = lstm_model.evaluate(X_test_seq, y_test_seq, verbose=0)
y_pred_proba_lstm = lstm_model.predict(X_test_seq).flatten()
y_pred_lstm = (y_pred_proba_lstm >= 0.5).astype(int)
pss_lstm = prediction_stability_score(y_pred_proba_lstm)
twa_lstm = twa_score(y_test_seq, y_pred_lstm, alpha=0.9)

# Print evaluation metrics
print(f"LSTM Accuracy: {lstm_accuracy:.4f}")
print(f"LSTM PSS: {pss_lstm:.4f}")
print(f"LSTM TWA: {twa_lstm:.4f}")

# Plot results for LSTM Model
plot_results(y_test_seq, y_pred_lstm, y_pred_proba_lstm, 'LSTM Model')

# Compute ROC curve for Hammerstein-Wiener Classifier
fpr_hw, tpr_hw, _ = roc_curve(y_test, y_pred_proba_hw)
roc_auc_hw = auc(fpr_hw, tpr_hw)

# Compute ROC curve for LSTM Model
fpr_lstm, tpr_lstm, _ = roc_curve(y_test_seq, y_pred_proba_lstm)
roc_auc_lstm = auc(fpr_lstm, tpr_lstm)

# Plot both ROC curves
plt.figure()
plt.plot(fpr_hw, tpr_hw, label=f'Hammerstein-Wiener (AUC = {roc_auc_hw:.4f})')
plt.plot(fpr_lstm, tpr_lstm, label=f'LSTM (AUC = {roc_auc_lstm:.4f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.title('ROC Curve Comparison')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc='lower right')
plt.show()

# Print Summary of Results
print("Summary of Results:")
print(f"Hammerstein-Wiener Classifier Accuracy: {accuracy_hw:.4f}")
print(f"Hammerstein-Wiener Classifier PSS: {pss_hw:.4f}")
print(f"Hammerstein-Wiener Classifier TWA: {twa_hw:.4f}")
print(f"LSTM Accuracy: {lstm_accuracy:.4f}")
print(f"LSTM PSS: {pss_lstm:.4f}")
print(f"LSTM TWA: {twa_lstm:.4f}")



ModuleNotFoundError: No module named 'tensorflow'

# 📄 Example Notebook Breakdown

Below is a detailed breakdown of each section in the notebook, explaining the purpose and functionality.

## 📦 Importing Necessary Libraries

We begin by importing all the required libraries and modules. This includes standard libraries for data manipulation and visualization, as well as specific modules from Scikit-learn, TensorFlow's Keras, and the HWM toolkit.

## 🗄️ Setting the Data Path

Specify the file path where the KDD Cup dataset is stored. Ensure that the path is correct to avoid file not found errors.

## 📁 Loading the KDD Cup 1999 Dataset

Load the dataset using Pandas, defining appropriate column names. We also categorize the features into continuous and categorical for preprocessing purposes.

## 🔄 Resampling the Dataset

To manage computational resources effectively, we resample the dataset to 100,000 samples using the `resample_data` utility from HWM. This reduces the dataset size while maintaining its statistical properties.

## 🛠️ Data Preprocessing

Preprocess the data by scaling numerical features using `StandardScaler` and encoding categorical variables using `OneHotEncoder`. This prepares the data for efficient modeling.

## 🔀 Splitting the Data into Training and Testing Sets

Split the preprocessed data into training and testing sets, ensuring that the split is stratified based on the target variable to maintain class distribution.

## 🔧 Defining a Custom ReLU Transformer

Define a custom transformer `ReLUTransformer` that applies the ReLU activation function. This transformer will be used in the Hammerstein-Wiener classifier to introduce nonlinearity.

## 🏋️‍♂️ Initializing the Hammerstein-Wiener Classifier

Initialize the `HammersteinWienerClassifier` with the custom ReLU transformers and specified parameters such as dependency order (`p`), loss function, optimizer, learning rate, and early stopping.

## 🏋️‍♀️ Training the Hammerstein-Wiener Classifier

Train the initialized Hammerstein-Wiener classifier using the training data.

## 🎛️ Hyperparameter Tuning with RandomizedSearchCV

Perform hyperparameter tuning using `RandomizedSearchCV` to identify the best combination of parameters that maximize the classifier's performance. This involves searching over parameters like dependency order, batch size, optimizer type, learning rate, and maximum iterations.

## 📊 Evaluating the Hammerstein-Wiener Classifier

Use the best estimator from the hyperparameter tuning to make predictions on the test set. Evaluate the model's performance using metrics such as accuracy, prediction stability score (PSS), and time-weighted accuracy (TWA).

## 🖼️ Plotting Results

Define a function `plot_results` to visualize the Confusion Matrix and ROC Curve for the models. This aids in understanding the classifier's performance in more detail.

## 📈 Plotting Hammerstein-Wiener Classifier Results

Visualize the performance of the Hammerstein-Wiener classifier using the `plot_results` function.

## 🧠 Defining and Training the LSTM Model

Implement an LSTM neural network to handle sequence-based data. This involves creating input sequences, building and compiling the LSTM model, and training it using the training sequences.

## 🏗️ Building the LSTM Model

Build and compile the LSTM model using TensorFlow's Keras API, specifying the architecture and compilation parameters.

## 🏋️‍♂️ Training the LSTM Model

Train the LSTM model using the training sequences, incorporating early stopping to prevent overfitting.

## 📊 Evaluating the LSTM Model

Evaluate the trained LSTM model on the test sequences and compute relevant metrics such as accuracy, prediction stability score (PSS), and time-weighted accuracy (TWA).

## 📈 Plotting LSTM Model Results

Visualize the performance of the LSTM model using the `plot_results` function.

## 🆚 Comparing ROC Curves Between Models

Compare the ROC curves of both the Hammerstein-Wiener classifier and the LSTM model to assess their performance relative to each other.

## 📝 Summary of Results

Print a summary of the performance metrics of both models for easy comparison.

---

# 📚 Additional Resources

For more examples and detailed explanations, refer to the [HWM Documentation](https://hwm.readthedocs.io/en/latest/).

# 📄 Conclusion

This notebook showcases how to leverage the HWM toolkit alongside traditional machine learning models like LSTM to perform adaptive dynamic system modeling. By following the structured workflow, users can efficiently preprocess data, train sophisticated models, perform hyperparameter tuning, and evaluate model performance using advanced metrics and visualization techniques.

Feel free to explore and extend this example to suit your specific machine learning and data analysis needs.