# Model Training and Evaluation

In this notebook, we train and evaluate multiple classifiers on the preprocessed Bot-IoT and TON-IoT Modbus datasets.  

We will train:
- Random Forest
- XGBoost
- LightGBM

We will evaluate models using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix

Finally, we will compare all models in a summary table.


## Step 2: Import Required Libraries

We need libraries for:
- Data manipulation
- Model training
- Metrics evaluation
- Plotting


In [8]:
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
import os
import joblib
!pip install xgboost lightgbm


ModuleNotFoundError: No module named 'xgboost'

##  Load Preprocessed Dataset Splits

We will load the preprocessed `train` and `test` splits for both datasets (`Bot-IoT` and `TON-IoT-Modbus`) saved as `.npy` files.


In [None]:
splits_dir = r"C:\Users\User\IIoT_IDS_Project\data\splits"

datasets = {
    "bot-iot": {
        "X_train": os.path.join(splits_dir, "X_train_bot-iot.npy"),
        "X_test": os.path.join(splits_dir, "X_test_bot-iot.npy"),
        "y_train": os.path.join(splits_dir, "y_train_bot-iot.npy"),
        "y_test": os.path.join(splits_dir, "y_test_bot-iot.npy")
    },
    "ton-iot-modbus": {
        "X_train": os.path.join(splits_dir, "X_train_ton-iot-modbus.npy"),
        "X_test": os.path.join(splits_dir, "X_test_ton-iot-modbus.npy"),
        "y_train": os.path.join(splits_dir, "y_train_ton-iot-modbus.npy"),
        "y_test": os.path.join(splits_dir, "y_test_ton-iot-modbus.npy")
    }
}

# Test load one dataset
X_train_test = np.load(datasets["bot-iot"]["X_train"])
y_train_test = np.load(datasets["bot-iot"]["y_train"])
print(f"Bot-IoT X_train shape: {X_train_test.shape}, y_train shape: {y_train_test.shape}")


## Define Models

We define three classifiers to train:
- Random Forest
- XGBoost
- LightGBM


In [None]:
models = {
    "RandomForest": RandomForestClassifier(n_estimators=100, random_state=42),
    "XGBoost": XGBClassifier(use_label_encoder=False, eval_metric="logloss", random_state=42),
    "LightGBM": LGBMClassifier(random_state=42)
}


## Train and Evaluate Each Model

For each dataset and each model:
1. Train the model on training data
2. Predict on test data
3. Calculate metrics: Accuracy, Precision, Recall, F1-score
4. Display Classification Report
5. Plot Confusion Matrix
6. Save trained model
7. Store metrics for summary table


In [None]:
summary_results = []

for dataset_name, paths in datasets.items():
    print(f"\n\n=== Dataset: {dataset_name} ===")

    # Load preprocessed splits
    X_train = np.load(paths["X_train"])
    X_test = np.load(paths["X_test"])
    y_train = np.load(paths["y_train"])
    y_test = np.load(paths["y_test"])

    for model_name, model in models.items():
        print(f"\n--- Model: {model_name} ---")

        # Train model
        model.fit(X_train, y_train)

        # Predict
        y_pred = model.predict(X_test)

        # Metrics
        acc = accuracy_score(y_test, y_pred)
        prec = precision_score(y_test, y_pred, zero_division=0)
        rec = recall_score(y_test, y_pred, zero_division=0)
        f1 = f1_score(y_test, y_pred, zero_division=0)

        print(f"Accuracy:  {acc:.4f}")
        print(f"Precision: {prec:.4f}")
        print(f"Recall:    {rec:.4f}")
        print(f"F1-score:  {f1:.4f}")

        # Classification report
        print("\nClassification Report:")
        print(classification_report(y_test, y_pred, zero_division=0))

        # Confusion matrix
        cm = confusion_matrix(y_test, y_pred)
        plt.figure(figsize=(5,4))
        sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
        plt.title(f"{dataset_name} - {model_name} Confusion Matrix")
        plt.xlabel("Predicted")
        plt.ylabel("Actual")
        plt.show()

        # Save model
        model_dir = os.path.join(r"C:\Users\User\IIoT_IDS_Project\models")
        os.makedirs(model_dir, exist_ok=True)
        joblib.dump(model, os.path.join(model_dir, f"{dataset_name}_{model_name}.joblib"))

        # Append to summary
        summary_results.append({
            "Dataset": dataset_name,
            "Model": model_name,
            "Accuracy": acc,
            "Precision": prec,
            "Recall": rec,
            "F1-score": f1
        })


## Summary Table

We display a summary table of metrics for all models and datasets for easy comparison.


In [None]:
summary_df = pd.DataFrame(summary_results)
display(summary_df)
