# Notebook 02: Model Development (Supervised & Unsupervised)

**Scope.** Build, tune, and evaluate supervised and unsupervised baselines for the FL-IDS (IIoT surveillance). Results here feed later federated experiments and the thesis comparison tables. This notebook reads the processed datasets created earlier and saves reproducible artifacts (metrics, models). It does not upload any data to the repository.


## Objectives and Structure

**Objectives**
- Train and evaluate supervised baselines (Logistic Regression, SGD Classifier, Random Forest).
- Train and evaluate unsupervised baselines (Isolation Forest, Autoencoder).
- Record accuracy, precision, recall, F1, FP/FN (rates and counts), model sizes, and timing.
- Save artifacts for later use (metrics CSVs, model binaries, thresholds).

**Structure**
1) Supervised model development (with and without SMOTE)  
2) Unsupervised model development (Isolation Forest tuning, Autoencoder + threshold tuning)  
3) Final summary and export of a combined comparison table


## Reproducibility and Output Folders

- All experiments use a fixed `SEED` for `random`, `numpy`, and model initializers when supported.
- Output folders (created automatically) keep models and metrics separate for clarity:
  - `results/models/supervised/{no_smote|with_smote}/`
  - `results/models/unsupervised/`
  - `results/*.csv` (experiment metrics and summaries)


## 1. Supervised Model Development & Evaluation

We evaluate three classifiers on two data variants:
- **No-SMOTE**: original 80/20 stratified split  
- **With-SMOTE**: same split, then SMOTE applied **only on the training set**

Features are standardised (`StandardScaler`) per variant to avoid leakage between sets.
Metrics are computed on the untouched test set. Models are saved for size measurement and reuse.


In [4]:
# Importing the necessary libraries and Setting Global Random Seed in order to have the work reproducable
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import time
import os 
import joblib
import random
from sklearn.model_selection import train_test_split
from sklearn.metrics import (accuracy_score, precision_score, recall_score, f1_score, confusion_matrix,classification_report)
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE

#Setting the random seed for reproducability purposes
SEED = 42
random.seed(SEED)
np.random.seed(SEED)

#Define the model output Directory
os.makedirs('results/models/baselines/supervised', exist_ok = True)
print("Libraries imported, Random Seed set, all good")


Libraries imported, Random Seed set, all good


### 1.1 Data Loading & Preprocessing (No-SMOTE and With-SMOTE)

- Read `no_smote/train.csv` and `test.csv`, then standardise features with a scaler **fit on the training data only**.  
- Read `with_smote/train.csv` and `test.csv`, then standardise features again (separate scaler).  
- Target column is `Attack_label`. Non-numeric artifacts from earlier steps have already been removed or encoded.


In [6]:
#Loadind the Preprocessed Data (for No SMOTE)
no_smote_path = r"D:\August-Thesis\FL-IDS-Surveillance\data\processed\surv_supervised\80_20\no_smote"
train_no_smote = pd.read_csv(f"{no_smote_path}\\train.csv", low_memory= False)
test_no_smote = pd.read_csv(f"{no_smote_path}\\test.csv", low_memory = False)

X_train_ns = train_no_smote.drop(columns = ['Attack_label'])
y_train_ns = train_no_smote['Attack_label']

X_test_ns = test_no_smote.drop(columns = ['Attack_label'])
y_test_ns = test_no_smote['Attack_label']

#Checking
print(f"No SMote - Train: {X_train_ns.shape}, and for Testing: {X_test_ns.shape}")


No SMote - Train: (1775067, 42), and for Testing: (443767, 42)


In [7]:
#Loading the Preprocessed Data ( for SMOTE version)
with_smote_path = r"D:\August-Thesis\FL-IDS-Surveillance\data\processed\surv_supervised\80_20\with_smote"
train_with_smote = pd.read_csv(f"{with_smote_path}\\train.csv", low_memory = False)
test_with_smote = pd.read_csv(f"{with_smote_path}\\test.csv", low_memory = False)

X_train_ws = train_with_smote.drop(columns=['Attack_label'])
y_train_ws = train_with_smote["Attack_label"]

X_test_ws = test_with_smote.drop(columns=["Attack_label"])
y_test_ws = test_with_smote["Attack_label"]

#Checking
print(f"With SMOTE- Train: {X_train_ws.shape}, Test: {X_test_ws.shape}")

With SMOTE- Train: (2585028, 42), Test: (443767, 42)


In [9]:
#Using the StandardScaler for both datasets
scaler_ns = StandardScaler()
X_train_ns_scaled = scaler_ns.fit_transform(X_train_ns)
X_test_ns_scaled = scaler_ns.transform(X_test_ns)

scaler_ws = StandardScaler()
X_train_ws_scaled = scaler_ws.fit_transform(X_train_ws)
X_test_ws_scaled = scaler_ws.transform(X_test_ws)

print('Feature Scaling : Applied successfully')

Feature Scaling : Applied successfully


### 1.2 Models and Training Setup

We train the following baselines with sensible defaults:
- **Logistic Regression** (`max_iter=1000`)
- **SGD Classifier** (linear baseline)
- **Random Forest** (parallel, `n_jobs=-1`)

For each model and data variant:
- Fit on the training split, predict on the test split.
- Record metrics, FP/FN counts, FP/FN rates, model size (MB), train time (s), and inference time (ms/sample).
- Save the fitted model under `results/models/supervised/{no_smote|with_smote}/`.


In [12]:
def train_and_evaluate(model, model_name, X_train, y_train, X_test, y_test, save_dir):
    results = {}

    #Training Phase
    start_train = time.time()
    model.fit(X_train, y_train)
    train_time = time.time() - start_train

    #Predicting Phase
    start_test = time.time()
    y_pred = model.predict(X_test)
    test_time = time.time() - start_test
    inference_time_per_sample = test_time / len(X_test)

    #Defining the metrics
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
    fp_rate = 100 * fp / (fp + tn)
    fn_rate = 100 * fn / (fn + tp)

    #Saving Model for measuring the size and possibly using later on ...etc
    os.makedirs(save_dir, exist_ok = True)
    save_path = os.path.join(save_dir, f"{model_name}.pkl")
    joblib.dump(model, save_path)
    model_size_mb = os.path.getsize(save_path) / (1024 * 1024)

    #Collecting results for the thesis
    results.update({
        'Model': model_name,
        'Accuracy': acc,
        'Precision': prec,
        'Recall': rec,
        'F1 Score': f1,
        'False Positives': fp,
        'False Negatives': fn,
        'FP Rate (%)': fp_rate,
        'FN Rate (%)': fn_rate,
        'Model Size (MB)': model_size_mb,
        'Train Time (s)': train_time,
        'Inference Time (ms/sample)': inference_time_per_sample * 1000
     })
    return results

In [13]:
def run_expirenments (models_dict, X_train, y_train, X_test, y_test, save_dir):
    all_resluts = []
    for name, model in models_dict.items():
        print(f"Training {name} in progress . . .")
        result = train_and_evaluate(model, name, X_train, y_train, X_test, y_test, save_dir)
        all_resluts.append(result)
    return pd.DataFrame(all_resluts)

In [14]:
models_to_train = {
    'Logistic_Regressin' : LogisticRegression(max_iter = 1000, random_state = SEED),
    'SGD Classifier' : SGDClassifier(random_state = SEED),
    'Random_Forest' : RandomForestClassifier(n_jobs = -1, random_state = SEED)
}

results_no_smote = run_expirenments(
    models_to_train,
    X_train_ns_scaled, y_train_ns,
    X_test_ns_scaled, y_test_ns,
    "results/models/supervised/no_smote"
)

results_no_smote.to_csv("results/supervised_results_no_smote.csv", index = False)
results_no_smote.style.highlight_max(subset=["Accuracy", "Precision", "Recall", "F1 Score"], color="lightgreen", axis=0)

Training Logistic_Regressin in progress . . .
Training SGD Classifier in progress . . .
Training Random_Forest in progress . . .


Unnamed: 0,Model,Accuracy,Precision,Recall,F1 Score,False Positives,False Negatives,FP Rate (%),FN Rate (%),Model Size (MB),Train Time (s),Inference Time (ms/sample)
0,Logistic_Regressin,0.982667,0.996414,0.939621,0.967184,408,7284,0.126265,6.037899,0.001143,9.686551,9.8e-05
1,SGD Classifier,0.975415,0.985634,0.923018,0.953299,1623,9287,0.502276,7.698238,0.001386,5.191004,7.8e-05
2,Random_Forest,0.997821,0.999449,0.992531,0.995978,66,901,0.020425,0.746863,1.474755,50.454203,0.001066


In [15]:
#With SMOTE Applied now
models_to_train_smote = {
    'Logistic_Regression_SMOTE' : LogisticRegression(max_iter = 1000, random_state = SEED),
    'SGD Classifier_SMOTE' : SGDClassifier(random_state = SEED),
    'Random_Forest_SMOTE' : RandomForestClassifier(n_jobs = -1, random_state = SEED)
}

results_ws_smote = run_expirenments(
    models_to_train_smote,
    X_train_ws_scaled, y_train_ws,
    X_test_ws_scaled, y_test_ws,
    "results/models/supervised/with_smote"
)

results_ws_smote.to_csv("results/supervised_results_with_smote.csv", index = False)
results_ws_smote.style.highlight_max(subset=["Accuracy", "Precision", "Recall", "F1 Score"], color="lightgreen", axis=0).highlight_min(subset=["FP Rate (%)",
                                                                                                                                               "FN Rate (%)",
                                                                                                                                               "Train Time (s)",
                                                                                                                                               "Model Size (MB)",
                                                                                                                                               "Inference Time (ms/sample)"],
                                                                                                                                       color="lightblue",
                                                                                                                                       axis=0
                                                                                                                                       )

Training Logistic_Regression_SMOTE in progress . . .
Training SGD Classifier_SMOTE in progress . . .
Training Random_Forest_SMOTE in progress . . .


Unnamed: 0,Model,Accuracy,Precision,Recall,F1 Score,False Positives,False Negatives,FP Rate (%),FN Rate (%),Model Size (MB),Train Time (s),Inference Time (ms/sample)
0,Logistic_Regression_SMOTE,0.970248,0.9361,0.955802,0.945848,7871,5332,2.435869,4.419835,0.001143,10.738022,0.000158
1,SGD Classifier_SMOTE,0.966852,0.934644,0.944081,0.939339,7964,6746,2.46465,5.591936,0.001386,5.864219,7.6e-05
2,Random_Forest_SMOTE,0.995906,0.985162,1.0,0.992525,1817,0,0.562314,0.0,2.156518,69.797611,0.001177


## 2. Unsupervised Baselines

This section builds unsupervised anomaly detectors that learn from **normal-only** data and are evaluated on a **mixed** test set.

- **Isolation Forest (IF):** tree-based outlier detector with a small hyperparameter sweep.
- **Autoencoder (AE):** reconstructs normal traffic; anomalies have higher reconstruction error.
- **Threshold tuning:** converts AE reconstruction errors into binary predictions.
- **Outputs:** metrics CSVs, trained models, scalers, and the chosen AE threshold.


### 2.1 Data for Unsupervised Models

- **Train (normal-only):** `train_normal_only.csv`
- **Test (mixed):** `test_mixed.csv` (contains both normal and attack)
- The label column (`Attack_label`) is used **only** for evaluation on the mixed test set.


In [16]:
#Loading the Data for Unsupervised Models
# Normal-only training data
train_unsup = pd.read_csv(r"D:\August-Thesis\FL-IDS-Surveillance\data\processed\surv_unsupervised\train_normal_only.csv", low_memory = False)

X_train_unsup = train_unsup.drop(columns=["Attack_label", "http.request.method"])
y_train_unsup = train_unsup["Attack_label"]

#Mixed test data
test_unsup = pd.read_csv(r"D:\August-Thesis\FL-IDS-Surveillance\data\processed\surv_unsupervised\test_mixed.csv", low_memory = False)
X_test_unsup = test_unsup.drop(columns=["Attack_label", "http.request.method"])
y_test_unsup = test_unsup["Attack_label"]

print(f"Train (Normal Only): {X_train_unsup.shape}, Test (Mixed): {X_test_unsup.shape}")

Train (Normal Only): (1615643, 41), Test (Mixed): (2218834, 41)


In [17]:
from sklearn.preprocessing import StandardScaler, MinMaxScaler
#Standard Scaler for the Isolation Forest
scaler_iforest = StandardScaler()
X_train_if_scaled = scaler_iforest.fit_transform(X_train_unsup)
X_test_if_scaled = scaler_iforest.transform(X_test_unsup)


#MinMax Scaler for the Autoencoder
scaler_autoencoder = MinMaxScaler()
X_train_ae_scaled = scaler_autoencoder.fit_transform(X_train_unsup)
X_test_ae_scaled = scaler_autoencoder.transform(X_test_unsup)


print("Scaling is complete for the Isolation Forest and Autoencoder.")

Scaling is complete for the Isolation Forest and Autoencoder.


### 2.2 Isolation Forest — Tuning and Evaluation

**Search space (lightweight):**
- `contamination ∈ {0.01 … 0.10}`
- `max_samples ∈ {256, 1000, 10000, 0.05, 0.1, 'auto', len(train)}`

**Protocol**
1. Fit IF on **normal-only** train.
2. Score the **mixed** test; thresholding is inherent via `decision_function` or `predict`.
3. Record: Accuracy, Precision, Recall, F1, FP/FN rates, model size (MB), fit time (s), inference time (ms/sample).
4. Save the **best** IF model and full results table.

**Artifacts**
- Models: `results/models/unsupervised/isoforest/*.pkl`
- Metrics: `results/unsupervised_isoforest_results.csv`


In [19]:
from sklearn.ensemble import IsolationForest
import joblib
import os
import time
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Isolation Forest: Training + Tuning & Evaluation
contamination_values = np.linspace(0.01, 0.10, 10)
max_samples_values = [256, 1000, 10000, 0.05, 0.1, 'auto', len(X_train_if_scaled)]

tuned_results = []
best_f1 = 0
best_model = None
best_params = {}

for contamination in contamination_values:
    for max_samples in max_samples_values:
        try:
            # The Model Training
            model = IsolationForest(
                n_estimators=100,
                contamination=contamination,
                max_samples=max_samples,
                random_state=SEED
            )
            start_train = time.time()
            model.fit(X_train_if_scaled)
            train_time = time.time() - start_train

            #The Prediction
            start_test = time.time()
            y_pred_raw = model.predict(X_test_if_scaled)
            # Convert: 1 = normal → 0, -1 = anomaly → 1
            y_pred = np.where(y_pred_raw == 1, 0, 1)
            test_time = time.time() - start_test
            inference_time_per_sample = test_time / len(X_test_if_scaled)

            # Metrics for Evaluation
            acc = accuracy_score(y_test_unsup, y_pred)
            prec = precision_score(y_test_unsup, y_pred, zero_division=0)
            rec = recall_score(y_test_unsup, y_pred, zero_division=0)
            f1 = f1_score(y_test_unsup, y_pred, zero_division=0)
            tn, fp, fn, tp = confusion_matrix(y_test_unsup, y_pred).ravel()
            fp_rate = 100 * fp / (fp + tn)
            fn_rate = 100 * fn / (fn + tp)

            # Saving the Best Model
            if f1 > best_f1:
                best_f1 = f1
                best_model = model
                best_params = {
                    'contamination': contamination,
                    'max_samples': max_samples,
                    'metrics': {
                        'Accuracy': acc,
                        'Precision': prec,
                        'Recall': rec,
                        'F1 Score': f1,
                        'False Positives': fp,
                        'False Negatives': fn,
                        'FP Rate (%)': fp_rate,
                        'FN Rate (%)': fn_rate,
                        'Train Time (s)': train_time,
                        'Inference Time (ms/sample)': inference_time_per_sample * 1000
                    }
                }

            # Logging this run
            tuned_results.append({
                'Contamination': contamination,
                'Max Samples': max_samples,
                'Accuracy': acc,
                'Precision': prec,
                'Recall': rec,
                'F1 Score': f1,
                'False Positives': fp,
                'False Negatives': fn,
                'FP Rate (%)': fp_rate,
                'FN Rate (%)': fn_rate,
                'Train Time (s)': train_time,
                'Inference Time (ms/sample)': inference_time_per_sample * 1000
            })

        except Exception as e:
            print(f"Skipped params contamination={contamination}, max_samples={max_samples} due to error: {e}")

# Converting the results to DataFrame
df_iforest_tuning = pd.DataFrame(tuned_results)
df_iforest_tuning.to_csv("results/unsupervised_iforest_tuning_results.csv", index=False)

# Saving the Best Model
os.makedirs("results/models/unsupervised", exist_ok=True)
joblib.dump(best_model, "results/models/unsupervised/isolation_forest_best.pkl")

# Reporting the Best Model Metrics
pd.DataFrame([{
    **best_params['metrics'],
    'Contamination': best_params['contamination'],
    'Max Samples': best_params['max_samples']
}])


Unnamed: 0,Accuracy,Precision,Recall,F1 Score,False Positives,False Negatives,FP Rate (%),FN Rate (%),Train Time (s),Inference Time (ms/sample),Contamination,Max Samples
0,0.905499,0.876375,0.759517,0.813773,64626,145057,4.000017,24.04827,111.180085,0.012598,0.04,1615643


In [21]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
import joblib
import os
import time
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

# Autoencoder: Baseline + Threshold Tuning + Evaluation + Model Saving
def build_autoencoder(input_dim):
    input_layer = Input(shape=(input_dim,))
    encoded = Dense(32, activation='relu')(input_layer)
    encoded = Dense(16, activation='relu')(encoded)
    encoded = Dense(8, activation='relu')(encoded)
    decoded = Dense(16, activation='relu')(encoded)
    decoded = Dense(32, activation='relu')(decoded)
    output_layer = Dense(input_dim, activation='sigmoid')(decoded)

    autoencoder = Model(inputs=input_layer, outputs=output_layer)
    autoencoder.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
    return autoencoder

# Build and Train the Baseline Autoencoder
input_dim = X_train_ae_scaled.shape[1]
autoencoder = build_autoencoder(input_dim)

early_stop = EarlyStopping(
    monitor='loss',
    patience=5,
    restore_best_weights=True,
    verbose=0
)

start_train = time.time()
history = autoencoder.fit(
    X_train_ae_scaled, X_train_ae_scaled,
    epochs=50,
    batch_size=256,
    shuffle=True,
    callbacks=[early_stop],
    verbose=0
)
train_time = time.time() - start_train

# Reconstruct Test Set & Compute Errors
start_test = time.time()
X_test_reconstructed = autoencoder.predict(X_test_ae_scaled, verbose=0)
reconstruction_errors = np.mean(
    np.square(X_test_ae_scaled - X_test_reconstructed),
    axis=1
)
test_time = time.time() - start_test
inference_time_per_sample = test_time / len(X_test_ae_scaled)

# Threshold Tuning
baseline_threshold = np.percentile(reconstruction_errors, 95)
y_pred_baseline = (reconstruction_errors > baseline_threshold).astype(int)
tn, fp, fn, tp = confusion_matrix(y_test_unsup, y_pred_baseline).ravel()
fp_rate = 100 * fp / (fp + tn)
fn_rate = 100 * fn / (fn + tp)

# Save Model FIRST
os.makedirs("results/models/unsupervised", exist_ok=True)
autoencoder.save("results/models/unsupervised/autoencoder_baseline.h5")

# Now check Model Size
model_size_mb = round(
    os.path.getsize("results/models/unsupervised/autoencoder_baseline.h5") / (1024 * 1024),
    4
)

# Collect Baseline Metrics AFTER saving
baseline_metrics = {
    'Threshold': baseline_threshold,
    'Accuracy': accuracy_score(y_test_unsup, y_pred_baseline),
    'Precision': precision_score(y_test_unsup, y_pred_baseline),
    'Recall': recall_score(y_test_unsup, y_pred_baseline),
    'F1 Score': f1_score(y_test_unsup, y_pred_baseline),
    'False Positives': fp,
    'False Negatives': fn,
    'FP Rate (%)': fp_rate,
    'FN Rate (%)': fn_rate,
    'Model Size (MB)': model_size_mb,
    'Train Time (s)': train_time,
    'Inference Time (ms/sample)': inference_time_per_sample * 1000
}

# Threshold Sweep Tuning
best_f1 = 0
best_threshold = None
best_metrics = {}

for p in np.arange(80, 99.9, 0.1):
    threshold = np.percentile(reconstruction_errors, p)
    y_pred = (reconstruction_errors > threshold).astype(int)
    tn, fp, fn, tp = confusion_matrix(y_test_unsup, y_pred).ravel()
    fp_rate = 100 * fp / (fp + tn)
    fn_rate = 100 * fn / (fn + tp)
    f1 = f1_score(y_test_unsup, y_pred)

    if f1 > best_f1:
        best_f1 = f1
        best_threshold = threshold
        best_metrics = {
            'Threshold': threshold,
            'Accuracy': accuracy_score(y_test_unsup, y_pred),
            'Precision': precision_score(y_test_unsup, y_pred),
            'Recall': recall_score(y_test_unsup, y_pred),
            'F1 Score': f1,
            'False Positives': fp,
            'False Negatives': fn,
            'FP Rate (%)': fp_rate,
            'FN Rate (%)': fn_rate,
            'Model Size (MB)': model_size_mb,
            'Train Time (s)': train_time,
            'Inference Time (ms/sample)': inference_time_per_sample * 1000
        }

# Saving results
pd.DataFrame([baseline_metrics]).to_csv(
    "results/unsupervised_autoencoder_baseline_results.csv", index=False
)
pd.DataFrame([best_metrics]).to_csv(
    "results/unsupervised_autoencoder_tuned_results.csv", index=False
)

# Compare the results
pd.concat([
    pd.DataFrame([baseline_metrics]).assign(Version="Baseline"),
    pd.DataFrame([best_metrics]).assign(Version="Tuned")
]).set_index("Version")




Unnamed: 0_level_0,Threshold,Accuracy,Precision,Recall,F1 Score,False Positives,False Negatives,FP Rate (%),FN Rate (%),Model Size (MB),Train Time (s),Inference Time (ms/sample)
Version,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Baseline,2004250.0,0.778149,1.0,0.183924,0.310702,0,492250,0.0,81.60765,0.0874,154.514615,0.032508
Tuned,0.001186191,0.908162,0.981317,0.675027,0.799852,7752,196021,0.479809,32.497335,0.0874,154.514615,0.032508


In [22]:
import os
import pandas as pd

#Checking results directory exists (in case it's missing)
os.makedirs("results", exist_ok=True)

# Load  the previously generated CSVs with the results
supervised_no_smote = pd.read_csv(os.path.join("results", "supervised_results_no_smote.csv"))
supervised_with_smote = pd.read_csv(os.path.join("results", "supervised_results_with_smote.csv"))
unsupervised_iforest = pd.read_csv(os.path.join("results", "unsupervised_iforest_tuning_results.csv"))
unsupervised_ae_baseline = pd.read_csv(os.path.join("results", "unsupervised_autoencoder_baseline_results.csv"))
unsupervised_ae_tuned = pd.read_csv(os.path.join("results", "unsupervised_autoencoder_tuned_results.csv"))

# Adding the category column in order to identify models
supervised_no_smote["Category"] = "Supervised - No SMOTE"
supervised_with_smote["Category"] = "Supervised - With SMOTE"
unsupervised_iforest["Category"] = "Unsupervised - Isolation Forest Tuning"
unsupervised_ae_baseline["Category"] = "Unsupervised - Autoencoder Baseline"
unsupervised_ae_tuned["Category"] = "Unsupervised - Autoencoder Tuned"

# Align the columns for Isolation Forest
unsupervised_iforest = unsupervised_iforest.rename(columns={
    "F1 Score": "F1 Score",
    "Accuracy": "Accuracy",
    "Precision": "Precision",
    "Recall": "Recall",
    "False Positives": "False Positives",
    "False Negatives": "False Negatives",
    "FP Rate (%)": "FP Rate (%)",
    "FN Rate (%)": "FN Rate (%)",
    "Train Time (s)": "Train Time (s)",
    "Inference Time (ms/sample)": "Inference Time (ms/sample)"
}).assign(Model="IsolationForest (Tuned)")

# Add model column for autoencoder versions
unsupervised_ae_baseline["Model"] = "Autoencoder (Baseline)"
unsupervised_ae_tuned["Model"] = "Autoencoder (Tuned)"

# Filter Isolation Forest to keep only the best tuned model (highest F1)
best_iforest = unsupervised_iforest.sort_values(by="F1 Score", ascending=False).head(1)

# Combine all the datasets
final_summary_df = pd.concat([
    supervised_no_smote,
    supervised_with_smote,
    best_iforest,
    unsupervised_ae_baseline,
    unsupervised_ae_tuned
], ignore_index=True)

# Save the combined summary
final_summary_path = os.path.join("results", "final_combined_model_results_summary.csv")
final_summary_df.to_csv(final_summary_path, index=False)


In [23]:
#Reading the results so far

final_results_path = os.path.join("results", "final_combined_model_results_summary.csv")
final_results = pd.read_csv(final_results_path)

# Drop unnecessary columns
columns_to_drop = [
    "False Positives", "False Negatives", "Category",
    "Contamination", "Max Samples", "Threshold"
]
final_results_display = final_results.drop(columns=columns_to_drop, errors="ignore")

# Display with highlighting
styled_results = (
    final_results_display.style
    .highlight_max(
        subset=["Accuracy", "Precision", "Recall", "F1 Score"],
        color="lightgreen",
        axis=0
    )
    .highlight_min(
        subset=[
            "FP Rate (%)", "FN Rate (%)", "Train Time (s)",
            "Model Size (MB)", "Inference Time (ms/sample)"
        ],
        color="lightblue",
        axis=0
    )
)

styled_results


Unnamed: 0,Model,Accuracy,Precision,Recall,F1 Score,FP Rate (%),FN Rate (%),Model Size (MB),Train Time (s),Inference Time (ms/sample)
0,Logistic_Regressin,0.982667,0.996414,0.939621,0.967184,0.126265,6.037899,0.001143,9.686551,9.8e-05
1,SGD Classifier,0.975415,0.985634,0.923018,0.953299,0.502276,7.698238,0.001386,5.191004,7.8e-05
2,Random_Forest,0.997821,0.999449,0.992531,0.995978,0.020425,0.746863,1.474755,50.454203,0.001066
3,Logistic_Regression_SMOTE,0.970248,0.9361,0.955802,0.945848,2.435869,4.419835,0.001143,10.738022,0.000158
4,SGD Classifier_SMOTE,0.966852,0.934644,0.944081,0.939339,2.46465,5.591936,0.001386,5.864219,7.6e-05
5,Random_Forest_SMOTE,0.995906,0.985162,1.0,0.992525,0.562314,0.0,2.156518,69.797611,0.001177
6,IsolationForest (Tuned),0.905499,0.876375,0.759517,0.813773,4.000017,24.04827,,111.180085,0.012598
7,Autoencoder (Baseline),0.778149,1.0,0.183924,0.310702,0.0,81.60765,0.0874,154.514615,0.032508
8,Autoencoder (Tuned),0.908162,0.981317,0.675027,0.799852,0.479809,32.497335,0.0874,154.514615,0.032508


# Final Model Comparison Summary

This final comparison combines the performance of both **Supervised** and **Unsupervised** models trained on the *Edge-IIoT Surveillance Dataset*.

## Key Findings from the Experiments

- **Random Forest (Supervised + SMOTE)** consistently achieved the highest **Recall** and **F1 Score** with low **FP** and **FN** rates, making it the best candidate for centralized deployment.  
- **Autoencoder (Tuned)** provided solid anomaly detection performance with high **Precision** and competitive **Recall**, making it a suitable choice for **Federated Unsupervised IDS**.  
- **Isolation Forest (Tuned)** showed decent recall but suffered from higher **FP rates**, confirming earlier observations.  
- **Model Sizes** for both Autoencoder and Isolation Forest are lightweight (< 1 MB), supporting their use on edge devices.  
- **Training and Inference Times** remain acceptable across all models, with supervised models training faster.  