## Introduction

The rapid proliferation of Industrial Internet of Things (IIoT) devices has revolutionized industrial automation, enabling enhanced connectivity, real-time monitoring, and efficient resource management in latency-sensitive and resource-constrained environments. However, this expanded attack surface has made IIoT infrastructures increasingly vulnerable to Distributed Denial-of-Service (DDoS) attacks, which can disrupt critical operations, cause significant downtime, and lead to substantial economic losses. Traditional intrusion detection methods often fail to achieve the required balance between high detection accuracy, low computational overhead, and rapid inference time needed for edge deployment in IIoT settings. To address these challenges, this project proposes a novel transfer learning-based framework utilizing a lightweight Convolutional Neural Network (CNN) with Convolutional Block Attention Module (CBAM) and feature projection for domain adaptation. The model is sequentially pre-trained on KDDCup99 (source domain), fine-tuned on CIC-DDoS2019 (intermediate modern domain), and finally adapted on Edge-IIoTset (target IIoT domain) to effectively bridge the domain shift from legacy to modern IIoT-specific traffic patterns while consistently achieving high accuracy

## Project Summary

The proposed model employs robust scaling and feature projection to handle varying input dimensions across datasets, enabling efficient knowledge transfer in a lightweight CNN architecture enhanced with CBAM attention for temporal feature focus. The sequential transfer learning pipeline consists of initial pre-training on KDDCup99 for foundational low-level feature extraction, intermediate fine-tuning on CIC-DDoS2019 for adaptation to contemporary reflection and amplification attacks, and final adaptation on Edge-IIoTset as the target IIoT domain with full layer unfreezing for specialisation to real-world IIoT traffic. This approach achieves exceptional detection performance, reaching 99.86% test accuracy with 99.78% F1-score on Edge-IIoTset, while maintaining an ultra-lightweight model i.e ~44k parameters, ~0.18 MB serialized, and sub-millisecond inference latency (0.573 ms average), making it suitable for deployment on resource-constrained IIoT edge devices. Comprehensive visualizations—including training curves, confusion matrices, cross-dataset accuracy comparisons, and inference latency distributions, highlight the framework's robustness, interpretability, and superiority over non-transfer baselines. By leveraging domain adaptation through multi-dataset sequential transfer learning and attention mechanisms, this work advances real-time DDoS mitigation in IIoT, which offers a practical, high-performance solution for secure industrial cyber-physical systems.

In [None]:
# Cell 1: All dependencies and setup

print("Cell 1: Loading all dependencies and setup")

import gc
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torch import optim
from pathlib import Path
from sklearn.preprocessing import RobustScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_curve, auc, precision_recall_curve
from sklearn.manifold import TSNE
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

# Device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Paths (as specified)
base_path = Path(r"F:\jupyter\kagglehub")  # Adjust if needed
paths = {
    'kddcup99': base_path / r"datasets\ericzs\kddcup99\versions",
    'cic_ddos': base_path / r"datasets\dhoogla\cicddos2019\versions\3",
    'edge_iot': base_path / r"edgeiiotset-cyber-security-dataset-of-iot-iiot\versions\5\Edge-IIoTset dataset\Selected dataset for ML and DL"
}

print("Cell 1 completed: All dependencies loaded and paths configured")

In [None]:
# Cell 2: Load datasets, without downsampling on any dataset

print("Cell 2: Loading datasets")

dataset_results = {}

def map_to_binary(label):
    if pd.isna(label): return 0
    s = str(label).lower().replace('_', '').replace(' ', '').replace('-', '')
    return 0 if any(k in s for k in ['normal', 'benign', '0']) else 1

for name, root in paths.items():
    print(f"\nLoading {name.upper()}...")
    files = list(root.rglob("*.csv")) + list(root.rglob("*.parquet"))
    dfs = []
    for f in files:
        try:
            df = pd.read_parquet(f) if f.suffix == '.parquet' else pd.read_csv(f, low_memory=False)
            print(f"   Loaded {f.name} → {len(df):,} rows")
            dfs.append(df)
            del df
            gc.collect()
        except Exception as e:
            print(f"   Failed {f.name}: {e}")
    
    df = pd.concat(dfs, ignore_index=True)
    print(f"   Total rows: {len(df):,}")

    label_col = next((c for c in ['Label', 'label', 'Attack_type', 'Attack', 'class'] if c in df.columns), df.columns[-1])
    df['target'] = df[label_col].apply(map_to_binary)
    attack_ratio = df['target'].mean()
    print(f"   Attack ratio: {attack_ratio:.4%}")

    X_raw = df.select_dtypes(include=np.number)
    if label_col in X_raw.columns:
        X_raw = X_raw.drop(columns=[label_col])
    
    X = np.nan_to_num(X_raw.values, nan=0.0, posinf=0.0, neginf=0.0).astype(np.float32)
    y = df['target'].values.astype(np.int8)

    # NO downsampling – keep natural imbalance for all datasets
    print(f"   Keeping original distribution")

    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, stratify=y_temp, random_state=42)

    scaler = RobustScaler()
    X_train = scaler.fit_transform(X_train).astype(np.float32)
    X_val = scaler.transform(X_val).astype(np.float32)
    X_test = scaler.transform(X_test).astype(np.float32)

    # Class weights for weighted loss
    num_neg = (y_train == 0).sum()
    num_pos = (y_train == 1).sum()
    pos_weight = num_neg / num_pos if num_pos > 0 else 1.0

    dataset_results[name] = {
        'X_train': X_train, 'X_val': X_val, 'X_test': X_test,
        'y_train': y_train, 'y_val': y_val, 'y_test': y_test,
        'scaler': scaler,
        'feature_count': X_train.shape[1],
        'pos_weight': torch.tensor(pos_weight, dtype=torch.float32)
    }

    del df, X, y
    gc.collect()

print("Cell 2 completed: Natural imbalance kept")

In [None]:
# Cell 3: Lightweight CNN with CBAM Attention and Dataset Class

print("Cell 3: Defining model and dataset class")

class AttackDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X[..., np.newaxis], dtype=torch.float32)  # (N, features, 1) → (N, 1, features)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self): return len(self.y)
    def __getitem__(self, idx): return self.X[idx], self.y[idx]

class CBAM(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.channel_attn = nn.Sequential(
            nn.AdaptiveAvgPool1d(1),
            nn.Conv1d(channels, channels // 16, 1),
            nn.ReLU(),
            nn.Conv1d(channels // 16, channels, 1),
            nn.Sigmoid()
        )
        self.spatial_attn = nn.Sequential(
            nn.Conv1d(2, 1, 7, padding=3),
            nn.Sigmoid()
        )

    def forward(self, x):
        # Channel
        ca = self.channel_attn(x) * x
        # Spatial
        avg = torch.mean(ca, dim=1, keepdim=True)
        max_ = torch.max(ca, dim=1, keepdim=True)[0]
        sa = self.spatial_attn(torch.cat([avg, max_], dim=1)) * ca
        return sa

class LightweightCNN(nn.Module):
    def __init__(self, input_features):
        super().__init__()
        self.conv1 = nn.Conv1d(1, 32, 5, padding=2)
        self.bn1 = nn.BatchNorm1d(32)
        self.pool1 = nn.MaxPool1d(2)
        
        self.conv2 = nn.Conv1d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm1d(64)
        self.pool2 = nn.MaxPool1d(2)
        
        self.conv3 = nn.Conv1d(64, 128, 3, padding=1)
        self.bn3 = nn.BatchNorm1d(128)
        self.cbam = CBAM(128)
        
        self.global_pool = nn.AdaptiveAvgPool1d(1)
        self.fc1 = nn.Linear(128, 64)
        self.dropout = nn.Dropout(0.4)
        self.fc2 = nn.Linear(64, 2)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.pool1(x)
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.pool2(x)
        x = F.relu(self.bn3(self.conv3(x)))
        x = self.cbam(x)
        x = self.global_pool(x).squeeze(-1)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        return self.fc2(x)

def get_dataloaders(name):
    data = dataset_results[name]
    train_ds = AttackDataset(data['X_train'], data['y_train'])
    val_ds = AttackDataset(data['X_val'], data['y_val'])
    test_ds = AttackDataset(data['X_test'], data['y_test'])
    return (DataLoader(train_ds, batch_size=256, shuffle=True),
            DataLoader(val_ds, batch_size=512),
            DataLoader(test_ds, batch_size=512))

print("Cell 3 completed: Model with CBAM attention and dataloaders ready")

In [None]:
# Cell 4: Model, Dataset Class, Dataloaders, and Evaluation Function

print("Cell 4: Defining model, dataset, dataloaders, and evaluation function")

class AttackDataset(Dataset):
    def __init__(self, X, y):
        # X: (N, features)
        # Reshape to (N, 1, features) → channels=1, sequence=features
        self.X = torch.tensor(X, dtype=torch.float32).unsqueeze(1)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

class LightweightCNN(nn.Module):
    def __init__(self, input_features):
        super().__init__()
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=32, kernel_size=5, padding=2)
        self.bn1 = nn.BatchNorm1d(32)
        self.pool1 = nn.MaxPool1d(2)
        
        self.conv2 = nn.Conv1d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm1d(64)
        self.pool2 = nn.MaxPool1d(2)
        
        self.conv3 = nn.Conv1d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm1d(128)
        self.cbam = CBAM(128)
        
        self.global_pool = nn.AdaptiveAvgPool1d(1)
        self.fc1 = nn.Linear(128, 64)
        self.dropout = nn.Dropout(0.4)
        self.fc2 = nn.Linear(64, 2)

    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.pool1(x)
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.pool2(x)
        x = F.relu(self.bn3(self.conv3(x)))
        x = self.cbam(x)
        x = self.global_pool(x).squeeze(-1)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        return self.fc2(x)

def get_dataloaders(name):
    data = dataset_results[name]
    train_ds = AttackDataset(data['X_train'], data['y_train'])
    val_ds = AttackDataset(data['X_val'], data['y_val'])
    test_ds = AttackDataset(data['X_test'], data['y_test'])
    return (DataLoader(train_ds, batch_size=256, shuffle=True),
            DataLoader(val_ds, batch_size=512),
            DataLoader(test_ds, batch_size=512))

# Evaluation function (now defined here to avoid NameError)
def evaluate(model, loader, name=""):
    model.eval()
    y_true, y_pred, y_prob = [], [], []
    with torch.no_grad():
        for Xb, yb in loader:
            Xb, yb = Xb.to(device), yb.to(device)
            outputs = model(Xb)
            probs = torch.softmax(outputs, dim=1)[:,1]
            _, pred = torch.max(outputs, 1)
            y_true.extend(yb.cpu().numpy())
            y_pred.extend(pred.cpu().numpy())
            y_prob.extend(probs.cpu().numpy())
    
    acc = accuracy_score(y_true, y_pred)
    prec = precision_score(y_true, y_pred, zero_division=0)
    rec = recall_score(y_true, y_pred, zero_division=0)
    f1 = f1_score(y_true, y_pred, zero_division=0)
    
    print(f"{name} Accuracy: {acc:.4%} | Precision: {prec:.4%} | Recall: {rec:.4%} | F1: {f1:.4%}")
    
    # Confusion matrix
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(5,4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Benign','DDoS'], yticklabels=['Benign','DDoS'])
    plt.title(f'Confusion Matrix - {name}')
    plt.xlabel('Predicted')
    plt.ylabel('Actual')
    plt.show()
    
    return acc

print("Cell 4 completed: Model, dataset, dataloaders, and evaluate function ready")

In [None]:
# Cell 5: Pre-training on KDDCup99 with weighted loss

print("Cell 5: Pre-training on KDDCup99 with weighted loss")

input_features = dataset_results['kddcup99']['feature_count']
model = LightweightCNN(input_features).to(device)

train_loader, val_loader, test_loader = get_dataloaders('kddcup99')

# Weighted loss for imbalance
pos_weight = dataset_results['kddcup99']['pos_weight'].to(device)
criterion = nn.CrossEntropyLoss(weight=torch.tensor([1.0, pos_weight]))

optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

epochs = 10
best_acc = 0.0

for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for Xb, yb in train_loader:
        Xb, yb = Xb.to(device), yb.to(device)
        optimizer.zero_grad()
        outputs = model(Xb)
        loss = criterion(outputs, yb)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    model.eval()
    correct = total = 0
    with torch.no_grad():
        for Xb, yb in val_loader:
            outputs = model(Xb.to(device))
            _, pred = torch.max(outputs, 1)
            total += yb.size(0)
            correct += (pred == yb.to(device)).sum().item()
    
    val_acc = correct / total
    print(f"Epoch {epoch+1:02d}/10 | Loss: {running_loss/len(train_loader):.4f} | Val Accuracy: {val_acc:.4%}")
    
    if val_acc > best_acc:
        best_acc = val_acc
        torch.save(model.state_dict(), 'pretrained_kdd_model.pth')
        print(f"   >>> Best model saved ({val_acc:.4%})")

print(f"Pre-training complete. Best val: {best_acc:.4%}")
evaluate(model, test_loader, "KDDCup99 Test")
print("Cell 5 completed: Stable pre-trained model")

In [None]:
# Cell 6: Transfer to CIC-DDoS2019 with projection

print("Cell 6: Transfer to CIC-DDoS2019")

cic_features = dataset_results['cic_ddos']['feature_count']

pretrained = LightweightCNN(39).to(device)
pretrained.load_state_dict(torch.load('pretrained_kdd_model.pth'))

class ProjectedModel(nn.Module):
    def __init__(self, backbone, new_features):
        super().__init__()
        self.projection = nn.Linear(new_features, 39)
        self.backbone = backbone

    def forward(self, x):
        x = x.squeeze(1)
        x = self.projection(x)
        x = x.unsqueeze(1)
        return self.backbone(x)

model = ProjectedModel(pretrained, cic_features).to(device)

# Unfreeze later layers
for name, param in model.backbone.named_parameters():
    if 'conv3' in name or 'fc' in name or 'cbam' in name:
        param.requires_grad = True
    else:
        param.requires_grad = False

train_loader, val_loader, test_loader = get_dataloaders('cic_ddos')

optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=3e-4)

epochs = 20
best_acc = 0.0

for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for Xb, yb in train_loader:
        Xb, yb = Xb.to(device), yb.to(device)
        optimizer.zero_grad()
        outputs = model(Xb)
        loss = nn.CrossEntropyLoss()(outputs, yb)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    model.eval()
    correct = total = 0
    with torch.no_grad():
        for Xb, yb in val_loader:
            outputs = model(Xb.to(device))
            _, pred = torch.max(outputs, 1)
            total += yb.size(0)
            correct += (pred == yb.to(device)).sum().item()
    
    val_acc = correct / total
    print(f"Epoch {epoch+1:02d}/20 | Loss: {running_loss/len(train_loader):.4f} | Val Accuracy: {val_acc:.4%}")
    
    if val_acc > best_acc:
        best_acc = val_acc
        torch.save(model.state_dict(), 'transferred_cic_model.pth')

evaluate(model, test_loader, "CIC-DDoS2019 Test")
print("Cell 6 completed")

In [None]:
# Cell 7: Final transfer to Edge-IIoTset (Target IIoT)

print("Cell 7: Final transfer to Edge-IIoTset")

edge_features = dataset_results['edge_iot']['feature_count']

cic_backbone = LightweightCNN(39).to(device)
cic_model = ProjectedModel(cic_backbone, cic_features)
cic_model.load_state_dict(torch.load('transferred_cic_model.pth'))

model = ProjectedModel(cic_backbone, edge_features).to(device)

# Full unfreeze for target domain
for param in model.parameters():
    param.requires_grad = True

train_loader, val_loader, test_loader = get_dataloaders('edge_iot')

optimizer = optim.Adam(model.parameters(), lr=1e-4)

epochs = 25
best_acc = 0.0

for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for Xb, yb in train_loader:
        Xb, yb = Xb.to(device), yb.to(device)
        optimizer.zero_grad()
        outputs = model(Xb)
        loss = nn.CrossEntropyLoss()(outputs, yb)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    
    model.eval()
    correct = total = 0
    with torch.no_grad():
        for Xb, yb in val_loader:
            outputs = model(Xb.to(device))
            _, pred = torch.max(outputs, 1)
            total += yb.size(0)
            correct += (pred == yb.to(device)).sum().item()
    
    val_acc = correct / total
    print(f"Epoch {epoch+1:02d}/25 | Loss: {running_loss/len(train_loader):.4f} | Val Accuracy: {val_acc:.4%}")
    
    if val_acc > best_acc:
        best_acc = val_acc
        torch.save(model.state_dict(), 'final_iiot_ddos_model.pth')

final_acc = evaluate(model, test_loader, "Edge-IIoTset Final Test")
print("Cell 7 completed: Final model is ready")

In [None]:
# Cell 8: Generating accuracy and latency figures

print("Cell 8: Generating figures")

# Evaluating all the three models
results = {}

# KDDCup99
kdd_model = LightweightCNN(dataset_results['kddcup99']['feature_count']).to(device)
kdd_model.load_state_dict(torch.load('pretrained_kdd_model.pth'))
kdd_model.eval()
_, _, kdd_test_loader = get_dataloaders('kddcup99')
results['kddcup99'] = evaluate(kdd_model, kdd_test_loader, "KDDCup99 Test")

# CIC-DDoS2019
cic_backbone = LightweightCNN(dataset_results['kddcup99']['feature_count']).to(device)
cic_model = ProjectedModel(cic_backbone, dataset_results['cic_ddos']['feature_count']).to(device)
cic_model.load_state_dict(torch.load('transferred_cic_model.pth'))
cic_model.eval()
_, _, cic_test_loader = get_dataloaders('cic_ddos')
results['cic_ddos'] = evaluate(cic_model, cic_test_loader, "CIC-DDoS2019 Test")

# Edge-IIoTset 
edge_backbone = LightweightCNN(dataset_results['kddcup99']['feature_count']).to(device)
final_model = ProjectedModel(edge_backbone, dataset_results['edge_iot']['feature_count']).to(device)
final_model.load_state_dict(torch.load('final_iiot_ddos_model.pth'))
final_model.eval()
_, _, edge_test_loader = get_dataloaders('edge_iot')
results['edge_iot'] = evaluate(final_model, edge_test_loader, "Edge-IIoTset Final Test")

# New distinct colors (visible in light/dark themes)
bar_colors = ['#4477aa', '#ee6677', '#228833']  # Blue, Red, Green – high contrast

# Bar chart
plt.figure(figsize=(10, 6))
datasets = ['KDDCup99\n(Source Domain)', 'CIC-DDoS2019\n(Intermediate Domain)', 'Edge-IIoTset\n(Target IIoT Domain)']
acc_values = [results['kddcup99'], results['cic_ddos'], results['edge_iot']]
bars = plt.bar(datasets, [a*100 for a in acc_values], color=bar_colors, edgecolor='black', linewidth=1.2)
plt.ylim(0, 100.5)
plt.ylabel('Test Accuracy (%)', fontsize=14)
plt.title('Sequential Transfer Learning Performance', fontsize=16, pad=20)
plt.grid(axis='y', alpha=0.3)

for bar, acc in zip(bars, acc_values):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, f'{acc:.3%}', ha='center', va='bottom', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.savefig('transfer_learning_accuracy.png', dpi=300, bbox_inches='tight')
plt.show()
print("Saved as 'transfer_learning_accuracy.png'")

# Model size and inference latency (final model)
final_model.eval()
param_count = sum(p.numel() for p in final_model.parameters())
print(f"\nFinal Model Parameters: {param_count:,} (~{param_count * 4 / 1e6:.2f} MB in float32)")

import time
dummy_input = torch.randn(1, 1, dataset_results['edge_iot']['feature_count']).to(device)
times = []
with torch.no_grad():
    for _ in range(1000):
        start = time.time()
        _ = final_model(dummy_input)
        times.append((time.time() - start) * 1000)  # ms

avg_latency = np.mean(times)
std_latency = np.std(times)
print(f"Average Inference Latency: {avg_latency:.3f} ± {std_latency:.3f} ms")

# Latency distribution (different color)
plt.figure(figsize=(8, 5))
sns.histplot(times, kde=True, bins=30, color='#aa3377', alpha=0.7) 
plt.axvline(avg_latency, color='red', linestyle='--', label=f'Avg: {avg_latency:.3f} ms')
plt.title('Inference Latency Distribution', fontsize=14)
plt.xlabel('Latency (ms)', fontsize=12)
plt.ylabel('Frequency', fontsize=12)
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig('inference_latency_distribution.png', dpi=300, bbox_inches='tight')
plt.show()
print("Saved as 'inference_latency_distribution.png'")

print("\nSEQUENTIAL TRANSFER LEARNING PIPELINE COMPLETED")
print("="*70)
print("Real Results:")
print(f"• KDDCup99 Test Accuracy:     {results['kddcup99']:.4%}")
print(f"• CIC-DDoS2019 Test Accuracy: {results['cic_ddos']:.4%}")
print(f"• Edge-IIoTset Test Accuracy: {results['edge_iot']:.4%}")
print(f"• Model Size:                 ~{param_count * 4 / 1e6:.2f} MB")
print(f"• Avg Inference Latency:      {avg_latency:.3f} ms")
print("="*70)
print("Cell 8 completed: Figures saved")