### DAGMM on KDDCupRev

This notebook trains **DAGMM** with **Gaussian / Laplace / Student‑t** mixture components on the **KDDCupRev** dataset.
You can switch the mixture distribution with the `dist_type` parameter below (`'gaussian'|'laplace'|'student_t'`).

In [1]:
## If you haven't installed the repo dependencies in this environment, uncomment and run:
# !pip install -r requirements.txt
# !pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

In [2]:
import os, sys, json, torch
from pathlib import Path
# Assuming this notebook lives inside the repo root; otherwise adjust:
sys.path.append(str(Path().resolve()))
from kddcup_rev import KDDCupRevLoader
from model import DaGMM
from solver import Solver

In [3]:
# ==== Configuration ====
data_path = 'kddcup.data_10_percent'   # change if your dataset lives elsewhere
dist_type = 'gaussian'   # 'gaussian' | 'laplace' | 'student_t'
student_nu = 4.0         # only used if dist_type == 'student_t'
mode = 'train'           # 'train' or 'test'

# Training params
batch_size = 1024  # adjust per dataset size
num_epochs = 100
lr = 1e-4
gmm_k = 4
lambda_energy = 0.1
lambda_cov_diag = 0.005

In [4]:
# ==== Data loader ====
dataset = KDDCupRevLoader(data_path, mode=mode)
from torch.utils.data import DataLoader
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=(mode=='train'))
print(f'Train set size: {len(dataset.train) if mode=="train" else "N/A"}')
print(f'Test set size : {len(dataset.test) if mode=="test" else len(dataset.test)}')

Train set size: 48639
Test set size : 72958


In [5]:
# ==== Initialize model & solver ====
config = {
    'lr': lr,
    'num_epochs': num_epochs,
    'batch_size': batch_size,
    'gmm_k': gmm_k,
    'lambda_energy': lambda_energy,
    'lambda_cov_diag': lambda_cov_diag,
    'dist_type': dist_type,
    'student_nu': student_nu,
    'model_save_path': './models',
    'input_dim': 118
}
solver = Solver(data_loader, config)
# Adjust input/output dimensions automatically (118 features after one‑hot encoding)
input_dim = dataset.train.shape[1] if mode=='train' else dataset.test.shape[1]
solver.dagmm.encoder[0] = torch.nn.Linear(input_dim, solver.dagmm.encoder[0].out_features)
solver.dagmm.decoder[-1] = torch.nn.Linear(solver.dagmm.decoder[-1].in_features, input_dim)

In [6]:
# ==== Train or Test ====
if mode == 'train':
    solver.train()
else:
    solver.test()

100%|██████████| 48/48 [00:01<00:00, 46.53it/s]
100%|██████████| 48/48 [00:00<00:00, 70.39it/s]
100%|██████████| 48/48 [00:00<00:00, 75.87it/s]
100%|██████████| 48/48 [00:00<00:00, 74.92it/s]
100%|██████████| 48/48 [00:00<00:00, 84.51it/s]
100%|██████████| 48/48 [00:00<00:00, 86.13it/s]
100%|██████████| 48/48 [00:00<00:00, 85.86it/s]
100%|██████████| 48/48 [00:00<00:00, 72.27it/s]
100%|██████████| 48/48 [00:00<00:00, 76.52it/s]
100%|██████████| 48/48 [00:00<00:00, 87.31it/s]
100%|██████████| 48/48 [00:00<00:00, 75.34it/s]
100%|██████████| 48/48 [00:00<00:00, 87.03it/s]
100%|██████████| 48/48 [00:00<00:00, 77.48it/s]
100%|██████████| 48/48 [00:00<00:00, 82.10it/s]
100%|██████████| 48/48 [00:00<00:00, 84.67it/s]
100%|██████████| 48/48 [00:00<00:00, 85.52it/s]
100%|██████████| 48/48 [00:00<00:00, 86.04it/s]
100%|██████████| 48/48 [00:00<00:00, 84.67it/s]
100%|██████████| 48/48 [00:00<00:00, 87.59it/s]
100%|██████████| 48/48 [00:00<00:00, 76.15it/s]
100%|██████████| 48/48 [00:00<00:00, 76.

In [7]:
print(f"Results for {dist_type} distribution:")
solver.test()

Results for gaussian distribution:
Threshold : -1.4645147323608398
Accuracy : 0.6655, Precision : 0.4972, Recall : 0.3112, F-score : 0.3828


(0.6655061816387511,
 0.4972078050062414,
 0.31119700645585757,
 0.3828022255943349)