### DAGMM on Thyroid

This notebook trains **DAGMM** with **Gaussian / Laplace / Student‑t** mixture components on the **Thyroid** dataset.
You can switch the mixture distribution with the `dist_type` parameter below (`'gaussian'|'laplace'|'student_t'`).

In [1]:
## If you haven't installed the repo dependencies in this environment, uncomment and run:
# !pip install -r requirements.txt
# !pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

In [2]:
import os, sys, json, torch
from pathlib import Path
# Assuming this notebook lives inside the repo root; otherwise adjust:
sys.path.append(str(Path().resolve()))
from thyroid import ThyroidLoader
from model import DaGMM
from solver import Solver

In [None]:
# ==== Configuration ====
data_path = 'ann-train.data'   # change if your dataset lives elsewhere
dist_type = 'gaussian'   # 'gaussian' | 'laplace' | 'student_t'
student_nu = 4.0         # only used if dist_type == 'student_t'
mode = 'train'           # 'train' or 'test'

# Training params
batch_size = 1024  # adjust per dataset size
num_epochs = 100
lr = 1e-4
gmm_k = 4
lambda_energy = 0.1
lambda_cov_diag = 0.005

In [4]:
# ==== Data loader ====
dataset = ThyroidLoader(data_path, mode=mode)
from torch.utils.data import DataLoader
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=(mode=='train'))
print(f'Train set size: {len(dataset.train) if mode=="train" else "N/A"}')
print(f'Test set size : {len(dataset.test) if mode=="test" else len(dataset.test)}')

Train set size: 1839
Test set size : 1933


  data = pd.read_csv(data_path, delim_whitespace=True, header=None)


In [5]:
# ==== Initialize model & solver ====
config = {
    'lr': lr,
    'num_epochs': num_epochs,
    'batch_size': batch_size,
    'gmm_k': gmm_k,
    'lambda_energy': lambda_energy,
    'lambda_cov_diag': lambda_cov_diag,
    'dist_type': dist_type,
    'student_nu': student_nu,
    'model_save_path': './models',
    'input_dim': 5
}
solver = Solver(data_loader, config)
# Adjust input/output dimensions automatically (21 raw features (15 binary + 6 continuous))
input_dim = dataset.train.shape[1] if mode=='train' else dataset.test.shape[1]
solver.dagmm.encoder[0] = torch.nn.Linear(input_dim, solver.dagmm.encoder[0].out_features)
solver.dagmm.decoder[-1] = torch.nn.Linear(solver.dagmm.decoder[-1].in_features, input_dim)

In [6]:
# ==== Train or Test ====
if mode == 'train':
    solver.train()
else:
    solver.test()

100%|██████████| 2/2 [00:00<00:00,  8.58it/s]
100%|██████████| 2/2 [00:00<00:00, 32.66it/s]
100%|██████████| 2/2 [00:00<00:00, 17.19it/s]
100%|██████████| 2/2 [00:00<00:00, 68.59it/s]
100%|██████████| 2/2 [00:00<00:00, 48.19it/s]
100%|██████████| 2/2 [00:00<00:00, 91.30it/s]
100%|██████████| 2/2 [00:00<00:00, 66.01it/s]
100%|██████████| 2/2 [00:00<00:00, 75.65it/s]
100%|██████████| 2/2 [00:00<00:00, 82.70it/s]
100%|██████████| 2/2 [00:00<00:00, 89.03it/s]
100%|██████████| 2/2 [00:00<00:00, 83.12it/s]
100%|██████████| 2/2 [00:00<00:00, 108.63it/s]
100%|██████████| 2/2 [00:00<00:00, 75.72it/s]
100%|██████████| 2/2 [00:00<00:00, 89.23it/s]
100%|██████████| 2/2 [00:00<00:00, 78.38it/s]
100%|██████████| 2/2 [00:00<00:00, 97.00it/s]
100%|██████████| 2/2 [00:00<00:00, 42.60it/s]
100%|██████████| 2/2 [00:00<00:00, 87.09it/s]
100%|██████████| 2/2 [00:00<00:00, 58.98it/s]
100%|██████████| 2/2 [00:00<00:00, 53.96it/s]
100%|██████████| 2/2 [00:00<00:00, 73.58it/s]
100%|██████████| 2/2 [00:00<00:00

In [7]:
print(f"Results for {dist_type} distribution:")
solver.test()

Results for student_t distribution:
Threshold : -11.323864364624024
Accuracy : 0.7941, Precision : 0.1429, Recall : 0.6559, F-score : 0.2346


(0.794102431453699,
 0.14285714285714285,
 0.6559139784946236,
 0.2346153846153846)