# Improved Apache Training with Supervised Classification

## üîç **Why Previous Transformer Failed (2.6% F1-Score):**

**Root Causes:**
1. **Tiny vocabulary**: Only 32 templates from normal synthetic logs
2. **Small dataset**: Only 790 sequences (not enough for generalization)
3. **Unsupervised approach**: Model trained only on normal, can't distinguish attacks
4. **High uncertainty**: Validation perplexity 26.4 (should be <10)

## ‚úÖ **New Approach: Supervised Attack Classification**

Instead of anomaly detection (unsupervised), train a **supervised classifier** using your labeled synthetic data:

**Advantages:**
- ‚úÖ Uses **all 10,000 logs** (normal + attacks) for training
- ‚úÖ Learns to **distinguish attack patterns** directly
- ‚úÖ Multi-class classification (SQL, XSS, traversal, etc.)
- ‚úÖ Much better performance on labeled data
- ‚úÖ Expected F1-score: **70-85%** (vs 2.6% unsupervised)

**Method:**
- Fine-tune transformer with **classification head**
- Train on sequences with attack labels
- Direct optimization for attack detection

In [19]:
import json
import math
import re
from pathlib import Path
from collections import defaultdict, Counter
import yaml
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, random_split
from tqdm.auto import tqdm
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import classification_report, confusion_matrix

plt.style.use('seaborn-v0_8-darkgrid' if 'seaborn-v0_8-darkgrid' in plt.style.available else 'default')
%matplotlib inline

In [20]:
# Setup
CWD = Path.cwd().resolve()
REPO_ROOT = CWD.parent if CWD.name == 'notebooks' else CWD

cfg = yaml.safe_load((REPO_ROOT / 'configs/train_openstack.yaml').read_text())
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print(f"Device: {device}")
print(f"Repo root: {REPO_ROOT}")

Device: cuda
Repo root: /home/tpi/distil_shahreyar


## 1. Load ALL Synthetic Logs (Normal + Attacks)

In [21]:
# Load synthetic logs and labels
log_file = REPO_ROOT / 'data/apache_logs/synthetic_nodejs_apache_10k.log'
label_file = REPO_ROOT / 'data/apache_logs/synthetic_apache_labels.json'

with open(label_file, 'r') as f:
    label_data = json.load(f)

ground_truth = label_data['labels']
metadata = label_data['metadata']

print(f"Dataset: {metadata['total_logs']:,} logs")
print(f"Normal: {metadata['normal_logs']:,}")
print(f"Anomalous: {metadata['anomalous_logs']:,}")
print(f"\nAttack distribution:")
for attack_type, count in metadata['attack_distribution'].items():
    print(f"  {attack_type}: {count}")

Dataset: 10,050 logs
Normal: 8,500
Anomalous: 1,500

Attack distribution:
  sql_injection: 375
  xss: 300
  path_traversal: 225
  command_injection: 150
  scanning: 450


In [22]:
# Apache log parser
APACHE_PATTERN = re.compile(
    r'^(?P<ip>\S+) \S+ \S+ '
    r'\[(?P<timestamp>[^\]]+)\] '
    r'"(?P<method>\S+) (?P<path>\S+) (?P<protocol>\S+)" '
    r'(?P<status>\d+) '
    r'(?P<size>\S+)'
)

RE_IPv4 = re.compile(r'\b(?:(?:25[0-5]|2[0-4]\d|1?\d?\d)\.){3}(?:25[0-5]|2[0-4]\d|1?\d?\d)\b')
RE_NUM = re.compile(r'(?<![A-Za-z])[-+]?\d+(?:\.\d+)?(?![A-Za-z])')
RE_PATH = re.compile(r'(?:/[^/\s]+)+')
RE_URL = re.compile(r'https?://\S+')

def normalize_message(msg: str) -> str:
    if not msg:
        return msg
    out = msg
    out = RE_URL.sub('<URL>', out)
    out = RE_IPv4.sub('<IP>', out)
    
    def normalize_path(match):
        path = match.group(0)
        path = re.sub(r'/\d+', '/<NUM>', path)
        path = re.sub(r'/[0-9a-fA-F]{8,}', '/<HEX>', path)
        return path
    
    out = RE_PATH.sub(normalize_path, out)
    
    def bucket_number(m):
        s = m.group(0)
        try:
            val = float(s) if '.' in s else int(s)
            if val == 0:
                return '<NUM_E0>'
            mag = int(math.floor(math.log10(abs(val))))
            return f'<NUM_E{mag}>'
        except:
            return '<NUM>'
    
    out = RE_NUM.sub(bucket_number, out)
    return re.sub(r'\s+', ' ', out).strip()

def parse_apache_log(log_path: Path):
    records = []
    with open(log_path, 'r', encoding='utf-8', errors='ignore') as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            
            match = APACHE_PATTERN.match(line)
            if not match:
                continue
            
            d = match.groupdict()
            try:
                ts = pd.to_datetime(d['timestamp'], format='%d/%b/%Y:%H:%M:%S %z', errors='coerce')
            except:
                ts = pd.NaT
            
            message = f"{d.get('method', 'GET')} {d.get('path', '/')} {d.get('protocol', 'HTTP/1.1')} {d.get('status', '200')}"
            
            records.append({
                'timestamp': ts,
                'ip': d['ip'],
                'method': d.get('method'),
                'path': d.get('path'),
                'status': int(d.get('status', 0)),
                'norm_message': normalize_message(message),
                'line_num': line_num
            })
    
    df = pd.DataFrame(records)
    df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
    
    # Add ground truth labels
    df['is_anomaly'] = df['line_num'].apply(lambda x: ground_truth.get(str(x), {}).get('is_anomaly', False))
    df['attack_type'] = df['line_num'].apply(lambda x: ground_truth.get(str(x), {}).get('attack_type', 'normal'))
    
    # Convert to numeric labels
    attack_types = ['normal', 'sql_injection', 'xss', 'path_traversal', 'command_injection', 'scanning']
    attack_to_id = {att: i for i, att in enumerate(attack_types)}
    
    df['label'] = df['attack_type'].fillna('normal').map(attack_to_id)
    
    return df, attack_to_id, attack_types

print("Parsing logs...")
df, attack_to_id, attack_types = parse_apache_log(log_file)

print(f"\n‚úì Parsed {len(df):,} log entries")
print(f"\nLabel distribution:")
print(df['attack_type'].fillna('normal').value_counts())

Parsing logs...



‚úì Parsed 9,427 log entries

Label distribution:
attack_type
normal            8500
scanning           450
path_traversal     225
xss                141
sql_injection       61
brute_force         50
Name: count, dtype: int64


## 2. Build Vocabulary (All Logs - Normal + Attacks)

In [23]:
# Build vocabulary from ALL logs
template_counts = Counter(df['norm_message'])
id_to_template = sorted(template_counts.keys(), key=lambda x: template_counts[x], reverse=True)
template_to_id = {t: i for i, t in enumerate(id_to_template)}

vocab_size = len(id_to_template)

print(f"Vocabulary size: {vocab_size} (vs previous 32 from normal-only logs)")
print(f"\nTop 10 templates:")
for i, (template, count) in enumerate(template_counts.most_common(10), 1):
    print(f"  {i:2d}. [{count:4d}x] {template}")

# Map templates
df['template_id'] = df['norm_message'].map(template_to_id)

Vocabulary size: 54 (vs previous 32 from normal-only logs)

Top 10 templates:
   1. [ 466x] GET /api/users HTTP/<NUM>.<NUM_E0> <NUM_E2>
   2. [ 454x] GET /static/images/logo.png HTTP/<NUM>.<NUM_E0> <NUM_E2>
   3. [ 450x] GET /health HTTP/<NUM>.<NUM_E0> <NUM_E2>
   4. [ 448x] GET /api/search?q=product HTTP/<NUM>.<NUM_E0> <NUM_E2>
   5. [ 447x] GET / HTTP/<NUM>.<NUM_E0> <NUM_E2>
   6. [ 434x] GET /api/auth/logout HTTP/<NUM>.<NUM_E0> <NUM_E2>
   7. [ 427x] GET /api/auth/login HTTP/<NUM>.<NUM_E0> <NUM_E2>
   8. [ 423x] GET /static/js/app.js HTTP/<NUM>.<NUM_E0> <NUM_E2>
   9. [ 419x] GET /metrics HTTP/<NUM>.<NUM_E0> <NUM_E2>
  10. [ 417x] GET /docs HTTP/<NUM>.<NUM_E0> <NUM_E2>


## 3. Create Labeled Sequences for Supervised Training

In [24]:
# Create sequences with labels
WINDOW_SIZE = 20
STRIDE = 10

sequences = []
sequence_labels = []

df_sorted = df.sort_values('timestamp').reset_index(drop=True)

for ip, group in tqdm(df_sorted.groupby('ip'), desc="Creating labeled sequences"):
    templates = group['template_id'].tolist()
    labels = group['label'].tolist()
    
    if len(templates) < 2:
        continue
    
    for i in range(0, len(templates) - 1, STRIDE):
        window = templates[i:i + WINDOW_SIZE]
        window_labels = labels[i:i + WINDOW_SIZE]
        
        if len(window) < 2:
            continue
        
        sequences.append(window)
        
        # Sequence label = most common attack type in window
        # If any attack exists, use that; otherwise 'normal'
        non_normal = [l for l in window_labels if l != 0 and not pd.isna(l)]
        if non_normal:
            counter = Counter(non_normal)
            seq_label = int(counter.most_common(1)[0][0])  # Convert to int
        else:
            seq_label = 0  # normal
        
        sequence_labels.append(seq_label)

print(f"\n‚úì Created {len(sequences):,} labeled sequences")
print(f"\nSequence label distribution:")
label_counts = Counter(sequence_labels)
for label_id, count in sorted(label_counts.items()):
    print(f"  {attack_types[label_id]}: {count} ({count/len(sequence_labels)*100:.1f}%)")

Creating labeled sequences:   0%|          | 0/110 [00:00<?, ?it/s]


‚úì Created 979 labeled sequences

Sequence label distribution:
  normal: 883 (90.2%)
  xss: 9 (0.9%)
  path_traversal: 23 (2.3%)
  scanning: 64 (6.5%)


## 4. Supervised Classification Model

In [25]:
# Classification model (transformer encoder + classification head)
class AttackClassifier(nn.Module):
    def __init__(self, vocab_size: int, num_classes: int, pad_id: int, 
                 d_model: int, n_layers: int, n_heads: int,
                 ffn_dim: int, dropout: float, max_length: int):
        super().__init__()
        self.pad_id = pad_id
        self.embedding = nn.Embedding(vocab_size, d_model, padding_idx=pad_id)
        self.positional = nn.Parameter(torch.zeros(1, max_length, d_model))
        
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model, nhead=n_heads, dim_feedforward=ffn_dim,
            dropout=dropout, batch_first=True, activation='gelu'
        )
        self.encoder = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
        
        self.dropout = nn.Dropout(dropout)
        self.norm = nn.LayerNorm(d_model)
        
        # Classification head
        self.classifier = nn.Sequential(
            nn.Linear(d_model, d_model // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(d_model // 2, num_classes)
        )
    
    def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor):
        seq_len = input_ids.size(1)
        x = self.embedding(input_ids)
        x = x + self.positional[:, :seq_len, :]
        
        key_padding = attention_mask == 0
        x = self.encoder(x, src_key_padding_mask=key_padding)
        x = self.dropout(self.norm(x))
        
        # Use [CLS] token (first position) or mean pooling
        pooled = x.mean(dim=1)  # Mean pooling over sequence
        logits = self.classifier(pooled)
        
        return logits

# Initialize model
pad_id = vocab_size
total_vocab = vocab_size + 1
num_classes = len(attack_types)

model = AttackClassifier(
    vocab_size=total_vocab,
    num_classes=num_classes,
    pad_id=pad_id,
    d_model=256,
    n_layers=4,  # Smaller model for small dataset
    n_heads=8,
    ffn_dim=512,
    dropout=0.1,
    max_length=100
).to(device)

print(f"Model initialized:")
print(f"  Vocabulary: {vocab_size} + 1 (pad) = {total_vocab}")
print(f"  Classes: {num_classes} ({', '.join(attack_types)})")
print(f"  Parameters: {sum(p.numel() for p in model.parameters()):,}")

Model initialized:
  Vocabulary: 54 + 1 (pad) = 55
  Classes: 6 (normal, sql_injection, xss, path_traversal, command_injection, scanning)
  Parameters: 2,182,278


## 5. Training Setup with Class Weights (Handle Imbalance)

In [27]:
# Dataset
class LabeledSequenceDataset(Dataset):
    def __init__(self, sequences, labels):
        self.sequences = sequences
        self.labels = labels
    
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        return self.sequences[idx], self.labels[idx]

class ClassificationCollator:
    def __init__(self, pad_id: int, max_length: int):
        self.pad_id = pad_id
        self.max_length = max_length
    
    def __call__(self, batch):
        sequences, labels = zip(*batch)
        
        truncated = [seq[:self.max_length] for seq in sequences]
        max_len = max(len(seq) for seq in truncated)
        bs = len(truncated)
        
        input_ids = torch.full((bs, max_len), self.pad_id, dtype=torch.long)
        attention_mask = torch.zeros((bs, max_len), dtype=torch.long)
        
        for i, seq in enumerate(truncated):
            input_ids[i, :len(seq)] = torch.tensor(seq, dtype=torch.long)
            attention_mask[i, :len(seq)] = 1
        
        labels_tensor = torch.tensor(labels, dtype=torch.long)
        
        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': labels_tensor
        }

# Split data
dataset = LabeledSequenceDataset(sequences, sequence_labels)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

collator = ClassificationCollator(pad_id=pad_id, max_length=100)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True, collate_fn=collator)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False, collate_fn=collator)

# Class weights for imbalanced data
label_counts = Counter(sequence_labels)
total_samples = len(sequence_labels)

# Handle case where some classes have no samples
weights = []
for i in range(num_classes):
    count = label_counts.get(i, 0)
    if count > 0:
        weights.append(total_samples / (num_classes * count))
    else:
        # For missing classes, use a default weight of 1.0
        weights.append(1.0)

class_weights = torch.tensor(weights, dtype=torch.float32).to(device)

print(f"\nClass weights (to handle imbalance):")
for i, (attack_type, weight) in enumerate(zip(attack_types, class_weights)):
    print(f"  {attack_type}: {weight:.2f}")

# Loss and optimizer
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0.01)

print(f"\nTraining setup:")
print(f"  Train sequences: {train_size:,}")
print(f"  Val sequences: {val_size:,}")
print(f"  Batch size: 32")


Class weights (to handle imbalance):
  normal: 0.18
  sql_injection: 1.00
  xss: 18.13
  path_traversal: 7.09
  command_injection: 1.00
  scanning: 2.55

Training setup:
  Train sequences: 783
  Val sequences: 196
  Batch size: 32


## 6. Training Loop

In [29]:
# Training function
def train_epoch(model, loader, optimizer, criterion, device, epoch):
    model.train()
    total_loss = 0
    correct = 0
    total = 0
    
    pbar = tqdm(loader, desc=f"Epoch {epoch} [Train]")
    for batch in pbar:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        
        optimizer.zero_grad()
        logits = model(input_ids, attention_mask)
        loss = criterion(logits, labels)
        
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        
        total_loss += loss.item()
        _, predicted = torch.max(logits, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        
        pbar.set_postfix({'loss': f"{loss.item():.4f}", 'acc': f"{correct/total:.4f}"})
    
    avg_loss = total_loss / len(loader)
    accuracy = correct / total
    
    return avg_loss, accuracy

def validate(model, loader, criterion, device, epoch, attack_types):
    model.eval()
    total_loss = 0
    all_preds = []
    all_labels = []
    
    with torch.no_grad():
        pbar = tqdm(loader, desc=f"Epoch {epoch} [Val]")
        for batch in pbar:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            
            logits = model(input_ids, attention_mask)
            loss = criterion(logits, labels)
            
            total_loss += loss.item()
            _, predicted = torch.max(logits, 1)
            
            all_preds.extend(predicted.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())
            
            pbar.set_postfix({'loss': f"{loss.item():.4f}"})
    
    avg_loss = total_loss / len(loader)
    accuracy = (np.array(all_preds) == np.array(all_labels)).mean()
    
    # Classification report with explicit labels
    print(f"\n{classification_report(all_labels, all_preds, labels=list(range(num_classes)), target_names=attack_types, zero_division=0)}")
    
    return avg_loss, accuracy, all_preds, all_labels

# Training loop
EPOCHS = 15
best_val_acc = 0

output_dir = REPO_ROOT / 'artifacts/apache_supervised_model'
output_dir.mkdir(parents=True, exist_ok=True)

history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': []}

print(f"\n{'='*70}")
print(f"STARTING SUPERVISED TRAINING")
print(f"{'='*70}\n")

for epoch in range(1, EPOCHS + 1):
    train_loss, train_acc = train_epoch(model, train_loader, optimizer, criterion, device, epoch)
    val_loss, val_acc, val_preds, val_labels = validate(model, val_loader, criterion, device, epoch, attack_types)
    
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc)
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc)
    
    print(f"\nEpoch {epoch}/{EPOCHS}:")
    print(f"  Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f}")
    print(f"  Val Loss:   {val_loss:.4f} | Val Acc:   {val_acc:.4f}")
    
    # Save best model
    if val_acc > best_val_acc:
        best_val_acc = val_acc
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'val_loss': val_loss,
            'val_acc': val_acc,
            'attack_types': attack_types,
            'vocab_size': total_vocab,
            'pad_id': pad_id
        }, output_dir / 'best.pt')
        print(f"  ‚úì Saved best model (val_acc: {val_acc:.4f})")
    
    print()

print(f"\n{'='*70}")
print(f"TRAINING COMPLETE")
print(f"{'='*70}")
print(f"Best validation accuracy: {best_val_acc:.4f}")


STARTING SUPERVISED TRAINING



Epoch 1 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 1 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.98      0.99       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.00      0.00      0.00         1
   path_traversal       0.83      1.00      0.91         5
command_injection       0.00      0.00      0.00         0
         scanning       0.67      1.00      0.80         6

        micro avg       0.98      0.98      0.98       196
        macro avg       0.42      0.50      0.45       196
     weighted avg       0.98      0.98      0.98       196


Epoch 1/15:
  Train Loss: 0.5291 | Train Acc: 0.9885
  Val Loss:   0.4663 | Val Acc:   0.9796
  ‚úì Saved best model (val_acc: 0.9796)



Epoch 2 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 2 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.98      0.99       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.60      0.75         5
command_injection       0.00      0.00      0.00         0
         scanning       0.60      1.00      0.75         6

        micro avg       0.97      0.97      0.97       196
        macro avg       0.52      0.60      0.53       196
     weighted avg       0.99      0.97      0.98       196


Epoch 2/15:
  Train Loss: 0.2655 | Train Acc: 0.9911
  Val Loss:   0.4327 | Val Acc:   0.9745



Epoch 3 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 3 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.98      0.99       184
    sql_injection       0.00      0.00      0.00         0
              xss       1.00      1.00      1.00         1
   path_traversal       1.00      1.00      1.00         5
command_injection       0.00      0.00      0.00         0
         scanning       0.67      1.00      0.80         6

        micro avg       0.98      0.98      0.98       196
        macro avg       0.61      0.66      0.63       196
     weighted avg       0.99      0.98      0.99       196


Epoch 3/15:
  Train Loss: 0.1276 | Train Acc: 0.9949
  Val Loss:   0.2421 | Val Acc:   0.9847
  ‚úì Saved best model (val_acc: 0.9847)



Epoch 4 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 4 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.98      0.99       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.33      1.00      0.50         1
   path_traversal       1.00      0.60      0.75         5
command_injection       0.00      0.00      0.00         0
         scanning       0.67      1.00      0.80         6

        micro avg       0.97      0.97      0.97       196
        macro avg       0.50      0.60      0.51       196
     weighted avg       0.99      0.97      0.98       196


Epoch 4/15:
  Train Loss: 0.1134 | Train Acc: 0.9974
  Val Loss:   0.3079 | Val Acc:   0.9745



Epoch 5 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 5 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.98      0.99       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.00      0.00      0.00         1
   path_traversal       0.83      1.00      0.91         5
command_injection       0.00      0.00      0.00         0
         scanning       0.67      1.00      0.80         6

        micro avg       0.98      0.98      0.98       196
        macro avg       0.42      0.50      0.45       196
     weighted avg       0.98      0.98      0.98       196


Epoch 5/15:
  Train Loss: 0.0649 | Train Acc: 0.9974
  Val Loss:   0.1725 | Val Acc:   0.9796



Epoch 6 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 6 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.98      0.99       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.33      1.00      0.50         1
   path_traversal       1.00      0.60      0.75         5
command_injection       0.00      0.00      0.00         0
         scanning       0.67      1.00      0.80         6

        micro avg       0.97      0.97      0.97       196
        macro avg       0.50      0.60      0.51       196
     weighted avg       0.99      0.97      0.98       196


Epoch 6/15:
  Train Loss: 0.1648 | Train Acc: 0.9923
  Val Loss:   0.2742 | Val Acc:   0.9745



Epoch 7 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 7 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      1.00      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       1.00      1.00      1.00         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.58      0.63      0.59       196
     weighted avg       1.00      0.99      1.00       196


Epoch 7/15:
  Train Loss: 0.0394 | Train Acc: 0.9974
  Val Loss:   0.1883 | Val Acc:   0.9949
  ‚úì Saved best model (val_acc: 0.9949)



Epoch 8 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 8 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.60      0.75         5
command_injection       0.00      0.00      0.00         0
         scanning       0.75      1.00      0.86         6

        micro avg       0.98      0.98      0.98       196
        macro avg       0.54      0.60      0.55       196
     weighted avg       0.99      0.98      0.98       196


Epoch 8/15:
  Train Loss: 0.0219 | Train Acc: 1.0000
  Val Loss:   0.2901 | Val Acc:   0.9847



Epoch 9 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 9 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 9/15:
  Train Loss: 0.0142 | Train Acc: 1.0000
  Val Loss:   0.1317 | Val Acc:   0.9898



Epoch 10 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 10 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 10/15:
  Train Loss: 0.0129 | Train Acc: 1.0000
  Val Loss:   0.2334 | Val Acc:   0.9898



Epoch 11 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 11 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 11/15:
  Train Loss: 0.0065 | Train Acc: 1.0000
  Val Loss:   0.1956 | Val Acc:   0.9898



Epoch 12 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 12 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 12/15:
  Train Loss: 0.0041 | Train Acc: 1.0000
  Val Loss:   0.1931 | Val Acc:   0.9898



Epoch 13 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 13 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 13/15:
  Train Loss: 0.0044 | Train Acc: 1.0000
  Val Loss:   0.1939 | Val Acc:   0.9898



Epoch 14 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 14 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 14/15:
  Train Loss: 0.0037 | Train Acc: 1.0000
  Val Loss:   0.1922 | Val Acc:   0.9898



Epoch 15 [Train]:   0%|          | 0/25 [00:00<?, ?it/s]

Epoch 15 [Val]:   0%|          | 0/7 [00:00<?, ?it/s]


                   precision    recall  f1-score   support

           normal       1.00      0.99      1.00       184
    sql_injection       0.00      0.00      0.00         0
              xss       0.50      1.00      0.67         1
   path_traversal       1.00      0.80      0.89         5
command_injection       0.00      0.00      0.00         0
         scanning       0.86      1.00      0.92         6

        micro avg       0.99      0.99      0.99       196
        macro avg       0.56      0.63      0.58       196
     weighted avg       0.99      0.99      0.99       196


Epoch 15/15:
  Train Loss: 0.0032 | Train Acc: 1.0000
  Val Loss:   0.1964 | Val Acc:   0.9898


TRAINING COMPLETE
Best validation accuracy: 0.9949


## 7. Results

**Expected Performance:**
- Overall Accuracy: **70-85%**
- Per-class F1-scores:
  - Normal: 75-85% (high precision, good recall)
  - SQL Injection: 80-90% (very distinctive patterns)
  - XSS: 75-85% (clear signatures)
  - Path Traversal: 80-90% (obvious patterns)
  - Command Injection: 85-95% (critical signatures)
  - Scanning: 60-75% (harder to distinguish)

**Why This Works Better:**
1. ‚úÖ Supervised learning with labeled data
2. ‚úÖ Multi-class classification (learns specific attack patterns)
3. ‚úÖ Class weights handle imbalance
4. ‚úÖ Uses all data (normal + attacks)
5. ‚úÖ Direct optimization for attack detection

This should give you **70-85% F1-score** instead of the current 2.6%!