# 🧪 CTI - MITRE ATTA&CK TTP MAPPİNG PROJECT

## 🤖 Model: CTI-BERT (IBM Research)
- **ibm-research/CTI-BERT**: Cyber Threat Intelligence verisiyle önceden eğitilmiş domain-specific BERT
- **Avantaj**: Güvenlik ve CTI metinlerini anlamada genel BERT'ten daha iyi
- **Reference**: https://huggingface.co/ibm-research/CTI-BERT

## 📊 Dataset: Single Source
- **tumeteor/Security-TTP-Mapping** (14.9k train + 2.6k test)
- **Özellik**: MITRE ATT&CK technique ID'leri (T-codes)
- **Avantaj**: Tutarlı label format, yüksek kalite
- **Reference**: https://huggingface.co/datasets/tumeteor/Security-TTP-Mapping 

### 🔧 SETUP 

In [1]:
# Update repository to latest version (get optimized tree classifiers)
!cd /content/Mitre_Attack_TTP_Mapping && git pull origin main

print("\n⚡ OPTIMIZATION APPLIED:")
print("   - ExtraTreesClassifier: 50 trees (was 100)")
print("   - RandomForestClassifier: 50 trees (was 100)")
print("   - max_features='sqrt' (~28 features per split instead of 768)")
print("   - min_samples_split=20 (faster splits)")
print("   - Expected speedup: ~4x faster training!")
print("\n   If training was stuck, Runtime > Interrupt execution, then re-run from Strategy cell\n")

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("✅ Google Colab ortamı tespit edildi")
    
    import torch
    if torch.cuda.is_available():
        print(f"✅ GPU: {torch.cuda.get_device_name(0)}")
        print(f"   VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    else:
        print("⚠️  GPU bulunamadı! Runtime > Change runtime type > GPU seçin")
    
    print("\n📥 Proje indiriliyor...")
    !rm -rf Mitre_Attack_TTP_Mapping
    !git clone https://github.com/Aliekinozcetin/Mitre_Attack_TTP_Mapping.git
    os.chdir('Mitre_Attack_TTP_Mapping')
    print(f"✅ Çalışma dizini: {os.getcwd()}")
    
    print("\n📦 Paketler yükleniyor...")
    !pip install -q torch transformers datasets scikit-learn pandas tqdm matplotlib seaborn
    print("✅ Tüm paketler yüklendi")
    
    # HuggingFace bağlantı optimizasyonu
    print("\n🔧 HuggingFace cache ayarları...")
    
    # Create cache directory
    cache_dir = '/content/hf_cache'
    os.makedirs(cache_dir, exist_ok=True)
    
    # Set environment variables
    os.environ['HF_HOME'] = cache_dir
    os.environ['TRANSFORMERS_CACHE'] = cache_dir
    os.environ['HF_DATASETS_CACHE'] = cache_dir
    os.environ['HF_HUB_DOWNLOAD_TIMEOUT'] = '600'  # 10 minutes
    os.environ['CURL_CA_BUNDLE'] = ''
    os.environ['HF_ENDPOINT'] = 'https://huggingface.co'
    os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = '1'  # Faster downloads
    
    print(f"✅ Cache dizini oluşturuldu: {cache_dir}")
    print(f"   Timeout: 10 dakika")
    
    # Test HuggingFace connection
    try:
        from huggingface_hub import HfApi
        api = HfApi()
        print("\n📡 HuggingFace bağlantı testi...")
        info = api.model_info("ibm-research/CTI-BERT", timeout=30)
        print(f"✅ Model erişilebilir: {info.modelId}")
    except Exception as e:
        print(f"⚠️  Bağlantı uyarısı: {str(e)[:100]}")
        print("   Model indirme denemeye devam edilecek...")
else:
    print("ℹ️  Yerel ortamda çalışıyorsunuz")

zsh:cd:1: no such file or directory: /content/Mitre_Attack_TTP_Mapping

⚡ OPTIMIZATION APPLIED:
   - ExtraTreesClassifier: 50 trees (was 100)
   - RandomForestClassifier: 50 trees (was 100)
   - max_features='sqrt' (~28 features per split instead of 768)
   - min_samples_split=20 (faster splits)
   - Expected speedup: ~4x faster training!

   If training was stuck, Runtime > Interrupt execution, then re-run from Strategy cell

ℹ️  Yerel ortamda çalışıyorsunuz


### 📦 Import Modules & Dependencies

In [2]:
# Clear import cache
import sys
if 'src.data_loader' in sys.modules:
    del sys.modules['src.data_loader']
if 'src.model' in sys.modules:
    del sys.modules['src.model']
if 'src.train' in sys.modules:
    del sys.modules['src.train']
if 'src.evaluate' in sys.modules:
    del sys.modules['src.evaluate']
if 'src.strategies' in sys.modules:
    del sys.modules['src.strategies']

import torch
import numpy as np
import json
from datetime import datetime
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader

# Import prepare_data first
from src.data_loader import prepare_data

# CRITICAL FIX: Define the function HERE if import fails
try:
    from src.data_loader import load_datasets_and_prepare_dataloaders
    print("✅ Fonksiyon GitHub'dan import edildi")
except ImportError:
    print("⚠️  GitHub import başarısız, fonksiyon notebook'ta tanımlanıyor...")
    
    def load_datasets_and_prepare_dataloaders(
        model_name: str = "ibm-research/CTI-BERT",
        batch_size: int = 16,
        max_length: int = 512,
        use_hybrid: bool = True,
        dataset_name: str = "tumeteor/Security-TTP-Mapping"
    ):
        """Wrapper for prepare_data - notebook fallback version."""
        data = prepare_data(
            model_name=model_name,
            max_length=max_length,
            use_hybrid=use_hybrid,
            dataset_name=dataset_name
        )
        
        train_loader = DataLoader(
            data['train_dataset'],
            batch_size=batch_size,
            shuffle=True
        )
        
        return (
            train_loader,
            None,
            data['test_dataset'],
            data['label_list']
        )
    
    print("✅ Fallback fonksiyon tanımlandı (CTI-BERT)")

from src.model import load_model
from src.train import train_model
from src.evaluate import evaluate_model
from src.strategies import get_strategy_config

print("✅ Modüller yüklendi")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")


ModuleNotFoundError: No module named 'torch'

### 🔧 CONFIGURATION

In [None]:
# Base training configuration
BASE_CONFIG = {
    'model_name': 'ibm-research/CTI-BERT',  # CTI domain-specific BERT
    'batch_size': 16,
    'learning_rate': 2e-5,
    'num_epochs': 3,
    'max_length': 128,
    'device': 'cuda' if torch.cuda.is_available() else 'cpu'
}

# Output directory
OUTPUT_DIR = Path('outputs')
OUTPUT_DIR.mkdir(exist_ok=True)

# Store results from all strategies
all_test_results = {}

print("✅ Konfigürasyon ayarlandı")
print(f"Model: {BASE_CONFIG['model_name']}")
print(f"Device: {BASE_CONFIG['device']}")
print(f"Output: {OUTPUT_DIR.absolute()}")


### 📊 DATA LOADING

In [None]:
print("📥 Veri yükleniyor...")
print("📦 Dataset: tumeteor/Security-TTP-Mapping (Single Source)")
print(f"🤖 Model: {BASE_CONFIG['model_name']}")
print("")

# Use single dataset: tumeteor only
train_dataloader, val_dataloader, test_dataset, label_names = load_datasets_and_prepare_dataloaders(
    model_name=BASE_CONFIG['model_name'],
    batch_size=BASE_CONFIG['batch_size'],
    max_length=BASE_CONFIG['max_length'],
    use_hybrid=False,  # Single dataset: tumeteor only
    dataset_name="tumeteor/Security-TTP-Mapping"
)

# Get train_dataset from dataloader for strategies
train_dataset = train_dataloader.dataset

# Create test dataloader
test_dataloader = DataLoader(
    test_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=False
)

num_labels = len(label_names)

# Create data dict for backward compatibility
data = {
    'train_dataset': train_dataset,
    'test_dataset': test_dataset,
    'label_list': label_names,
    'num_labels': num_labels
}

print(f"✅ Veri yüklendi")
print(f"   Train batches: {len(train_dataloader)}")
print(f"   Test batches: {len(test_dataloader)}")
print(f"   Toplam label sayısı: {num_labels}")
print(f"   İlk 5 label: {label_names[:5]}")


---

## 📊 EXPERIMENT STRUCTURE

**Execution Order (Önerilen Sıra):**
1. **PART A**: Data Augmentation (AUG-1 → AUG-4) - Veri kalitesini artır
2. **PART B**: Loss Function Strategies (STR-1 → STR-4) - En iyi augmented data ile test et
3. **PART C**: Multi-label Classification Techniques - En iyi kombinasyonu bul

---

## 🔄 PART A: DATA AUGMENTATION EXPERIMENTS

**Öncelik:** Bu bölümü PART B ve C'den ÖNCE çalıştırın!

Bu bölüm **tail TTP augmentation** stratejilerini test eder.

### 3 Augmentation Yöntemi:
1. **IoC Replacement** - IP, domain, hash, file path değiştirme (overfitting'i önler)
2. **Back-translation** - EN→DE→EN paraphrasing (semantic variation)
3. **Tail Oversampling** - Rare TTP'leri 3x-10x çoğaltma

### 5 Test Stratejisi:
- **A-1:** Baseline (No Augmentation)
- **A-2:** IoC Replacement Only
- **A-3:** Back-translation Only
- **A-4:** Oversampling Only
- **A-5:** Combined (All 3 methods)

### Beklenen İyileştirme:
- **Tail TTP Recall:** +40-60%
- **Overall mAP:** +20-30%
- **Micro F1:** +30-50%

**Not:** Her augmentation stratejisi bağımsız olarak test edilir, sonra karşılaştırılır.

### Augmentation Setup

Augmentation modülünü import et ve test et.

In [None]:
# Import augmentation module
from src.augmentation import replace_iocs, back_translate, augment_tail_samples

# Test IoC replacement
test_text = """
The attacker used PowerShell to connect to 192.168.1.10 and downloaded malware from
http://malicious.com/payload.exe. The file was saved to C:\\Users\\Alice\\AppData\\mal.dll
with MD5 hash 5d41402abc4b2a76b9719d911017c592. Registry key HKLM\\SOFTWARE\\Test was modified.
"""

print("="*70)
print("🧪 TEST 1: IoC REPLACEMENT")
print("="*70)
print("\n📄 Original Text:")
print(test_text)
print("\n🔄 IoC Replaced:")
print(replace_iocs(test_text))

# Test back-translation (small example for speed)
simple_text = "The attacker used PowerShell to execute malicious commands and escalate privileges."

print("\n" + "="*70)
print("🧪 TEST 2: BACK-TRANSLATION")
print("="*70)
print("\n📄 Original Text:")
print(simple_text)
print("\n🔄 Back-translated (EN→DE→EN):")
print(back_translate(simple_text, device=BASE_CONFIG['device']))

print("\n✅ Augmentation module loaded successfully!\n")

### Strategy AUG-1: Baseline (No Augmentation)

**Açıklama:** Referans performans için augmentation olmadan Weighted BCE.

**Süre:** ~30-40 dakika

In [None]:
import time

strategy_name = "aug_baseline"
print(f"\n{'='*60}")
print(f"🧪 AUG-1: Baseline (No Augmentation)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()

# Use weighted BCE (best performing strategy)
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
strategy_train_dataloader = DataLoader(
    strategy_config['dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print("📋 Konfigürasyon:")
print(f"   Strategy: Weighted BCE (Baseline for comparison)")
print(f"   Augmentation: NONE")
print(f"   Num labels: {strategy_config['num_labels']}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

# Store results
all_test_results[strategy_name] = {
    'config': 'Weighted BCE (No Augmentation)',
    'description': 'Baseline for augmentation comparison',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-1 TAMAMLANDI: {strategy_name}")
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("="*60)

print(f"\n📈 Sonuçlar:")
print(f"\n📈 Sonuçlar:")print("\n")

for metric, value in test_results.items():
for metric, value in test_results.items():print("\n")

    if isinstance(value, (int, float)):
    if isinstance(value, (int, float)):        print(f"   {metric}: {value:.4f}")
        print(f"   {metric}: {value:.4f}")

### Strategy AUG-2: IoC Replacement Only

**Açıklama:** Sadece IoC replacement (IP, domain, hash, path değiştirme).

**Avantaj:** Çok hızlı, overfitting'i önler.

**Süre:** ~30-40 dakika

In [None]:
strategy_name = "aug_ioc_replacement"
print(f"\n{'='*60}")
print(f"🧪 AUG-2: IoC Replacement")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()

# Load raw dataset for text augmentation
print("🔄 Loading raw dataset for IoC replacement...")
from datasets import load_dataset
import ast

raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Find text column
print(f"Available columns: {train_df.columns.tolist()}")
possible_text_cols = ['text1', 'description', 'text', 'content', 'sentence']
text_column = None
for col in possible_text_cols:
    if col in train_df.columns:
        text_column = col
        break

if text_column is None:
    raise ValueError(f"No text column found! Available: {train_df.columns.tolist()}")

print(f"Using text column: {text_column}")

# Apply IoC replacement to texts
print("Replacing IoCs in training texts...")
augmented_train_texts = []
for text in train_df[text_column].fillna('').tolist():
    # Original + 2 IoC-replaced versions
    augmented_train_texts.append(text)  # Original
    augmented_train_texts.append(replace_iocs(text, seed=42))  # Aug 1
    augmented_train_texts.append(replace_iocs(text, seed=123))  # Aug 2

# Replicate labels accordingly
from sklearn.preprocessing import MultiLabelBinarizer
import ast

# Parse labels
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Expand labels to match augmented texts
augmented_train_labels = []
for labels in train_labels_raw:
    augmented_train_labels.append(labels)  # Original
    augmented_train_labels.append(labels)  # Aug 1
    augmented_train_labels.append(labels)  # Aug 2

print(f"✅ Original samples: {len(train_df)}")
print(f"✅ Augmented samples: {len(augmented_train_texts)} (3x augmentation)")

# Prepare augmented data
from torch.utils.data import Dataset, DataLoader

# Create custom dataset
class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_train_texts,
    augmented_train_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

# Store results
all_test_results[strategy_name] = {
    'config': 'Weighted BCE + IoC Replacement (3x)',
    'description': 'Training data augmented with IoC replacement only',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-2 TAMAMLANDI: {strategy_name}")
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("="*60)

print(f"\n📈 Sonuçlar:")
    if isinstance(value, (int, float)):print("\n")

for metric, value in test_results.items():
        print(f"   {metric}: {value:.4f}")        print(f"   {metric}: {value:.4f}")

    if isinstance(value, (int, float)):print("\n")

### Strategy AUG-3: Back-translation Only

**Augmentation Method:** Back-translation (EN→DE→EN)
- Apply back-translation to 15% of tail TTP samples (frequency < 10)
- Expected improvement: +15-30% tail recall, +10-20% mAP

In [None]:
strategy_name = "aug_back_translation"
print(f"\n{'='*60}")
print(f"🧪 AUG-3: Back-translation")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()

# Load raw data
from datasets import load_dataset
raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Parse labels
import ast
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Calculate label frequencies
from collections import Counter
label_counter = Counter()
for labels in train_labels_raw:
    label_counter.update(labels)

# Identify tail TTPs (frequency < 10)
tail_threshold = 10
tail_ttps = {label for label, count in label_counter.items() if count < tail_threshold}
print(f"📊 Tail TTPs detected: {len(tail_ttps)} (frequency < {tail_threshold})")

# Identify samples with tail TTPs
tail_sample_indices = []
for idx, labels in enumerate(train_labels_raw):
    if any(label in tail_ttps for label in labels):
        tail_sample_indices.append(idx)

print(f"📊 Samples with tail TTPs: {len(tail_sample_indices)} / {len(train_df)}")

# Apply back-translation to 15% of tail samples (faster)
import random
random.seed(42)
num_to_augment = int(len(tail_sample_indices) * 0.15)
samples_to_augment = random.sample(tail_sample_indices, num_to_augment)

print(f"🔄 Applying back-translation to {num_to_augment} samples...")

# Create augmented dataset
augmented_train_texts = train_df[text_column].fillna('').tolist()
augmented_train_labels = train_labels_raw.copy()

# Load translation models (lazy loading)
back_translate_cached = {}

for idx in samples_to_augment:
    original_text = train_df.iloc[idx][text_column]
    if pd.isna(original_text) or len(original_text.strip()) < 10:
        continue
    
    # Back-translate
    try:
        bt_text = back_translate(original_text, pivot_lang='de')
        if bt_text and bt_text != original_text:
            # Add augmented sample
            augmented_train_texts.append(bt_text)
            augmented_train_labels.append(train_labels_raw[idx])
    except Exception as e:
        print(f"⚠️ Back-translation failed for sample {idx}: {e}")
        continue

print(f"Original samples: {len(train_df)}")
print(f"Augmented samples: {len(augmented_train_texts)} (+{len(augmented_train_texts) - len(train_df)} from back-translation)")

# Create custom dataset
from src.data_loader import prepare_data
from torch.utils.data import Dataset, DataLoader

class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_train_texts,
    augmented_train_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': 'Weighted BCE + Back-translation (30% tail)',
    'description': 'Training data augmented with back-translation for tail TTPs',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-3 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():

    if isinstance(value, (int, float)):print("\n")
        print(f"   {metric}: {value:.4f}")

### Strategy AUG-4: Oversampling Only

**Açıklama:** Sadece tail TTP'leri 3x-10x çoğaltma (oversampling).

**Avantaj:** En hızlı augmentation, frequency dengesizliğini giderir.

**Süre:** ~30-40 dakika

In [None]:
strategy_name = "aug_oversampling"
print(f"\n{'='*60}")
print(f"🧪 AUG-4: Oversampling Only")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Load raw data
from datasets import load_dataset
raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Find text column
possible_text_cols = ['text1', 'description', 'text', 'content', 'sentence']
text_column = None
for col in possible_text_cols:
    if col in train_df.columns:
        text_column = col
        break

if text_column is None:
    raise ValueError(f"No text column found! Available: {train_df.columns.tolist()}")

print(f"Using text column: {text_column}")

# Parse labels
import ast
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Calculate label frequencies
from collections import Counter
label_counter = Counter()
for labels in train_labels_raw:
    label_counter.update(labels)

# Identify tail TTPs (frequency < 10)
tail_threshold = 10
tail_ttps = {label for label, count in label_counter.items() if count < tail_threshold}
print(f"📊 Tail TTPs detected: {len(tail_ttps)} (frequency < {tail_threshold})")

# Get training texts
train_texts = train_df[text_column].fillna('').tolist()

# Apply oversampling only
augmented_texts = train_texts.copy()
augmented_labels = train_labels_raw.copy()

print(f"🔄 Applying oversampling to tail TTPs...")

for idx, labels in enumerate(train_labels_raw):
    # Check if sample has tail TTPs
    if any(label in tail_ttps for label in labels):
        # Calculate oversample factor based on min frequency
        min_freq = min([label_counter[label] for label in labels if label in tail_ttps])
        oversample_factor = max(3, min(10, 100 // min_freq))  # 3x-10x based on frequency
        
        # Oversample
        for _ in range(oversample_factor - 1):  # -1 because original is already in list
            augmented_texts.append(train_texts[idx])
            augmented_labels.append(labels)

print(f"Original samples: {len(train_texts)}")
print(f"Augmented samples: {len(augmented_texts)}")
print(f"Augmentation ratio: {len(augmented_texts) / len(train_texts):.2f}x")

# Create custom dataset
from torch.utils.data import Dataset, DataLoader

class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_texts,
    augmented_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': 'Weighted BCE + Oversampling Only',
    'description': 'Training data augmented with tail TTP oversampling',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-4 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():
    if isinstance(value, (int, float)):
        print(f"   {metric}: {value:.4f}")
print("\n")

### Strategy AUG-5: Combined (IoC + Back-translation + Oversampling)

**Yapılandırma:**
- **Augmentation**: İyileştirme Kombinasyonu
  - IoC Replacement (100% olasılık)
  - Back-translation (15% olasılık)
  - Tail TTP Oversampling (3x-10x)
- **Loss Function**: Weighted BCE
- **Classification**: CTI-BERT

**Beklenen İyileştirmeler:**
- +40-60% Tail TTP recall
- +20-30% mAP (ranking kalitesi)
- Genel F1 ve Hamming Loss iyileştirmesi

**Süre:** ~50-60 dakika

In [None]:
strategy_name = "aug_combined"
print(f"\n{'='*60}")
print(f"🧪 AUG-5: Combined Augmentation (All Methods)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Load raw data
from datasets import load_dataset
raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Find text column
possible_text_cols = ['text1', 'description', 'text', 'content', 'sentence']
text_column = None
for col in possible_text_cols:
    if col in train_df.columns:
        text_column = col
        break

if text_column is None:
    raise ValueError(f"No text column found! Available: {train_df.columns.tolist()}")

print(f"Using text column: {text_column}")

# Parse labels
import ast
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Calculate label frequencies
from collections import Counter
label_counter = Counter()
for labels in train_labels_raw:
    label_counter.update(labels)

# Identify tail TTPs (frequency < 10)
tail_threshold = 10
tail_ttps = {label for label, count in label_counter.items() if count < tail_threshold}
print(f"📊 Tail TTPs detected: {len(tail_ttps)} (frequency < {tail_threshold})")

# Get training texts
train_texts = train_df[text_column].fillna('').tolist()

print(f"🔄 Applying COMBINED augmentation...")
print(f"   - IoC Replacement")
print(f"   - Back-translation (15% probability)")
print(f"   - Tail Oversampling (3x-10x)")

# Apply combined augmentation
augmented_texts = train_texts.copy()
augmented_labels = train_labels_raw.copy()

for idx, labels in enumerate(train_labels_raw):
    # Check if sample has tail TTPs
    if any(label in tail_ttps for label in labels):
        # Calculate oversample factor based on min frequency
        min_freq = min([label_counter[label] for label in labels if label in tail_ttps])
        oversample_factor = max(3, min(10, 100 // min_freq))  # 3x-10x based on frequency
        
        # Oversample with augmentation
        for _ in range(oversample_factor - 1):  # -1 because original is already in list
            augmented_text = train_texts[idx]
            
            # Apply IoC replacement
            augmented_text = replace_iocs(augmented_text)
            
            # Apply back-translation with 15% probability
            import random
            if random.random() < 0.15:
                augmented_text = back_translate(augmented_text, device=BASE_CONFIG['device'])
            
            augmented_texts.append(augmented_text)
            augmented_labels.append(labels)

print(f"\nOriginal samples: {len(train_texts)}")
print(f"Augmented samples: {len(augmented_texts)}")
print(f"Augmentation ratio: {len(augmented_texts) / len(train_texts):.2f}x")

# Create custom dataset
from torch.utils.data import Dataset, DataLoader

class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_texts,
    augmented_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': 'Weighted BCE + Combined Augmentation',
    'description': 'Training data augmented with IoC replacement, back-translation, and tail oversampling',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-5 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():
    if isinstance(value, (int, float)):
        print(f"   {metric}: {value:.4f}")
print("\n")

## 📊 Augmentation Results Comparison

Compare all augmentation strategies and identify the best performer

In [None]:
# Extract augmentation results
aug_strategies = ['aug_baseline', 'aug_ioc_replacement', 'aug_back_translation', 'aug_oversampling', 'aug_combined']

# Create comparison dataframe
aug_comparison = []
for strategy_name in aug_strategies:
    if strategy_name in all_test_results:
        result_dict = all_test_results[strategy_name]
        row = {
            'Strategy': strategy_name,
            'Config': result_dict['config'],
            'Description': result_dict['description']
        }
        # Add metrics
        for metric, value in result_dict['results'].items():
            if isinstance(value, (int, float)):
                row[metric] = value
        aug_comparison.append(row)

aug_df = pd.DataFrame(aug_comparison)

# Calculate improvements over baseline
if len(aug_df) > 0 and 'aug_baseline' in aug_df['Strategy'].values:
    baseline_idx = aug_df[aug_df['Strategy'] == 'aug_baseline'].index[0]
    
    # Calculate deltas
    for col in aug_df.columns:
        if col not in ['Strategy', 'Config', 'Description'] and aug_df[col].dtype in ['float64', 'int64']:
            baseline_val = aug_df.loc[baseline_idx, col]
            aug_df[f'{col}_delta'] = ((aug_df[col] - baseline_val) / baseline_val * 100).round(2)

# Sort by mAP (descending)
if 'mean_average_precision' in aug_df.columns:
    aug_df = aug_df.sort_values('mean_average_precision', ascending=False)

# Display comparison
print("\n" + "="*80)
print("📊 AUGMENTATION STRATEGIES COMPARISON")
print("="*80 + "\n")

# Key metrics to display
key_metrics = ['micro_f1', 'mean_average_precision', 'recall_at_5', 'recall_at_10']

for idx, row in aug_df.iterrows():
    print(f"\n{'='*80}")
    print(f"🎯 {row['Strategy'].upper()}")
    print(f"{'='*80}")
    print(f"Config: {row['Config']}")
    print(f"Description: {row['Description']}\n")
    
    print("Metrics:")
    for metric in key_metrics:
        if metric in row:
            value = row[metric]
            delta_col = f'{metric}_delta'
            if delta_col in row and pd.notna(row[delta_col]):
                delta = row[delta_col]
                delta_str = f" ({'+' if delta > 0 else ''}{delta:.2f}%)"
                print(f"  {metric}: {value:.4f}{delta_str}")
            else:
                print(f"  {metric}: {value:.4f}")

# Create visualization
import matplotlib.pyplot as plt

if len(aug_df) > 0:
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Plot 1: Micro F1
    ax = axes[0, 0]
    strategies = aug_df['Strategy'].tolist()
    micro_f1 = aug_df['micro_f1'].tolist()
    bars = ax.bar(range(len(strategies)), micro_f1, color=['gray' if 'baseline' in s else 'skyblue' for s in strategies])
    ax.set_xticks(range(len(strategies)))
    ax.set_xticklabels(strategies, rotation=45, ha='right')
    ax.set_ylabel('Micro F1')
    ax.set_title('Micro F1 Comparison')
    ax.grid(axis='y', alpha=0.3)
    
    # Annotate bars
    for i, (bar, val) in enumerate(zip(bars, micro_f1)):
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005, 
                f'{val:.4f}', ha='center', va='bottom', fontsize=9)
    
    # Plot 2: mAP
    ax = axes[0, 1]
    if 'mean_average_precision' in aug_df.columns:
        map_values = aug_df['mean_average_precision'].tolist()
        bars = ax.bar(range(len(strategies)), map_values, color=['gray' if 'baseline' in s else 'lightcoral' for s in strategies])
        ax.set_xticks(range(len(strategies)))
        ax.set_xticklabels(strategies, rotation=45, ha='right')
        ax.set_ylabel('mAP')
        ax.set_title('Mean Average Precision (mAP) Comparison')
        ax.grid(axis='y', alpha=0.3)
        
        for i, (bar, val) in enumerate(zip(bars, map_values)):
            ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005, 
                    f'{val:.4f}', ha='center', va='bottom', fontsize=9)
    
    # Plot 3: Recall@5 and Recall@10
    ax = axes[1, 0]
    x = np.arange(len(strategies))
    width = 0.35
    if 'recall_at_5' in aug_df.columns and 'recall_at_10' in aug_df.columns:
        r5 = aug_df['recall_at_5'].tolist()
        r10 = aug_df['recall_at_10'].tolist()
        ax.bar(x - width/2, r5, width, label='Recall@5', color='lightgreen')
        ax.bar(x + width/2, r10, width, label='Recall@10', color='darkgreen')
        ax.set_xticks(x)
        ax.set_xticklabels(strategies, rotation=45, ha='right')
        ax.set_ylabel('Recall')
        ax.set_title('Recall@K Comparison')
        ax.legend()
        ax.grid(axis='y', alpha=0.3)
    
    # Plot 4: Improvement over baseline (%)
    ax = axes[1, 1]
    if 'micro_f1_delta' in aug_df.columns:
        # Filter out baseline
        aug_df_filtered = aug_df[aug_df['Strategy'] != 'aug_baseline']
        if len(aug_df_filtered) > 0:
            strategies_filtered = aug_df_filtered['Strategy'].tolist()
            f1_delta = aug_df_filtered['micro_f1_delta'].tolist()
            map_delta = aug_df_filtered['mean_average_precision_delta'].tolist() if 'mean_average_precision_delta' in aug_df_filtered.columns else [0] * len(strategies_filtered)
            
            x = np.arange(len(strategies_filtered))
            width = 0.35
            ax.bar(x - width/2, f1_delta, width, label='Micro F1 Δ%', color='skyblue')
            ax.bar(x + width/2, map_delta, width, label='mAP Δ%', color='lightcoral')
            ax.set_xticks(x)
            ax.set_xticklabels(strategies_filtered, rotation=45, ha='right')
            ax.set_ylabel('Improvement over Baseline (%)')
            ax.set_title('Relative Improvement over Baseline')
            ax.axhline(y=0, color='black', linestyle='--', linewidth=0.8)
            ax.legend()
            ax.grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# Summary recommendation
print("\n" + "="*80)
print("🏆 RECOMMENDATION")
print("="*80)

if len(aug_df) > 1:
    # Exclude baseline
    aug_df_filtered = aug_df[aug_df['Strategy'] != 'aug_baseline']
    if len(aug_df_filtered) > 0:
        best_strategy = aug_df_filtered.iloc[0]['Strategy']
        best_map = aug_df_filtered.iloc[0]['mean_average_precision'] if 'mean_average_precision' in aug_df_filtered.columns else 0
        best_f1 = aug_df_filtered.iloc[0]['micro_f1']
        
        print(f"\n✅ Best performing strategy: {best_strategy.upper()}")
        print(f"   - mAP: {best_map:.4f}")
        print(f"   - Micro F1: {best_f1:.4f}")
        
        if 'mean_average_precision_delta' in aug_df_filtered.columns:
            map_delta = aug_df_filtered.iloc[0]['mean_average_precision_delta']
            f1_delta = aug_df_filtered.iloc[0]['micro_f1_delta']
            print(f"\n   Improvement over baseline:")
            print(f"   - mAP: +{map_delta:.2f}%")
            print(f"   - Micro F1: +{f1_delta:.2f}%")

print("\n")

---

## 🔄 PART B: LOSS FUNCTION STRATEGIES

**Not:** Bu stratejileri PART A'dan sonra, en iyi augmentation yöntemi ile çalıştırın.

Her strateji bağımsız olarak test edilebilir. İstediğiniz hücreyi çalıştırın.

### 🔹 Strategy B-1: Baseline (Standard BCE Loss)

**Açıklama:** Standart Binary Cross-Entropy loss kullanır. Referans performans için baseline.

**Part B Baseline:** PART A'nın en iyi augmentation sonucu ile karşılaştırma yapmak için çalıştırın (30-45 dakika).

In [None]:
strategy_name = "baseline"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 1: Baseline BCE")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['pos_weight'] is not None:
    print(f"   Pos weight: min={strategy_config['pos_weight'].min():.2f}, max={strategy_config['pos_weight'].max():.2f}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 1 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():
    if isinstance(value, (int, float)):
        print(f"   {metric}: {value:.4f}")
print("\n")

### 🔹 Strategy B-2: Weighted BCE Loss

**Açıklama:** Her label için frekans bazlı ağırlık hesaplar (pos_weight=458 for rare labels). Class imbalance için en etkili yöntem.

**Önerilen Kullanım:** Baseline'dan sonra bu stratejiyi test edin. F1 > 0.15 ise, diğer stratejileri test etmeye gerek yok.

In [None]:
strategy_name = "weighted"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 2: Weighted BCE")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['pos_weight'] is not None:
    print(f"   Pos weight: min={strategy_config['pos_weight'].min():.2f}, max={strategy_config['pos_weight'].max():.2f}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 2 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():
    if isinstance(value, (int, float)):
        print(f"   {metric}: {value:.4f}")
print("\n")

### 🔹 Strategy B-3: Focal Loss (γ=2, α=0.25)

**Açıklama:** Focal Loss with moderate focusing (γ=2). Hard örneklere odaklanır.

**Önerilen Kullanım:** Weighted BCE başarısız olursa deneyin.

In [None]:
strategy_name = "focal_weak"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 3: Focal Loss (γ=2)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['use_focal_loss']:
    print(f"   Focal alpha: {strategy_config.get('focal_alpha')}")
    print(f"   Focal gamma: {strategy_config.get('focal_gamma')}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 3 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():
    if isinstance(value, (int, float)):
        print(f"   {metric}: {value:.4f}")
print("\n")

### Strategy 4: Focal Loss (γ=5, α=0.25)

**Açıklama:** Focal Loss with strong focusing (γ=5). Çok hard örneklere odaklanır.

**Önerilen Kullanım:** Focal γ=2 başarılı olursa daha güçlü versiyon için deneyin.

In [None]:
strategy_name = "focal_strong"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 4: Focal Loss (γ=5)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['use_focal_loss']:
    print(f"   Focal alpha: {strategy_config.get('focal_alpha')}")
    print(f"   Focal gamma: {strategy_config.get('focal_gamma')}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 4 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print(f"\n📈 Sonuçlar:")
for metric, value in test_results.items():
    if isinstance(value, (int, float)):
        print(f"   {metric}: {value:.4f}")
print("\n")

---

## 📊 PART B Section 1: Loss Function Comparison

Compare the 4 loss function strategies (B-1 to B-4) to identify the best performers.

In [None]:
# Extract loss function results for comparison
print(f"\n{'='*80}")
print(f"📊 LOSS FUNCTION STRATEGIES COMPARISON")
print(f"{'='*80}\n")

loss_strategies = ['baseline', 'weighted', 'focal_weak', 'focal_strong']
loss_comparison_data = []

for strategy_name in loss_strategies:
    if strategy_name in all_test_results:
        data = all_test_results[strategy_name]
        results = data['results']
        loss_comparison_data.append({
            'Strategy': data['config'],
            'mAP': results.get('mean_average_precision', 0),
            'Micro_F1': results.get('micro_f1', 0),
            'Recall@5': results.get('recall_at_5', 0),
            'Precision@5': results.get('precision_at_5', 0),
            'Recall@10': results.get('recall_at_10', 0),
            'Hamming_Loss': results.get('hamming_loss', 0),
            'Micro_Precision': results.get('micro_precision', 0),
            'Micro_Recall': results.get('micro_recall', 0)
        })

if len(loss_comparison_data) > 0:
    df_loss_comparison = pd.DataFrame(loss_comparison_data)
    
    # Display comparison table
    print("\n📋 Loss Function Performance Comparison:")
    print(df_loss_comparison.to_string(index=False))
    
    # Find best strategies
    print("\n🏆 Best Performers:")
    print(f"   Best mAP: {df_loss_comparison.loc[df_loss_comparison['mAP'].idxmax(), 'Strategy']} ({df_loss_comparison['mAP'].max():.4f})")
    print(f"   Best Micro F1: {df_loss_comparison.loc[df_loss_comparison['Micro_F1'].idxmax(), 'Strategy']} ({df_loss_comparison['Micro_F1'].max():.4f})")
    print(f"   Best Recall@5: {df_loss_comparison.loc[df_loss_comparison['Recall@5'].idxmax(), 'Strategy']} ({df_loss_comparison['Recall@5'].max():.4f})")
    print(f"   Lowest Hamming Loss: {df_loss_comparison.loc[df_loss_comparison['Hamming_Loss'].idxmin(), 'Strategy']} ({df_loss_comparison['Hamming_Loss'].min():.4f})")
    
    # Create visualization
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Plot 1: mAP comparison
    ax = axes[0, 0]
    ax.bar(range(len(df_loss_comparison)), df_loss_comparison['mAP'], color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'], alpha=0.8)
    ax.set_xticks(range(len(df_loss_comparison)))
    ax.set_xticklabels(df_loss_comparison['Strategy'], rotation=45, ha='right')
    ax.set_ylabel('Mean Average Precision (mAP)', fontsize=12)
    ax.set_title('Loss Functions: mAP Comparison', fontsize=14, fontweight='bold')
    ax.grid(axis='y', alpha=0.3)
    for i, v in enumerate(df_loss_comparison['mAP']):
        ax.text(i, v + 0.01, f'{v:.3f}', ha='center', fontsize=10)
    
    # Plot 2: Micro F1 comparison
    ax = axes[0, 1]
    ax.bar(range(len(df_loss_comparison)), df_loss_comparison['Micro_F1'], color=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'], alpha=0.8)
    ax.set_xticks(range(len(df_loss_comparison)))
    ax.set_xticklabels(df_loss_comparison['Strategy'], rotation=45, ha='right')
    ax.set_ylabel('Micro F1 Score', fontsize=12)
    ax.set_title('Loss Functions: Micro F1 Comparison', fontsize=14, fontweight='bold')
    ax.grid(axis='y', alpha=0.3)
    for i, v in enumerate(df_loss_comparison['Micro_F1']):
        ax.text(i, v + 0.01, f'{v:.3f}', ha='center', fontsize=10)
    
    # Plot 3: Recall@5 and Recall@10
    ax = axes[1, 0]
    x = range(len(df_loss_comparison))
    width = 0.35
    ax.bar([i - width/2 for i in x], df_loss_comparison['Recall@5'], width, label='Recall@5', alpha=0.8)
    ax.bar([i + width/2 for i in x], df_loss_comparison['Recall@10'], width, label='Recall@10', alpha=0.8)
    ax.set_xticks(x)
    ax.set_xticklabels(df_loss_comparison['Strategy'], rotation=45, ha='right')
    ax.set_ylabel('Recall Score', fontsize=12)
    ax.set_title('Loss Functions: Recall@5 and Recall@10', fontsize=14, fontweight='bold')
    ax.legend()
    ax.grid(axis='y', alpha=0.3)
    
    # Plot 4: Precision vs Recall
    ax = axes[1, 1]
    ax.scatter(df_loss_comparison['Micro_Recall'], df_loss_comparison['Micro_Precision'], 
               s=200, c=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728'], alpha=0.6)
    for i, strategy in enumerate(df_loss_comparison['Strategy']):
        ax.annotate(strategy, 
                   (df_loss_comparison.loc[i, 'Micro_Recall'], df_loss_comparison.loc[i, 'Micro_Precision']),
                   xytext=(5, 5), textcoords='offset points', fontsize=10)
    ax.set_xlabel('Micro Recall', fontsize=12)
    ax.set_ylabel('Micro Precision', fontsize=12)
    ax.set_title('Loss Functions: Precision vs Recall', fontsize=14, fontweight='bold')
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    
    # Save plots
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    plot_file = OUTPUT_DIR / f'loss_function_comparison_{timestamp}.png'
    plt.savefig(plot_file, dpi=300, bbox_inches='tight')
    plt.show()
    
    # Save CSV
    csv_file = OUTPUT_DIR / f'loss_function_comparison_{timestamp}.csv'
    df_loss_comparison.to_csv(csv_file, index=False)
    
    print(f"\n✅ Comparison plot saved: {plot_file}")
    print(f"✅ Comparison CSV saved: {csv_file}")
    
    # Download files if on Colab
    try:
        from google.colab import files
        files.download(str(plot_file))
        files.download(str(csv_file))
        print("✅ Files downloaded")
    except ImportError:
        pass
    
else:
    print("⚠️  No loss function results found. Please run strategies B-1 to B-4 first.")

print(f"\n{'='*80}\n")

---

## 🔬 PART B Section 2: Capacity Testing (Top-K Analysis)

Test model capacity with different label subset sizes to understand learning behavior.

### Strategy 5: Top-K Label Analysis (Capacity Test)

**Açıklama:** 5 farklı K değeri ile model kapasitesini test eder. Her K için baseline BCE kullanır.

**Test Edilen K Değerleri:**
- Top-100: Geniş label seti
- Top-50: Orta seviye
- Top-20: Küçük label seti
- Top-10: Minimal label seti
- Top-5: En küçük subset

**Çıktılar:**
- CSV dosyası: Tüm metriklerin karşılaştırması
- 4 grafikli histogram: F1 vs K, Hamming Loss vs K, Recall@5, All Metrics

**Önerilen Kullanım:** Model'in farklı label sayılarındaki performansını görmek için çalıştırın (~2-2.5 saat).

In [None]:
# Top-K Label Analysis: Test model capacity with different label subset sizes
print(f"\n{'='*60}")
print(f"🧪 TOP-K LABEL ANALYSIS")
print(f"{'='*60}\n")

from src.strategies import filter_top_k_labels

# Test different K values
k_values = [100, 50, 20, 10, 5]
topk_results = {}

for k in k_values:
    print(f"\n{'='*60}")
    print(f"🔬 Testing Top-{k} Labels")
    print(f"{'='*60}\n")
    
    # Filter dataset to top-k labels
    filtered_train_ds, filtered_label_list, label_mapping = filter_top_k_labels(
        data['train_dataset'], 
        label_names, 
        k=k
    )
    filtered_test_ds, _, _ = filter_top_k_labels(
        data['test_dataset'], 
        label_names, 
        k=k
    )
    
    print(f"📊 Dataset Statistics:")
    print(f"   Top-{k} labels selected")
    print(f"   Train samples: {len(filtered_train_ds)}")
    print(f"   Test samples: {len(filtered_test_ds)}")
    print(f"   Labels: {filtered_label_list[:5]}...")
    
    # Create dataloaders
    topk_train_loader = DataLoader(
        filtered_train_ds,
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )
    topk_test_loader = DataLoader(
        filtered_test_ds,
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=False
    )
    
    # Load model for this K
    print(f"\n🔧 Loading model for {k} labels...")
    topk_model = load_model(
        model_name=BASE_CONFIG['model_name'],
        num_labels=k,
        device=BASE_CONFIG['device'],
        use_focal_loss=False,
        pos_weight=None
    )
    
    # Train model
    print(f"\n🚀 Training on Top-{k}...")
    topk_history = train_model(
        model=topk_model,
        train_dataloader=topk_train_loader,
        num_epochs=BASE_CONFIG['num_epochs'],
        learning_rate=BASE_CONFIG['learning_rate'],
        device=BASE_CONFIG['device']
    )
    
    # Evaluate model
    print(f"\n📊 Evaluating Top-{k}...")
    topk_test_results = evaluate_model(
        model=topk_model,
        test_dataloader=topk_test_loader,
        label_names=filtered_label_list,
        device=BASE_CONFIG['device']
    )
    
    # Store results
    topk_results[f'top_{k}'] = {
        'k': k,
        'num_train': len(filtered_train_ds),
        'num_test': len(filtered_test_ds),
        'metrics': topk_test_results,
        'labels': filtered_label_list
    }
    
    # Display results
    print(f"\n✅ Top-{k} Results:")
    for metric, value in topk_test_results.items():
        if isinstance(value, (int, float)):
            print(f"   {metric}: {value:.4f}")

# Create comparison DataFrame
print(f"\n{'='*60}")
print(f"📊 TOP-K COMPARISON TABLE")
print(f"{'='*60}\n")

topk_comparison = []
for key, data in topk_results.items():
    metrics = data['metrics']
    topk_comparison.append({
        'K': data['k'],
        'Train Samples': data['num_train'],
        'Test Samples': data['num_test'],
        'Micro F1': metrics.get('micro_f1', 0),
        'Hamming Loss': metrics.get('hamming_loss', 0),
        'Micro Precision': metrics.get('micro_precision', 0),
        'Micro Recall': metrics.get('micro_recall', 0),
        'Recall@5': metrics.get('recall_at_5', 0),
        'Precision@5': metrics.get('precision_at_5', 0)
    })

df_topk = pd.DataFrame(topk_comparison)
df_topk = df_topk.sort_values('K', ascending=False)
print(df_topk.to_string(index=False))

# Save CSV
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
topk_csv = OUTPUT_DIR / f'topk_analysis_{timestamp}.csv'
df_topk.to_csv(topk_csv, index=False)
print(f"\n✅ CSV saved: {topk_csv}")

# Create visualization
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Plot 1: F1 Score vs K
ax = axes[0, 0]
ax.plot(df_topk['K'], df_topk['Micro F1'], marker='o', linewidth=2, markersize=8, label='Micro F1', color='blue')
ax.set_xlabel('Number of Labels (K)', fontsize=12)
ax.set_ylabel('Micro F1 Score', fontsize=12)
ax.set_title('Model Performance vs Label Count', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend()
ax.invert_xaxis()  # Higher K on left

# Plot 2: Hamming Loss vs K
ax = axes[0, 1]
ax.plot(df_topk['K'], df_topk['Hamming Loss'], marker='s', linewidth=2, markersize=8, label='Hamming Loss', color='red')
ax.set_xlabel('Number of Labels (K)', fontsize=12)
ax.set_ylabel('Hamming Loss (lower is better)', fontsize=12)
ax.set_title('Hamming Loss vs Label Count', fontsize=14, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.legend()
ax.invert_xaxis()  # Higher K on left

# Plot 3: Recall@5 vs K
ax = axes[1, 0]
ax.bar(range(len(df_topk)), df_topk['Recall@5'], alpha=0.7, color='coral')
ax.set_xticks(range(len(df_topk)))
ax.set_xticklabels([f'Top-{k}' for k in df_topk['K']])
ax.set_ylabel('Recall@5', fontsize=12)
ax.set_title('Top-5 Recall by Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3)
for i, v in enumerate(df_topk['Recall@5']):
    ax.text(i, v + 0.02, f'{v:.3f}', ha='center', fontsize=10)

# Plot 4: All metrics comparison
ax = axes[1, 1]
x = range(len(df_topk))
width = 0.2
ax.bar([i - width*1.5 for i in x], df_topk['Micro F1'], width, label='F1', alpha=0.8)
ax.bar([i - width*0.5 for i in x], df_topk['Micro Precision'], width, label='Precision', alpha=0.8)
ax.bar([i + width*0.5 for i in x], df_topk['Micro Recall'], width, label='Recall', alpha=0.8)
ax.bar([i + width*1.5 for i in x], df_topk['Recall@5'], width, label='Recall@5', alpha=0.8)
ax.set_xticks(x)
ax.set_xticklabels([f'Top-{k}' for k in df_topk['K']])
ax.set_ylabel('Score', fontsize=12)
ax.set_title('All Metrics Comparison', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.savefig(OUTPUT_DIR / f'topk_analysis_{timestamp}.png', dpi=300, bbox_inches='tight')
plt.show()

print(f"\n✅ Histogram saved: {OUTPUT_DIR / f'topk_analysis_{timestamp}.png'}")

# Store in all_test_results for comparison
for key, data in topk_results.items():
    all_test_results[key] = {
        'config': f"Top-{data['k']} Labels",
        'description': f"Baseline BCE with {data['k']} most frequent labels",
        'results': data['metrics']
    }

print(f"\n{'='*60}")
print(f"✅ TOP-K ANALYSIS COMPLETE")
print(f"{'='*60}\n")

---

## 🔗 PART C: HYBRID STRATEGIES (Loss × Classification)

**Not:** Bu bölümü PART A ve B'den SONRA çalıştırın!

**Comprehensive Testing:** 2 Loss × 5 Methods = **10 Strateji**

Bu bölüm **tüm loss fonksiyonlarını tüm classification yöntemleriyle** test eder:

### Loss Functions (Part B'den seçildi):
1. **Weighted BCE** - Frequency-based weights (pos_weight) - En başarılı baseline
2. **Focal Loss γ=5** - Strong hard example focusing - En güçlü focal loss

### Classification Methods (5 yöntem):
1. **ClassifierChain** - Label dependencies (sequential) - Label ilişkileri için
2. **ExtraTreesClassifier** - Extremely randomized trees - Hız ve çeşitlilik için
3. **RandomForestClassifier** - Ensemble of decision trees - Yüksek doğruluk için
4. **AttentionXML** (NeurIPS 2019) - Multi-label attention mechanism
5. **LightXML** (AAAI 2021) - Dynamic negative sampling + label embeddings

### Strategy Matrix:
```
                    Chain  ExtraTrees  RandomForest  AttentionXML  LightXML
Weighted BCE        C-1    C-2         C-3           C-4           C-5
Focal Loss γ=5      C-6    C-7         C-8           C-9           C-10
```

**Method Özellikleri:**
- **ClassifierChain**: Sequential label modeling, captures dependencies
- **ExtraTrees**: Faster, more diverse, less overfitting (random splits)
- **RandomForest**: Optimal splits, slightly better accuracy
- **AttentionXML**: Her label için özel attention weights → label-specific features
- **LightXML**: Two-stage ranking + dynamic negative sampling → efficiency

**Loss Function Etkisi:**
- **Weighted BCE**: Standard approach, class imbalance için weights
- **Focal Loss γ=5**: Hard examples'a odaklanır, tail TTPs için potansiyel iyileştirme

**Toplam Süre:** ~7.5-10 saat (10 strateji, ~45-60 dakika each)

**Kullanım:** Aşağıdaki hücreyi çalıştırarak tüm kombinasyonları otomatik olarak test edin.

In [None]:
import time

print(f"\n{'='*80}")
print(f"🔄 HYBRID STRATEGIES: LOSS FUNCTIONS × CLASSIFICATION METHODS")
print(f"{'='*80}\n")

from src.classifier_chain import train_classifier_chain, evaluate_classifier_chain
from src.classifier_chain import train_multi_output_classifier, evaluate_multi_output_classifier

# Define loss configurations (2 most promising from PART B experiments)
# Format: (name, use_focal_loss, pos_weight_config, focal_alpha, focal_gamma)
loss_configs = []

# 1. Weighted BCE (best baseline)
strategy_config_weighted = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)
loss_configs.append({
    'name': 'weighted',
    'display_name': 'Weighted BCE',
    'use_focal_loss': False,
    'pos_weight': strategy_config_weighted['pos_weight'],
    'focal_alpha': None,
    'focal_gamma': None
})

# 2. Focal Loss γ=5 (strongest focal)
loss_configs.append({
    'name': 'focal_gamma5',
    'display_name': 'Focal Loss (γ=5)',
    'use_focal_loss': True,
    'pos_weight': None,
    'focal_alpha': 0.25,
    'focal_gamma': 5.0
})

# Define classification methods (5 methods: 3 traditional + 2 XMC)
classification_methods = [
    {
        'name': 'classifier_chain',
        'display_name': 'ClassifierChain',
        'base_estimator': 'logistic',
        'train_func': train_classifier_chain,
        'eval_func': evaluate_classifier_chain
    },
    {
        'name': 'extra_trees',
        'display_name': 'ExtraTreesClassifier',
        'base_estimator': 'extra_trees',
        'train_func': train_multi_output_classifier,  # Uses MultiOutputClassifier with ExtraTrees
        'eval_func': evaluate_multi_output_classifier
    },
    {
        'name': 'random_forest',
        'display_name': 'RandomForestClassifier',
        'base_estimator': 'random_forest',
        'train_func': train_multi_output_classifier,  # Uses MultiOutputClassifier with RandomForest
        'eval_func': evaluate_multi_output_classifier
    },
    {
        'name': 'attentionxml',
        'display_name': 'AttentionXML',
        'base_estimator': 'attention_xml',
        'train_func': 'attention_xml',  # Special marker
        'eval_func': 'attention_xml'
    },
    {
        'name': 'lightxml',
        'display_name': 'LightXML',
        'base_estimator': 'light_xml',
        'train_func': 'light_xml',  # Special marker
        'eval_func': 'light_xml'
    }
]

# Counter for strategy numbering (starting from 1 for Part C)
strategy_counter = 1

# Test all combinations
print(f"🧪 Testing {len(loss_configs)} loss functions × {len(classification_methods)} classification methods")
print(f"   Total combinations: {len(loss_configs) * len(classification_methods)}")
print(f"   Estimated time: ~{len(loss_configs) * len(classification_methods) * 50} minutes")
print(f"\n{'='*80}\n")

for loss_config in loss_configs:
    for clf_method in classification_methods:
        
        strategy_name = f"hybrid_{loss_config['name']}_{clf_method['name']}"
        
        print(f"\n{'='*80}")
        print(f"🧪 PART C STRATEGY {strategy_counter}: {loss_config['display_name']} + {clf_method['display_name']}")
        print(f"{'='*80}\n")
        
        # Start timing
        strategy_start_time = time.time()
        
        # STEP 1: Configuration
        print(f"📋 Configuration:")
        print(f"   Loss Function: {loss_config['display_name']}")
        print(f"   Classification Method: {clf_method['display_name']}")
        print(f"   Use Focal Loss: {loss_config['use_focal_loss']}")
        if loss_config['pos_weight'] is not None:
            print(f"   Pos Weight: min={loss_config['pos_weight'].min():.2f}, max={loss_config['pos_weight'].max():.2f}")
        if loss_config['use_focal_loss']:
            print(f"   Focal Alpha: {loss_config['focal_alpha']}")
            print(f"   Focal Gamma: {loss_config['focal_gamma']}")
        
        # Check if this is AttentionXML or LightXML (end-to-end training)
        if clf_method['name'] in ['attentionxml', 'lightxml']:
            # Import XML utilities
            from src.xml_utils import train_attention_xml, evaluate_attention_xml
            from src.xml_utils import train_light_xml, evaluate_light_xml
            
            if clf_method['name'] == 'attentionxml':
                from src.attention_xml import load_attention_xml_model
                
                print(f"\n🔧 Training {clf_method['display_name']} (end-to-end with {loss_config['display_name']})...")
                model = load_attention_xml_model(
                    model_name=BASE_CONFIG['model_name'],
                    num_labels=num_labels,
                    device=BASE_CONFIG['device'],
                    dropout=0.1
                )
                
                training_history = train_attention_xml(
                    model=model,
                    train_dataloader=train_dataloader,
                    num_epochs=BASE_CONFIG['num_epochs'],
                    learning_rate=BASE_CONFIG['learning_rate'],
                    device=BASE_CONFIG['device'],
                    use_focal_loss=loss_config['use_focal_loss'],
                    pos_weight=loss_config['pos_weight'],
                    focal_alpha=loss_config['focal_alpha'],
                    focal_gamma=loss_config['focal_gamma']
                )
                
                print(f"\n📊 Evaluating...")
                test_results = evaluate_attention_xml(
                    model=model,
                    test_dataloader=test_dataloader,
                    device=BASE_CONFIG['device'],
                    label_names=label_names
                )
                
            else:  # lightxml
                from src.light_xml import load_light_xml_model
                
                print(f"\n🔧 Training {clf_method['display_name']} (end-to-end with {loss_config['display_name']})...")
                model = load_light_xml_model(
                    model_name=BASE_CONFIG['model_name'],
                    num_labels=num_labels,
                    device=BASE_CONFIG['device'],
                    num_label_groups=50,
                    label_emb_dim=128,
                    dropout=0.1
                )
                
                training_history = train_light_xml(
                    model=model,
                    train_dataloader=train_dataloader,
                    num_epochs=BASE_CONFIG['num_epochs'],
                    learning_rate=BASE_CONFIG['learning_rate'],
                    device=BASE_CONFIG['device'],
                    use_focal_loss=loss_config['use_focal_loss'],
                    pos_weight=loss_config['pos_weight'],
                    focal_alpha=loss_config['focal_alpha'],
                    focal_gamma=loss_config['focal_gamma']
                )
                
                print(f"\n📊 Evaluating...")
                test_results = evaluate_light_xml(
                    model=model,
                    test_dataloader=test_dataloader,
                    device=BASE_CONFIG['device'],
                    label_names=label_names
                )
        
        else:
            # Traditional two-stage approach: BERT + Classification
            # STEP 1: Train BERT with specific loss function
            print(f"\n🔧 Step 1/3: Training BERT with {loss_config['display_name']}...")
            bert_model = load_model(
                model_name=BASE_CONFIG['model_name'],
                num_labels=num_labels,
                device=BASE_CONFIG['device'],
                use_focal_loss=loss_config['use_focal_loss'],
                focal_alpha=loss_config['focal_alpha'],
                focal_gamma=loss_config['focal_gamma'],
                pos_weight=loss_config['pos_weight']
            )
            
            # Train BERT
            training_history = train_model(
                model=bert_model,
                train_dataloader=train_dataloader,
                num_epochs=BASE_CONFIG['num_epochs'],
                learning_rate=BASE_CONFIG['learning_rate'],
                device=BASE_CONFIG['device']
            )
            
            # STEP 2: Train classification method on BERT embeddings
            print(f"\n🔧 Step 2/3: Training {clf_method['display_name']} on BERT embeddings...")
            print(f"   Base Estimator: {clf_method['base_estimator']}")
            
            if clf_method['name'] == 'classifier_chain':
                # Train ClassifierChain
                clf_model = train_classifier_chain(
                    bert_model=bert_model,
                    train_dataloader=train_dataloader,
                    device=BASE_CONFIG['device'],
                    base_estimator=clf_method['base_estimator'],
                    order='random',
                    cv=None,
                    random_state=42
                )
            else:
                # Train MultiOutputClassifier (or ExtraTrees variant)
                clf_model = train_multi_output_classifier(
                    bert_model=bert_model,
                    train_dataloader=train_dataloader,
                    device=BASE_CONFIG['device'],
                    base_estimator=clf_method['base_estimator'],
                    n_jobs=-1,
                    random_state=42
                )
            
            # STEP 3: Evaluate
            print(f"\n📊 Step 3/3: Evaluating...")
            test_results = clf_method['eval_func'](
                model=clf_model,
                test_dataloader=test_dataloader,
                label_names=label_names
            )
        
        # Calculate training time
        training_time_min = (time.time() - strategy_start_time) / 60
        
        # Store results
        all_test_results[strategy_name] = {
            'config': f"{loss_config['display_name']} + {clf_method['display_name']}",
            'description': f"{'End-to-end training' if clf_method['name'] in ['attentionxml', 'lightxml'] else 'BERT embeddings'} with {loss_config['display_name']}, classified with {clf_method['display_name']}",
            'training_time_min': training_time_min,
            'results': test_results
        }
        
        # Display results
        print(f"\n{'='*80}")
        print(f"✅ PART C STRATEGY {strategy_counter} TAMAMLANDI: {strategy_name}")
        print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
        print(f"{'='*80}")
        print(f"\n📈 Results:")
        for metric, value in test_results.items():
            if isinstance(value, (int, float)):
                print(f"   {metric}: {value:.4f}")
        print(f"\n{'='*80}\n")
        
        strategy_counter += 1

print(f"\n{'='*80}")
print(f"✅ TÜM HYBRID STRATEGIES TAMAMLANDI!")
print(f"{'='*80}")
print(f"\n📊 Total strategies tested: {strategy_counter - 1}")
print(f"💡 Results stored in 'all_test_results' dictionary")
print(f"💡 Run 'Results Comparison' cell to see detailed comparison\n")

---

## 📊 RESULTS COMPARISON

Çalıştırdığınız stratejilerin sonuçlarını karşılaştırmak için bu hücreyi çalıştırın.

In [None]:
if len(all_test_results) == 0:
    print("⚠️ Henüz hiçbir strateji test edilmedi!")
else:
    print("\n" + "="*80)
    print("📊 STRATEGY COMPARISON RESULTS")
    print("="*80 + "\n")
    
    # Create comparison table
    comparison_data = []
    for strategy_name, data in all_test_results.items():
        results = data['results']
        comparison_data.append({
            'Strategy': strategy_name,
            'Training Time (min)': data.get('training_time_min', 0),
            'Micro F1': results.get('micro_f1', 0),
            'mAP': results.get('mean_average_precision', 0),
            'Recall@5': results.get('recall_at_5', 0),
            'Precision@5': results.get('precision_at_5', 0),
            'Hamming Loss': results.get('hamming_loss', 0),
            'Subset Accuracy': results.get('example_based_accuracy', 0)
        })
    
    df_comparison = pd.DataFrame(comparison_data)
    df_comparison = df_comparison.sort_values('mAP', ascending=False)  # Sort by mAP
    
    print(df_comparison.to_string(index=False))
    print("\n" + "="*80)
    
    # Find best strategy
    best_strategy = df_comparison.iloc[0]['Strategy']
    best_map = df_comparison.iloc[0]['mAP']
    best_f1 = df_comparison.iloc[0]['Micro F1']
    
    print(f"\n🏆 EN İYİ STRATEJİ: {best_strategy}")
    print(f"   mAP (Ranking Quality): {best_map:.4f}")
    print(f"   Micro F1 Score: {best_f1:.4f}")
    
    # Compare to baseline if exists
    if 'baseline' in all_test_results:
        baseline_map = all_test_results['baseline']['results'].get('mean_average_precision', 0)
        improvement = ((best_map - baseline_map) / baseline_map * 100) if baseline_map > 0 else 0
        print(f"   Baseline'a göre mAP iyileştirme: {improvement:+.2f}%")
    
    # Plot comparison
    if len(all_test_results) > 1:
        fig, axes = plt.subplots(1, 2, figsize=(14, 5))
        
        # mAP and F1 scores comparison
        strategies = df_comparison['Strategy'].tolist()
        mean_ap = df_comparison['mAP'].tolist()
        micro_f1 = df_comparison['Micro F1'].tolist()
        
        x = np.arange(len(strategies))
        width = 0.35
        
        axes[0].bar(x - width/2, mean_ap, width, label='mAP (Ranking)', alpha=0.8, color='green')
        axes[0].bar(x + width/2, micro_f1, width, label='Micro F1', alpha=0.8, color='blue')
        axes[0].set_xlabel('Strategy')
        axes[0].set_ylabel('Score')
        axes[0].set_title('mAP & Micro F1 Comparison')
        axes[0].set_xticks(x)
        axes[0].set_xticklabels(strategies, rotation=45, ha='right')
        axes[0].legend()
        axes[0].grid(axis='y', alpha=0.3)
        
        # Recall@5 comparison
        recall_at_5 = df_comparison['Recall@5'].tolist()
        
        axes[1].bar(x, recall_at_5, alpha=0.8, color='coral')
        axes[1].set_xlabel('Strategy')
        axes[1].set_ylabel('Recall@5 (Top-5 Coverage)')
        axes[1].set_title('Recall@5 Comparison')
        axes[1].set_xticks(x)
        axes[1].set_xticklabels(strategies, rotation=45, ha='right')
        axes[1].grid(axis='y', alpha=0.3)
        
        plt.tight_layout()
        plt.show()
    
    print("\n")

---

## 💾 SAVE RESULTS

Sonuçları kaydetmek için bu hücreyi çalıştırın.

In [None]:
if len(all_test_results) == 0:
    print("⚠️ Kaydedilecek sonuç yok!")
else:
    # Helper function to convert numpy arrays to lists
    def convert_to_serializable(obj):
        """Recursively convert numpy arrays to lists for JSON serialization."""
        if isinstance(obj, np.ndarray):
            return obj.tolist()
        elif isinstance(obj, dict):
            return {key: convert_to_serializable(value) for key, value in obj.items()}
        elif isinstance(obj, list):
            return [convert_to_serializable(item) for item in obj]
        elif isinstance(obj, (np.integer, np.floating)):
            return float(obj)
        else:
            return obj
    
    # Save results to JSON
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    results_file = OUTPUT_DIR / f'strategy_comparison_{timestamp}.json'
    
    # Convert results to JSON-serializable format
    serializable_results = convert_to_serializable(all_test_results)
    
    with open(results_file, 'w') as f:
        json.dump(serializable_results, f, indent=2)
    
    print(f"✅ Sonuçlar kaydedildi: {results_file}")
    
    # Save comparison table to CSV
    comparison_data = []
    for strategy_name, data in all_test_results.items():
        results = data['results']
        comparison_data.append({
            'Strategy': strategy_name,
            'Training_Time_min': data.get('training_time_min', 0),
            'mAP': results.get('mean_average_precision', 0),
            'Micro_F1': results.get('micro_f1', 0),
            'Recall_at_5': results.get('recall_at_5', 0),
            'Precision_at_5': results.get('precision_at_5', 0),
            'Recall_at_10': results.get('recall_at_10', 0),
            'Precision_at_10': results.get('precision_at_10', 0),
            'Hamming_Loss': results.get('hamming_loss', 0),
            'Subset_Accuracy': results.get('example_based_accuracy', 0)
        })
    
    df_comparison = pd.DataFrame(comparison_data)
    csv_file = OUTPUT_DIR / f'strategy_comparison_{timestamp}.csv'
    df_comparison.to_csv(csv_file, index=False)
    
    print(f"✅ CSV tablosu kaydedildi: {csv_file}")
    
    # Download files if on Colab
    try:
        from google.colab import files
        files.download(str(results_file))
        files.download(str(csv_file))
        print("✅ Dosyalar indirildi")
    except ImportError:
        print("💡 Colab değil, dosyalar sadece kaydedildi")

---

## 📝 NOTLAR VE ÖNERİLER

**Model:**
- **CTI-BERT** (ibm-research/CTI-BERT): Cyber Threat Intelligence'a özel BERT
- Güvenlik metinlerinde genel BERT'ten daha iyi performans
- Pre-trained on security-specific corpus
- 768-dimensional embeddings

**Dataset:**
- Single source: tumeteor/Security-TTP-Mapping (14.9k train + 2.6k test)
- 499 unique MITRE ATT&CK technique labels
- Tutarlı label format, yüksek kalite
- Imbalanced distribution (tail TTP'ler az görülür)

---

## 📊 EXPERIMENT STRUCTURE OVERVIEW

### **PART A: Data Augmentation (5 strategies)**
Tail TTP'leri güçlendirmek için augmentation yöntemleri:
- **A-1:** Baseline (No Augmentation)
- **A-2:** IoC Replacement Only
- **A-3:** Back-translation Only
- **A-4:** Oversampling Only (3x-10x)
- **A-5:** Combined (All 3 methods)

**Beklenen İyileştirme:** Tail TTP Recall +40-60%, Overall mAP +20-30%

### **PART B Section 1: Loss Functions (4 strategies)**
BERT end-to-end training ile farklı loss fonksiyonları:
- **STR-1:** Baseline BCE (class imbalance handle yok)
- **STR-2:** Weighted BCE (pos_weight ile rare label'lara odaklanır)
- **STR-3:** Focal Loss γ=2 (moderate hard example focusing)
- **STR-4:** Focal Loss γ=5 (strong hard example focusing)

**En Başarılı:** Weighted BCE (frequency-based pos_weight)

### **PART B Section 2: Capacity Testing (1 strategy, 5 variants)**
Model kapasitesini test et:
- **STR-5:** Top-K Label Analysis (K=5, 10, 20, 50, 100)
- Her K için baseline BCE kullanır
- Model'in farklı label sayılarındaki performansını gösterir

**Süre:** ~2-2.5 saat (5 model)

### **PART C: Hybrid Strategies (10 strategies = 2 loss × 5 methods)**
En iyi loss fonksiyonlarıyla tüm classification yöntemlerini test et:

**Loss Functions (Part B'den seçildi):**
1. **Weighted BCE** - Frequency-based weights
2. **Focal Loss γ=5** - Strong hard example focusing

**Classification Methods:**
1. **ClassifierChain** - Label dependencies (sequential)
2. **ExtraTreesClassifier** - Extremely randomized trees
3. **RandomForestClassifier** - Ensemble of decision trees
4. **AttentionXML** - Multi-label attention mechanism (NeurIPS 2019)
5. **LightXML** - Dynamic negative sampling + label embeddings (AAAI 2021)

**Strategy Matrix:**
```
                    Chain  ExtraTrees  RandomForest  AttentionXML  LightXML
Weighted BCE        C-1    C-2         C-3           C-4           C-5
Focal Loss γ=5      C-6    C-7         C-8           C-9           C-10
```

**Toplam Süre:** ~7.5-10 saat (10 strateji × 45-60 dakika)

---

## 🎯 Method Detayları

### **ClassifierChain:**
- **Nasıl çalışır:** BERT embeddings + 499 zincirleme binary classifier
- **Avantaj:** Label dependencies modelleyebilir
- **Dezavantaj:** Training süresi uzun, chain order'a bağımlı
- **Progress:** Tek satır progress bar (499 line spam yerine)

### **ExtraTrees vs RandomForest:**
- **ExtraTrees:** Daha hızlı, random splits, less overfitting
- **RandomForest:** Optimal splits, biraz daha yüksek accuracy
- **İkisi de:** Ensemble methods, class_weight='balanced'

### **AttentionXML:**
- Her label için özel attention weights
- Label-specific text regions'a odaklanır
- End-to-end BERT fine-tuning
- Simplified implementation (499 labels için)

### **LightXML:**
- Two-stage: Label grouping (50 groups) → Candidate ranking
- Dynamic negative sampling
- Label embeddings (128-dim semantic space)
- Efficient for large label spaces

---

## 📈 Başarı Kriterleri

**SOC Analyst Perspektifi:**
- **mAP > 0.20** → İyi sıralama kalitesi (EN ÖNEMLİ!)
- **Recall@5 > 0.30** → Top-5 tahmin içinde doğru TTP'leri bulma
- **Micro F1 > 0.15** → Genel performans (threshold-based)
- **Hamming Loss < 0.10** → Az yanlış tahmin

**Metrikler:**
- **mAP:** Sıralama kalitesi - doğru TTP'leri listenin tepesine koyma başarısı
- **Recall@5:** Top-5 tahmin içinde kaç doğru TTP var
- **Precision@5:** Top-5 tahminlerin kaçı doğru
- **Micro F1:** Tüm label'lar üzerinden genel performans
- **Hamming Loss:** Yanlış tahmin edilen label oranı (düşük=iyi)

---

## 🚀 Test Önerileri

**Sıralama (Optimal):**
1. **PART A** → En iyi augmentation'ı bul (tail TTP'ler için kritik)
2. **PART B** → En iyi loss function'ı bul (weighted BCE genelde kazanır)
3. **PART C** → En iyi kombinasyonu bul (loss × classification)

**Bağımsız Test (Esnek):**
- Her part birbirinden bağımsız çalıştırılabilir
- Tek gereksinim: Her part'ın başındaki import hücresini çalıştır
- İstediğin sırayla test edebilirsin

**Hızlı Test:**
- A-1 (Baseline) + STR-2 (Weighted BCE) → ~45 dakika
- C-1 (Weighted BCE + Chain) → ~50 dakika
- Toplam: ~1.5 saat (hızlı baseline)

**Tam Test:**
- Part A (5 strateji) → ~4-5 saat
- Part B (4+5 strateji) → ~5-6 saat
- Part C (10 strateji) → ~7.5-10 saat
- Toplam: ~17-21 saat

---

## ⚠️ Hatırlatmalar

- Her strateji hücresi **bağımsızdır**, istediğiniz sırada çalıştırabilirsiniz
- Sonuçlar `all_test_results` dictionary'sinde saklanır
- Comparison hücresini dilediğiniz zaman çalıştırıp ara sonuçlara bakabilirsiniz
- CTI-BERT ilk indirilirken cache'lenir (~500MB), sonraki çalıştırmalar hızlıdır
- **ClassifierChain verbose output düzeltildi** - 499 line yerine tek progress bar
- **AttentionXML ve LightXML eklendi** - XMC (extreme multi-label classification) için
- **Progress bars** tüm training loop'larda aktif - gerçek zamanlı ilerleme takibi