# 🛡️ CTI - MITRE ATT&CK TTP MAPPING PROJECT

> **Amaç:** Siber tehdit istihbaratı (CTI) metinlerini otomatik olarak MITRE ATT&CK tekniklerine eşleştirmek için derin öğrenme tabanlı multi-label sınıflandırma sistemi.

---

## 🤖 Model: CTI-BERT (IBM Research)

| Özellik | Değer |
|---------|-------|
| **Model** | `ibm-research/CTI-BERT` |
| **Tip** | Domain-specific BERT |
| **Avantaj** | Güvenlik ve CTI metinlerinde genel BERT'ten daha iyi performans |
| **Embedding** | 768-dimensional |

📎 [Hugging Face Model Card](https://huggingface.co/ibm-research/CTI-BERT)

---

## 📊 Dataset: Security-TTP-Mapping

| Özellik | Değer |
|---------|-------|
| **Kaynak** | `tumeteor/Security-TTP-Mapping` |
| **Train** | 14.9k örnek |
| **Test** | 2.6k örnek |
| **Labels** | 499 MITRE ATT&CK tekniği |

📎 [Hugging Face Dataset](https://huggingface.co/datasets/tumeteor/Security-TTP-Mapping)

---
## 🔧 SETUP 
> Colab ortamında proje kurulumu ve bağımlılıkların yüklenmesi.

In [1]:
# Update repository to latest version (get optimized tree classifiers)
!cd /content/Mitre_Attack_TTP_Mapping && git pull origin main

print("\n⚡ OPTIMIZATION APPLIED:")
print("   - ExtraTreesClassifier: 50 trees (was 100)")
print("   - RandomForestClassifier: 50 trees (was 100)")
print("   - max_features='sqrt' (~28 features per split instead of 768)")
print("   - min_samples_split=20 (faster splits)")
print("   - Expected speedup: ~4x faster training!")
print("\n   If training was stuck, Runtime > Interrupt execution, then re-run from Strategy cell\n")

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("✅ Google Colab ortamı tespit edildi")
    
    import torch
    if torch.cuda.is_available():
        print(f"✅ GPU: {torch.cuda.get_device_name(0)}")
        print(f"   VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    else:
        print("⚠️  GPU bulunamadı! Runtime > Change runtime type > GPU seçin")
    
    print("\n📥 Proje indiriliyor...")
    !rm -rf Mitre_Attack_TTP_Mapping
    !git clone https://github.com/Aliekinozcetin/Mitre_Attack_TTP_Mapping.git
    os.chdir('Mitre_Attack_TTP_Mapping')
    print(f"✅ Çalışma dizini: {os.getcwd()}")
    
    print("\n📦 Paketler yükleniyor...")
    !pip install -q torch transformers datasets scikit-learn pandas tqdm matplotlib seaborn
    print("✅ Tüm paketler yüklendi")
    
    # HuggingFace bağlantı optimizasyonu
    print("\n🔧 HuggingFace cache ayarları...")
    
    # Create cache directory
    cache_dir = '/content/hf_cache'
    os.makedirs(cache_dir, exist_ok=True)
    
    # Set environment variables
    os.environ['HF_HOME'] = cache_dir
    os.environ['TRANSFORMERS_CACHE'] = cache_dir
    os.environ['HF_DATASETS_CACHE'] = cache_dir
    os.environ['HF_HUB_DOWNLOAD_TIMEOUT'] = '600'  # 10 minutes
    os.environ['CURL_CA_BUNDLE'] = ''
    os.environ['HF_ENDPOINT'] = 'https://huggingface.co'
    os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = '1'  # Faster downloads
    
    print(f"✅ Cache dizini oluşturuldu: {cache_dir}")
    print(f"   Timeout: 10 dakika")
    
    # Test HuggingFace connection
    try:
        from huggingface_hub import HfApi
        api = HfApi()
        print("\n📡 HuggingFace bağlantı testi...")
        info = api.model_info("ibm-research/CTI-BERT", timeout=30)
        print(f"✅ Model erişilebilir: {info.modelId}")
    except Exception as e:
        print(f"⚠️  Bağlantı uyarısı: {str(e)[:100]}")
        print("   Model indirme denemeye devam edilecek...")
else:
    print("ℹ️  Yerel ortamda çalışıyorsunuz")

zsh:cd:1: no such file or directory: /content/Mitre_Attack_TTP_Mapping

⚡ OPTIMIZATION APPLIED:
   - ExtraTreesClassifier: 50 trees (was 100)
   - RandomForestClassifier: 50 trees (was 100)
   - max_features='sqrt' (~28 features per split instead of 768)
   - min_samples_split=20 (faster splits)
   - Expected speedup: ~4x faster training!

   If training was stuck, Runtime > Interrupt execution, then re-run from Strategy cell

ℹ️  Yerel ortamda çalışıyorsunuz


---
## 📦 Import Modules
> Gerekli kütüphaneleri ve proje modüllerini yükle.

In [2]:
import torch
import numpy as np
import json
import time
from datetime import datetime
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt
from torch.utils.data import DataLoader

# Import project modules
from src.data_loader import prepare_data, load_datasets_and_prepare_dataloaders
from src.model import load_model
from src.train import train_model
from src.evaluate import evaluate_model
from src.strategies import get_strategy_config
from src.augmentation import replace_iocs, back_translate, augment_tail_samples

print("✅ Modüller yüklendi")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")

ModuleNotFoundError: No module named 'torch'

---
## ⚙️ Configuration
> Eğitim parametrelerini ve çıktı dizinini ayarla.

In [None]:
# Base training configuration
BASE_CONFIG = {
    'model_name': 'ibm-research/CTI-BERT',  # CTI domain-specific BERT
    'batch_size': 16,
    'learning_rate': 2e-5,
    'num_epochs': 5,
    'max_length': 128,
    'device': 'cuda' if torch.cuda.is_available() else 'cpu'
}

# Output directory
OUTPUT_DIR = Path('outputs')
OUTPUT_DIR.mkdir(exist_ok=True)

# Store results from all strategies
all_test_results = {}

print("✅ Konfigürasyon ayarlandı")
print(f"Model: {BASE_CONFIG['model_name']}")
print(f"Device: {BASE_CONFIG['device']}")
print(f"Output: {OUTPUT_DIR.absolute()}")

---
## 📥 Data Loading
> Dataset'i yükle ve DataLoader'ları oluştur.

In [None]:
print("📥 Veri yükleniyor...")
print("📦 Dataset: tumeteor/Security-TTP-Mapping (Single Source)")
print(f"🤖 Model: {BASE_CONFIG['model_name']}")
print("")

# Use single dataset: tumeteor only
train_dataloader, val_dataloader, test_dataset, label_names = load_datasets_and_prepare_dataloaders(
    model_name=BASE_CONFIG['model_name'],
    batch_size=BASE_CONFIG['batch_size'],
    max_length=BASE_CONFIG['max_length'],
    dataset_name="tumeteor/Security-TTP-Mapping"
)

# Get train_dataset from dataloader for strategies
train_dataset = train_dataloader.dataset

# Create test dataloader
test_dataloader = DataLoader(
    test_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=False
)

num_labels = len(label_names)

# Create data dict for backward compatibility
data = {
    'train_dataset': train_dataset,
    'test_dataset': test_dataset,
    'label_list': label_names,
    'num_labels': num_labels
}

print(f"✅ Veri yüklendi")
print(f"   Train batches: {len(train_dataloader)}")
print(f"   Test batches: {len(test_dataloader)}")
print(f"   Toplam label sayısı: {num_labels}")
print(f"   İlk 5 label: {label_names[:5]}")


---

# 📊 EXPERIMENT STRUCTURE

> Üç aşamalı deneysel çalışma ile optimal TTP mapping stratejisini bulma.

| Bölüm | Açıklama | Strateji Sayısı | Tahmini Süre |
|-------|----------|-----------------|--------------|
| **PART A** | Data Augmentation | 5 strateji | ~4-5 saat |
| **PART B** | Loss Functions + Capacity | 4 + 5 strateji | ~5-6 saat |
| **PART C** | Hybrid (Loss × Classification) | 10 strateji | ~7.5-10 saat |

### 🎯 Önerilen Çalıştırma Sırası:
1. **PART A** → En iyi augmentation yöntemini bul
2. **PART B** → En iyi loss function'ı belirle  
3. **PART C** → Optimal kombinasyonu keşfet

---

# 🔄 PART A: DATA AUGMENTATION EXPERIMENTS

> **Hedef:** Tail TTP'lerin (az görülen teknikler) performansını artırmak için veri augmentation stratejilerini test et.

⚠️ **Öncelik:** Bu bölümü PART B ve C'den ÖNCE çalıştırın!

---

### 📋 Augmentation Yöntemleri

| Yöntem | Açıklama | Etki |
|--------|----------|------|
| **IoC Replacement** | IP, domain, hash, file path değiştirme | Overfitting'i önler |
| **Back-translation** | EN → DE → EN paraphrasing | Semantic çeşitlilik |
| **Tail Oversampling** | Rare TTP'leri 3x-10x çoğaltma | Class balance |

---

### 🧪 Test Stratejileri

| Strateji | Açıklama | Süre |
|----------|----------|------|
| **AUG-1** | Baseline (No Augmentation) | ~30-40 dk |
| **AUG-2** | IoC Replacement Only | ~30-40 dk |
| **AUG-3** | Back-translation Only | ~40-50 dk |
| **AUG-4** | Oversampling Only | ~30-40 dk |
| **AUG-5** | Combined (All 3 methods) | ~50-60 dk |

### 🔹 AUG-1: Baseline (No Augmentation)
> Referans performans için augmentation olmadan Weighted BCE.  
> ⏱️ ~30-40 dakika

In [None]:
import time

strategy_name = "aug_baseline"
print(f"\n{'='*60}")
print(f"🧪 AUG-1: Baseline (No Augmentation)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()

# Use weighted BCE (best performing strategy)
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
strategy_train_dataloader = DataLoader(
    strategy_config['dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print("📋 Konfigürasyon:")
print(f"   Strategy: Weighted BCE (Baseline for comparison)")
print(f"   Augmentation: NONE")
print(f"   Num labels: {strategy_config['num_labels']}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device'],
    label_names=label_names
)

# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

# Store results
all_test_results[strategy_name] = {
    'config': 'Weighted BCE (No Augmentation)',
    'description': 'Baseline for augmentation comparison',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-1 TAMAMLANDI: {strategy_name}")
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("="*60)



### 🔹 AUG-2: IoC Replacement Only
> IP, domain, hash, file path değiştirme ile overfitting'i önle.  
> ⏱️ ~30-40 dakika

In [None]:
strategy_name = "aug_ioc_replacement"
print(f"\n{'='*60}")
print(f"🧪 AUG-2: IoC Replacement")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()

# Load raw dataset for text augmentation
print("🔄 Loading raw dataset for IoC replacement...")
from datasets import load_dataset
import ast

raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Find text column
print(f"Available columns: {train_df.columns.tolist()}")
possible_text_cols = ['text1', 'description', 'text', 'content', 'sentence']
text_column = None
for col in possible_text_cols:
    if col in train_df.columns:
        text_column = col
        break

if text_column is None:
    raise ValueError(f"No text column found! Available: {train_df.columns.tolist()}")

print(f"Using text column: {text_column}")

# Apply IoC replacement to texts
print("Replacing IoCs in training texts...")
augmented_train_texts = []
for text in train_df[text_column].fillna('').tolist():
    # Original + 2 IoC-replaced versions
    augmented_train_texts.append(text)  # Original
    augmented_train_texts.append(replace_iocs(text, seed=42))  # Aug 1
    augmented_train_texts.append(replace_iocs(text, seed=123))  # Aug 2

# Replicate labels accordingly
from sklearn.preprocessing import MultiLabelBinarizer
import ast

# Parse labels
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Expand labels to match augmented texts
augmented_train_labels = []
for labels in train_labels_raw:
    augmented_train_labels.append(labels)  # Original
    augmented_train_labels.append(labels)  # Aug 1
    augmented_train_labels.append(labels)  # Aug 2

print(f"✅ Original samples: {len(train_df)}")
print(f"✅ Augmented samples: {len(augmented_train_texts)} (3x augmentation)")

# Prepare augmented data
from torch.utils.data import Dataset, DataLoader

# Create custom dataset
class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_train_texts,
    augmented_train_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device'],
    label_names=label_names
)

# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

# Store results
all_test_results[strategy_name] = {
    'config': 'Weighted BCE + IoC Replacement (3x)',
    'description': 'Training data augmented with IoC replacement only',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-2 TAMAMLANDI: {strategy_name}")
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("="*60)



### 🔹 AUG-3: Back-translation Only
> EN → DE → EN paraphrasing ile semantic çeşitlilik.  
> Tail TTP örneklerinin %15'ine uygulanır.  
> ⏱️ ~40-50 dakika

In [None]:
strategy_name = "aug_back_translation"
print(f"\n{'='*60}")
print(f"🧪 AUG-3: Back-translation")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()

# Load raw data
from datasets import load_dataset
raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Parse labels
import ast
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Calculate label frequencies
from collections import Counter
label_counter = Counter()
for labels in train_labels_raw:
    label_counter.update(labels)

# Identify tail TTPs (frequency < 10)
tail_threshold = 10
tail_ttps = {label for label, count in label_counter.items() if count < tail_threshold}
print(f"📊 Tail TTPs detected: {len(tail_ttps)} (frequency < {tail_threshold})")

# Identify samples with tail TTPs
tail_sample_indices = []
for idx, labels in enumerate(train_labels_raw):
    if any(label in tail_ttps for label in labels):
        tail_sample_indices.append(idx)

print(f"📊 Samples with tail TTPs: {len(tail_sample_indices)} / {len(train_df)}")

# Apply back-translation to 15% of tail samples (faster)
import random
random.seed(42)
num_to_augment = int(len(tail_sample_indices) * 0.15)
samples_to_augment = random.sample(tail_sample_indices, num_to_augment)

print(f"🔄 Applying back-translation to {num_to_augment} samples...")

# Create augmented dataset
augmented_train_texts = train_df[text_column].fillna('').tolist()
augmented_train_labels = train_labels_raw.copy()

# Load translation models (lazy loading)
back_translate_cached = {}

for idx in samples_to_augment:
    original_text = train_df.iloc[idx][text_column]
    if pd.isna(original_text) or len(original_text.strip()) < 10:
        continue
    
    # Back-translate
    try:
        bt_text = back_translate(original_text, pivot_lang='de')
        if bt_text and bt_text != original_text:
            # Add augmented sample
            augmented_train_texts.append(bt_text)
            augmented_train_labels.append(train_labels_raw[idx])
    except Exception as e:
        print(f"⚠️ Back-translation failed for sample {idx}: {e}")
        continue

print(f"Original samples: {len(train_df)}")
print(f"Augmented samples: {len(augmented_train_texts)} (+{len(augmented_train_texts) - len(train_df)} from back-translation)")

# Create custom dataset
from src.data_loader import prepare_data
from torch.utils.data import Dataset, DataLoader

class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_train_texts,
    augmented_train_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device'],
    label_names=label_names
)

# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

# Store results
all_test_results[strategy_name] = {
    'config': 'Weighted BCE + Back-translation (30% tail)',
    'description': 'Training data augmented with back-translation for tail TTPs',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-3 TAMAMLANDI: {strategy_name}")
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("="*60)



### 🔹 AUG-4: Oversampling Only
> Tail TTP'leri 3x-10x çoğaltarak class balance sağla.  
> En hızlı augmentation yöntemi.  
> ⏱️ ~30-40 dakika

In [None]:
strategy_name = "aug_oversampling"
print(f"\n{'='*60}")
print(f"🧪 AUG-4: Oversampling Only")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Load raw data
from datasets import load_dataset
raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Find text column
possible_text_cols = ['text1', 'description', 'text', 'content', 'sentence']
text_column = None
for col in possible_text_cols:
    if col in train_df.columns:
        text_column = col
        break

if text_column is None:
    raise ValueError(f"No text column found! Available: {train_df.columns.tolist()}")

print(f"Using text column: {text_column}")

# Parse labels
import ast
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Calculate label frequencies
from collections import Counter
label_counter = Counter()
for labels in train_labels_raw:
    label_counter.update(labels)

# Identify tail TTPs (frequency < 10)
tail_threshold = 10
tail_ttps = {label for label, count in label_counter.items() if count < tail_threshold}
print(f"📊 Tail TTPs detected: {len(tail_ttps)} (frequency < {tail_threshold})")

# Get training texts
train_texts = train_df[text_column].fillna('').tolist()

# Apply oversampling only
augmented_texts = train_texts.copy()
augmented_labels = train_labels_raw.copy()

print(f"🔄 Applying oversampling to tail TTPs...")

for idx, labels in enumerate(train_labels_raw):
    # Check if sample has tail TTPs
    if any(label in tail_ttps for label in labels):
        # Calculate oversample factor based on min frequency
        min_freq = min([label_counter[label] for label in labels if label in tail_ttps])
        oversample_factor = max(3, min(10, 100 // min_freq))  # 3x-10x based on frequency
        
        # Oversample
        for _ in range(oversample_factor - 1):  # -1 because original is already in list
            augmented_texts.append(train_texts[idx])
            augmented_labels.append(labels)

print(f"Original samples: {len(train_texts)}")
print(f"Augmented samples: {len(augmented_texts)}")
print(f"Augmentation ratio: {len(augmented_texts) / len(train_texts):.2f}x")

# Create custom dataset
from torch.utils.data import Dataset, DataLoader

class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_texts,
    augmented_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': 'Weighted BCE + Oversampling Only',
    'description': 'Training data augmented with tail TTP oversampling',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-4 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("\n")

### 🔹 AUG-5: Combined (IoC + Back-translation + Oversampling)
> Tüm augmentation yöntemlerinin birlikte uygulandığı strateji.  
> En kapsamlı veri zenginleştirme.  
> ⏱️ ~50-60 dakika

In [None]:
strategy_name = "aug_combined"
print(f"\n{'='*60}")
print(f"🧪 AUG-5: Combined Augmentation (All Methods)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Load raw data
from datasets import load_dataset
raw_dataset = load_dataset("tumeteor/Security-TTP-Mapping")
train_df = raw_dataset['train'].to_pandas()

# Find text column
possible_text_cols = ['text1', 'description', 'text', 'content', 'sentence']
text_column = None
for col in possible_text_cols:
    if col in train_df.columns:
        text_column = col
        break

if text_column is None:
    raise ValueError(f"No text column found! Available: {train_df.columns.tolist()}")

print(f"Using text column: {text_column}")

# Parse labels
import ast
train_labels_raw = train_df['labels'].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x).tolist()

# Calculate label frequencies
from collections import Counter
label_counter = Counter()
for labels in train_labels_raw:
    label_counter.update(labels)

# Identify tail TTPs (frequency < 10)
tail_threshold = 10
tail_ttps = {label for label, count in label_counter.items() if count < tail_threshold}
print(f"📊 Tail TTPs detected: {len(tail_ttps)} (frequency < {tail_threshold})")

# Get training texts
train_texts = train_df[text_column].fillna('').tolist()

print(f"🔄 Applying COMBINED augmentation...")
print(f"   - IoC Replacement")
print(f"   - Back-translation (15% probability)")
print(f"   - Tail Oversampling (3x-10x)")

# Apply combined augmentation
augmented_texts = train_texts.copy()
augmented_labels = train_labels_raw.copy()

for idx, labels in enumerate(train_labels_raw):
    # Check if sample has tail TTPs
    if any(label in tail_ttps for label in labels):
        # Calculate oversample factor based on min frequency
        min_freq = min([label_counter[label] for label in labels if label in tail_ttps])
        oversample_factor = max(3, min(10, 100 // min_freq))  # 3x-10x based on frequency
        
        # Oversample with augmentation
        for _ in range(oversample_factor - 1):  # -1 because original is already in list
            augmented_text = train_texts[idx]
            
            # Apply IoC replacement
            augmented_text = replace_iocs(augmented_text)
            
            # Apply back-translation with 15% probability
            import random
            if random.random() < 0.15:
                augmented_text = back_translate(augmented_text, device=BASE_CONFIG['device'])
            
            augmented_texts.append(augmented_text)
            augmented_labels.append(labels)

print(f"\nOriginal samples: {len(train_texts)}")
print(f"Augmented samples: {len(augmented_texts)}")
print(f"Augmentation ratio: {len(augmented_texts) / len(train_texts):.2f}x")

# Create custom dataset
from torch.utils.data import Dataset, DataLoader

class AugmentedCTIDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_length):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # Get unique labels
        all_labels = set()
        for label_list in labels:
            all_labels.update(label_list)
        self.label_list = sorted(list(all_labels))
        self.label_to_idx = {label: idx for idx, label in enumerate(self.label_list)}
        
        # Tokenize all texts
        self.encodings = tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors=None
        )
        
        # Encode labels
        self.encoded_labels = []
        for label_list in labels:
            encoded = [0] * len(self.label_list)
            for label in label_list:
                if label in self.label_to_idx:
                    encoded[self.label_to_idx[label]] = 1
            self.encoded_labels.append(encoded)
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.encoded_labels[idx], dtype=torch.float)
        return item
    
    def __len__(self):
        return len(self.texts)

# Create augmented dataset
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_CONFIG['model_name'])
aug_train_dataset = AugmentedCTIDataset(
    augmented_texts,
    augmented_labels,
    tokenizer,
    BASE_CONFIG['max_length']
)

# Create dataloader
aug_train_dataloader = DataLoader(
    aug_train_dataset,
    batch_size=BASE_CONFIG['batch_size'],
    shuffle=True
)

print(f"✅ Augmented dataset created!")
print(f"   Num labels: {len(aug_train_dataset.label_list)}")

# Get strategy config for weighted BCE
strategy_config = get_strategy_config(
    strategy_name='weighted',
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=len(aug_train_dataset.label_list),
    device=BASE_CONFIG['device'],
    use_focal_loss=False,
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=aug_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': 'Weighted BCE + Combined Augmentation',
    'description': 'Training data augmented with IoC replacement, back-translation, and tail oversampling',
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ AUG-5 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("\n")

---
## 📊 Part A Results: Augmentation Comparison
> Tüm augmentation stratejilerini karşılaştır ve en iyi performansı belirle.

In [None]:
# Extract augmentation results for comparison
print(f"\n{'='*80}")
print(f"📊 AUGMENTATION STRATEGIES COMPARISON")
print(f"{'='*80}\n")

# Define the desired order of strategies for the x-axis (must match strategy_name in each cell)
ordered_strategies = ['aug_baseline', 'aug_ioc_replacement', 'aug_back_translation', 'aug_oversampling', 'aug_combined']
aug_comparison_data = []

for strategy_name in ordered_strategies:
    if strategy_name in all_test_results:
        data = all_test_results[strategy_name]
        results = data['results']
        aug_comparison_data.append({
            'Strategy': data['config'],
            'Training_Time_min': data.get('training_time_min', 0),
            'mAP': results.get('mean_average_precision', 0),
            'Micro_F1': results.get('micro_f1', 0),
            'Recall@5': results.get('recall_at_5', 0),
            'Precision@5': results.get('precision_at_5', 0),
            'Recall@10': results.get('recall_at_10', 0),
            'Precision@10': results.get('precision_at_10', 0),
            'Hamming_Loss': results.get('hamming_loss', 0),
            'Micro_Precision': results.get('micro_precision', 0),
            'Micro_Recall': results.get('micro_recall', 0)
        })

if len(aug_comparison_data) > 0:
    df_aug_comparison = pd.DataFrame(aug_comparison_data)
    
    # Ensure the DataFrame is in the desired order
    df_aug_comparison['Strategy'] = pd.Categorical(df_aug_comparison['Strategy'], categories=[all_test_results[s]['config'] for s in ordered_strategies if s in all_test_results], ordered=True)
    df_aug_comparison = df_aug_comparison.sort_values('Strategy')
    
    # Export CSV (includes all metrics including @10)
    import os
    aug_csv_path = 'outputs/augmentation_comparison.csv'
    os.makedirs('outputs', exist_ok=True)
    df_aug_comparison.to_csv(aug_csv_path, index=False)
    print(f"✅ CSV exported to: {aug_csv_path}\n")
    
    # Display comparison table
    print("\n📋 Augmentation Performance Comparison (Ordered by Implementation):")
    print(df_aug_comparison.to_string(index=False))
    
    # Find best strategies
    print("\n🏆 Best Performers:")
    print(f"   Best mAP: {df_aug_comparison.loc[df_aug_comparison['mAP'].idxmax(), 'Strategy']} ({df_aug_comparison['mAP'].max():.4f})")
    print(f"   Best Micro F1: {df_aug_comparison.loc[df_aug_comparison['Micro_F1'].idxmax(), 'Strategy']} ({df_aug_comparison['Micro_F1'].max():.4f})")
    print(f"   Best Recall@5: {df_aug_comparison.loc[df_aug_comparison['Recall@5'].idxmax(), 'Strategy']} ({df_aug_comparison['Recall@5'].max():.4f})")
    print(f"   Best Precision@5: {df_aug_comparison.loc[df_aug_comparison['Precision@5'].idxmax(), 'Strategy']} ({df_aug_comparison['Precision@5'].max():.4f})")
    print(f"   Lowest Hamming Loss: {df_aug_comparison.loc[df_aug_comparison['Hamming_Loss'].idxmin(), 'Strategy']} ({df_aug_comparison['Hamming_Loss'].min():.4f})")
    
    # Create output directory for plots
    aug_plots_dir = 'outputs/augmentation_plots'
    os.makedirs(aug_plots_dir, exist_ok=True)
    
    strategies = df_aug_comparison['Strategy'].tolist()
    x_pos = np.arange(len(strategies))
    
    print("\n📊 Creating LINE CHART visualizations...")

    # ========== 1. mAP LINE CHART ==========
    fig1, ax1 = plt.subplots(figsize=(10, 6))
    values = df_aug_comparison['mAP'].tolist()
    ax1.plot(x_pos, values, marker='o', linewidth=2.5, markersize=10, color='#06A77D', label='mAP')
    ax1.fill_between(x_pos, values, alpha=0.2, color='#06A77D')
    ax1.set_xticks(x_pos)
    ax1.set_xticklabels(strategies, rotation=45, ha='right')
    ax1.set_ylabel('Mean Average Precision (mAP)', fontsize=12)
    ax1.set_title('Augmentation Strategies: mAP Comparison', fontsize=14, fontweight='bold')
    ax1.grid(axis='y', alpha=0.3, linestyle='--')
    ax1.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax1.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{aug_plots_dir}/map_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ mAP line chart saved")
    
    # ========== 2. RECALL@5 LINE CHART ==========
    fig2, ax2 = plt.subplots(figsize=(10, 6))
    values = df_aug_comparison['Recall@5'].tolist()
    ax2.plot(x_pos, values, marker='s', linewidth=2.5, markersize=10, color='#F39C12', label='Recall@5')
    ax2.fill_between(x_pos, values, alpha=0.2, color='#F39C12')
    ax2.set_xticks(x_pos)
    ax2.set_xticklabels(strategies, rotation=45, ha='right')
    ax2.set_ylabel('Recall@5', fontsize=12)
    ax2.set_title('Augmentation Strategies: Recall@5 Comparison', fontsize=14, fontweight='bold')
    ax2.grid(axis='y', alpha=0.3, linestyle='--')
    ax2.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax2.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{aug_plots_dir}/recall_5_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Recall@5 line chart saved")
    
    # ========== 3. PRECISION@5 LINE CHART ==========
    fig3, ax3 = plt.subplots(figsize=(10, 6))
    values = df_aug_comparison['Precision@5'].tolist()
    ax3.plot(x_pos, values, marker='D', linewidth=2.5, markersize=10, color='#17A589', label='Precision@5')
    ax3.fill_between(x_pos, values, alpha=0.2, color='#17A589')
    ax3.set_xticks(x_pos)
    ax3.set_xticklabels(strategies, rotation=45, ha='right')
    ax3.set_ylabel('Precision@5', fontsize=12)
    ax3.set_title('Augmentation Strategies: Precision@5 Comparison', fontsize=14, fontweight='bold')
    ax3.grid(axis='y', alpha=0.3, linestyle='--')
    ax3.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax3.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{aug_plots_dir}/precision_5_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Precision@5 line chart saved")
    
    # ========== 4. MICRO F1 LINE CHART ==========
    fig4, ax4 = plt.subplots(figsize=(10, 6))
    values = df_aug_comparison['Micro_F1'].tolist()
    ax4.plot(x_pos, values, marker='p', linewidth=2.5, markersize=10, color='#8E44AD', label='Micro F1')
    ax4.fill_between(x_pos, values, alpha=0.2, color='#8E44AD')
    ax4.set_xticks(x_pos)
    ax4.set_xticklabels(strategies, rotation=45, ha='right')
    ax4.set_ylabel('Micro F1 Score', fontsize=12)
    ax4.set_title('Augmentation Strategies: Micro F1 Comparison', fontsize=14, fontweight='bold')
    ax4.grid(axis='y', alpha=0.3, linestyle='--')
    ax4.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax4.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{aug_plots_dir}/micro_f1_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Micro F1 line chart saved")
    
    # ========== 5. HAMMING LOSS LINE CHART ==========
    fig5, ax5 = plt.subplots(figsize=(10, 6))
    values = df_aug_comparison['Hamming_Loss'].tolist()
    ax5.plot(x_pos, values, marker='^', linewidth=2.5, markersize=10, color='#E74C3C', label='Hamming Loss')
    ax5.fill_between(x_pos, values, alpha=0.2, color='#E74C3C')
    ax5.set_xticks(x_pos)
    ax5.set_xticklabels(strategies, rotation=45, ha='right')
    ax5.set_ylabel('Hamming Loss', fontsize=12)
    ax5.set_title('Augmentation Strategies: Hamming Loss Comparison (Lower is Better)', fontsize=14, fontweight='bold')
    ax5.grid(axis='y', alpha=0.3, linestyle='--')
    ax5.set_ylim([min(values)*0.9, max(values)*1.1])
    for i, val in enumerate(values):
        ax5.text(i, val + (max(values) - min(values))*0.02, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{aug_plots_dir}/hamming_loss_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Hamming Loss line chart saved")
    
    # ========== 6. TRAINING TIME LINE CHART ==========
    fig6, ax6 = plt.subplots(figsize=(10, 6))
    values = df_aug_comparison['Training_Time_min'].tolist()
    ax6.plot(x_pos, values, marker='h', linewidth=2.5, markersize=10, color='#34495E', label='Training Time (min)')
    ax6.fill_between(x_pos, values, alpha=0.2, color='#34495E')
    ax6.set_xticks(x_pos)
    ax6.set_xticklabels(strategies, rotation=45, ha='right')
    ax6.set_ylabel('Training Time (minutes)', fontsize=12)
    ax6.set_title('Augmentation Strategies: Training Time Comparison', fontsize=14, fontweight='bold')
    ax6.grid(axis='y', alpha=0.3, linestyle='--')
    ax6.set_ylim([0, max(values)*1.15])
    for i, val in enumerate(values):
        ax6.text(i, val + 5, f'{val:.1f}', ha='center', fontsize=9,
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{aug_plots_dir}/training_time_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Training Time line chart saved")
    
    print(f"\n✅ All line charts saved to: {aug_plots_dir}/")
    
    print("\n" + "="*80)
    print("📁 EXPORTED FILES:")
    print("="*80)
    print(f"  • CSV: {aug_csv_path} (includes @10 metrics)")
    print(f"  • Line Charts: {aug_plots_dir}/")
    print("    - map_comparison.png")
    print("    - recall_5_comparison.png")
    print("    - precision_5_comparison.png")
    print("    - micro_f1_comparison.png")
    print("    - hamming_loss_comparison.png")
    print("    - training_time_comparison.png")
    print("="*80)
    
else:
    print("⚠️  No augmentation results found. Please run strategies A-1 to A-4 first.")

print(f"\n{'='*80}\n")

---

# 🔄 PART B: LOSS FUNCTION STRATEGIES

> **Hedef:** Farklı loss fonksiyonlarının multi-label sınıflandırma performansına etkisini test et.

💡 **Not:** Bu stratejileri PART A'dan sonra, en iyi augmentation yöntemi ile çalıştırın.

---

### 🧪 Test Stratejileri

| Strateji | Loss Function | Özellik | Süre |
|----------|---------------|---------|------|
| **B-1** | Baseline BCE | Standart loss, referans | ~30-45 dk |
| **B-2** | Weighted BCE | Frekans bazlı ağırlıklar | ~30-45 dk |
| **B-3** | Focal Loss (γ=2) | Moderate hard example focusing | ~30-45 dk |
| **B-4** | Focal Loss (γ=5) | Strong hard example focusing | ~30-45 dk |

### 🔹 B-1: Baseline (Standard BCE Loss)
> Standart Binary Cross-Entropy loss. Referans performans için baseline.  
> ⏱️ ~30-45 dakika

In [None]:
strategy_name = "baseline"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 1: Baseline BCE")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['pos_weight'] is not None:
    print(f"   Pos weight: min={strategy_config['pos_weight'].min():.2f}, max={strategy_config['pos_weight'].max():.2f}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 1 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("\n")

### 🔹 B-2: Weighted BCE Loss
> Frekans bazlı ağırlıklar (pos_weight) ile class imbalance'ı çöz.  
> En etkili baseline yöntem.  
> ⏱️ ~30-45 dakika

In [None]:
strategy_name = "weighted"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 2: Weighted BCE")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['pos_weight'] is not None:
    print(f"   Pos weight: min={strategy_config['pos_weight'].min():.2f}, max={strategy_config['pos_weight'].max():.2f}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 2 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("\n")

### 🔹 B-3: Focal Loss (γ=2)
> Moderate hard example focusing. Yanlış sınıflandırılan örneklere odaklan.  
> ⏱️ ~30-45 dakika

In [None]:
strategy_name = "focal_weak"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 3: Focal Loss (γ=2)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['use_focal_loss']:
    print(f"   Focal alpha: {strategy_config.get('focal_alpha')}")
    print(f"   Focal gamma: {strategy_config.get('focal_gamma')}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 3 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("\n")

### 🔹 B-4: Focal Loss (γ=5)
> Strong hard example focusing. Tail TTP'ler için potansiyel iyileştirme.  
> ⏱️ ~30-45 dakika

In [None]:
strategy_name = "focal_strong"
print(f"\n{'='*60}")
print(f"🧪 STRATEGY 4: Focal Loss (γ=5)")
print(f"{'='*60}\n")

# Start timing
strategy_start_time = time.time()


# Get strategy configuration
strategy_config = get_strategy_config(
    strategy_name=strategy_name,
    train_dataset=data['train_dataset'],
    num_labels=num_labels,
    label_list=label_names
)

# Create DataLoader
if strategy_config['custom_dataloader'] is not None:
    strategy_train_dataloader = strategy_config['custom_dataloader'](BASE_CONFIG['batch_size'])
else:
    strategy_train_dataloader = DataLoader(
        strategy_config['dataset'],
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )

print("📋 Konfigürasyon:")
print(f"   Strategy: {strategy_config['name']}")
print(f"   Description: {strategy_config['description']}")
print(f"   Num labels: {strategy_config['num_labels']}")
print(f"   Focal loss: {strategy_config['use_focal_loss']}")
if strategy_config['use_focal_loss']:
    print(f"   Focal alpha: {strategy_config.get('focal_alpha')}")
    print(f"   Focal gamma: {strategy_config.get('focal_gamma')}")

# Load model
print("\n🔧 Model yükleniyor...")
model = load_model(
    model_name=BASE_CONFIG['model_name'],
    num_labels=strategy_config['num_labels'],
    device=BASE_CONFIG['device'],
    use_focal_loss=strategy_config['use_focal_loss'],
    focal_alpha=strategy_config.get('focal_alpha', 0.25),
    focal_gamma=strategy_config.get('focal_gamma', 2.0),
    pos_weight=strategy_config['pos_weight']
)

# Train model
print("\n🚀 Eğitim başlıyor...")
training_history = train_model(
    model=model,
    train_dataloader=strategy_train_dataloader,
    num_epochs=BASE_CONFIG['num_epochs'],
    learning_rate=BASE_CONFIG['learning_rate'],
    device=BASE_CONFIG['device']
)

# Evaluate model
print("\n📊 Test seti değerlendiriliyor...")
test_results = evaluate_model(
    model=model,
    test_dataset=data['test_dataset'],
    batch_size=BASE_CONFIG['batch_size'],
    device=BASE_CONFIG['device']
)

# Store results
# Calculate training time
training_time_min = (time.time() - strategy_start_time) / 60

all_test_results[strategy_name] = {
    'config': strategy_config['name'],
    'description': strategy_config['description'],
    'training_time_min': training_time_min,
    'results': test_results
}

# Display results
print("\n" + "="*60)
print(f"✅ STRATEGY 4 TAMAMLANDI: {strategy_name}")
print("="*60)
print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
print("\n")

---
## 📊 Part B-1 Results: Loss Function Comparison
> 4 loss function stratejisini (B-1 → B-4) karşılaştır ve en iyi performansı belirle.

In [None]:
# Create output directory for plots
    loss_plots_dir = 'outputs/loss_function_plots'
    os.makedirs(loss_plots_dir, exist_ok=True)
    
    strategies = df_loss_comparison['Strategy'].tolist()
    x_pos = np.arange(len(strategies))
    
    # ========== 1. mAP LINE CHART ==========
    fig1, ax1 = plt.subplots(figsize=(10, 6))
    values = df_loss_comparison['mAP'].tolist()
    ax1.plot(x_pos, values, marker='o', linewidth=2.5, markersize=10, color='#06A77D', label='mAP')
    ax1.fill_between(x_pos, values, alpha=0.2, color='#06A77D')
    ax1.set_xticks(x_pos)
    ax1.set_xticklabels(strategies, rotation=45, ha='right')
    ax1.set_ylabel('Mean Average Precision (mAP)', fontsize=12)
    ax1.set_title('Loss Functions: mAP Comparison', fontsize=14, fontweight='bold')
    ax1.grid(axis='y', alpha=0.3, linestyle='--')
    ax1.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax1.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{loss_plots_dir}/map_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ mAP line chart saved")
    
    # ========== 2. RECALL@5 LINE CHART ==========
    fig2, ax2 = plt.subplots(figsize=(10, 6))
    values = df_loss_comparison['Recall@5'].tolist()
    ax2.plot(x_pos, values, marker='s', linewidth=2.5, markersize=10, color='#F39C12', label='Recall@5')
    ax2.fill_between(x_pos, values, alpha=0.2, color='#F39C12')
    ax2.set_xticks(x_pos)
    ax2.set_xticklabels(strategies, rotation=45, ha='right')
    ax2.set_ylabel('Recall@5', fontsize=12)
    ax2.set_title('Loss Functions: Recall@5 Comparison', fontsize=14, fontweight='bold')
    ax2.grid(axis='y', alpha=0.3, linestyle='--')
    ax2.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax2.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{loss_plots_dir}/recall_5_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Recall@5 line chart saved")
    
    # ========== 3. PRECISION@5 LINE CHART ==========
    fig3, ax3 = plt.subplots(figsize=(10, 6))
    values = df_loss_comparison['Precision@5'].tolist()
    ax3.plot(x_pos, values, marker='D', linewidth=2.5, markersize=10, color='#17A589', label='Precision@5')
    ax3.fill_between(x_pos, values, alpha=0.2, color='#17A589')
    ax3.set_xticks(x_pos)
    ax3.set_xticklabels(strategies, rotation=45, ha='right')
    ax3.set_ylabel('Precision@5', fontsize=12)
    ax3.set_title('Loss Functions: Precision@5 Comparison', fontsize=14, fontweight='bold')
    ax3.grid(axis='y', alpha=0.3, linestyle='--')
    ax3.set_ylim([min(values)*0.95, max(values)*1.08])
    for i, val in enumerate(values):
        ax3.text(i, val - (max(values) - min(values))*0.04, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{loss_plots_dir}/precision_5_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Precision@5 line chart saved")
    
    # ========== 4. HAMMING LOSS LINE CHART ==========
    fig4, ax4 = plt.subplots(figsize=(10, 6))
    values = df_loss_comparison['Hamming_Loss'].tolist()
    ax4.plot(x_pos, values, marker='^', linewidth=2.5, markersize=10, color='#E74C3C', label='Hamming Loss')
    ax4.fill_between(x_pos, values, alpha=0.2, color='#E74C3C')
    ax4.set_xticks(x_pos)
    ax4.set_xticklabels(strategies, rotation=45, ha='right')
    ax4.set_ylabel('Hamming Loss', fontsize=12)
    ax4.set_title('Loss Functions: Hamming Loss Comparison (Lower is Better)', fontsize=14, fontweight='bold')
    ax4.grid(axis='y', alpha=0.3, linestyle='--')
    ax4.set_ylim([min(values)*0.9, max(values)*1.1])
    for i, val in enumerate(values):
        ax4.text(i, val + (max(values) - min(values))*0.02, f'{val:.4f}', 
                ha='center', fontsize=9, 
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
    plt.tight_layout()
    plt.savefig(f'{loss_plots_dir}/hamming_loss_comparison.png', dpi=300, bbox_inches='tight')
    plt.close()
    print(f"  ✓ Hamming Loss line chart saved")
    
    print(f"\n✅ All line charts saved to: {loss_plots_dir}/")
    
    print("\n" + "="*80)
    print("📁 EXPORTED FILES:")
    print("="*80)
    print(f"  • CSV: {loss_csv_path} (includes @10 metrics)")
    print(f"  • Line Charts: {loss_plots_dir}/")
    print("    - map_comparison.png")
    print("    - recall_5_comparison.png")
    print("    - precision_5_comparison.png")
    print("    - hamming_loss_comparison.png")
    print("="*80)
    
else:
    print("⚠️  No loss function results found. Please run strategies B-1 to B-4 first.")

print(f"\n{'='*80}\n")

---

## 🔬 PART B Section 2: Capacity Testing (Top-K Analysis)

> **Hedef:** Farklı label subset boyutlarıyla model kapasitesini test et ve öğrenme davranışını anla.

### 🔹 Strategy B-5: Top-K Label Analysis

> Model'in farklı label sayılarındaki performansını test et.

| K Değeri | Açıklama |
|----------|----------|
| **Top-5** | En az label ile baseline |
| **Top-10** | Minimal label seti |
| **Top-20** | Küçük label seti |
| **Top-50** | Orta seviye |
| **Top-100** | Geniş label seti |

⏱️ ~2-2.5 saat (5 model)

In [None]:
print(f"\n{'='*60}")
print(f"🧪 TOP-K LABEL ANALYSIS")
print(f"{'='*60}\n")

from src.strategies import filter_top_k_labels

# Test different K values
k_values = [100, 50, 20, 10, 5]
topk_results = {}

for k in k_values:
    print(f"\n{'='*60}")
    print(f"🔬 Testing Top-{k} Labels")
    print(f"{'='*60}\n")
    
    # Filter dataset to top-k labels
    filtered_train_ds, filtered_label_list, label_mapping = filter_top_k_labels(
        train_dataset, 
        label_names, 
        k=k
    )
    filtered_test_ds, _, _ = filter_top_k_labels(
        test_dataset, 
        label_names, 
        k=k
    )
    
    print(f"📊 Dataset Statistics:")
    print(f"   Top-{k} labels selected")
    print(f"   Train samples: {len(filtered_train_ds)}")
    print(f"   Test samples: {len(filtered_test_ds)}")
    print(f"   Labels: {filtered_label_list[:5]}...")
    
    # Create dataloaders
    topk_train_loader = DataLoader(
        filtered_train_ds,
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=True
    )
    topk_test_loader = DataLoader(
        filtered_test_ds,
        batch_size=BASE_CONFIG['batch_size'],
        shuffle=False
    )
    
    # Load model for this K
    print(f"\n🔧 Loading model for {k} labels...")
    topk_model = load_model(
        model_name=BASE_CONFIG['model_name'],
        num_labels=k,
        device=BASE_CONFIG['device'],
        use_focal_loss=False,
        pos_weight=None
    )
    
    # Train model
    print(f"\n🚀 Training on Top-{k}...")
    topk_history = train_model(
        model=topk_model,
        train_dataloader=topk_train_loader,
        num_epochs=BASE_CONFIG['num_epochs'],
        learning_rate=BASE_CONFIG['learning_rate'],
        device=BASE_CONFIG['device']
    )
    
    # Evaluate model
    print(f"\n📊 Evaluating Top-{k}...")
    topk_test_results = evaluate_model(
        model=topk_model,
        test_dataloader=topk_test_loader,
        label_names=filtered_label_list,
        device=BASE_CONFIG['device']
    )
    
    # Store results
    topk_results[f'top_{k}'] = {
        'k': k,
        'num_train': len(filtered_train_ds),
        'num_test': len(filtered_test_ds),
        'metrics': topk_test_results,
        'labels': filtered_label_list
    }
    
    # Display results
    print(f"\n✅ Top-{k} Results:")
    for metric, value in topk_test_results.items():
        if isinstance(value, (int, float)):
            print(f"   {metric}: {value:.4f}")

# Create comparison DataFrame
print(f"\n{'='*60}")
print(f"📊 TOP-K COMPARISON TABLE")
print(f"{'='*60}\n")

topk_comparison = []
for key, data in topk_results.items():
    metrics = data['metrics']
    topk_comparison.append({
        'K': data['k'],
        'Train Samples': data['num_train'],
        'Test Samples': data['num_test'],
        'Micro F1': metrics.get('micro_f1', 0),
        'Hamming Loss': metrics.get('hamming_loss', 0),
        'Micro Precision': metrics.get('micro_precision', 0),
        'Micro Recall': metrics.get('micro_recall', 0),
        'Recall@5': metrics.get('recall_at_5', 0),
        'Precision@5': metrics.get('precision_at_5', 0)
    })

df_topk = pd.DataFrame(topk_comparison)
df_topk = df_topk.sort_values('K', ascending=False)
print(df_topk.to_string(index=False))

# Save CSV
import os
topk_csv_path = 'outputs/topk_analysis.csv'
os.makedirs('outputs', exist_ok=True)
df_topk.to_csv(topk_csv_path, index=False)
print(f"\n✅ CSV saved: {topk_csv_path}")

# Create output directory for plots
topk_plots_dir = 'outputs/topk_analysis_plots'
os.makedirs(topk_plots_dir, exist_ok=True)

# Setup for line charts
k_labels = [f'Top-{k}' for k in df_topk['K']]
x_pos = np.arange(len(k_labels))

print("\n📊 Creating LINE CHART visualizations...")

# ========== MICRO F1 LINE CHART ==========
micro_f1 = df_topk['Micro F1'].tolist()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_pos, micro_f1, marker='o', linewidth=2.5, markersize=10, 
        color='#2E86AB', label='Micro F1')
ax.fill_between(x_pos, micro_f1, alpha=0.2, color='#2E86AB')
ax.set_xticks(x_pos)
ax.set_xticklabels(k_labels, rotation=45, ha='right')
ax.set_ylabel('Micro F1 Score', fontsize=12)
ax.set_title('Top-K Analysis: Micro F1 vs Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_ylim([min(micro_f1)*0.95, max(micro_f1)*1.08])
ax.legend(loc='upper right')
for i, val in enumerate(micro_f1):
    ax.text(i, val - (max(micro_f1) - min(micro_f1))*0.04, f'{val:.4f}', 
            ha='center', fontsize=9, 
            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.savefig(f'{topk_plots_dir}/micro_f1_comparison.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"  ✓ Micro F1 line chart saved")

# ========== HAMMING LOSS LINE CHART ==========
hamming_loss = df_topk['Hamming Loss'].tolist()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_pos, hamming_loss, marker='v', linewidth=2.5, markersize=10, 
        color='#E63946', label='Hamming Loss')
ax.fill_between(x_pos, hamming_loss, alpha=0.2, color='#E63946')
ax.set_xticks(x_pos)
ax.set_xticklabels(k_labels, rotation=45, ha='right')
ax.set_ylabel('Hamming Loss (lower is better)', fontsize=12)
ax.set_title('Top-K Analysis: Hamming Loss vs Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_ylim([min(hamming_loss)*0.92, max(hamming_loss)*1.08])
ax.legend(loc='upper right')
for i, val in enumerate(hamming_loss):
    ax.text(i, val + (max(hamming_loss) - min(hamming_loss))*0.04, f'{val:.4f}', 
            ha='center', fontsize=9, 
            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.savefig(f'{topk_plots_dir}/hamming_loss_comparison.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"  ✓ Hamming Loss line chart saved")

# ========== PRECISION@5 LINE CHART ==========
precision_5 = df_topk['Precision@5'].tolist()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_pos, precision_5, marker='p', linewidth=2.5, markersize=10, 
        color='#F77F00', label='Precision@5')
ax.fill_between(x_pos, precision_5, alpha=0.2, color='#F77F00')
ax.set_xticks(x_pos)
ax.set_xticklabels(k_labels, rotation=45, ha='right')
ax.set_ylabel('Precision@5', fontsize=12)
ax.set_title('Top-K Analysis: Precision@5 vs Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_ylim([min(precision_5)*0.95, max(precision_5)*1.08])
ax.legend(loc='upper right')
for i, val in enumerate(precision_5):
    ax.text(i, val - (max(precision_5) - min(precision_5))*0.04, f'{val:.4f}', 
            ha='center', fontsize=9, 
            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.savefig(f'{topk_plots_dir}/precision_5_comparison.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"  ✓ Precision@5 line chart saved")

# ========== RECALL@5 LINE CHART ==========
recall_5 = df_topk['Recall@5'].tolist()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_pos, recall_5, marker='D', linewidth=2.5, markersize=10, 
        color='#06A77D', label='Recall@5')
ax.fill_between(x_pos, recall_5, alpha=0.2, color='#06A77D')
ax.set_xticks(x_pos)
ax.set_xticklabels(k_labels, rotation=45, ha='right')
ax.set_ylabel('Recall@5', fontsize=12)
ax.set_title('Top-K Analysis: Recall@5 vs Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_ylim([min(recall_5)*0.95, max(recall_5)*1.08])
ax.legend(loc='upper right')
for i, val in enumerate(recall_5):
    ax.text(i, val - (max(recall_5) - min(recall_5))*0.04, f'{val:.4f}', 
            ha='center', fontsize=9, 
            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.savefig(f'{topk_plots_dir}/recall_5_comparison.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"  ✓ Recall@5 line chart saved")

# ========== MICRO PRECISION LINE CHART ==========
micro_prec = df_topk['Micro Precision'].tolist()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_pos, micro_prec, marker='o', linewidth=2.5, markersize=10, 
        color='#F18F01', label='Micro Precision')
ax.fill_between(x_pos, micro_prec, alpha=0.2, color='#F18F01')
ax.set_xticks(x_pos)
ax.set_xticklabels(k_labels, rotation=45, ha='right')
ax.set_ylabel('Micro Precision', fontsize=12)
ax.set_title('Top-K Analysis: Micro Precision vs Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_ylim([min(micro_prec)*0.95, max(micro_prec)*1.08])
ax.legend(loc='upper right')
for i, val in enumerate(micro_prec):
    ax.text(i, val - (max(micro_prec) - min(micro_prec))*0.04, f'{val:.4f}', 
            ha='center', fontsize=9, 
            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.savefig(f'{topk_plots_dir}/micro_precision_comparison.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"  ✓ Micro Precision line chart saved")

# ========== MICRO RECALL LINE CHART ==========
micro_rec = df_topk['Micro Recall'].tolist()

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_pos, micro_rec, marker='s', linewidth=2.5, markersize=10, 
        color='#C1121F', label='Micro Recall')
ax.fill_between(x_pos, micro_rec, alpha=0.2, color='#C1121F')
ax.set_xticks(x_pos)
ax.set_xticklabels(k_labels, rotation=45, ha='right')
ax.set_ylabel('Micro Recall', fontsize=12)
ax.set_title('Top-K Analysis: Micro Recall vs Label Count', fontsize=14, fontweight='bold')
ax.grid(axis='y', alpha=0.3, linestyle='--')
ax.set_ylim([min(micro_rec)*0.95, max(micro_rec)*1.08])
ax.legend(loc='upper right')
for i, val in enumerate(micro_rec):
    ax.text(i, val - (max(micro_rec) - min(micro_rec))*0.04, f'{val:.4f}', 
            ha='center', fontsize=9, 
            bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8, edgecolor='gray'))
plt.tight_layout()
plt.savefig(f'{topk_plots_dir}/micro_recall_comparison.png', dpi=300, bbox_inches='tight')
plt.close()
print(f"  ✓ Micro Recall line chart saved")

print(f"\n✅ All line charts saved to: {topk_plots_dir}/")

# Store in all_test_results for comparison
for key, data in topk_results.items():
    all_test_results[key] = {
        'config': f"Top-{data['k']} Labels",
        'description': f"Baseline BCE with {data['k']} most frequent labels",
        'results': data['metrics']
    }

print("\n" + "="*80)
print("📁 EXPORTED FILES:")
print("="*80)
print(f"  • CSV: {topk_csv_path}")
print(f"  • Line Charts: {topk_plots_dir}/")
print("    - micro_f1_comparison.png")
print("    - hamming_loss_comparison.png")
print("    - precision_5_comparison.png")
print("    - recall_5_comparison.png")
print("    - micro_precision_comparison.png")
print("    - micro_recall_comparison.png")
print("="*80)

print(f"\n{'='*60}")
print(f"✅ TOP-K ANALYSIS COMPLETE")
print(f"{'='*60}\n")

---

# 🔗 PART C: HYBRID STRATEGIES (Loss × Classification)

> **Hedef:** En iyi loss fonksiyonlarını farklı classification yöntemleriyle kombinleyerek optimal stratejiyi bul.

**Formula:** 2 Loss × 5 Methods = **10 Strateji**

---

### ⚡ Loss Functions (Part B'den seçildi)

| Loss | Açıklama |
|------|----------|
| **Weighted BCE** | Frekans bazlı ağırlıklar - En başarılı baseline |
| **Focal Loss γ=5** | Strong hard example focusing |

---

### 🎯 Classification Methods

| Method | Açıklama | Kaynak |
|--------|----------|--------|
| **ClassifierChain** | Sequential label dependencies | scikit-learn |
| **ExtraTreesClassifier** | Extremely randomized trees - Hızlı | scikit-learn |
| **RandomForestClassifier** | Ensemble decision trees | scikit-learn |
| **AttentionXML** | Multi-label attention mechanism | NeurIPS 2019 |
| **LightXML** | Dynamic negative sampling + label embeddings | AAAI 2021 |

---

⏱️ **Toplam Süre:** ~7.5-10 saat

In [None]:
import time

print(f"\n{'='*80}")
print(f"🔄 HYBRID STRATEGIES: LOSS FUNCTIONS × CLASSIFICATION METHODS")
print(f"{'='*80}\n")

from src.classifier_chain import train_classifier_chain, evaluate_classifier_chain
from src.classifier_chain import train_multi_output_classifier, evaluate_multi_output_classifier

# Define loss configurations (2 most promising from PART B experiments)
# Format: (name, use_focal_loss, pos_weight_config, focal_alpha, focal_gamma)
loss_configs = []

# 1. Weighted BCE (best baseline)
strategy_config_weighted = get_strategy_config(
    strategy_name='weighted',
    train_dataset=train_dataset,
    num_labels=num_labels,
    label_list=label_names
)
loss_configs.append({
    'name': 'weighted',
    'display_name': 'Weighted BCE',
    'use_focal_loss': False,
    'pos_weight': strategy_config_weighted['pos_weight'],
    'focal_alpha': None,
    'focal_gamma': None
})

# 2. Focal Loss γ=5 (strongest focal)
loss_configs.append({
    'name': 'focal_gamma5',
    'display_name': 'Focal Loss (γ=5)',
    'use_focal_loss': True,
    'pos_weight': None,
    'focal_alpha': 0.25,
    'focal_gamma': 5.0
})

# Define classification methods (5 methods: 3 traditional + 2 XMC)
classification_methods = [
    {
        'name': 'classifier_chain',
        'display_name': 'ClassifierChain',
        'base_estimator': 'logistic',
        'train_func': train_classifier_chain,
        'eval_func': evaluate_classifier_chain
    },
    {
        'name': 'extra_trees',
        'display_name': 'ExtraTreesClassifier',
        'base_estimator': 'extra_trees',
        'train_func': train_multi_output_classifier,
        'eval_func': evaluate_multi_output_classifier
    },
    {
        'name': 'random_forest',
        'display_name': 'RandomForestClassifier',
        'base_estimator': 'random_forest',
        'train_func': train_multi_output_classifier,
        'eval_func': evaluate_multi_output_classifier
    },
    {
        'name': 'attentionxml',
        'display_name': 'AttentionXML',
        'train_func': 'attention_xml',
        'eval_func': 'attention_xml'
    },
    {
        'name': 'lightxml',
        'display_name': 'LightXML',
        'train_func': 'light_xml',
        'eval_func': 'light_xml'
    }
]

# Counter for strategy numbering (starting from 1 for Part C)
strategy_counter = 1

# Test all combinations
print(f"🧪 Testing {len(loss_configs)} loss functions × {len(classification_methods)} classification methods")
print(f"   Total combinations: {len(loss_configs) * len(classification_methods)}")
print(f"   Estimated time: ~{len(loss_configs) * len(classification_methods) * 50} minutes")
print(f"\n{'='*80}\n")

for loss_config in loss_configs:
    for clf_method in classification_methods:
        
        strategy_name = f"hybrid_{loss_config['name']}_{clf_method['name']}"
        
        print(f"\n{'='*80}")
        print(f"🧪 PART C STRATEGY {strategy_counter}: {loss_config['display_name']} + {clf_method['display_name']}")
        print(f"{'='*80}\n")
        
        # Start timing
        strategy_start_time = time.time()
        
        # STEP 1: Configuration
        print(f"📋 Configuration:")
        print(f"   Loss Function: {loss_config['display_name']}")
        print(f"   Classification Method: {clf_method['display_name']}")
        print(f"   Use Focal Loss: {loss_config['use_focal_loss']}")
        if loss_config['pos_weight'] is not None:
            print(f"   Pos Weight: min={loss_config['pos_weight'].min():.2f}, max={loss_config['pos_weight'].max():.2f}")
        if loss_config['use_focal_loss']:
            print(f"   Focal Alpha: {loss_config['focal_alpha']}")
            print(f"   Focal Gamma: {loss_config['focal_gamma']}")
        
        # Check if this is AttentionXML or LightXML (end-to-end training)
        if clf_method['name'] in ['attentionxml', 'lightxml']:
            # Import XML utilities
            from src.xml_utils import train_attention_xml, evaluate_attention_xml
            from src.xml_utils import train_light_xml, evaluate_light_xml
            
            if clf_method['name'] == 'attentionxml':
                from src.attention_xml import load_attention_xml_model
                
                print(f"\n🔧 Training {clf_method['display_name']} (end-to-end with {loss_config['display_name']})...")
                model = load_attention_xml_model(
                    model_name=BASE_CONFIG['model_name'],
                    num_labels=num_labels,
                    device=BASE_CONFIG['device'],
                    dropout=0.1
                )
                
                training_history = train_attention_xml(
                    model=model,
                    train_dataloader=train_dataloader,
                    num_epochs=BASE_CONFIG['num_epochs'],
                    learning_rate=BASE_CONFIG['learning_rate'],
                    device=BASE_CONFIG['device'],
                    use_focal_loss=loss_config['use_focal_loss'],
                    pos_weight=loss_config['pos_weight'],
                    focal_alpha=loss_config['focal_alpha'],
                    focal_gamma=loss_config['focal_gamma']
                )
                
                print(f"\n📊 Evaluating...")
                test_results = evaluate_attention_xml(
                    model=model,
                    test_dataloader=test_dataloader,
                    device=BASE_CONFIG['device'],
                    label_names=label_names
                )
                
            else:  # lightxml
                from src.light_xml import load_light_xml_model
                
                print(f"\n🔧 Training {clf_method['display_name']} (end-to-end with {loss_config['display_name']})...")
                model = load_light_xml_model(
                    model_name=BASE_CONFIG['model_name'],
                    num_labels=num_labels,
                    device=BASE_CONFIG['device'],
                    num_label_groups=50,
                    label_emb_dim=128,
                    dropout=0.1
                )
                
                training_history = train_light_xml(
                    model=model,
                    train_dataloader=train_dataloader,
                    num_epochs=BASE_CONFIG['num_epochs'],
                    learning_rate=BASE_CONFIG['learning_rate'],
                    device=BASE_CONFIG['device'],
                    use_focal_loss=loss_config['use_focal_loss'],
                    pos_weight=loss_config['pos_weight'],
                    focal_alpha=loss_config['focal_alpha'],
                    focal_gamma=loss_config['focal_gamma']
                )
                
                print(f"\n📊 Evaluating...")
                test_results = evaluate_light_xml(
                    model=model,
                    test_dataloader=test_dataloader,
                    device=BASE_CONFIG['device'],
                    label_names=label_names
                )
        
        else:
            # Traditional two-stage approach: BERT + Classification
            # STEP 1: Train BERT with specific loss function
            print(f"\n🔧 Step 1/3: Training BERT with {loss_config['display_name']}...")
            bert_model = load_model(
                model_name=BASE_CONFIG['model_name'],
                num_labels=num_labels,
                device=BASE_CONFIG['device'],
                use_focal_loss=loss_config['use_focal_loss'],
                focal_alpha=loss_config['focal_alpha'],
                focal_gamma=loss_config['focal_gamma'],
                pos_weight=loss_config['pos_weight']
            )
            
            training_history = train_model(
                model=bert_model,
                train_dataloader=train_dataloader,
                num_epochs=BASE_CONFIG['num_epochs'],
                learning_rate=BASE_CONFIG['learning_rate'],
                device=BASE_CONFIG['device']
            )
            
            # STEP 2: Train classifier on top of BERT embeddings
            print(f"\n🔧 Step 2/3: Training {clf_method['display_name']}...")
            
            if clf_method['name'] == 'classifier_chain':
                # ClassifierChain: Sequential label modeling
                chain_model = train_classifier_chain(
                    bert_model=bert_model,
                    train_dataloader=train_dataloader,
                    device=BASE_CONFIG['device'],
                    base_estimator=clf_method['base_estimator'],
                    order='random',
                    random_state=42
                )
                
                print(f"\n📊 Step 3/3: Evaluating...")
                test_results = evaluate_classifier_chain(
                    model=chain_model,
                    test_dataloader=test_dataloader,
                    label_names=label_names
                )
            
            else:  # MultiOutputClassifier (RandomForest or ExtraTrees)
                # Multi-output: Independent binary classifiers
                multi_output_model = train_multi_output_classifier(
                    bert_model=bert_model,
                    train_dataloader=train_dataloader,
                    device=BASE_CONFIG['device'],
                    base_estimator=clf_method['base_estimator'],
                    n_jobs=-1,
                    random_state=42
                )
                
                print(f"\n📊 Step 3/3: Evaluating...")
                test_results = evaluate_multi_output_classifier(
                    model=multi_output_model,
                    test_dataloader=test_dataloader,
                    label_names=label_names
                )
        
        # Calculate training time
        training_time_min = (time.time() - strategy_start_time) / 60
        
        # Store results
        all_test_results[strategy_name] = {
            'config': f"{loss_config['display_name']} + {clf_method['display_name']}",
            'description': f"Hybrid strategy: {loss_config['display_name']} loss with {clf_method['display_name']}",
            'training_time_min': training_time_min,
            'results': test_results
        }
        
        # Display results
        print(f"\n{'='*80}")
        print(f"✅ PART C STRATEGY {strategy_counter} COMPLETE: {strategy_name}")
        print(f"⏱️  Training Time: {training_time_min:.2f} minutes")
        print(f"{'='*80}")
        
        print(f"\n📈 Results:")
        for metric, value in test_results.items():
            if isinstance(value, (int, float)):
                print(f"   {metric}: {value:.4f}")
        
        print(f"\n{'='*80}\n")
        
        # Increment counter
        strategy_counter += 1

print(f"\n{'='*80}")
print(f"✅ ALL HYBRID STRATEGIES COMPLETE")
print(f"   Total strategies tested: {len(loss_configs) * len(classification_methods)}")
print(f"{'='*80}\n")

---
## 📊 Part C Results: Hybrid Strategies Comparison
> Tüm hybrid stratejileri karşılaştır ve optimal kombinasyonu belirle.

In [None]:
# Compare all strategies
if len(all_test_results) > 0:
    print("\n" + "="*80)
    print(" 📊 HİBRİT STRATEJİ KARŞILAŞTIRMASI (Sıralı - Uygulama Sırasına Göre)")
    print("="*80 + "\n")
    
    # Define the desired order of hybrid strategies (implementation order)
    ordered_hybrid_strategies = [
        'hybrid_weighted_classifier_chain',
        'hybrid_weighted_extra_trees',
        'hybrid_weighted_random_forest',
        'hybrid_weighted_attentionxml',
        'hybrid_weighted_lightxml',
        'hybrid_focal_gamma5_classifier_chain',
        'hybrid_focal_gamma5_extra_trees',
        'hybrid_focal_gamma5_random_forest',
        'hybrid_focal_gamma5_attentionxml',
        'hybrid_focal_gamma5_lightxml'
    ]
    
    comparison_data = []
    
    # First add hybrid strategies in order
    for strategy_name in ordered_hybrid_strategies:
        if strategy_name in all_test_results:
            result_dict = all_test_results[strategy_name]
            results = result_dict['results']
            comparison_data.append({
                'Strategy': strategy_name,
                'Training_Time_min': result_dict.get('training_time_min', 0),
                'Micro F1': results.get('micro_f1', 0),
                'mAP': results.get('mean_average_precision', 0),
                'Hamming Loss': results.get('hamming_loss', 0),
                'Example-based Accuracy': results.get('example_based_accuracy', 0),
                'Micro Precision': results.get('micro_precision', 0),
                'Micro Recall': results.get('micro_recall', 0),
                'Precision@5': results.get('precision_at_5', 0),
                'Recall@5': results.get('recall_at_5', 0),
                'Precision@10': results.get('precision_at_10', 0),
                'Recall@10': results.get('recall_at_10', 0)
            })
    
    # Then add any remaining strategies not in the ordered list
    for strategy_name, result_dict in all_test_results.items():
        if strategy_name not in ordered_hybrid_strategies:
            results = result_dict['results']
            comparison_data.append({
                'Strategy': strategy_name,
                'Training_Time_min': result_dict.get('training_time_min', 0),
                'Micro F1': results.get('micro_f1', 0),
                'mAP': results.get('mean_average_precision', 0),
                'Hamming Loss': results.get('hamming_loss', 0),
                'Example-based Accuracy': results.get('example_based_accuracy', 0),
                'Micro Precision': results.get('micro_precision', 0),
                'Micro Recall': results.get('micro_recall', 0),
                'Precision@5': results.get('precision_at_5', 0),
                'Recall@5': results.get('recall_at_5', 0),
                'Precision@10': results.get('precision_at_10', 0),
                'Recall@10': results.get('recall_at_10', 0)
            })
    
    df_comparison = pd.DataFrame(comparison_data)
    
    print(df_comparison.to_string(index=False))
    print("\n" + "="*80)
    
    # Find best strategy
    best_idx = df_comparison['mAP'].idxmax()
    best_strategy = df_comparison.iloc[best_idx]['Strategy']
    best_map = df_comparison.iloc[best_idx]['mAP']
    best_f1 = df_comparison.iloc[best_idx]['Micro F1']
    
    print(f"\n🏆 EN İYİ STRATEJİ: {best_strategy}")
    print(f"   mAP (Ranking Quality): {best_map:.4f}")
    print(f"   Micro F1 Score: {best_f1:.4f}")
    print(f"   Micro Precision: {df_comparison.iloc[best_idx]['Micro Precision']:.4f}")
    print(f"   Micro Recall: {df_comparison.iloc[best_idx]['Micro Recall']:.4f}")
    
    # Compare to baseline if exists
    if 'baseline' in all_test_results:
        baseline_map = all_test_results['baseline']['results'].get('mean_average_precision', 0)
        improvement = ((best_map - baseline_map) / baseline_map * 100) if baseline_map > 0 else 0
        print(f"   Baseline'a göre mAP iyileştirme: {improvement:+.2f}%")
    
    # Save comparison to CSV
    csv_path = 'outputs/hybrid_strategies_comparison.csv'
    os.makedirs('outputs', exist_ok=True)
    df_comparison.to_csv(csv_path, index=False)
    print(f"\n💾 CSV saved: {csv_path}")
    
    # Plot comparison - individual line charts
    if len(all_test_results) > 1:
        plots_dir = 'outputs/hybrid_strategies_plots'
        os.makedirs(plots_dir, exist_ok=True)
        
        strategies = df_comparison['Strategy'].tolist()
        x = np.arange(len(strategies))
        
        # 1. Micro F1 Score
        fig1, ax1 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Micro F1'].tolist()
        ax1.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#2E86AB', label='Micro F1')
        ax1.fill_between(x, values, alpha=0.3, color='#2E86AB')
        ax1.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax1.set_ylabel('Micro F1 Score', fontsize=13, fontweight='bold')
        ax1.set_title('Hybrid Strategies: Micro F1 Score Comparison', fontsize=15, fontweight='bold', pad=20)
        ax1.set_xticks(x)
        ax1.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax1.grid(True, alpha=0.3, linestyle='--')
        ax1.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax1.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/micro_f1.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 2. mAP (Mean Average Precision)
        fig2, ax2 = plt.subplots(figsize=(12, 7))
        values = df_comparison['mAP'].tolist()
        ax2.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#06A77D', label='mAP')
        ax2.fill_between(x, values, alpha=0.3, color='#06A77D')
        ax2.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax2.set_ylabel('Mean Average Precision (mAP)', fontsize=13, fontweight='bold')
        ax2.set_title('Hybrid Strategies: mAP Comparison (Ranking Quality)', fontsize=15, fontweight='bold', pad=20)
        ax2.set_xticks(x)
        ax2.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax2.grid(True, alpha=0.3, linestyle='--')
        ax2.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax2.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/map.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 3. Micro Precision
        fig3, ax3 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Micro Precision'].tolist()
        ax3.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#5DADE2', label='Micro Precision')
        ax3.fill_between(x, values, alpha=0.3, color='#5DADE2')
        ax3.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax3.set_ylabel('Micro Precision', fontsize=13, fontweight='bold')
        ax3.set_title('Hybrid Strategies: Micro Precision Comparison', fontsize=15, fontweight='bold', pad=20)
        ax3.set_xticks(x)
        ax3.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax3.grid(True, alpha=0.3, linestyle='--')
        ax3.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax3.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/micro_precision.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 4. Micro Recall
        fig4, ax4 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Micro Recall'].tolist()
        ax4.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#EC7063', label='Micro Recall')
        ax4.fill_between(x, values, alpha=0.3, color='#EC7063')
        ax4.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax4.set_ylabel('Micro Recall', fontsize=13, fontweight='bold')
        ax4.set_title('Hybrid Strategies: Micro Recall Comparison', fontsize=15, fontweight='bold', pad=20)
        ax4.set_xticks(x)
        ax4.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax4.grid(True, alpha=0.3, linestyle='--')
        ax4.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax4.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/micro_recall.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 5. Recall@5
        fig5, ax5 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Recall@5'].tolist()
        ax5.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#F39C12', label='Recall@5')
        ax5.fill_between(x, values, alpha=0.3, color='#F39C12')
        ax5.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax5.set_ylabel('Recall@5', fontsize=13, fontweight='bold')
        ax5.set_title('Hybrid Strategies: Recall@5 Comparison', fontsize=15, fontweight='bold', pad=20)
        ax5.set_xticks(x)
        ax5.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax5.grid(True, alpha=0.3, linestyle='--')
        ax5.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax5.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/recall_at_5.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 6. Precision@5
        fig6, ax6 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Precision@5'].tolist()
        ax6.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#17A589', label='Precision@5')
        ax6.fill_between(x, values, alpha=0.3, color='#17A589')
        ax6.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax6.set_ylabel('Precision@5', fontsize=13, fontweight='bold')
        ax6.set_title('Hybrid Strategies: Precision@5 Comparison', fontsize=15, fontweight='bold', pad=20)
        ax6.set_xticks(x)
        ax6.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax6.grid(True, alpha=0.3, linestyle='--')
        ax6.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax6.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/precision_at_5.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 7. Hamming Loss (Lower is Better)
        fig7, ax7 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Hamming Loss'].tolist()
        ax7.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#FF6F61', label='Hamming Loss')
        ax7.fill_between(x, values, alpha=0.3, color='#FF6F61')
        ax7.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax7.set_ylabel('Hamming Loss', fontsize=13, fontweight='bold')
        ax7.set_title('Hybrid Strategies: Hamming Loss Comparison (Lower is Better)', fontsize=15, fontweight='bold', pad=20)
        ax7.set_xticks(x)
        ax7.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax7.grid(True, alpha=0.3, linestyle='--')
        ax7.set_ylim([min(values) * 0.85, max(values) * 1.1])
        for i, v in enumerate(values):
            ax7.text(i, v, f'{v:.4f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/hamming_loss.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # 8. Example-based Accuracy
        fig8, ax8 = plt.subplots(figsize=(12, 7))
        values = df_comparison['Example-based Accuracy'].tolist()
        ax8.plot(x, values, marker='o', linewidth=2.5, markersize=8, color='#9B59B6', label='Example-based Accuracy')
        ax8.fill_between(x, values, alpha=0.3, color='#9B59B6')
        ax8.set_xlabel('Strategy', fontsize=13, fontweight='bold')
        ax8.set_ylabel('Example-based Accuracy', fontsize=13, fontweight='bold')
        ax8.set_title('Hybrid Strategies: Example-based Accuracy Comparison', fontsize=15, fontweight='bold', pad=20)
        ax8.set_xticks(x)
        ax8.set_xticklabels(strategies, rotation=45, ha='right', fontsize=10)
        ax8.grid(True, alpha=0.3, linestyle='--')
        ax8.set_ylim([min(values) * 0.95, max(values) * 1.1])
        for i, v in enumerate(values):
            ax8.text(i, v, f'{v:.3f}', ha='center', va='bottom', fontsize=9,
                    bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))
        plt.tight_layout()
        plt.savefig(f'{plots_dir}/example_based_accuracy.png', dpi=300, bbox_inches='tight')
        plt.close()
        
        # Combined plot (4x2 grid for 8 metrics)
        fig, axes = plt.subplots(4, 2, figsize=(18, 24))
        
        # Panel 1: Micro F1
        values = df_comparison['Micro F1'].tolist()
        axes[0, 0].plot(x, values, marker='o', linewidth=2, markersize=6, color='#2E86AB')
        axes[0, 0].fill_between(x, values, alpha=0.3, color='#2E86AB')
        axes[0, 0].set_xlabel('Strategy', fontsize=10)
        axes[0, 0].set_ylabel('Micro F1', fontsize=10)
        axes[0, 0].set_title('Micro F1 Score', fontsize=11, fontweight='bold')
        axes[0, 0].set_xticks(x)
        axes[0, 0].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[0, 0].grid(True, alpha=0.3)
        
        # Panel 2: mAP
        values = df_comparison['mAP'].tolist()
        axes[0, 1].plot(x, values, marker='o', linewidth=2, markersize=6, color='#06A77D')
        axes[0, 1].fill_between(x, values, alpha=0.3, color='#06A77D')
        axes[0, 1].set_xlabel('Strategy', fontsize=10)
        axes[0, 1].set_ylabel('mAP', fontsize=10)
        axes[0, 1].set_title('Mean Average Precision', fontsize=11, fontweight='bold')
        axes[0, 1].set_xticks(x)
        axes[0, 1].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[0, 1].grid(True, alpha=0.3)
        
        # Panel 3: Micro Precision
        values = df_comparison['Micro Precision'].tolist()
        axes[1, 0].plot(x, values, marker='o', linewidth=2, markersize=6, color='#5DADE2')
        axes[1, 0].fill_between(x, values, alpha=0.3, color='#5DADE2')
        axes[1, 0].set_xlabel('Strategy', fontsize=10)
        axes[1, 0].set_ylabel('Micro Precision', fontsize=10)
        axes[1, 0].set_title('Micro Precision', fontsize=11, fontweight='bold')
        axes[1, 0].set_xticks(x)
        axes[1, 0].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[1, 0].grid(True, alpha=0.3)
        
        # Panel 4: Micro Recall
        values = df_comparison['Micro Recall'].tolist()
        axes[1, 1].plot(x, values, marker='o', linewidth=2, markersize=6, color='#EC7063')
        axes[1, 1].fill_between(x, values, alpha=0.3, color='#EC7063')
        axes[1, 1].set_xlabel('Strategy', fontsize=10)
        axes[1, 1].set_ylabel('Micro Recall', fontsize=10)
        axes[1, 1].set_title('Micro Recall', fontsize=11, fontweight='bold')
        axes[1, 1].set_xticks(x)
        axes[1, 1].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[1, 1].grid(True, alpha=0.3)
        
        # Panel 5: Recall@5
        values = df_comparison['Recall@5'].tolist()
        axes[2, 0].plot(x, values, marker='o', linewidth=2, markersize=6, color='#F39C12')
        axes[2, 0].fill_between(x, values, alpha=0.3, color='#F39C12')
        axes[2, 0].set_xlabel('Strategy', fontsize=10)
        axes[2, 0].set_ylabel('Recall@5', fontsize=10)
        axes[2, 0].set_title('Recall@5', fontsize=11, fontweight='bold')
        axes[2, 0].set_xticks(x)
        axes[2, 0].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[2, 0].grid(True, alpha=0.3)
        
        # Panel 6: Precision@5
        values = df_comparison['Precision@5'].tolist()
        axes[2, 1].plot(x, values, marker='o', linewidth=2, markersize=6, color='#17A589')
        axes[2, 1].fill_between(x, values, alpha=0.3, color='#17A589')
        axes[2, 1].set_xlabel('Strategy', fontsize=10)
        axes[2, 1].set_ylabel('Precision@5', fontsize=10)
        axes[2, 1].set_title('Precision@5', fontsize=11, fontweight='bold')
        axes[2, 1].set_xticks(x)
        axes[2, 1].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[2, 1].grid(True, alpha=0.3)
        
        # Panel 7: Hamming Loss
        values = df_comparison['Hamming Loss'].tolist()
        axes[3, 0].plot(x, values, marker='o', linewidth=2, markersize=6, color='#FF6F61')
        axes[3, 0].fill_between(x, values, alpha=0.3, color='#FF6F61')
        axes[3, 0].set_xlabel('Strategy', fontsize=10)
        axes[3, 0].set_ylabel('Hamming Loss', fontsize=10)
        axes[3, 0].set_title('Hamming Loss (Lower is Better)', fontsize=11, fontweight='bold')
        axes[3, 0].set_xticks(x)
        axes[3, 0].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[3, 0].grid(True, alpha=0.3)
        
        # Panel 8: Example-based Accuracy
        values = df_comparison['Example-based Accuracy'].tolist()
        axes[3, 1].plot(x, values, marker='o', linewidth=2, markersize=6, color='#9B59B6')
        axes[3, 1].fill_between(x, values, alpha=0.3, color='#9B59B6')
        axes[3, 1].set_xlabel('Strategy', fontsize=10)
        axes[3, 1].set_ylabel('Example-based Accuracy', fontsize=10)
        axes[3, 1].set_title('Example-based Accuracy', fontsize=11, fontweight='bold')
        axes[3, 1].set_xticks(x)
        axes[3, 1].set_xticklabels(strategies, rotation=45, ha='right', fontsize=7)
        axes[3, 1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        combined_plot_path = f'{plots_dir}/hybrid_strategies_comparison_all.png'
        plt.savefig(combined_plot_path, dpi=300, bbox_inches='tight')
        plt.show()
        
        print(f"\n📊 Plots saved:")
        print(f"    - {plots_dir}/hybrid_strategies_comparison_all.png (combined 8-panel)")
        print(f"    - {plots_dir}/micro_f1.png")
        print(f"    - {plots_dir}/map.png")
        print(f"    - {plots_dir}/micro_precision.png")
        print(f"    - {plots_dir}/micro_recall.png")
        print(f"    - {plots_dir}/recall_at_5.png")
        print(f"    - {plots_dir}/precision_at_5.png")
        print(f"    - {plots_dir}/hamming_loss.png")
        print(f"    - {plots_dir}/example_based_accuracy.png")
    
    print("\n")

---

# 📝 NOTLAR VE REFERANSLAR

---

## 📈 Başarı Kriterleri (SOC Analyst Perspektifi)

| Metrik | Hedef | Açıklama | Öncelik |
|--------|-------|----------|---------|
| **mAP** | > 0.20 | Sıralama kalitesi - doğru TTP'leri listenin tepesine koyma | ⭐⭐⭐ |
| **Recall@5** | > 0.30 | Top-5 tahmin içinde doğru TTP'lerin oranı | ⭐⭐⭐ |
| **Micro F1** | > 0.15 | Genel performans (threshold-based) | ⭐⭐ |
| **Hamming Loss** | < 0.10 | Yanlış tahmin oranı (düşük = iyi) | ⭐⭐ |

---

## ⚠️ Önemli Hatırlatmalar

- ✅ Her strateji hücresi **bağımsız** çalıştırılabilir
- ✅ Sonuçlar `all_test_results` dictionary'sinde saklanır
- ✅ Comparison hücresini dilediğiniz zaman çalıştırıp ara sonuçlara bakabilirsiniz
- ⏱️ CTI-BERT ilk indirilirken cache'lenir (~500MB)
- 📊 Tüm grafikler `outputs/` klasörüne kaydedilir

---

## 📚 Referanslar

| Kaynak | Açıklama |
|--------|----------|
| [CTI-BERT](https://huggingface.co/ibm-research/CTI-BERT) | IBM Research - Cyber Threat Intelligence BERT |
| [Security-TTP-Mapping](https://huggingface.co/datasets/tumeteor/Security-TTP-Mapping) | MITRE ATT&CK etiketli CTI dataset |
| [MITRE ATT&CK](https://attack.mitre.org/) | Adversarial Tactics, Techniques & Common Knowledge |