<a href="https://colab.research.google.com/github/RayAKaan/Personal-Research/blob/main/HTAS-V3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# ======================================================================================
# COLAB ENVIRONMENT SETUP (Fixed for Oct 2025)
# ======================================================================================

print("⏳ Installing all required Python packages... This may take a few minutes.")

!pip install transformers==4.38.2 datasets==2.18.0 sentence-transformers==2.7.0 \
rouge-score==0.1.2 bert-score==0.3.13 networkx==3.2.1 hdbscan==0.8.33 \
scikit-learn==1.3.2 nltk==3.8.1 torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
matplotlib seaborn --extra-index-url https://download.pytorch.org/whl/cu118

print("✅ Python packages installed successfully.")

import nltk
nltk.download('punkt')
nltk.download('punkt_tab')  # Some newer NLTK versions also need this
print("✅ Punkt tokenizer data reinstalled successfully!")

print("\n" + "="*80)
print("  SETUP COMPLETE. THE ENVIRONMENT IS NOW PREPARED.")
print("  >>> PLEASE RESTART THE RUNTIME MANUALLY TO PROCEED. <<<")
print("  Go to the menu: 'Runtime' -> 'Restart Session' (or 'Factory reset runtime')")
print("  After restarting, you can run the main experiment script in the next cell.")
print("="*80)


⏳ Installing all required Python packages... This may take a few minutes.
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118
✅ Python packages installed successfully.


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


✅ Punkt tokenizer data reinstalled successfully!

  SETUP COMPLETE. THE ENVIRONMENT IS NOW PREPARED.
  >>> PLEASE RESTART THE RUNTIME MANUALLY TO PROCEED. <<<
  Go to the menu: 'Runtime' -> 'Restart Session' (or 'Factory reset runtime')
  After restarting, you can run the main experiment script in the next cell.


[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


In [None]:
# ======================================================================================
# HTAS-V3 Kaggle POC: Fast Multi-Document Summarization (1-1.5 hours)
# This script is now ready to run in the correctly prepared environment.
# ======================================================================================

import warnings
warnings.filterwarnings("ignore")
import os
os.environ['TOKENIZERS_PARALLELISM'] = 'false'

import gc
import json
import random
from pathlib import Path
from collections import defaultdict
from typing import List, Dict, Tuple

import torch
import torch.nn.functional as F
import numpy as np
from tqdm.auto import tqdm
from torch.utils.data import DataLoader, Dataset as TorchDataset
import matplotlib.pyplot as plt
import seaborn as sns

from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    get_linear_schedule_with_warmup,
    DataCollatorForSeq2Seq,
    logging as hf_logging
)
from rouge_score import rouge_scorer
from bert_score import score as bert_score_calculator
from nltk.tokenize import sent_tokenize
import nltk

import hdbscan
import networkx as nx

hf_logging.set_verbosity_error()

# ======================================================================================
# KAGGLE-OPTIMIZED CONFIGURATION (Fast POC)
# ======================================================================================

class KaggleConfig:
    """Fast configuration for 1-1.5 hour Kaggle runtime."""
    # Data - Reduced for speed
    num_train_samples: int = 300
    num_test_samples: int = 100

    # Model - Use smaller BART
    model_name: str = 'facebook/bart-base'  # Faster than bart-large
    max_input_length: int = 1024
    max_target_length: int = 150

    # Training - Optimized for speed
    batch_size: int = 4
    gradient_accumulation_steps: int = 2
    learning_rate: float = 3e-5
    max_steps: int = 300
    epochs: int = 2
    warmup_ratio: float = 0.1
    fp16: bool = True

    # HTAS - Simplified
    token_budget: int = 400
    min_cluster_size: int = 2
    per_cluster_topk: int = 2
    fallback_topk: int = 8
    sim_threshold: float = 0.15

    # PageRank
    pagerank_alpha: float = 0.85
    pagerank_max_iter: int = 100

    # Evaluation
    num_beams: int = 4

    # System
    seed: int = 42
    device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
    output_dir: str = '/kaggle/working/htas_output'

# ======================================================================================
# MULTI-DOC NEWS DEMO ARTICLES (INTEGRATED TEST CASES)
# ======================================================================================

DEMO_ARTICLES = [
    {
        "topic": "Climate Change Summit 2025",
        "articles": [
            """World leaders gathered in Geneva for the 2025 Climate Summit, marking a crucial moment in global climate policy. The United Nations Secretary-General emphasized that this decade is decisive for preventing catastrophic warming. Over 150 countries pledged new commitments to reduce carbon emissions by 50% before 2030. The summit highlighted renewable energy breakthroughs, with solar costs dropping 80% since 2020. However, developing nations demanded $500 billion in climate finance from wealthy countries. Scientists warned that current pledges still fall short of limiting warming to 1.5°C.""",
            """Major economies announced historic climate investments at the Geneva summit. The European Union committed €300 billion for green technology development. China revealed plans to phase out coal power by 2035, a surprising acceleration of previous targets. The United States pledged to achieve 100% clean electricity by 2032. Tech giants including Microsoft, Apple, and Google announced carbon-negative operations by 2027. Climate activists staged massive protests, demanding faster action and fossil fuel phase-outs.""",
            """The summit's final agreement includes binding emissions targets and transparency mechanisms. Countries must submit detailed reduction plans by March 2026. A new global carbon pricing mechanism will launch in January 2027. Deforestation will be penalized through international trade restrictions. Climate finance for vulnerable nations increased to $200 billion annually. Youth delegates secured representation in all future climate negotiations, a historic first."""
        ],
        "key_points": [
            "150+ countries pledged 50% emissions cuts by 2030",
            "EU commits €300B, China phases out coal by 2035, US targets clean electricity by 2032",
            "New binding agreement with carbon pricing and $200B annual climate finance"
        ]
    },
    {
        "topic": "AI Regulation Breakthrough",
        "articles": [
            """The United States, European Union, and China signed the first global AI safety treaty. The treaty establishes international standards for AI development and deployment. Companies must disclose AI training data sources and conduct safety audits. High-risk AI systems in healthcare, finance, and law enforcement face strict oversight. A new UN agency will monitor compliance and investigate AI-related incidents. The agreement took three years of negotiations and balances innovation with safety.""",
            """Major tech companies responded positively to the new AI regulations. OpenAI, Google DeepMind, and Anthropic committed to transparency measures. The treaty requires red-teaming of powerful AI models before release. Researchers must report dangerous capabilities to the UN AI Safety Board. Penalties for violations include fines up to 10% of global revenue. Small AI startups received exemptions for systems below specified capability thresholds.""",
            """Critics debate whether the regulations strike the right balance. Some scientists argue the rules could slow beneficial AI progress in medicine. Privacy advocates praise strong data protection requirements. The treaty includes provisions for biometric AI and facial recognition limits. Developing nations gained technology transfer agreements for AI infrastructure. Implementation begins in six months with a two-year compliance grace period."""
        ],
        "key_points": [
            "US, EU, China sign first global AI safety treaty after 3 years of negotiations",
            "Requires safety audits, data disclosure, red-teaming for high-risk AI systems",
            "Penalties up to 10% revenue, UN agency to monitor, implementation in 6 months"
        ]
    }
]

# ======================================================================================
# DETERMINISTIC SEEDING
# ======================================================================================

def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# ======================================================================================
# GPU-ACCELERATED ENCODER
# ======================================================================================

class SentenceEncoderGPU:
    def __init__(self, device: str = 'cuda'):
        self.device = device
        self.model = SentenceTransformer('all-MiniLM-L6-v2', device=device)
        self.model.eval()

    def encode(self, sentences: List[str]) -> torch.Tensor:
        if not sentences:
            return torch.empty(0, 384, device=self.device)
        return self.model.encode(sentences, convert_to_tensor=True, device=self.device, show_progress_bar=False)

    def cosine_similarity(self, a: torch.Tensor, b: torch.Tensor) -> torch.Tensor:
        a_norm = F.normalize(a, p=2, dim=-1)
        b_norm = F.normalize(b, p=2, dim=-1)
        return torch.mm(a_norm, b_norm.t())

# ======================================================================================
# CLUSTERING & SELECTION
# ======================================================================================

class HTASClusterer:
    def __init__(self, config):
        self.config = config

    def cluster_sentences(self, embeddings: torch.Tensor) -> np.ndarray:
        if len(embeddings) < self.config.min_cluster_size:
            return np.zeros(len(embeddings), dtype=int)

        emb_cpu = embeddings.cpu().numpy()
        clusterer = hdbscan.HDBSCAN(
            min_cluster_size=self.config.min_cluster_size,
            min_samples=1,
            metric='euclidean'
        )
        labels = clusterer.fit_predict(emb_cpu)

        # Handle noise
        noise_mask = labels == -1
        if noise_mask.any() and (~noise_mask).any():
            for idx in np.where(noise_mask)[0]:
                sims = F.cosine_similarity(embeddings[idx].unsqueeze(0), embeddings[~noise_mask])
                labels[idx] = labels[~noise_mask][sims.argmax().item()]

        return labels

    def select_from_cluster(self, cluster_indices: List[int], sim_matrix: torch.Tensor) -> List[int]:
        if len(cluster_indices) <= self.config.per_cluster_topk:
            return cluster_indices

        n = len(cluster_indices)
        sim_sub = sim_matrix[cluster_indices][:, cluster_indices].cpu().numpy()
        sim_sub[sim_sub < self.config.sim_threshold] = 0
        np.fill_diagonal(sim_sub, 0)

        try:
            G = nx.from_numpy_array(sim_sub)
            scores = nx.pagerank(G, alpha=self.config.pagerank_alpha, max_iter=self.config.pagerank_max_iter)
            degree = nx.degree_centrality(G)
            combined = {i: 0.5 * scores[i] + 0.5 * degree[i] for i in range(n)}
        except:
            combined = {i: 1.0/n for i in range(n)}

        ranked = sorted(combined.items(), key=lambda x: x[1], reverse=True)
        return [cluster_indices[i] for i, _ in ranked[:self.config.per_cluster_topk]]

# ======================================================================================
# TOKEN-BUDGETED SELECTOR
# ======================================================================================

class TokenBudgetedSelector:
    def __init__(self, tokenizer, config):
        self.tokenizer = tokenizer
        self.config = config

    def select_with_budget(self, sentences: List[str], candidate_indices: List[int]) -> List[int]:
        selected = []
        total_tokens = 0

        for idx in candidate_indices:
            tokens = len(self.tokenizer.encode(sentences[idx], add_special_tokens=False))
            if total_tokens + tokens <= self.config.token_budget:
                selected.append(idx)
                total_tokens += tokens
            if total_tokens >= self.config.token_budget:
                break

        return selected

# ======================================================================================
# MAIN HTAS PROCESSOR
# ======================================================================================

class HTASProcessor:
    def __init__(self, config, tokenizer):
        self.config = config
        self.tokenizer = tokenizer
        self.encoder = SentenceEncoderGPU(device=config.device)
        self.clusterer = HTASClusterer(config)
        self.selector = TokenBudgetedSelector(tokenizer, config)

    def preprocess_documents(self, documents: List[str]) -> List[str]:
        sentences = []
        for doc in documents:
            doc_sents = sent_tokenize(doc)
            sentences.extend([s for s in doc_sents if 10 < len(s.split()) < 150])
        return sentences

    def create_guided_input(self, documents: List[str]) -> str:
        sentences = self.preprocess_documents(documents)

        if not sentences:
            return " ".join(documents)

        embeddings = self.encoder.encode(sentences)
        if len(embeddings) == 0:
            return " ".join(documents)

        sim_matrix = self.encoder.cosine_similarity(embeddings, embeddings)
        labels = self.clusterer.cluster_sentences(embeddings)

        selected_indices = []
        for label in np.unique(labels):
            cluster_indices = np.where(labels == label)[0].tolist()
            selected = self.clusterer.select_from_cluster(cluster_indices, sim_matrix)
            selected_indices.extend(selected)

        # Fallback: global PageRank
        if len(selected_indices) < self.config.fallback_topk:
            sim_np = sim_matrix.cpu().numpy()
            sim_np[sim_np < self.config.sim_threshold] = 0
            np.fill_diagonal(sim_np, 0)

            G = nx.from_numpy_array(sim_np)
            pr = nx.pagerank(G, alpha=self.config.pagerank_alpha, max_iter=self.config.pagerank_max_iter)
            ranked = sorted(pr.items(), key=lambda x: x[1], reverse=True)

            for idx, _ in ranked:
                if idx not in selected_indices:
                    selected_indices.append(idx)
                if len(selected_indices) >= self.config.fallback_topk:
                    break

        final_indices = self.selector.select_with_budget(sentences, selected_indices)
        final_indices = sorted(final_indices)

        guidance = " ".join([sentences[i] for i in final_indices])
        full_text = " ".join(documents)

        return f"{guidance}</s><s>{full_text}"

# ======================================================================================
# DATASET
# ======================================================================================

class SummarizationDataset(TorchDataset):
    def __init__(self, texts: List[str], summaries: List[str], tokenizer, config):
        self.texts = texts
        self.summaries = summaries
        self.tokenizer = tokenizer
        self.config = config

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        model_inputs = self.tokenizer(
            str(self.texts[idx]),
            max_length=self.config.max_input_length,
            truncation=True,
            padding=False
        )

        with self.tokenizer.as_target_tokenizer():
            labels = self.tokenizer(
                str(self.summaries[idx]),
                max_length=self.config.max_target_length,
                truncation=True,
                padding=False
            )

        model_inputs['labels'] = labels['input_ids']
        return model_inputs

# ======================================================================================
# TRAINING
# ======================================================================================

def train_model(model, tokenizer, train_dataset, config):
    model.to(config.device)
    model.train()

    collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model, padding=True)
    dataloader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True, collate_fn=collator)

    optimizer = torch.optim.AdamW(model.parameters(), lr=config.learning_rate, eps=1e-8)

    num_steps = min(config.max_steps, len(dataloader) // config.gradient_accumulation_steps * config.epochs)
    num_warmup = int(config.warmup_ratio * num_steps)
    scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup, num_steps)
    scaler = torch.cuda.amp.GradScaler(enabled=config.fp16)

    global_step = 0
    losses = []

    print(f"\n🚂 Training for {num_steps} steps...")

    for epoch in range(config.epochs):
        if global_step >= num_steps:
            break

        pbar = tqdm(dataloader, desc=f"Epoch {epoch+1}/{config.epochs}")

        for step, batch in enumerate(pbar):
            if global_step >= num_steps:
                break

            batch = {k: v.to(config.device) for k, v in batch.items()}

            with torch.cuda.amp.autocast(enabled=config.fp16):
                outputs = model(**batch)
                loss = outputs.loss / config.gradient_accumulation_steps

            scaler.scale(loss).backward()

            if (step + 1) % config.gradient_accumulation_steps == 0:
                scaler.unscale_(optimizer)
                torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                scaler.step(optimizer)
                scaler.update()
                scheduler.step()
                optimizer.zero_grad()

                global_step += 1
                if loss.item() != 0:
                    losses.append(loss.item() * config.gradient_accumulation_steps)
                pbar.set_postfix({'loss': f'{losses[-1]:.4f}' if losses else 0})

    return model, losses

# ======================================================================================
# EVALUATION
# ======================================================================================

def evaluate_model(model_name: str, model, tokenizer, test_dataset, config, processor=None):
    model.to(config.device)
    model.eval()

    candidates = []
    references = []

    print(f"\n📊 Evaluating {model_name}...")

    for item in tqdm(test_dataset, desc="Generating", leave=False):
        input_text = processor.create_guided_input(item['texts']) if processor else " ".join(item['texts'])

        inputs = tokenizer(input_text, max_length=config.max_input_length, truncation=True, return_tensors='pt').to(config.device)

        with torch.no_grad():
            summary_ids = model.generate(
                inputs.input_ids,
                num_beams=config.num_beams,
                max_length=config.max_target_length,
                min_length=40,
                length_penalty=2.0,
                no_repeat_ngram_size=3,
                early_stopping=True
            )

        generated = tokenizer.decode(summary_ids[0], skip_special_tokens=True).strip()

        if generated and item['summary']:
            candidates.append(generated)
            references.append(item['summary'])

    # ROUGE
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    rouge_scores = defaultdict(list)

    for cand, ref in zip(candidates, references):
        scores = scorer.score(ref, cand)
        for key in scores:
            rouge_scores[key].append(scores[key].fmeasure)

    # BERTScore
    P, R, F1 = bert_score_calculator(candidates, references, lang='en', model_type='distilbert-base-uncased', device=config.device, verbose=False)

    return {
        'rouge1': rouge_scores['rouge1'],
        'rouge2': rouge_scores['rouge2'],
        'rougeL': rouge_scores['rougeL'],
        'bertscore_f1': F1.tolist()
    }

# ======================================================================================
# VISUALIZATION
# ======================================================================================

def plot_comparison(results: Dict, save_path: Path):
    fig, axes = plt.subplots(1, 3, figsize=(18, 6))
    metrics = ['rouge1', 'rouge2', 'rougeL']
    titles = ['ROUGE-1', 'ROUGE-2', 'ROUGE-L']
    colors = ['#3b528b', '#21918c', '#5ec962']

    for ax, metric, title, color in zip(axes, metrics, titles, colors):
        names = list(results.keys())
        means = [results[n][f'{metric}_mean'] for n in names]
        stds = [results[n][f'{metric}_std'] for n in names]

        bars = ax.bar(names, means, yerr=stds, color=color, alpha=0.85, edgecolor='black', capsize=5)
        ax.set_title(title, fontsize=14, fontweight='bold')
        ax.set_ylabel('Score (%)', fontsize=12)
        ax.grid(axis='y', alpha=0.3)
        plt.setp(ax.get_xticklabels(), rotation=45, ha='right')

        for bar in bars:
            h = bar.get_height()
            ax.text(bar.get_x() + bar.get_width()/2, h + 1, f'{h:.1f}', ha='center', fontsize=11, weight='bold')

    plt.suptitle('HTAS-V3 vs Baseline: ROUGE Comparison', fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.savefig(save_path, dpi=150, bbox_inches='tight')
    plt.close()

# ======================================================================================
# MULTI-DOC DEMO
# ======================================================================================

def run_multidoc_demo(model, tokenizer, processor, config):
    print("\n" + "="*80)
    print("🎯 MULTI-DOCUMENT SUMMARIZATION DEMO")
    print("="*80)

    model.eval()

    for demo in DEMO_ARTICLES:
        print(f"\n📰 Topic: {demo['topic']}")
        print("-" * 80)

        print(f"\n📄 Input: {len(demo['articles'])} articles with {sum(len(a.split()) for a in demo['articles'])} total words")

        # Generate with HTAS
        guided_input = processor.create_guided_input(demo['articles'])
        inputs = tokenizer(guided_input, max_length=config.max_input_length, truncation=True, return_tensors='pt').to(config.device)

        with torch.no_grad():
            summary_ids = model.generate(
                inputs.input_ids,
                num_beams=6,
                max_length=200,
                min_length=80,
                length_penalty=2.0,
                no_repeat_ngram_size=3,
                early_stopping=True
            )

        summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

        print(f"\n🤖 HTAS-V3 Generated Summary:")
        print(f"   {summary}")

        print(f"\n✅ Key Points (Reference):")
        for i, point in enumerate(demo['key_points'], 1):
            print(f"   {i}. {point}")

        print()

# ======================================================================================
# MAIN
# ======================================================================================

def main():
    config = KaggleConfig()
    set_seed(config.seed)

    output_dir = Path(config.output_dir)
    output_dir.mkdir(exist_ok=True, parents=True)

    print("="*80)
    print("🚀 HTAS")
    print("="*80)
    print(f"\nDevice: {config.device}")
    print(f"Training samples: {config.num_train_samples}")
    print(f"Test samples: {config.num_test_samples}")
    print(f"Model: {config.model_name}")
    print(f"FP16: {config.fp16}")

    # Load data
    print("\n📚 Loading CNN/DailyMail dataset...")

    def normalize(ex):
        if 'article' in ex and ex.get('article'):
            return {'texts': [ex['article']], 'summary': ex['highlights']}
        return {'texts': [], 'summary': ''}

    total = config.num_train_samples + config.num_test_samples
    dataset = load_dataset('cnn_dailymail', '3.0.0', split=f'train[:{total}]', trust_remote_code=True)
    dataset = dataset.map(normalize, remove_columns=dataset.column_names)
    dataset = dataset.filter(lambda x: x['texts'] and x['summary'])

    split = dataset.train_test_split(test_size=config.num_test_samples, seed=config.seed)
    train_data = split['train']
    test_data = split['test']

    print(f"✓ Loaded {len(train_data)} train, {len(test_data)} test samples")

    # Initialize
    tokenizer = AutoTokenizer.from_pretrained(config.model_name)
    processor = HTASProcessor(config, tokenizer)

    # Prepare training data
    print("\n🔧 Preparing HTAS-guided training data...")
    guided_texts = [processor.create_guided_input(item['texts']) for item in tqdm(train_data, desc="Processing")]
    train_summaries = [item['summary'] for item in train_data]
    train_dataset = SummarizationDataset(guided_texts, train_summaries, tokenizer, config)

    # Train
    print("\n🚂 Fine-tuning model...")
    model = AutoModelForSeq2SeqLM.from_pretrained(config.model_name)
    trained_model, losses = train_model(model, tokenizer, train_dataset, config)

    # Save model
    model_dir = output_dir / 'htas_model'
    model_dir.mkdir(exist_ok=True)
    trained_model.save_pretrained(model_dir)
    tokenizer.save_pretrained(model_dir)
    print(f"✓ Model saved to {model_dir}")

    # Multi-doc demo
    run_multidoc_demo(trained_model, tokenizer, processor, config)

    # Evaluate
    print("\n📊 Evaluation on test set...")
    htas_scores = evaluate_model("HTAS-V3 (Fine-Tuned)", trained_model, tokenizer, test_data, config, processor)

    print("\n📊 Baseline evaluation...")
    baseline_model = AutoModelForSeq2SeqLM.from_pretrained(config.model_name)
    baseline_scores = evaluate_model("BART-Base (Zero-Shot)", baseline_model, tokenizer, test_data, config, None)

    # Results
    results = {}
    for name, scores in [('HTAS-V3', htas_scores), ('Baseline', baseline_scores)]:
        results[name] = {
            'rouge1_mean': np.mean(scores['rouge1']) * 100,
            'rouge1_std': np.std(scores['rouge1']) * 100,
            'rouge2_mean': np.mean(scores['rouge2']) * 100,
            'rouge2_std': np.std(scores['rouge2']) * 100,
            'rougeL_mean': np.mean(scores['rougeL']) * 100,
            'rougeL_std': np.std(scores['rougeL']) * 100,
            'bertscore_mean': np.mean(scores['bertscore_f1']) * 100,
            'bertscore_std': np.std(scores['bertscore_f1']) * 100,
        }

    # Print table
    print("\n" + "="*80)
    print("📊 FINAL RESULTS")
    print("="*80)
    print("\n| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | BERTScore |")
    print("|-------|---------|---------|---------|-----------|")

    for name, data in results.items():
        r1 = f"{data['rouge1_mean']:.2f}±{data['rouge1_std']:.2f}"
        r2 = f"{data['rouge2_mean']:.2f}±{data['rouge2_std']:.2f}"
        rl = f"{data['rougeL_mean']:.2f}±{data['rougeL_std']:.2f}"
        bs = f"{data['bertscore_mean']:.2f}±{data['bertscore_std']:.2f}"
        print(f"| {name:<13} | {r1:<15} | {r2:<15} | {rl:<15} | {bs:<17} |")

    # Improvement
    improvement = {
        'rouge1': results['HTAS-V3']['rouge1_mean'] - results['Baseline']['rouge1_mean'],
        'rouge2': results['HTAS-V3']['rouge2_mean'] - results['Baseline']['rouge2_mean'],
        'rougeL': results['HTAS-V3']['rougeL_mean'] - results['Baseline']['rougeL_mean'],
        'bertscore': results['HTAS-V3']['bertscore_mean'] - results['Baseline']['bertscore_mean']
    }

    print(f"\n✨ IMPROVEMENT:")
    print(f"   ROUGE-1: +{improvement['rouge1']:.2f}%")
    print(f"   ROUGE-2: +{improvement['rouge2']:.2f}%")
    print(f"   ROUGE-L: +{improvement['rougeL']:.2f}%")
    print(f"   BERTScore: +{improvement['bertscore']:.2f}%")

    # Plot
    plot_comparison(results, output_dir / 'comparison.png')
    print(f"\n✓ Saved comparison plot to {output_dir / 'comparison.png'}")

    # Save results
    with open(output_dir / 'results.json', 'w') as f:
        json.dump({'results': results, 'improvement': improvement}, f, indent=2)

    print("\n" + "="*80)
    print("🎉 POC COMPLETE!")
    print("="*80)
    print(f"✓ Results saved to {output_dir}")
    print(f"✓ Model demonstrates {improvement['rougeL']:.2f}% ROUGE-L improvement over baseline")

    # Cleanup
    del trained_model, baseline_model
    gc.collect()
    torch.cuda.empty_cache()

if __name__ == "__main__":
    main()

🚀 HTAS

Device: cuda
Training samples: 300
Test samples: 100
Model: facebook/bart-base
FP16: True

📚 Loading CNN/DailyMail dataset...
✓ Loaded 300 train, 100 test samples

🔧 Preparing HTAS-guided training data...


Processing:   0%|          | 0/300 [00:00<?, ?it/s]


🚂 Fine-tuning model...


model.safetensors:   0%|          | 0.00/558M [00:00<?, ?B/s]


🚂 Training for 74 steps...


Epoch 1/2:   0%|          | 0/75 [00:00<?, ?it/s]

Epoch 2/2:   0%|          | 0/75 [00:00<?, ?it/s]

✓ Model saved to /kaggle/working/htas_output/htas_model

🎯 MULTI-DOCUMENT SUMMARIZATION DEMO

📰 Topic: Climate Change Summit 2025
--------------------------------------------------------------------------------

📄 Input: 3 articles with 216 total words

🤖 HTAS-V3 Generated Summary:
   World leaders gathered in Geneva for the 2025 Climate Summit, marking a crucial moment in global climate policy.
World leaders pledged to reduce carbon emissions by 50% before 2030, a surprising acceleration of previous targets.
The summit highlighted renewable energy breakthroughs, with solar costs dropping 80% since 2020. The United States pledged to achieve 100% clean electricity by 2032. The summit's final agreement includes binding emissions targets and transparency mechanisms.

✅ Key Points (Reference):
   1. 150+ countries pledged 50% emissions cuts by 2030
   2. EU commits €300B, China phases out coal by 2035, US targets clean electricity by 2032
   3. New binding agreement with carbon pricing and

Generating:   0%|          | 0/100 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]


📊 Baseline evaluation...

📊 Evaluating BART-Base (Zero-Shot)...


Generating:   0%|          | 0/100 [00:00<?, ?it/s]


📊 FINAL RESULTS

| Model | ROUGE-1 | ROUGE-2 | ROUGE-L | BERTScore |
|-------|---------|---------|---------|-----------|
| HTAS-V3       | 31.56±11.63     | 11.26±8.96      | 21.17±8.68      | 79.18±3.55        |
| Baseline      | 28.73±8.07      | 11.24±6.82      | 17.94±6.68      | 78.66±3.02        |

✨ IMPROVEMENT:
   ROUGE-1: +2.82%
   ROUGE-2: +0.02%
   ROUGE-L: +3.23%
   BERTScore: +0.51%

✓ Saved comparison plot to /kaggle/working/htas_output/comparison.png

🎉 POC COMPLETE!
✓ Results saved to /kaggle/working/htas_output
✓ Model demonstrates 3.23% ROUGE-L improvement over baseline
