# NLP Final Project: Dataset Cartography for Artifact Mitigation
## Fast GPU Training in Google Colab

This notebook runs the complete training pipeline using GPU acceleration for fast results.

## 1. Setup Environment

In [20]:
# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

CUDA available: True
GPU: Tesla T4
Memory: 15.8 GB


In [21]:
# Install required packages
!pip install datasets transformers torch evaluate matplotlib seaborn scipy



In [22]:
# Clone repository
!git clone https://github.com/agsilver108/nlp-fa25-final-project.git
%cd nlp-fa25-final-project

Cloning into 'nlp-fa25-final-project'...
remote: Enumerating objects: 48, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 48 (delta 7), reused 47 (delta 6), pack-reused 0 (from 0)[K
Receiving objects: 100% (48/48), 78.74 KiB | 3.94 MiB/s, done.
Resolving deltas: 100% (7/7), done.
/content/nlp-fa25-final-project/nlp-fa25-final-project


## 2. Run Fast Training

In [None]:
# Pull latest changes and run the FIXED training pipeline
print("üîß Pulling latest fixes from repository...")
!git pull origin main

print("üöÄ Starting training with FIXED evaluation configuration...")
print("‚è±Ô∏è  Expected training time: 15-25 minutes (3 epochs with evaluation)")
print("üìä Fixed metric_for_best_model error - should now show proper EM and F1 scores!")
print("üîß Using external colab_training.py (not the backup script)")

# Execute the external training script with all fixes
exec(open('colab_training.py').read())

üöÄ Starting Colab GPU Training...
Device: cuda
GPU: Tesla T4
Memory: 15.8 GB
üì¶ Loading model and tokenizer...
üìä Loading SQuAD dataset...
Training samples: 10000
Evaluation samples: 1000
üîÑ Preprocessing datasets...
Preprocessing completed in 0.0s


TypeError: TrainingArguments.__init__() got an unexpected keyword argument 'evaluation_strategy'

### Training Setup

The notebook will now:
1. Pull the latest fixes from GitHub
2. Execute the optimized `colab_training.py` script
3. Train both baseline and cartography-mitigated models
4. Generate comparison results with proper evaluation metrics

**Expected Results:**
- Baseline F1: 60-80% (good performance)
- Training time: 15-25 minutes with GPU acceleration
- Cartography comparison to show artifact mitigation effectiveness

## 3. View Results

In [None]:
# Load and display results with better error handling
import json
import os

results_file = '/content/colab_training_results.json'

if os.path.exists(results_file):
    with open(results_file, 'r') as f:
        results = json.load(f)

    print("üéØ Training Results Summary:")
    print(f"\nBaseline Model:")
    print(f"  Exact Match: {results['baseline']['exact_match']:.3f}")
    print(f"  F1 Score: {results['baseline']['f1']:.3f}")
    print(f"  Training Time: {results['baseline']['training_time']:.1f}s")

    print(f"\nCartography Model:")
    print(f"  Exact Match: {results['cartography']['exact_match']:.3f}")
    print(f"  F1 Score: {results['cartography']['f1']:.3f}")
    print(f"  Training Time: {results['cartography']['training_time']:.1f}s")

    print(f"\nImprovement:")
    print(f"  EM Diff: {results['improvement']['em_diff']:+.3f}")
    print(f"  F1 Diff: {results['improvement']['f1_diff']:+.3f}")
    
    # Quality assessment
    baseline_f1 = results['baseline']['f1']
    if baseline_f1 > 0.7:
        print(f"\n‚úÖ EXCELLENT: Baseline F1 > 70% indicates good training!")
    elif baseline_f1 > 0.5:
        print(f"\n‚úîÔ∏è  GOOD: Baseline F1 > 50% indicates decent training.")
    elif baseline_f1 > 0.2:
        print(f"\n‚ö†Ô∏è  OKAY: Baseline F1 > 20% indicates some learning occurred.")
    elif baseline_f1 > 0:
        print(f"\n‚ùå POOR: Baseline F1 very low, model barely learning.")
    else:
        print(f"\nüíÄ BROKEN: F1 = 0 indicates evaluation is not working!")
        
else:
    print("‚ùå Results file not found!")
    print("This could mean:")
    print("1. Training hasn't completed yet")
    print("2. Training failed with an error")
    print("3. The training script couldn't save results")
    print("\nCheck the training output above for error messages.")

## 4. Download Results

In [None]:
# Download trained models and results
from google.colab import files

# Zip results for download
!zip -r colab_results.zip /content/baseline_model /content/cartography_model /content/colab_training_results.json
files.download('colab_results.zip')