## ‚ö†Ô∏è IMPORTANT: Kaggle Session Management

**Critical:** `/kaggle/working/` is temporary and **gets wiped** when:
- Notebook session expires (idle timeout)
- You close/restart the notebook
- Kernel crashes or restarts

**To avoid losing work:**
1. ‚úÖ Download files immediately after each major step
2. ‚úÖ Keep notebook active (don't close browser tab)
3. ‚úÖ If training overnight, check every few hours
4. ‚úÖ After training completes, **download model FIRST** before doing anything else

**Recovery strategy:** Each step creates checkpoints. If session dies, you can resume from the last completed step.

## ‚ö†Ô∏è BEFORE YOU START - Critical Setup

**To prevent losing your trained model:**

1. **Enable "Save & Run All (Commit)"** in Kaggle
   - This saves outputs even if session expires
   - Go to: File ‚Üí Save Version ‚Üí Save & Run All

2. **Alternative: Download immediately after training**
   - Don't wait until morning
   - Run the "Save Model Immediately" cell right after training completes

**Kaggle sessions expire** - working directory `/kaggle/working/` gets wiped! Plan accordingly.

# Game Bug Classifier - Step by Step Pipeline

**Setup Required:**
- ‚úÖ Kaggle GPU enabled (2x T4)
- ‚úÖ Internet enabled
- ‚úÖ Input dataset: `data-splits` (train.jsonl, val.jsonl, test.jsonl)

**Timeline:**
- Setup: 2 min
- Label improvement: ~3 hours
- Training: ~2 hours
- Evaluation: ~5 min
- **Total: ~5 hours**

## Step 1: Clone Repository from GitHub

In [None]:
# Clone the repository
!git clone https://github.com/FarazENEU/game-bug-classifier.git
!ls -la game-bug-classifier/

In [None]:
# Navigate to repo and check structure
%cd game-bug-classifier
!pwd
!ls scripts/

## Step 2: Install Dependencies

Install all required packages from requirements-kaggle.txt

In [None]:
# Install dependencies from requirements-kaggle.txt
!pip install -q -r requirements-kaggle.txt

## Step 3: Verify Data Access

In [None]:
# Check input data is accessible
!ls -lh /kaggle/input/data-splits/

# Check working directory is writable
!ls -lh /kaggle/working/

## Step 4: Improve Labels with Mistral Base Model

**Runtime:** ~3 hours for 1399 examples

This uses the instruction-tuned Mistral-7B base model to generate better labels than our keyword-based heuristics.

In [None]:
# Run label improvement script
!python scripts/improve_labels.py

In [None]:
# Verify improved labels were created
!ls -lh /kaggle/working/train_improved.jsonl
!wc -l /kaggle/working/train_improved.jsonl

### ‚úÖ Checkpoint: Labels Improved

**Safe to proceed!** If session dies now, you can skip Step 4 and start from Step 5 (training) using the improved labels.

## Step 5: Train Model with Improved Labels

**Runtime:** ~2 hours (264 steps, 3 epochs)

Fine-tune Mistral-7B with QLoRA (4-bit quantization + LoRA adapters).

In [None]:
# Run training script (uses improved labels by default)
!python scripts/train.py

In [None]:
# Verify model was saved
!ls -lh /kaggle/working/outputs/final_model/
!du -sh /kaggle/working/outputs/final_model/

In [None]:
# IMPORTANT: Zip model immediately after training
!cd /kaggle/working && zip -r bug_classifier_v2.zip outputs/final_model/

# Also save improved training data
!cp /kaggle/working/train_improved.jsonl /kaggle/working/outputs/

print("\n‚úÖ Model and data saved!")
print("\n‚ö° DOWNLOAD THESE NOW before session expires:")
print("- /kaggle/working/bug_classifier_v2.zip")
print("\nKaggle Output panel ‚Üí Click download icon")

## ‚ö†Ô∏è CRITICAL: Save Model Immediately

**Do this RIGHT AFTER training completes** to avoid losing the model if session expires!

## üö® CRITICAL: Download Model NOW!

**Training complete!** Your model is ready but **WILL BE LOST** if session dies.

**DO THIS IMMEDIATELY:**
1. Run the next cell to zip the model
2. Right-click the zip file in Kaggle's file browser ‚Üí Download
3. **Don't proceed** until download completes
4. Verify download finished before continuing

**If you skip this:** You'll have to retrain (2+ hours lost)!

In [None]:
# Zip model immediately after training
!cd /kaggle/working && zip -r bug_classifier_v2.zip outputs/final_model/

print("\n" + "="*70)
print("üî• MODEL ZIPPED - DOWNLOAD NOW!")
print("="*70)
print("\nüì• In Kaggle's file browser (right sidebar):")
print("   1. Navigate to /kaggle/working/")
print("   2. Find bug_classifier_v2.zip")
print("   3. Right-click ‚Üí Download")
print("   4. ‚ö†Ô∏è  WAIT for download to complete before proceeding!")
print("\n" + "="*70)

!ls -lh /kaggle/working/bug_classifier_v2.zip

## Step 6: Evaluate Model on Test Set

**Runtime:** ~5 minutes for 50 examples

Test the fine-tuned model and calculate accuracy metrics.

In [None]:
# Run evaluation script
!python scripts/evaluate.py \
    --model_path /kaggle/working/outputs/final_model \
    --base_model mistralai/Mistral-7B-Instruct-v0.2 \
    --test_path /kaggle/input/data-splits/test.jsonl \
    --num_samples 50

In [None]:
# Display results
import json

with open('/kaggle/working/outputs/evaluation_results.json', 'r') as f:
    results = json.load(f)

print("="*70)
print("üìä FINAL EVALUATION RESULTS")
print("="*70)
print(f"\nSeverity Accuracy:       {results['metrics']['severity_accuracy']:.2f}%")
print(f"Component Accuracy:      {results['metrics']['component_accuracy']:.2f}%")
print(f"Reproducibility Accuracy: {results['metrics']['reproducibility_accuracy']:.2f}%")
print(f"\n{'='*70}")
print(f"Overall Average:         {results['metrics']['overall_average']:.2f}%")
print(f"{'='*70}")
print(f"\nTested on {results['num_samples']} examples")

## Step 7: Download Trained Model

Zip and download the fine-tuned model for local use.

In [None]:
# Verify all files are ready
import os

files_to_download = [
    "/kaggle/working/bug_classifier_v2.zip",
    "/kaggle/working/outputs/evaluation_results.json",
    "/kaggle/working/train_improved.jsonl"
]

print("="*70)
print("üì¶ FILES READY FOR DOWNLOAD")
print("="*70)

for filepath in files_to_download:
    if os.path.exists(filepath):
        size = os.path.getsize(filepath) / (1024*1024)  # MB
        print(f"‚úÖ {filepath} ({size:.1f} MB)")
    else:
        print(f"‚ùå {filepath} - MISSING!")

print("\n" + "="*70)
print("üì• Download from Kaggle's file browser (right sidebar)")
print("="*70)

## Summary

**What we accomplished:**
1. ‚úÖ Improved labels using Mistral base model (bootstrapping)
2. ‚úÖ Trained Mistral-7B with QLoRA on improved labels
3. ‚úÖ Evaluated model on held-out test set
4. ‚úÖ Model ready for download and deployment

**Expected improvements over v1:**
- Severity: 41% ‚Üí 60-70%+
- Component: 62% ‚Üí 70-75%+
- Reproducibility: 88% ‚Üí 90%+
- **Overall: 64% ‚Üí 75%+**

**Next steps:**
1. Download trained model
2. Build Flask API demo
3. Write business case analysis
4. Record video walkthrough
5. Write technical report
6. Submit assignment