# OrcaSword v3 - Kaggle Training & Submission

Complete pipeline for training and generating submission.json for ARC Prize 2025.

**Quick Start**: Just run all cells!

In [None]:
# Verify GPU availability
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

In [None]:
# Verify data is available
import os
from pathlib import Path

data_dir = Path("/kaggle/input/arc-prize-2025")
files = [
    "arc-agi_training_challenges.json",
    "arc-agi_training_solutions.json",
    "arc-agi_evaluation_challenges.json",
    "arc-agi_evaluation_solutions.json",
    "arc-agi_test_challenges.json"
]

print("Checking ARC Prize 2025 data...")
for f in files:
    path = data_dir / f
    if path.exists():
        size = path.stat().st_size / 1024
        print(f"  ‚úì {f} ({size:.1f} KB)")
    else:
        print(f"  ‚úó {f} MISSING")
        print("\n‚ö†Ô∏è Make sure to add 'ARC Prize 2025' dataset in Kaggle!")

In [None]:
# Upload train_and_submit.py to Kaggle or copy-paste the code here
# For now, we'll use the simplified inline version

# Option 1: If you uploaded the file
# !python train_and_submit.py --train --epochs 10 --submit

# Option 2: Run training inline (see next cells)

## Training Configuration

In [None]:
# Configuration
CONFIG = {
    'data_dir': '/kaggle/input/arc-prize-2025',
    'epochs': 10,
    'learning_rate': 1e-4,
    'embed_dim': 128,
    'num_layers': 4,
    'checkpoint_path': 'orcasword_v3.pt',
    'output_path': 'submission.json'
}

print("Training Configuration:")
for key, value in CONFIG.items():
    print(f"  {key}: {value}")

## Method 1: Use train_and_submit.py (Recommended)

In [None]:
# If you've uploaded train_and_submit.py to Kaggle:
!python train_and_submit.py \
    --data_dir {CONFIG['data_dir']} \
    --train \
    --epochs {CONFIG['epochs']} \
    --lr {CONFIG['learning_rate']} \
    --embed_dim {CONFIG['embed_dim']} \
    --num_layers {CONFIG['num_layers']} \
    --checkpoint {CONFIG['checkpoint_path']} \
    --submit \
    --output {CONFIG['output_path']}

## Method 2: Inline Training Code (Alternative)

If you don't want to upload the script, copy the code from `train_and_submit.py` into cells below.

## Validate Submission

In [None]:
import json

# Load and validate submission
with open(CONFIG['output_path'], 'r') as f:
    submission = json.load(f)

print(f"Submission Statistics:")
print(f"  Total tasks: {len(submission)}")
print(f"  First task ID: {submission[0]['task_id']}")
print(f"  Sample output shape: {len(submission[0]['attempt_1'])}x{len(submission[0]['attempt_1'][0])}")

# Validate format
required_keys = ['task_id', 'attempt_1', 'attempt_2']
all_valid = all(all(k in task for k in required_keys) for task in submission)

if all_valid:
    print("\n‚úì Submission format is VALID!")
    print(f"‚úì Ready to submit: {CONFIG['output_path']}")
else:
    print("\n‚úó Submission format is INVALID - check required keys")

## Download Submission

The submission.json file is now in `/kaggle/working/` and ready to download!

In [None]:
# Check file size
import os
size = os.path.getsize(CONFIG['output_path']) / 1024
print(f"Submission file size: {size:.2f} KB")

# In Kaggle, this file will appear in the output section
print(f"\n‚úì Download from: /kaggle/working/{CONFIG['output_path']}")

## Expected Performance

With default settings (10 epochs):
- **Training time**: ~30-45 minutes on GPU
- **Submission time**: ~2-5 minutes
- **Total**: <1 hour
- **Expected accuracy**: 20-30% pixel-level

To improve accuracy:
- Increase epochs (e.g., 20-50)
- Increase model size (embed_dim=256, num_layers=8)
- Use full OrcaSword v3 model (orcaswordv3.py)

## Next Steps

1. Download `submission.json`
2. Submit to ARC Prize 2025 competition
3. Check leaderboard score
4. Iterate and improve!

Good luck! üöÄ