## KAGGLE SETUP - Clone Repository

**IMPORTANT:** Before running other cells, clone the GitHub repository.

### For Kaggle Users:
Uncomment and run the git clone command in the cell below.

### For Local Users:
Skip the cell below and proceed to the next section.


In [None]:
# ============================================================================
# KAGGLE: Clone GitHub Repository (includes dataset)
# ============================================================================
# Uncomment the line below when running on Kaggle:

# !git clone https://github.com/Unknown1502/NeurIPS.git /kaggle/working/NeurIPS

# ============================================================================
# Verify clone success (optional):
# ============================================================================

# !ls -la /kaggle/working/NeurIPS
# !ls -la /kaggle/working/NeurIPS/data | head -20

print("Repository clone instructions ready.")
print("For KAGGLE: Uncomment the git clone command above and run this cell.")
print("For LOCAL: Skip this cell and continue.")


## Path Configuration Reference

Configuration for local and Kaggle environments:

| Component | Local Path | Kaggle Path |
|-----------|------------|-------------|
| Repository | `c:\Users\prajw\OneDrive\Desktop\google golf` | `/kaggle/working/NeurIPS` |
| Data | `./data` | `/kaggle/working/NeurIPS/data` |
| Solutions | `./solutions` | `/kaggle/working/solutions` |

**Note:** The dataset is included in the GitHub repository, so no separate dataset upload is required.


# Google Code Golf 2025 - ARC-AGI Competition Notebook

## Competition Overview

**Objective:** Create 400 Python programs that implement grid transformations for ARC-AGI tasks.

**Scoring:**
- Correct solution: `max(1, 2500 - byte_count)` points
- Incorrect solution: 0.001 points

**Key Requirements:**
- Solutions must work on ALL examples (train + test + arc-gen)
- Minimize byte count for maximum score
- Each solution is a standalone Python file

This notebook provides a complete workflow for analyzing tasks, developing solutions, and optimizing code.

## 1. Setup and Import Required Libraries

Import the ARC solver framework and utility modules.

In [None]:
import json
import sys
from pathlib import Path

# ============================================================================
# PATH CONFIGURATION
# ============================================================================

# Repository path
# LOCAL:  repo_path = r'c:\Users\prajw\OneDrive\Desktop\google golf'
# KAGGLE: repo_path = '/kaggle/working/NeurIPS'

repo_path = '/kaggle/working/NeurIPS'  # KAGGLE PATH
sys.path.insert(0, repo_path)

# Data directory path (dataset is in the GitHub repo)
# LOCAL:  DATA_DIR = "./data"
# KAGGLE: DATA_DIR = "/kaggle/working/NeurIPS/data"

DATA_DIR = "/kaggle/working/NeurIPS/data"  # KAGGLE PATH

# Solutions output directory
# LOCAL:  SOLUTIONS_DIR = "./solutions"
# KAGGLE: SOLUTIONS_DIR = "/kaggle/working/solutions"

SOLUTIONS_DIR = "/kaggle/working/solutions"  # KAGGLE PATH

# ============================================================================
# IMPORTS
# ============================================================================

from arc_solver import ARCTaskSolver
from utils import grid_operations as go
from utils import pattern_detection as pd
from collections import Counter
from typing import List, Tuple, Dict

# Create solutions directory
Path(SOLUTIONS_DIR).mkdir(exist_ok=True, parents=True)

# ============================================================================
# VERIFY SETUP
# ============================================================================

print("="*70)
print("SETUP COMPLETE")
print("="*70)
print(f"Repository path: {repo_path}")
print(f"Data directory: {DATA_DIR}")
print(f"Solutions directory: {Path(SOLUTIONS_DIR).absolute()}")
print("="*70)


## 2. Verify Dataset

Verify the dataset is accessible and check its structure.


In [None]:
# Verify dataset exists and check structure
print("DATASET VERIFICATION")
print("="*60)

data_path = Path(DATA_DIR)
if data_path.exists():
    print(f"Data directory found: {data_path}")
    
    # Count task files
    task_files = list(data_path.glob("task*.json"))
    print(f"Total task files: {len(task_files)}")
    
    if task_files:
        # Show first few task files
        print(f"\nFirst 5 task files:")
        for i, task_file in enumerate(task_files[:5], 1):
            print(f"  {i}. {task_file.name}")
        
        # Check a sample task structure
        with open(task_files[0], 'r') as f:
            sample = json.load(f)
        
        print(f"\nSample task structure (task001.json):")
        print(f"  Keys: {list(sample.keys())}")
        if 'train' in sample:
            print(f"  Train examples: {len(sample['train'])}")
        if 'test' in sample:
            print(f"  Test examples: {len(sample['test'])}")
        if 'arc-gen' in sample:
            print(f"  Arc-gen examples: {len(sample.get('arc-gen', []))}")
        
        print("\nDataset verification successful!")
    else:
        print("ERROR: No task files found in data directory")
else:
    print(f"ERROR: Data directory not found: {data_path}")
    print("Make sure you have cloned the repository in the previous cell")

print("="*60)


## 3. Initialize the Solver Framework

After finding the correct dataset above, initialize the solver.

In [None]:
# Initialize the solver framework
solver = ARCTaskSolver(data_dir=DATA_DIR, solutions_dir=SOLUTIONS_DIR)

# Verify initialization
task_files = list(Path(DATA_DIR).glob("task*.json"))
print(f"Solver initialized with {len(task_files)} tasks")

if task_files:
    # Check first task
    with open(task_files[0], 'r') as f:
        sample = json.load(f)
    
    print(f"\nTask structure verified:")
    print(f"  Train examples: {len(sample.get('train', []))}")
    print(f"  Test examples: {len(sample.get('test', []))}")
    print(f"  Arc-gen examples: {len(sample.get('arc-gen', []))}")
    
    print("\nSolver initialized successfully!")
else:
    print("\nERROR: No task files found")
    print("Verify DATA_DIR path is correct")


## 3. Load and Explore a Task

Let's analyze Task 1 to understand the pattern.

In [None]:
# Select task to analyze
TASK_ID = 1

# Load task data
task_data = solver.load_task(TASK_ID)

# Print detailed analysis
solver.print_task_analysis(TASK_ID)

### Visualize Training Examples

Let's examine the first few training examples to understand the transformation pattern.

In [None]:
def visualize_example(input_grid, output_grid, example_num=1):
    """Display an input-output pair."""
    print(f"\n{'='*60}")
    print(f"Example {example_num}")
    print(f"{'='*60}")
    
    print(f"\nInput ({len(input_grid)}x{len(input_grid[0]) if input_grid else 0}):")
    for row in input_grid:
        print("  ", row)
    
    print(f"\nOutput ({len(output_grid)}x{len(output_grid[0]) if output_grid else 0}):")
    for row in output_grid:
        print("  ", row)
    
    print()

# Visualize first 2 training examples
print("TRAINING EXAMPLES:")
for i, pair in enumerate(task_data['train'][:2], 1):
    visualize_example(pair['input'], pair['output'], i)

## 4. Pattern Detection and Analysis

Use the pattern detection utilities to understand the transformation type.

In [None]:
# Gather input and output grids
input_grids = [pair['input'] for pair in task_data['train']]
output_grids = [pair['output'] for pair in task_data['train']]

# Detect transformation type
print("TRANSFORMATION CHARACTERISTICS:")
trans_type = pd.detect_transformation_type(input_grids, output_grids)
for key, value in trans_type.items():
    if value:
        print(f"  [X] {key.replace('_', ' ').title()}")

print("\n" + "="*60)
print("COMPLEXITY ANALYSIS:")
complexity = pd.analyze_task_complexity(input_grids, output_grids)
for key, value in complexity.items():
    print(f"  {key}: {value}")

print("\n" + "="*60)
print("SUGGESTED APPROACHES:")
suggestions = pd.suggest_approach(input_grids, output_grids)
for i, suggestion in enumerate(suggestions, 1):
    print(f"  {i}. {suggestion}")


## 5. Develop Solution - Initial Implementation

Based on the analysis, implement a clear solution first (optimize later).

In [None]:
# Example solution for Task 1 (tiling pattern)
# Pattern observed: Input is 3x3, output tiles it into 3x3 arrangement (9x9 total)

def solve_v1(grid):
    """
    Clear implementation - tile the input 3x3.
    This is the initial version focused on correctness.
    """
    # Tile horizontally 3 times
    tiled_horizontal = [row * 3 for row in grid]
    
    # Tile vertically 3 times
    result = tiled_horizontal * 3
    
    return result

# Test on first training example
test_input = task_data['train'][0]['input']
test_output = task_data['train'][0]['output']
result = solve_v1(test_input)

print("Testing initial solution on first training example:")
print(f"Input dimensions: {len(test_input)}x{len(test_input[0])}")
print(f"Expected output dimensions: {len(test_output)}x{len(test_output[0])}")
print(f"Actual output dimensions: {len(result)}x{len(result[0])}")
print(f"Match: {result == test_output}")

## 6. Test Solution on All Examples

Test the solution against ALL example sets: train, test, and arc-gen.

In [None]:
# Test solution comprehensively
is_correct, message, stats = solver.test_solution(solve_v1, task_data, verbose=False)

print("="*60)
print("COMPREHENSIVE TEST RESULTS")
print("="*60)
print(f"\nTrain examples: {stats['train']['passed']}/{stats['train']['total']}")
print(f"Test examples: {stats['test']['passed']}/{stats['test']['total']}")
print(f"Arc-gen examples: {stats['arc-gen']['passed']}/{stats['arc-gen']['total']}")

total_passed = sum(s['passed'] for s in stats.values())
total_tests = sum(s['total'] for s in stats.values())

print(f"\nTOTAL: {total_passed}/{total_tests} passed")
print(f"Success rate: {100 * total_passed / total_tests:.2f}%")

if is_correct:
    print("\nSUCCESS: SOLUTION IS CORRECT")
else:
    print(f"\nFAILURE: SOLUTION FAILED")
    print(f"Details: {message}")

## 7. Code Golf Optimization

After verifying correctness, optimize the solution to minimize byte count.

In [None]:
# Optimized version - minimize byte count
solve = lambda g: [r*3 for r in g]*3

# Compare byte counts
v1_code = "def solve_v1(grid):\n    tiled_horizontal = [row * 3 for row in grid]\n    result = tiled_horizontal * 3\n    return result"
v2_code = "solve = lambda g: [r*3 for r in g]*3"

v1_bytes = len(v1_code.encode('utf-8'))
v2_bytes = len(v2_code.encode('utf-8'))

print("CODE GOLF OPTIMIZATION RESULTS:")
print("="*60)
print(f"\nInitial version (clear):")
print(f"  Byte count: {v1_bytes}")
print(f"  Score (if correct): {max(1, 2500 - v1_bytes)}")

print(f"\nOptimized version (golf):")
print(f"  Byte count: {v2_bytes}")
print(f"  Score (if correct): {max(1, 2500 - v2_bytes)}")

print(f"\nImprovement:")
print(f"  Bytes saved: {v1_bytes - v2_bytes}")
print(f"  Points gained: {max(1, 2500 - v2_bytes) - max(1, 2500 - v1_bytes)}")

# Verify optimized version still works
is_correct_opt, _, _ = solver.test_solution(solve, task_data, verbose=False)
print(f"\nOptimized version correctness: {'PASS' if is_correct_opt else 'FAIL'}")

## 8. Save and Score Solution

Process the solution through the framework to save it and calculate the final score.

In [None]:
# Final solution code (as it will be saved)
final_solution = """solve=lambda g:[r*3 for r in g]*3"""

# Process solution and save if correct
result = solver.process_solution(TASK_ID, final_solution, verbose=True)

print("\n" + "="*60)
print("FINAL RESULTS")
print("="*60)
print(f"Task ID: {result['task_id']}")
print(f"Correctness: {'PASS' if result['success'] else 'FAIL'}")
print(f"Byte count: {result['byte_count']}")
print(f"Score: {result['score']:.3f} points")

if result['success']:
    print(f"\nSolution saved to: solutions/task{TASK_ID:03d}.py")
else:
    print(f"\nSolution not saved (failed validation)")

## 9. Batch Processing Multiple Tasks

Workflow for systematically solving multiple tasks.

In [None]:
# Batch analyze multiple tasks
START_TASK = 1
END_TASK = 10

print(f"Analyzing tasks {START_TASK} to {END_TASK}...")
print("="*60)

for task_id in range(START_TASK, END_TASK + 1):
    try:
        analysis = solver.analyze_task(task_id)
        print(f"Task {task_id:03d}: "
              f"train={analysis['train_count']}, "
              f"test={analysis['test_count']}, "
              f"arc-gen={analysis['arc_gen_count']}, "
              f"total={analysis['total_pairs']}")
    except Exception as e:
        print(f"Task {task_id:03d}: ERROR - {e}")

print("="*60)

## 10. Track Progress

Monitor your overall progress across all 400 tasks.

In [None]:
# Display progress report
solver.report_progress()

## 11. Generate Submission

Create the submission ZIP file with all your solutions.

In [None]:
# Create submission ZIP file
solver.create_submission_zip("submission.zip")

# Verify submission file
submission_path = Path("submission.zip")
if submission_path.exists():
    print(f"\nSubmission file created: {submission_path.absolute()}")
    print(f"File size: {submission_path.stat().st_size:,} bytes")
else:
    print("\nSubmission file not found")

## Appendix: Useful Code Golf Techniques

Common optimization techniques for minimizing byte count.

In [None]:
print("CODE GOLF OPTIMIZATION TECHNIQUES:")
print("="*60)

techniques = {
    "Lambda functions": {
        "Before": "def solve(g): return transform(g)",
        "After": "solve=lambda g:transform(g)",
        "Savings": "~15 bytes"
    },
    "Remove whitespace": {
        "Before": "x = y + z",
        "After": "x=y+z",
        "Savings": "2 bytes"
    },
    "List comprehensions": {
        "Before": "result=[]\nfor r in g: result.append(r*2)",
        "After": "[r*2 for r in g]",
        "Savings": "~25 bytes"
    },
    "Single-letter variables": {
        "Before": "grid, row, cell",
        "After": "g, r, c",
        "Savings": "~10 bytes per variable"
    },
    "Zip for transpose": {
        "Before": "[[g[i][j] for i in range(len(g))] for j in range(len(g[0]))]",
        "After": "[list(r) for r in zip(*g)]",
        "Savings": "~30 bytes"
    },
    "Boolean arithmetic": {
        "Before": "sum(1 for c in row if c==5)",
        "After": "sum(c==5 for c in row)",
        "Savings": "~5 bytes"
    }
}

for name, details in techniques.items():
    print(f"\n{name}:")
    print(f"  Before ({len(details['Before'])} bytes): {details['Before']}")
    print(f"  After  ({len(details['After'])} bytes): {details['After']}")
    print(f"  Savings: {details['Savings']}")

## Summary and Next Steps

### Workflow Recap

1. **Analyze Task** - Use solver.print_task_analysis(task_id)
2. **Detect Patterns** - Use pattern detection utilities
3. **Implement Solution** - Start with clear, correct version
4. **Test Thoroughly** - Verify on ALL examples (train + test + arc-gen)
5. **Optimize** - Apply code golf techniques to minimize bytes
6. **Save and Score** - Use solver.process_solution()
7. **Iterate** - Repeat for all 400 tasks

### Key Reminders

- **Correctness First**: Wrong answer = 0.001 points (essentially zero)
- **Test Comprehensively**: Solutions must pass ALL examples (~250+ per task)
- **Optimize Second**: Only after verifying correctness
- **Track Progress**: Use solver.report_progress() regularly
- **Create Submission**: Use solver.create_submission_zip() when ready

### Scoring Reference

- 50-byte solution: 2450 points
- 100-byte solution: 2400 points
- 500-byte solution: 2000 points
- 1000-byte solution: 1500 points
- 2500+ byte solution: 1 point

### Target Goals

- Solve: 380+ tasks (95% completion)
- Average: 2200-2400 points per task
- Total: 880,000-960,000 points

### Resources

- Framework documentation: README.md
- Competition guide: todo.md
- Example solutions: examples/example_solutions.py
- Utility functions: utils/grid_operations.py, utils/pattern_detection.py

Good luck with the competition!