<a href="https://colab.research.google.com/github/CalebTalley2024/ARC-AGI-2/blob/vedant/notebooks/task_viz.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# ARC-AGI Task Visualization

This notebook visualizes ARC-AGI puzzles from training dataset JSON files. It displays input-output pairs in a clear format to help understand the dataset structure and patterns.

## Features:
- Load any ARC training JSON file by path
- Visualize all training pairs in 2-column format (Input | Output)
- Handle variable number of puzzles per file
- Compatible with Google Colab

SyntaxError: invalid syntax (ipython-input-3911080781.py, line 3)

In [None]:
# Setup for Google Colab
!git clone https://github.com/CalebTalley2024/ARC-AGI-2.git
%cd ARC-AGI-2

# Install required packages
!pip install matplotlib numpy

In [None]:
import json
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import os

# Color map for visualization (0-9 colors for ARC grids)
COLOR_MAP = [
    '#000000',  # 0: black
    '#0074D9',  # 1: blue
    '#FF4136',  # 2: red
    '#2ECC40',  # 3: green
    '#FFDC00',  # 4: yellow
    '#AAAAAA',  # 5: grey
    '#F012BE',  # 6: magenta
    '#FF851B',  # 7: orange
    '#7FDBFF',  # 8: sky blue
    '#870C25'   # 9: brown
]

print("Libraries loaded successfully!")

In [None]:
def load_arc_data(data_path):
    """Load ARC training or evaluation data"""
    try:
        with open(data_path, 'r') as f:
            data = json.load(f)
        print(f"Loaded {len(data)} tasks from {data_path}")
        return data
    except FileNotFoundError:
        print(f"Data file not found: {data_path}")
        return None
    except json.JSONDecodeError as e:
        print(f"Error parsing JSON: {e}")
        return None

def plot_grid(grid, title="", ax=None):
    """Plot a single ARC grid with proper colors"""
    if ax is None:
        fig, ax = plt.subplots(1, 1, figsize=(6, 6))

    grid_array = np.array(grid)
    height, width = grid_array.shape

    # Create color mapping
    colored_grid = np.zeros((height, width, 3))
    for i in range(height):
        for j in range(width):
            color_hex = COLOR_MAP[grid_array[i, j]]
            # Convert hex to RGB
            r = int(color_hex[1:3], 16) / 255.0
            g = int(color_hex[3:5], 16) / 255.0
            b = int(color_hex[5:7], 16) / 255.0
            colored_grid[i, j] = [r, g, b]

    ax.imshow(colored_grid, interpolation='nearest')
    ax.set_title(title, fontsize=12, fontweight='bold')
    ax.set_xticks([])
    ax.set_yticks([])

    # Add grid lines
    for i in range(height + 1):
        ax.axhline(i - 0.5, color='white', linewidth=1)
    for j in range(width + 1):
        ax.axvline(j - 0.5, color='white', linewidth=1)

    return ax

print("Grid plotting functions defined!")

In [None]:
def visualize_task_pairs(task_data, task_id, max_pairs=3):
    """Visualize input/output pairs for a specific task in 2-column format"""
    if task_id not in task_data:
        print(f"Task {task_id} not found in data")
        return

    task = task_data[task_id]
    train_pairs = task['train']
    test_pairs = task['test']

    num_train = min(len(train_pairs), max_pairs)
    num_test = min(len(test_pairs), max_pairs)
    total_pairs = num_train + num_test

    if total_pairs == 0:
        print("No pairs to visualize")
        return

    # Create subplot grid: 2 columns (Input | Output), rows for each pair
    fig, axes = plt.subplots(total_pairs, 2, figsize=(12, 4*total_pairs))

    # Handle single row case
    if total_pairs == 1:
        axes = axes.reshape(1, -1)

    # Plot training pairs
    for i in range(num_train):
        pair = train_pairs[i]
        input_grid = pair['input']
        output_grid = pair['output']

        plot_grid(input_grid, f"Train {i+1} - Input", axes[i, 0])
        plot_grid(output_grid, f"Train {i+1} - Output", axes[i, 1])

    # Plot test pairs
    for i in range(num_test):
        pair = test_pairs[i]
        input_grid = pair['input']
        output_grid = pair['output']

        row_idx = num_train + i
        plot_grid(input_grid, f"Test {i+1} - Input", axes[row_idx, 0])
        plot_grid(output_grid, f"Test {i+1} - Output", axes[row_idx, 1])

    plt.suptitle(f"ARC Task: {task_id}", fontsize=16, fontweight='bold')
    plt.tight_layout()
    plt.show()

def browse_tasks(data, start_idx=0, num_tasks=5):
    """Browse through multiple tasks with their IDs"""
    task_ids = list(data.keys())
    end_idx = min(start_idx + num_tasks, len(task_ids))

    print(f"Showing tasks {start_idx+1} to {end_idx} of {len(task_ids)} total tasks:")
    print("-" * 50)

    for i in range(start_idx, end_idx):
        task_id = task_ids[i]
        task = data[task_id]
        num_train = len(task['train'])
        num_test = len(task['test'])
        print(f"{i+1:3d}. Task ID: {task_id} | Train: {num_train} | Test: {num_test}")

    print("-" * 50)
    print(f"To visualize a task, use: visualize_task_pairs(data, 'task_id')")

print("Task visualization functions defined!")

In [None]:
import random

def visualize_random_tasks(data, num_tasks=3, show_all_pairs=True):
    """
    Randomly select tasks and display all their input/output pairs

    Args:
        data: Dictionary of ARC tasks
        num_tasks: Number of random tasks to select (default: 3)
        show_all_pairs: If True, shows all pairs; if False, uses max_pairs limit
    """
    if not data:
        print("No data provided")
        return

    task_ids = list(data.keys())
    if len(task_ids) < num_tasks:
        print(f"Only {len(task_ids)} tasks available, showing all")
        selected_tasks = task_ids
    else:
        selected_tasks = random.sample(task_ids, num_tasks)

    print(f"Randomly selected {len(selected_tasks)} tasks:")
    print("=" * 60)

    for idx, task_id in enumerate(selected_tasks, 1):
        task = data[task_id]
        train_pairs = task['train']
        test_pairs = task['test']

        print(f"\n{idx}. Task ID: {task_id}")
        print(f"   Training pairs: {len(train_pairs)}")
        print(f"   Test pairs: {len(test_pairs)}")
        print("-" * 40)

        # Calculate total pairs to show
        if show_all_pairs:
            num_train = len(train_pairs)
            num_test = len(test_pairs)
        else:
            # Use reasonable limits to avoid overwhelming display
            num_train = min(len(train_pairs), 3)
            num_test = min(len(test_pairs), 3)

        total_pairs = num_train + num_test

        if total_pairs == 0:
            print("No pairs to visualize")
            continue

        # Create subplot grid: 2 columns (Input | Output), rows for each pair
        fig, axes = plt.subplots(total_pairs, 2, figsize=(12, 4*total_pairs))

        # Handle single row case
        if total_pairs == 1:
            axes = axes.reshape(1, -1)

        row_idx = 0

        # Plot training pairs
        for i in range(num_train):
            pair = train_pairs[i]
            input_grid = pair['input']
            output_grid = pair['output']

            plot_grid(input_grid, f"Train {i+1} - Input", axes[row_idx, 0])
            plot_grid(output_grid, f"Train {i+1} - Output", axes[row_idx, 1])
            row_idx += 1

        # Plot test pairs
        for i in range(num_test):
            pair = test_pairs[i]
            input_grid = pair['input']
            output_grid = pair['output']

            plot_grid(input_grid, f"Test {i+1} - Input", axes[row_idx, 0])
            plot_grid(output_grid, f"Test {i+1} - Output", axes[row_idx, 1])
            row_idx += 1

        plt.suptitle(f"Task {idx}: {task_id}", fontsize=16, fontweight='bold')
        plt.tight_layout()
        plt.show()

        if idx < len(selected_tasks):
            print("\n" + "="*60)

def visualize_random_sample():
    """Quick function to visualize 3 random tasks from loaded training data"""
    if 'training_data' in globals() and training_data:
        print("Visualizing 3 random tasks from TRAINING data:")
        visualize_random_tasks(training_data, num_tasks=3, show_all_pairs=True)
    elif 'eval_data' in globals() and eval_data:
        print("Visualizing 3 random tasks from EVALUATION data:")
        visualize_random_tasks(eval_data, num_tasks=3, show_all_pairs=True)
    else:
        print("No data loaded. Please run the data loading cells first.")

print("Random task visualization functions defined!")

In [None]:
# Load ARC training data
training_data = load_arc_data('data/raw/arc/training.txt')

if training_data:
    print(f"Successfully loaded {len(training_data)} training tasks")

    # Browse first few tasks
    browse_tasks(training_data, start_idx=0, num_tasks=10)

In [None]:
# Example: Visualize specific tasks or random selection

# Option 1: Visualize a specific task
# Uncomment and modify the task ID to visualize:
# visualize_task_pairs(training_data, '00d62c1b', max_pairs=2)

# Option 2: Visualize 3 random tasks with ALL their pairs
# Uncomment to run:
# visualize_random_sample()

# Option 3: Visualize random tasks from specific dataset
# Uncomment to run:
# visualize_random_tasks(training_data, num_tasks=3, show_all_pairs=True)
# visualize_random_tasks(eval_data, num_tasks=2, show_all_pairs=False)

print("Visualization options:")
print("1. visualize_task_pairs(data, 'task_id', max_pairs=N) - specific task")
print("2. visualize_random_sample() - 3 random tasks from loaded data")
print("3. visualize_random_tasks(data, num_tasks=N, show_all_pairs=True) - custom random selection")

In [None]:
# Load evaluation data for comparison
eval_data = load_arc_data('data/raw/arc/evaluation.txt')

if eval_data:
    print(f"Successfully loaded {len(eval_data)} evaluation tasks")
    print("Evaluation tasks (first 5):")
    browse_tasks(eval_data, start_idx=0, num_tasks=5)

## Quick Start Guide

1. **Setup**: Run the first two cells to install dependencies and import libraries
2. **Load Data**: Execute the data loading cells to see available tasks
3. **Browse**: Use `browse_tasks(data, start_idx=0, num_tasks=10)` to see task IDs
4. **Visualize**: Choose from multiple visualization options below
5. **Explore**: Try different parameters to understand various puzzle patterns

## Visualization Options

### Random Task Selection (NEW!)
- `visualize_random_sample()`: Quick way to see 3 random tasks from loaded data
- `visualize_random_tasks(data, num_tasks=3, show_all_pairs=True)`: Custom random selection

### Specific Task Selection
- `visualize_task_pairs(data, 'task_id', max_pairs=3)`: Show specific task with limited pairs
- `browse_tasks(data, start_idx=0, num_tasks=10)`: List available task IDs

## Key Functions

- `load_arc_data(path)`: Load training or evaluation data
- `browse_tasks(data, start_idx, num_tasks)`: List available tasks with IDs
- `visualize_task_pairs(data, task_id, max_pairs)`: Show input/output pairs for specific task
- `visualize_random_tasks(data, num_tasks, show_all_pairs)`: Show random tasks with all/limited pairs
- `visualize_random_sample()`: Quick random selection from loaded data
- `plot_grid(grid, title)`: Plot individual grid with proper colors

## Parameters Explained

- `max_pairs`: Limits pairs shown per task (for specific task visualization)
- `num_tasks`: Number of tasks to display (for browsing or random selection)
- `show_all_pairs`: If True, shows ALL pairs in selected tasks; if False, limits to 3 pairs
- `start_idx`: Starting position for browsing tasks

## Tips

- Each task has multiple training pairs and test pairs
- Grids use colors 0-9 mapped to distinct colors
- Input/Output pairs show the transformation pattern to learn
- Random selection helps discover diverse puzzle types
- Use `show_all_pairs=False` for large tasks to avoid overwhelming displays