# 🖥️ Local Benchmark for ARC-AGI

This notebook is designed to **evaluate your ARC-AGI solution locally** on your machine. It provides a streamlined setup for testing your approach before transitioning to the Kaggle environment. However, you may need to adjust a few paths to ensure compatibility with Kaggle.

## Imports

In [6]:
import json
from tqdm import tqdm
# -------------------
from abstract_and_reason import solver_v1
from abstract_and_reason.assets import load_json, sort_challenges_by_size

## Let's define our Solver

In [7]:
abstract_and_reason = solver_v1.Solver(prod=False)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## JSON files loading

🚨🚨🚨 change `dev_path` to `prod_path` for Kaggle testing 🚨🚨🚨

In [8]:
dev_path = '../data/challenges/' # your own challenge directory
prod_path = '/kaggle/input/arc-prize-2024/' # path may change in 2025 arc-prize

base_path = dev_path # /!\ change dev_path to prod_path for Kaggle testing


submission_file_path = './submission.json'
sample_submission_file_path = base_path + 'sample_submission.json'

training_challenges = abstract_and_reason.training_challenges
training_solutions = abstract_and_reason.training_solutions
evaluation_challenges = abstract_and_reason.evaluation_challenges
evaluation_solutions = abstract_and_reason.evaluation_solutions
test_challenges = abstract_and_reason.test_challenges
sample_submission = abstract_and_reason.sample_submission

## Submission file creation and filling

The submission file is filled with 2 wrong attemps for each challenge, which makes it ready to get evaluated. 

Next cells bellow will fill the challenges answers using your own solution.

In [9]:
with open(submission_file_path, "w") as file:
        json.dump(sample_submission, file, indent=4)

# Submission

In [10]:
with open(submission_file_path, "r+") as outfile:
    submission_data = json.load(outfile)
    
    ids_test = sort_challenges_by_size(test_challenges)

    for i, challenge_id in enumerate(tqdm(ids_test)):
        # puzzle_inps_train, puzzle_outs_train, puzzle_inps_test, puzzle_outs_test = abstract_and_reason.process_challenge(challenge_id, test_challenges)
        
        attempt1 = abstract_and_reason.predict(challenge_id)
        # attempt2 = abstract_and_reason.predict(challenge_id)
        
        print(attempt1)
        # print("-------------------------------------------------------------")
        # print(attempt2)
        break # TODO: remove this line to predict all challenges
        
        result = []
        for j in range(len(attempt1)):
            result.append({
                'attempt_1': attempt1[j].tolist(),
                'attempt_2': attempt2[j].tolist()
            })
        
        submission_data[challenge_id] = result
        
        outfile.seek(0)
        json.dump(submission_data, outfile, indent=4)
        outfile.truncate()

  0%|          | 0/100 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
  0%|          | 0/100 [08:47<?, ?it/s]


KeyboardInterrupt: 

# Validation

In [None]:
import numpy as np

def get_score(model_answers, real_answers):
    """
    Computes a score based on the similarity between model-generated answers and real answers.
    It handles input matrices of different shapes and ensures comparisons are done within the bounds of the shortest list.

    Args:
        model_answers (list of lists): Model-generated answers as matrices (list of lists).
        real_answers (list of lists): Real answers as matrices (list of lists).

    Returns:
        int: The total score as an integer.
    """
    total_score = 0
    valid_comparisons = 0
    
    for i in range(min(len(model_answers), len(real_answers))):
        model_answer = np.array(model_answers[i])
        real_answer = np.array(real_answers[i])
        
        if model_answer.shape == real_answer.shape:
            score = ((model_answer == real_answer).astype(int)).mean()
            if score == 1.0:
                total_score += 1
            valid_comparisons += 1
    
    return int(total_score / valid_comparisons) if valid_comparisons > 0 else 0


def get_anwser(challenge_id, answers_file_path):
    sample_submission = load_json(answers_file_path)
    challenge = sample_submission[challenge_id]
    return challenge[0]

In [None]:
total_score = 0

with open(submission_file_path, "r") as outfile: # previously 'r+'
    submission_data = json.load(outfile)
    
    ids_test = sort_challenges_by_size(test_challenges)

    for i, challenge_id in enumerate(ids_test):

        ground_truth = get_anwser(challenge_id, sample_submission_file_path)
        model_answer = get_anwser(challenge_id, submission_file_path)

        challenge_score = 0
        for attempt in ground_truth.keys():
            challenge_score += get_score(model_answer[attempt], ground_truth[attempt])

        total_score += (challenge_score)/2

print(f"\n\nFinal score: {total_score}")



Final score: 100.0


### 💡 Save Yourself the Headache! 💡

I've made the mistakes so you don't have to! Now, you'll save tons of time working on ARC-AGI. 

If this notebook helped you avoid common pitfalls or sped up your progress, I'd love your support!

- **Follow me on Kaggle:** [Malo Le Mestre](https://www.kaggle.com/malolem)
- **Leave a ⭐ on the GitHub repo** [here](https://github.com/MaloLM/arc-agi-genesis) to show your appreciation and keep the project growing!
- **Upvote this notebook** on Kaggle if it saved you from banging your head against the wall!

Your feedback keeps me motivated and helps others avoid the same challenges. 

# Thank you! 🚀✨