### Abstraction and Reasoning Corpus (ARC)

#### ARCathon

From the [ARCathon website](https://lab42.global/arcathon/):

> "ARCathon is a worldwide AI competition hosted in Davos, Switzerland. The challenge invites individual participants and representatives from companies and institutions to solve ARC, a task deemed impossible for state-of-the-art AI models. The intelligence test for algorithms was created by [François Chollet](https://fchollet.com/), a Google Senior Staff Engineer and Co-Host of ARCathon.

> "ARC consists of 1,000 tasks, including 100 secret ones that make up the private test set. The competition aims to develop AI capable of solving these private tasks without prior knowledge, requiring advanced abstraction abilities. While humans typically solve 80% of ARC tasks, current algorithms only reach 30.5% - a world record achieved by aggregating existing algorithms designed to solve ARC. Learn more about ARC on our [ARC Page](https://lab42.global/arc/) and Chollet's 2019 paper [On the Measure of Intelligence](https://arxiv.org/abs/1911.01547v2)."

> "Arcathon allows the use of the code and data for Open Source Code under [Apache 2.0 license](https://opensource.org/licenses/Apache-2.0), and any purpose, licensed under [Apache 2.0](https://opensource.org/licenses/Apache-2.0)."

#### ARC Task
- An ARC task consists of several example tests of how to solve the task and usually one test that you must solve.
- Each test consists of one input - what it looks like before - and one output - what it should look like after.
- Each test input consists of a grid with a certain height and width, where each of the cells can have one of ten colors.
- Your task is to find out how to transform the input to achieve the output, based on the examples!

#### ARC Data
Each task is a .json file within the zip file which contains 
  - 400 training tasks 
  - 400 evaluation tasks

#### Evals Approach

The goal of this notebook is to evaluate model's abstract reasoning abilities via the ARC task data.

In [None]:
# Install Evals if you haven't already
# %pip install -e ../.

In [None]:
!curl -O https://lab42.global/wp-content/uploads/2022/08/ARC-800-tasks.zip
!tar -xf ARC-800-tasks.zip 

In [None]:
import os
import json
import glob

In [None]:
def convert_to_chatml(data, cutoff_max_tokens_sample):
    """
    This function takes a string and converts it to a ChatML object.
    """
    chatml_data = [] # strip spaces and newlines from the input and output with 
    reference = str(data["train"]).replace(" ", "").replace("\n", "")
    inputmatrix = str(data["test"][0]["input"]).replace(" ", "").replace("\n", "")
    outputmatrix = str(data["test"][0]["output"]).replace(" ", "").replace("\n", "")
    user_query = f"Your task: generate a new output grid for the following unseen input grid, applying the same pattern shared among the training pairs. Input: { inputmatrix } Generate only the 2D output grid in a cleansed format of \"[[],[],...,[]]\" (no spaces like \" \" and no \"\\n\"). Output:"

    chatml_example = {
        "input": [
            {"role": "system", "content": f"You're AbstractReasonGPT, an expert at analyzing pairs of 2D grids that are filled with integers, each representing a different color. You're the best in the world at finding the patterns and generating outputs for new inputs that apply the same pattern. An abstract, conceptual pattern exists between all of the following pairs. {reference}"},
            {"role": "user", "content": user_query}
        ],
        "ideal": str(outputmatrix)
    }
    # if the chatml_example is too long, we do not use it
    if len(str(chatml_example)) > cutoff_max_tokens_sample:
        return chatml_data    
    chatml_data.append(chatml_example)    
    return chatml_data

def save_chatml_to_file(chatml_data, filename):
    """
    Saves the given chatml data to the given file.
    """
    with open(filename, 'w') as f:
        for example in chatml_data:
            f.write(json.dumps(example) + '\n')

def read_json_files(directory, cutoff_max_tokens_sample):
    """ 
    Reads all JSON files in the given directory and returns them as a list. 
    """
    chatml_data = []
    for file in glob.glob(os.path.join(directory, '*.json')):
        with open(file, 'r') as f:
            data = json.load(f)
            chatml_data.extend(convert_to_chatml(data, cutoff_max_tokens_sample))
    return chatml_data

In [None]:
# Cut off the sample if it is too long to avoid errors, so this limits training set
# GPT-4 allows you to increase this well beyond included prompts and completions
cutoff_max_tokens_sample = 2048

# Run the data processing pipelines
# Training data set
training_output_file = "../evals/registry/data/abstract_reasoning_arc/training.jsonl"
training_directory = './training/'
training_chatml_data = read_json_files(training_directory, cutoff_max_tokens_sample)
save_chatml_to_file(training_chatml_data, training_output_file)

# Evaluation data set
evaluation_output_file = "../evals/registry/data/abstract_reasoning_arc/evaluation.jsonl"
evaluation_directory = './evaluation/'
evaluation_chatml_data = read_json_files(evaluation_directory, cutoff_max_tokens_sample)
save_chatml_to_file(evaluation_chatml_data, evaluation_output_file)

In [None]:
# Run the evaluation with gpt-3.5-turbo on the generated data
# make sure you've set os.environ["OPENAI_API_KEY"] to your API key
!oaieval gpt-3.5-turbo abstract-reasoning-arc --max_samples 10