Code to call the OpenAI API to call and solve visual puzzles (ConceptARC problems).

https://github.com/victorvikram/ConceptARC/tree/main/MinimalTasks


In [1]:
from openai import OpenAI
import dotenv
import os
from rich import print as rprint # for making fancy outputs

dotenv.load_dotenv()

client = OpenAI()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Call ConceptARC solver. Start by creating a good system prompt.

In [2]:
system_prompt = "You are an intelligent solver of puzzles."
user_prompt = 'your job is to solve a puzzle. These are puzzles in json format. I have some training examples and a test example in ConceptARC. Please solve it. Here is the puzzle:  {"train":[{"input":[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,4,0,0,0,0,0],[0,0,0,4,4,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,3,3,3,0,0,0,0],[0,0,0,3,3,3,0,0,0,0],[0,0,0,3,3,3,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],"output":[[0,4,0],[4,4,4]]},{"input":[[0,0,3,3,0,0,0,0],[0,0,3,3,0,0,0,0],[0,0,0,0,0,0,0,0],[0,4,0,4,0,0,0,0],[0,4,4,4,0,0,0,0],[0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0]],"output":[[3,3],[3,3]]},{"input":[[0,0,0,0,0,0,0,0,0],[0,0,0,3,3,3,3,3,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,4,4,4,4,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0]],"output":[[3,3,3,3,3]]},{"input":[[0,0,0,0,0,0,0,0],[0,0,0,4,0,0,0,0],[0,0,4,0,4,0,0,0],[0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0],[0,0,3,3,3,3,0,0],[0,0,3,0,0,3,0,0],[0,0,0,3,3,0,0,0]],"output":[[0,4,0],[4,0,4],[0,4,0]]}],"test":[{"input":[[4,4,4,0,0,0],[4,4,4,0,0,0],[0,0,0,0,0,0],[0,0,3,0,0,0],[3,3,3,3,0,0],[0,0,0,0,0,0]],"output":[[4,4,4],[4,4,4]]},{"input":[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[4,4,4,4,4,4,4,4,4,4],[3,3,3,3,3,3,3,3,3,3],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],"output":[[4,4,4,4,4,4,4,4,4,4]]},{"input":[[0,0,0,0,0],[0,3,3,3,0],[0,3,0,3,0],[0,3,3,3,0],[0,4,0,4,0],[4,0,4,0,4]],"output":[[3,3,3],[3,0,3],[3,3,3]]}]}'


Solve the problem now.

In [5]:
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role":"system", "content": system_prompt},
        {"role": "user",  "content": user_prompt}
    ],
    temperature=1.0,
    max_tokens=1000,
)

print(response.choices[0].message.content) # print the response from the model


To solve the provided test puzzles based on the training data, we need to analyze the patterns and outputs from the training examples and apply that understanding to the test cases.

### Analysis of the Training Data

1. **Training Example 1:**
   - Input:
     ```
     [
       [0,0,0,0,0,0,0,0,0,0],
       [0,0,0,0,4,0,0,0,0,0],
       [0,0,0,4,4,4,0,0,0,0],
       ...
     ]
     ```
   - Output:
     ```
     [
       [0,4,0],
       [4,4,4]
     ]
     ```
   - The output extracted regions with values of `4` and follows their arrangement.

2. **Training Example 2:**
   - Input:
     ```
     [
       [0,0,3,3,0,0,0,0],
       ...
     ]
     ```
   - Output:
     ```
     [
       [3,3],
       [3,3]
     ]
     ```
   - The value `3` is centralized in a rectangular arrangement.

3. **Training Example 3:**
   - Input:
     ```
     [
       [0,0,0,0,0,0,...]
       ...
     ]
     ```
   - Output:
     ```
     [
       [3,3,3,3,3]
     ]
     ```
   - Shows a horizontal line of `

Use a vision model now to solve ConceptARC puzze


In [4]:
prompt = ("You are an intelligent solver of puzzles.",
"Your job is to solve a puzzle. The puzzle is given to you as an image",
"and you need to provide the solution in JSON format.",
"Please provide the solution in a clear and concise manner.",
"Make sure to include all necessary details in the solution.",
"Feel free to make any inferences you need to."
)





Now call the OpenAI vision model API

In [None]:
import base64
import requests
import json

function to encode the image

In [5]:
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return encoded_string



In [None]:
# Path to image file
image_path = "arcplots/puzzle.png"

def get_image_caption(image_path, prompt):
  # Getting the base64 string
  base64_image = encode_image(image_path)

  headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
  }

  payload = {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": prompt
          },
          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }
          }
        ]
      }
    ],
    "max_tokens": 512
  }

  response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

  return response.json()['choices'][0]['message']['content']


