The winner notebook from the previous competition:

https://www.kaggle.com/code/lewtun/numina-1st-place-solution/notebook

Changes:

* removed the code working with the previous competition specific env
* downloaded the *Numina* model in a separate notebook (aimo-2-numina-model) to run the code offline
* removed validation set for now
* decreased the value of **num_samples** from 48 to **19** to make the code run faster
* the temperature was increased from 0.8 to **0.9**


In [1]:
import os

import pandas as pd
import polars as pl

import kaggle_evaluation.aimo_2_inference_server

In [2]:
# If using pip
# !pip install vllm==0.4.2
# !pip install grpcio==1.62.2
# !pip install antlr4-python3-runtime==4.11.0
# !pip install networkx shapely sage matplotlib gmpy2 scipy numpy sympy mpmath

# If on Kaggle
!pip uninstall -y torch
!pip install -U --no-index --find-links=/kaggle/input/vllm-whl -U vllm
!pip install -U --upgrade /kaggle/input/vllm-t4-fix/grpcio-1.62.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
!pip install -U --upgrade /kaggle/input/vllm-t4-fix/ray-2.11.0-cp310-cp310-manylinux2014_x86_64.whl
!pip install -U --upgrade /kaggle/input/antlr4-python3-runtime-package-4-11/antlr4_python3_runtime-4.11.0-py3-none-any.whl

Found existing installation: torch 2.4.0
Uninstalling torch-2.4.0:
  Successfully uninstalled torch-2.4.0
Looking in links: /kaggle/input/vllm-whl
Processing /kaggle/input/vllm-whl/vllm-0.4.0.post1-cp310-cp310-manylinux1_x86_64.whl
Processing /kaggle/input/vllm-whl/cmake-3.29.0.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (from vllm)
Processing /kaggle/input/vllm-whl/torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl (from vllm)
Processing /kaggle/input/vllm-whl/xformers-0.0.23.post1-cp310-cp310-manylinux2014_x86_64.whl (from vllm)
Processing /kaggle/input/vllm-whl/pynvml-11.5.0-py3-none-any.whl (from vllm)
Processing /kaggle/input/vllm-whl/triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (from vllm)
Processing /kaggle/input/vllm-whl/outlines-0.0.34-py3-none-any.whl (from vllm)
Processing /kaggle/input/vllm-whl/tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (from vllm)
Processing /kaggle/input/vllm-whl/interegular-0.3.3-py37-n

In [3]:
import os
import re
import signal
import subprocess
import tempfile
from collections import Counter
from contextlib import contextmanager
from dataclasses import dataclass

import pandas as pd
import polars as pl
from datasets import load_dataset, Dataset, concatenate_datasets
import torch
from transformers import set_seed
from tqdm import tqdm
from vllm import LLM, SamplingParams

2024-10-25 08:37:59,135	INFO util.py:124 -- Outdated packages:
  ipywidgets==7.7.1 found, needs ipywidgets>=8
Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


### Configuration

We found it useful to define a single Config class that gathers all the setting used for a single submission:

In [13]:
@dataclass
class Config:
    model_id: str

    # Decoding Parameters
    num_samples: int        # Number of candidates to generate (width)
    num_generations: int    # Number of steps to generate per candidate (depth)
    restart_on_fail: bool   # Regenerate a step if it fails to generate Python codeblocks

    # Sampling Parameters
    temperature: float
    max_new_tokens: int

    # Runtime Parameters
    # validation_set: str  # One of AI-MO/aimo-validation-amc, AI-MO/aimo-validation-aime, AI-MO/aimo-validation-math-level-4, AI-MO/aimo-validation-math-level-5
    is_submission: bool  # bool(os.getenv("KAGGLE_IS_COMPETITION_RERUN"))
    dtype : str

### vLLM and model generation utilities

In [5]:
def build_vllm(config):
    num_gpus = torch.cuda.device_count()
    if "awq" in config.model_id.lower():
        quantization = "AWQ"
    elif "gptq" in config.model_id.lower():
        quantization = "gptq"
    else:
        quantization = None
    vllm = LLM(
        model=config.model_id,
        tensor_parallel_size=num_gpus,
        quantization=quantization,
        swap_space=0,
    )
    return vllm


def apply_template(sample, tokenizer, prompt):
    messages = [{"role": "user", "content": prompt.format(sample["prompt"], "{}")}]
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    sample["text"] = text
    return sample


def generate_batched(samples, vllm, sampling_params):
    outputs = vllm.generate(samples["gen_texts"], sampling_params, use_tqdm=True)
    samples["gen_texts"] = [o.prompt + o.outputs[0].text for o in outputs]
    return samples

### Python REPL and code execution utilities

In [6]:
class PythonREPL:
    def __init__(self, timeout=5):
        self.timeout = timeout

    @contextmanager
    def time_limit(self, seconds):
        def signal_handler(*_):
            raise TimeoutError(f"Timed out after {seconds} seconds.")

        signal.signal(signal.SIGALRM, signal_handler)
        signal.alarm(seconds)
        try:
            yield
        finally:
            signal.alarm(0)

    def __call__(self, query):
        query = "import math\nimport numpy as np\nimport sympy as sp\n" + query
        query = query.strip().split("\n")
        if "print(" not in query[-1]:
            if "#" in query[-1]:
                query[-1] = query[-1].split("#")[0]
            query[-1] = "print(" + query[-1] + ")"
        query = "\n".join(query)
        with tempfile.TemporaryDirectory() as temp_dir:
            temp_file_path = os.path.join(temp_dir, "tmp.py")
            with open(temp_file_path, "w", encoding="utf-8") as f:
                f.write(query)
            with self.time_limit(self.timeout):
                result = subprocess.run(
                    ["python3", temp_file_path],
                    capture_output=True,
                    check=False,
                    text=True,
                    timeout=self.timeout,
                )
                if result.returncode == 0:
                    output = result.stdout
                    return True, output.strip()
                error_msg = result.stderr.strip()
                msgs = error_msg.split("\n")
                new_msgs = []
                want_next = False
                for m in msgs:
                    if "Traceback" in m:
                        new_msgs.append(m)
                    elif m == msgs[-1]:
                        new_msgs.append(m)
                    elif temp_file_path in m:
                        st = m.index('"/') + 1 if '"/' in m else 0
                        ed = m.index(temp_file_path) + 1 if temp_file_path in m else None
                        clr = m[st:ed] if not ed else m[st:]
                        m = m.replace(clr, "")
                        new_msgs.append(m)
                        want_next = True
                    elif want_next:
                        new_msgs.append(m)
                        want_next = False
                error_msg = "\n".join(new_msgs)
                return False, error_msg.strip()
            

def execute_completion(executor, completion, return_status, last_code_block):
    executions = re.findall(r"```python(.*?)```", completion, re.DOTALL)
    if len(executions) == 0:
        return completion, False if return_status else completion
    if last_code_block:
        executions = [executions[-1]]
    outputs = []
    successes = []
    for code in executions:
        success = False
        for lib in ("subprocess", "venv"):
            if lib in code:
                output = f"{lib} is not allowed"
                outputs.append(output)
                successes.append(success)
                continue
        try:
            success, output = executor(code)
        except TimeoutError as e:
            print("Code timed out")
            output = e
        if not success and not return_status:
            output = ""
        outputs.append(output)
        successes.append(success)
    output = str(outputs[-1]).strip()
    success = successes[-1]
    if return_status:
        return output, success
    return output


def postprocess_completion(text, return_status, last_code_block):
    executor = PythonREPL()
    result = execute_completion(executor, text, return_status=return_status, last_code_block=last_code_block)
    del executor
    return result

### Post-processing and solution extraction utilities

In [7]:
def extract_boxed_answer(text):
    def last_boxed_only_string(text):
        idx = text.rfind("\\boxed")
        if idx < 0:
            idx = text.rfind("\\fbox")
            if idx < 0:
                return None
        i = idx
        right_brace_idx = None
        num_left_braces_open = 0
        while i < len(text):
            if text[i] == "{":
                num_left_braces_open += 1
            if text[i] == "}":
                num_left_braces_open -= 1
                if num_left_braces_open == 0:
                    right_brace_idx = i
                    break
            i += 1
        if right_brace_idx is None:
            return None
        return text[idx : right_brace_idx + 1]

    def remove_boxed(boxed):
        left = "\\boxed{"
        try:
            assert boxed[: len(left)] == left
            assert boxed[-1] == "}"
            length = len(left)
            return boxed[length:-1]
        except Exception:
            return None

    boxed = last_boxed_only_string(text)
    if boxed is None:
        return None
    answer = remove_boxed(boxed)
    return answer


def normalize_answer(answer):
    match = re.search(r"(.*?)Problem:", answer, flags=re.S)
    if match:
        answer = match.group(1)
    subs = [("an ", ""), ("a ", ""), (".$", "$"), ("\\$", ""), (r"\ ", ""), (" ", ""), ("mbox", "text"), (",\\text{and}", ","), ("\\text{and}", ","), ("\\text{m}", "\\text{}"), ("\\le", "<")]
    remove = ["square", "ways", "integers", "dollars", "mph", "inches", "ft", "hours", "km", "units", "\\ldots", "sue", "points", "feet", "minutes", "digits", "cents", "degrees", "cm", "gm", "pounds", "meters", "meals", "edges", "students", "childrentickets", "multiples", "\\text{s}", "\\text{.}", "\\text{\ns}", "\\text{}^2", "\\text{}^3", "\\text{\n}", "\\text{}", r"\mathrm{th}", r"^\circ", r"^{\circ}", r"\;", r",\!", "{,}", '"', "\\dots", "\n", "\r", "\f", "\%"]
    sub_patterns = [r"(\\text\{)(.*?)(\})", r"(\\textbf\{)(.*?)(\})", r"(\\overline\{)(.*?)(\})", r"(\\boxed\{)(.*)(\})"]
    split_patterns = [r"finalansweris(.*)", r"answer?is:?(.*)", r"oxed\{(.*?)\}", r"\$(.*?)\$"]
    for before, after in subs:
        answer = answer.replace(before, after)
    for expr in remove:
        answer = answer.replace(expr, "")
    for pattern in sub_patterns:
        answer = re.sub(pattern, "\\2", answer)
    for pattern in split_patterns:
        if len(re.findall(pattern, answer)) > 0:
            answer = re.findall(pattern, answer)[-1]
    answer = answer.strip()
    if "rac" in answer and "\\frac" not in answer:
        answer = answer.replace("rac", "\\frac")
    answer = re.sub(r"(frac)([^{])(.)", "frac{\\2}{\\3}", answer)
    answer = re.sub(r"(sqrt)([^{])", "sqrt{\\2}", answer)
    answer = answer.replace("$", "")
    if answer.replace(",", "").isdigit():
        answer = answer.replace(",", "")
    return answer

### SC-TIR control flow

In [8]:
def process_code(sample, restart_on_fail, last_step, check_last_n_chars=100):
    gen_text = sample["gen_texts"]
    num_python_blocks = len(re.findall(r"```python(.*?)```", gen_text, re.DOTALL))
    region_to_check = gen_text[-check_last_n_chars:]
    if num_python_blocks == 0:
        if restart_on_fail:
            print("no code has ever been generated, RESTARTING")
            sample["gen_texts"] = sample["text"]
        else:
            print("no code has ever been generated, STOP")
            sample["should_prune"] = True
            sample["has_code"] = False
        return sample
    if not gen_text.endswith("```output\n") and ("answer is" in region_to_check or "\\boxed" in region_to_check):
        num_output_blocks = len(re.findall(r"```output(.*?)```", gen_text, re.DOTALL))
        if num_output_blocks == 0:
            print("The model hallucinated the code answer")
            sample["should_prune"] = True
            return sample
        if "boxed" in region_to_check:
            try:
                answer = normalize_answer(extract_boxed_answer(region_to_check))
            except Exception:
                answer = "-1"
        else:
            answer = normalize_answer(region_to_check)
        sample["model_answers"] = answer
        return sample
    if last_step:
        return sample
    if not gen_text.endswith("```output\n"):
        print("warning: output block not found: ", gen_text[-40:])
        if restart_on_fail:
            sample["gen_texts"] = sample["text"]
        else:
            sample["should_prune"] = True
        return sample
    code_result, _ = postprocess_completion(gen_text, return_status=True, last_code_block=True)
    truncation_limit = 200
    if len(code_result) > truncation_limit:
        code_result = code_result[:truncation_limit] + " ... (output truncated)"
    sample["gen_texts"] = gen_text + f"{code_result}\n```"
    return sample

### Sample filtering and majority voting

In [9]:
def filter_answers(answers):
    def validate_answer_is_numeric(x, tolerance=0.2):
        try:
            x = round(float(x))
            f = float(x)
            if abs(x - f) > tolerance:
                x = -1
        except Exception:
            x = -1
        return x

    formatted = [validate_answer_is_numeric(a) for a in answers]
    filtered = [a for a in formatted if a >= 0]
    return filtered


def get_majority_vote(answers):
    if not len(answers):
        return 0
    c = Counter(answers)
    value, _ = c.most_common()[0]
    return value

### Specify config

In [14]:
config = Config(
    # model_id="AI-MO/NuminaMath-7B-TIR-GPTQ",
    model_id="Qwen/Qwen2.5-Math-7B-Instruct",
    num_samples=19,  # 48,
    num_generations=4,
    restart_on_fail=True,
    temperature=0.9,
    max_new_tokens=2048,
    # validation_set="AI-MO/aimo-validation-amc",
    is_submission=False,
    dtype="half"
)

In [15]:
print(f"=== Running submission with config ===\n\n{config}")

=== Running submission with config ===

Config(model_id='Qwen/Qwen2.5-Math-7B-Instruct', num_samples=19, num_generations=4, restart_on_fail=True, temperature=0.9, max_new_tokens=2048, is_submission=False, dtype='half')


### Run computations

In [12]:
set_seed(42)
num_procs = os.cpu_count()
vllm = build_vllm(config)
sampling_params = SamplingParams(
    temperature=config.temperature,
    max_tokens=config.max_new_tokens,
    stop=["```output\n"],
    include_stop_str_in_output=True,
)

config.json:   0%|          | 0.00/658 [00:00<?, ?B/s]

  self.pid = _posixsubprocess.fork_exec(
  self.pid = _posixsubprocess.fork_exec(
2024-10-25 08:38:51,890	INFO worker.py:1749 -- Started a local Ray instance.


INFO 10-25 08:38:53 llm_engine.py:74] Initializing an LLM engine (v0.4.0.post1) with config: model='Qwen/Qwen2.5-Math-7B-Instruct', tokenizer='Qwen/Qwen2.5-Math-7B-Instruct', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)


tokenizer_config.json:   0%|          | 0.00/7.32k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

INFO 10-25 08:39:03 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
INFO 10-25 08:39:03 selector.py:25] Using XFormers backend.
[36m(RayWorkerVllm pid=409)[0m INFO 10-25 08:39:04 selector.py:40] Cannot use FlashAttention backend for Volta and Turing GPUs.
[36m(RayWorkerVllm pid=409)[0m INFO 10-25 08:39:04 selector.py:25] Using XFormers backend.
[36m(RayWorkerVllm pid=409)[0m ERROR 10-25 08:39:04 ray_utils.py:44] Error executing method init_device. This might cause deadlock in distributed execution.
[36m(RayWorkerVllm pid=409)[0m ERROR 10-25 08:39:04 ray_utils.py:44] Traceback (most recent call last):
[36m(RayWorkerVllm pid=409)[0m ERROR 10-25 08:39:04 ray_utils.py:44]   File "/opt/conda/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method
[36m(RayWorkerVllm pid=409)[0m ERROR 10-25 08:39:04 ray_utils.py:44]     return executor(*args, **kwargs)
[36m(RayWorkerVllm pid=409)[0m ERROR 10-25 08:39:04 ray_utils.py:44]   F

ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead by explicitly setting the`dtype` flag in CLI, for example: --dtype=half.

2024-10-25 08:39:10,357	ERROR worker.py:406 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): [36mray::RayWorkerVllm.execute_method()[39m (pid=409, ip=172.19.2.2, actor_id=a9cc01cbbff0a1a1b970bf0e01000000, repr=<vllm.engine.ray_utils.RayWorkerVllm object at 0x7f4799044910>)
  File "/opt/conda/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 45, in execute_method
    raise e
  File "/opt/conda/lib/python3.10/site-packages/vllm/engine/ray_utils.py", line 37, in execute_method
    return executor(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 93, in init_device
    _check_if_gpu_supports_dtype(self.model_config.dtype)
  File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 312, in _check_if_gpu_supports_dtype
    raise ValueError(
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla T4 GPU has compute capability 7.5. You can use float16 instead 

In [13]:
def solve_problem(question: str):
    assert type(question) is str
    problem = apply_template(
        {"prompt": question},
        tokenizer=vllm.get_tokenizer(),
        prompt="{}"
    )
    samples = Dataset.from_list([
        {
            "text": problem["text"],
            "gen_texts": problem["text"],
            "should_prune": False,
            "model_answers": "-1",
            "has_code": True,
        }
        for _ in range(config.num_samples)
    ])
    completed = []
    for step in range(config.num_generations):
        samples = samples.map(
            generate_batched,
            batch_size=128,
            batched=True,
            fn_kwargs={
                "vllm": vllm,
                "sampling_params": sampling_params
            },
            load_from_cache_file=False,
        )
        samples = samples.map(
            process_code,
            num_proc=num_procs,
            load_from_cache_file=False,
            fn_kwargs={
                "restart_on_fail": config.restart_on_fail,
                "last_step": step == (config.num_generations - 1)
            },
        )
        done = samples.filter(
            lambda x: x["should_prune"] is True,
            load_from_cache_file=False
        )
        if len(done):
            completed.append(done)
        samples = samples.filter(
            lambda x: x["should_prune"] is False,
            load_from_cache_file=False
        )
    completed.append(samples)
    samples = concatenate_datasets(completed)
    candidates = samples["model_answers"]
    print(f"=== CANDIDATE ANSWERS ({len(candidates)}) ===\n{candidates}\n")
    filtered = filter_answers(candidates)
    print(f"=== FILTERED ANSWERS ({len(filtered)}) ===\n{filtered}\n")
    majority = get_majority_vote(filtered)
    print(f"=== MAJORITY ANSWER (mod 1000) ===\n{majority}\n")
    return majority

The evaluation API requires that you set up a server which will respond to inference requests. We have already defined the server; you just need write the `predict` function. When we evaluate your submission on the hidden test set the client defined in `aimo_2_gateway` will run in a different container with direct access to the hidden test set and hand off each question one at a time, in random order.

Your code will always have access to the published copies of the files.

In [None]:
# Replace this function with your inference code.
# The function should return a single integer between 0 and 999, inclusive.
# Each prediction (except the very first) must be returned within 30 minutes
# of the question being provided.

def predict(id_: pl.DataFrame, question: pl.DataFrame) -> pl.DataFrame | pd.DataFrame:
    """Make a prediction."""
    print('Types:', type(id_), type(question))
    id_str = id_.item(0)
    question_str = question.item(0)

    assert type(id_str) is str
    assert type(question_str) is str

    print('====================================================')
    print('QUESTION:', question_str)

    prediction = solve_problem(question_str)
    print('PREDICTION:', prediction)
    print('====================================================')

    return pl.DataFrame({'id': id_str, 'answer': prediction})

When your notebook is run on the hidden test set, `inference_server.serve()` must be called within 15 minutes of the notebook starting or the gateway will throw an error. If you need more than 15 minutes to load your model you can do so during the very first predict call, which does not have the usual 10 minute response deadline.

In [None]:
inference_server = kaggle_evaluation.aimo_2_inference_server.AIMO2InferenceServer(predict)

if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(
        (
            '/kaggle/input/ai-mathematical-olympiad-progress-prize-2/test.csv',
        )
    )

In [15]:
%%time

from sklearn.metrics import accuracy_score


if not config.is_submission:
    reference = pd.read_csv(
        '/kaggle/input/translated-test-df/translated_test_df.csv'
    )
    true_answers = []
    pred_answers = []

    for id_, row in reference.iterrows():
#         true_answers.append(row['answer'])
        id_no = row['ID']
        print('Solving for id : ', id_no)
        pred_answers.append(solve_problem(row['Problem']))
    
    #print('accuracy:', accuracy_score(true_answers, pred_answers))



Solving for id :  0


Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:15<04:37, 15.41s/it][A
Processed prompts:  11%|█         | 2/19 [00:16<01:54,  6.75s/it][A
Processed prompts:  21%|██        | 4/19 [00:16<00:41,  2.74s/it][A
Processed prompts:  26%|██▋       | 5/19 [00:17<00:29,  2.11s/it][A
Processed prompts:  32%|███▏      | 6/19 [00:17<00:20,  1.58s/it][A
Processed prompts:  37%|███▋      | 7/19 [00:18<00:15,  1.29s/it][A
Processed prompts:  42%|████▏     | 8/19 [00:18<00:11,  1.01s/it][A
Processed prompts:  47%|████▋     | 9/19 [00:19<00:07,  1.25it/s][A
Processed prompts:  53%|█████▎    | 10/19 [00:19<00:07,  1.28it/s][A
Processed prompts:  58%|█████▊    | 11/19 [00:20<00:05,  1.49it/s][A
Processed prompts:  63%|██████▎   | 12/19 [00:21<00:04,  1.46it/s][A
Processed prompts:  68%|██████▊   | 13/19 [00:21<00:03,  1.77it/s][A
Processed prompts:  74%|███████▎  | 14/19 [00:21<00:02,  2.20it/s][A
Processed prompts:  84%|████████▍ | 

Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]

  self.pid = os.fork()


Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:03<01:08,  3.82s/it][A
Processed prompts:  26%|██▋       | 5/19 [00:03<00:08,  1.67it/s][A
Processed prompts:  37%|███▋      | 7/19 [00:04<00:04,  2.44it/s][A
Processed prompts:  58%|█████▊    | 11/19 [00:04<00:01,  4.08it/s][A
Processed prompts:  68%|██████▊   | 13/19 [00:04<00:01,  4.51it/s][A
Processed prompts:  79%|███████▉  | 15/19 [00:05<00:00,  4.14it/s][A
Processed prompts:  95%|█████████▍| 18/19 [00:06<00:00,  2.92it/s][A
Processed prompts: 100%|██████████| 19/19 [00:07<00:00,  2.43it/s][A


Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]



ber of pieces with each additional cut.

 the maximum number of separate regions.


Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:01<00:23,  1.29s/it][A
Processed prompts:  53%|█████▎    | 10/19 [00:01<00:01,  7.02it/s][A
Processed prompts:  68%|██████▊   | 13/19 [00:02<00:01,  4.94it/s][A
Processed prompts:  74%|███████▎  | 14/19 [00:03<00:01,  3.09it/s][A
Processed prompts:  84%|████████▍ | 16/19 [00:08<00:02,  1.23it/s][A
Processed prompts:  89%|████████▉ | 17/19 [00:10<00:02,  1.02s/it][A
Processed prompts:  95%|█████████▍| 18/19 [00:10<00:00,  1.08it/s][A
Processed prompts: 100%|██████████| 19/19 [00:13<00:00,  1.46it/s][A


Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]


\[ P = 2^2 - 2 + 1 = 4 - 2 + 1 = 3 \]

 
ally, can create a maximum of 3 pieces.



Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:01<00:23,  1.29s/it][A
Processed prompts:  47%|████▋     | 9/19 [00:01<00:01,  5.78it/s][A
Processed prompts:  68%|██████▊   | 13/19 [00:02<00:01,  4.96it/s][A
Processed prompts:  89%|████████▉ | 17/19 [00:07<00:01,  1.89it/s][A
Processed prompts:  95%|█████████▍| 18/19 [00:08<00:00,  1.70it/s][A
Processed prompts: 100%|██████████| 19/19 [00:08<00:00,  2.11it/s][A


Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

=== CANDIDATE ANSWERS (19) ===
['3', '4', '3', '3', '3', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '3', '4']

=== FILTERED ANSWERS (19) ===
[3, 4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4]

=== MAJORITY ANSWER (mod 1000) ===
4

Solving for id :  1


Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:29<08:42, 29.00s/it][A
Processed prompts:  11%|█         | 2/19 [00:29<03:24, 12.04s/it][A
Processed prompts:  16%|█▌        | 3/19 [00:29<01:47,  6.74s/it][A
Processed prompts:  21%|██        | 4/19 [00:30<01:07,  4.52s/it][A
Processed prompts:  26%|██▋       | 5/19 [00:33<00:52,  3.77s/it][A
Processed prompts:  32%|███▏      | 6/19 [00:33<00:35,  2.73s/it][A
Processed prompts:  42%|████▏     | 8/19 [00:36<00:21,  1.91s/it][A
Processed prompts:  53%|█████▎    | 10/19 [00:36<00:11,  1.23s/it][A
Processed prompts:  58%|█████▊    | 11/19 [00:37<00:09,  1.21s/it][A
Processed prompts:  68%|██████▊   | 13/19 [00:37<00:04,  1.27it/s][A
Processed prompts:  74%|███████▎  | 14/19 [00:38<00:03,  1.26it/s][A
Processed prompts:  84%|████████▍ | 16/19 [00:39<00:01,  1.85it/s][A
Processed prompts:  89%|████████▉ | 17/19 [00:39<00:00,  2.12it/s][A
Processed prompts:  95%|█████████▍|

Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]

*** SIGTERM received at time=1729845070 on cpu 3 ***
*** SIGTERM received at time=1729845070 on cpu 1 ***
PC: @     0x7cfcd7bdfd0f  (unknown)  sem_trywait
    @     0x7cfcd7b85520  (unknown)  (unknown)
    @ ... and at least 1 more frames
[2024-10-25 08:31:10,939 E 813 30] logging.cc:365: *** SIGTERM received at time=1729845070 on cpu 1 ***
[2024-10-25 08:31:10,939 E 813 30] logging.cc:365: PC: @     0x7cfcd7bdfd0f  (unknown)  sem_trywait
[2024-10-25 08:31:10,940 E 813 30] logging.cc:365:     @     0x7cfcd7b85520  (unknown)  (unknown)
[2024-10-25 08:31:10,940 E 813 30] logging.cc:365:     @ ... and at least 1 more frames
PC: @     0x57c4cc7080ab  (unknown)  _PyObject_GetMethod
    @     0x7cfcd7b85520  (unknown)  (unknown)
    @     0x57c4ce3f3ac0  (unknown)  (unknown)
[2024-10-25 08:31:10,954 E 812 30] logging.cc:365: *** SIGTERM received at time=1729845070 on cpu 3 ***
[2024-10-25 08:31:10,954 E 812 30] logging.cc:365: PC: @     0x57c4cc7080ab  (unknown)  _PyObject_GetMethod
[2024-10

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:05<01:38,  5.46s/it][A
Processed prompts:  16%|█▌        | 3/19 [00:05<00:23,  1.48s/it][A
Processed prompts:  37%|███▋      | 7/19 [00:05<00:06,  2.00it/s][A
Processed prompts:  47%|████▋     | 9/19 [00:06<00:03,  2.54it/s][A
Processed prompts:  58%|█████▊    | 11/19 [00:06<00:02,  3.01it/s][A
Processed prompts:  68%|██████▊   | 13/19 [00:07<00:02,  2.85it/s][A
Processed prompts:  74%|███████▎  | 14/19 [00:07<00:01,  2.62it/s][A
Processed prompts:  79%|███████▉  | 15/19 [00:10<00:03,  1.33it/s][A
Processed prompts:  84%|████████▍ | 16/19 [00:12<00:03,  1.07s/it][A
Processed prompts:  89%|████████▉ | 17/19 [00:12<00:01,  1.02it/s][A
Processed prompts:  95%|█████████▍| 18/19 [00:14<00:01,  1.23s/it][A
Processed prompts: 100%|██████████| 19/19 [00:15<00:00,  1.21it/s][A


Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]





Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:01<00:23,  1.29s/it][A
Processed prompts:  26%|██▋       | 5/19 [00:02<00:06,  2.06it/s][A
Processed prompts:  58%|█████▊    | 11/19 [00:04<00:02,  3.14it/s][A
Processed prompts:  79%|███████▉  | 15/19 [00:04<00:01,  3.47it/s][A
Processed prompts:  84%|████████▍ | 16/19 [00:05<00:00,  3.66it/s][A
Processed prompts:  89%|████████▉ | 17/19 [00:06<00:00,  2.61it/s][A
Processed prompts:  95%|█████████▍| 18/19 [00:16<00:02,  2.06s/it][A
Processed prompts: 100%|██████████| 19/19 [00:18<00:00,  1.01it/s][A


Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]





Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A
Processed prompts:   5%|▌         | 1/19 [00:01<00:24,  1.34s/it][A
Processed prompts:  32%|███▏      | 6/19 [00:02<00:05,  2.55it/s][A
Processed prompts:  58%|█████▊    | 11/19 [00:03<00:02,  3.22it/s][A
Processed prompts:  84%|████████▍ | 16/19 [00:04<00:00,  4.24it/s][A
Processed prompts:  95%|█████████▍| 18/19 [00:16<00:01,  1.37s/it][A
Processed prompts: 100%|██████████| 19/19 [00:20<00:00,  1.06s/it][A


Map (num_proc=4):   0%|          | 0/19 [00:00<?, ? examples/s]

*** SIGTERM received at time=1729845132 on cpu 2 ***
PC: @     0x7cfcd7bdf4a5  (unknown)  sem_post
    @     0x7cfcd7b85520  (unknown)  (unknown)
[2024-10-25 08:32:12,807 E 1013 30] logging.cc:365: *** SIGTERM received at time=1729845132 on cpu 2 ***
[2024-10-25 08:32:12,807 E 1013 30] logging.cc:365: PC: @     0x7cfcd7bdf4a5  (unknown)  sem_post
[2024-10-25 08:32:12,808 E 1013 30] logging.cc:365:     @     0x7cfcd7b85520  (unknown)  (unknown)


Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

Filter:   0%|          | 0/19 [00:00<?, ? examples/s]

=== CANDIDATE ANSWERS (19) ===
['49', '100', '49', '50', '49', '49', '0', '10', '7', '10', '275', '275', '150', '50', '10', '10', '100', '84', '17']

=== FILTERED ANSWERS (19) ===
[49, 100, 49, 50, 49, 49, 0, 10, 7, 10, 275, 275, 150, 50, 10, 10, 100, 84, 17]

=== MAJORITY ANSWER (mod 1000) ===
49

Solving for id :  2


Map:   0%|          | 0/19 [00:00<?, ? examples/s]


Processed prompts:   0%|          | 0/19 [00:00<?, ?it/s][A

KeyboardInterrupt: 