# AIMO3 Elle Submission
## Paths:
- Model: `/kaggle/input/qwen-72b-math-nf4/transformers/default/1`
- Wheels: `/kaggle/input/elle-offline-wheels`
- Competition: `/kaggle/input/ai-mathematical-olympiad-progress-prize-3`

In [None]:
# Install offline wheels
import subprocess
import sys
from pathlib import Path

WHEELS_DIR = Path("/kaggle/input/elle-offline-wheels")

# Install only what's needed (skip nvidia/torch - Kaggle has them)
priority_wheels = ["peft", "bitsandbytes", "accelerate", "einops", "sentencepiece"]
for whl in WHEELS_DIR.glob("*.whl"):
    if any(p in whl.name.lower() for p in priority_wheels):
        subprocess.run([sys.executable, "-m", "pip", "install", str(whl), "-q", "--no-deps"])
print("Wheels installed!")

In [None]:
import os
import time
import torch
import numpy as np
from collections import Counter
import re

# Environment detection
def is_on_kaggle_commit():
    return os.getenv("KAGGLE_KERNEL_RUN_TYPE") == "Batch" and not bool(os.getenv("KAGGLE_IS_COMPETITION_RERUN"))

def is_on_kaggle_interactive():
    return os.getenv("KAGGLE_KERNEL_RUN_TYPE") == "Interactive" and not bool(os.getenv("KAGGLE_IS_COMPETITION_RERUN"))

# Timing
start_time = time.time()
final_cutoff_time = start_time + (4 * 60 + 45) * 60  # 4.75 hours
cutoff_times = [int(x) for x in np.linspace(final_cutoff_time, start_time + 12 * 60, 50 + 1)]
cutoff_times.pop()

os.makedirs("solutions", exist_ok=True)

assert torch.cuda.is_available()
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

In [None]:
# Model paths
MODEL_PATH = "/kaggle/input/qwen-72b-math-nf4/transformers/default/1"
DEEPSEEK_PATH = "/kaggle/input/deepseek-coder-lite/transformers/default/1"  # backup

# Load model
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

print("Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

print("Loading model (4-bit quantized)...")
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
model.eval()
print("Model loaded!")

In [None]:
# PROMETHEUS/CIC System Prompts
SYSTEM_PROMPTS = [
    "You are solving a national/international-level mathematics olympiad problem. You must rigorously define all variables, explore multiple solution strategies before committing, perform full case analysis where required, justify every nontrivial step, explicitly check boundary cases and hidden assumptions, and verify the final result using at least one independent method. Return only the final numerical answer inside \\boxed{}. The answer must be an integer in [0, 99999]. Never guess.",

    "Solve the problem with full rigor. After obtaining a candidate solution, actively attempt to refute your own answer by searching for counterexamples, re-running the logic from a different viewpoint, and stress-testing edge cases. Only after the answer survives refutation, return it in \\boxed{}. The answer must be an integer in [0, 99999]. Never guess.",

    "Solve this problem as if under IMO-level time pressure: identify the key invariant, symmetry, or extremal principle early, avoid brute force unless strictly justified, compress reasoning without sacrificing correctness, and perform at least one final arithmetic verification pass. Return only the final integer answer in \\boxed{}, with 0 <= answer <= 99999. Never guess.",

    "You must attempt at least two fundamentally different solution approaches (e.g., algebraic vs geometric, combinatorial vs number-theoretic). Proceed with the more rigorous one and use the other as a verification tool. Return only the verified final answer in \\boxed{}, where the answer is an integer in [0, 99999]. Never guess.",

    # PROMETHEUS Protocol
    """You are Elle, an elite mathematical intelligence. Apply the PROMETHEUS protocol:

STAGE 1 - LATENT ARCHAEOLOGY: Before solving, identify what mathematical structures are hiding in this problem. What theorems, lemmas, or techniques from competition mathematics are being invoked implicitly? Extract the deep structure.

STAGE 2 - NOVEL SYNTHESIS: Force-connect at least two disparate mathematical domains (e.g., number theory + geometry, combinatorics + analysis). Even if unconventional, explore whether insights from one domain illuminate the other.

STAGE 3 - THEORETICAL VALIDATION: Verify your approach has mathematical rigor. Check: Are all cases covered? Are there degenerate cases? Does dimensional analysis hold?

STAGE 4 - OPERATIONALIZATION: Execute the solution with precision. Show key steps.

STAGE 5 - OUTPUT: Return ONLY the final integer answer in \\boxed{}, where 0 <= answer <= 99999. Never guess.""",

    # CIC Methodology
    """Solve this olympiad problem using CIC methodology:

COMPRESSION: Identify the minimal sufficient representation of this problem. Strip away irrelevant details. What is the core mathematical object?

INTEGRATION: How do the components of this problem interact? Map the causal/logical dependencies. What must be true for the answer to follow?

CAUSALITY: Trace the chain of implications. If A then B then C. Verify each link. Check for hidden assumptions that could break the chain.

CONFIDENCE BOUND: Your confidence cannot exceed 0.95. If uncertain, explore alternative approaches before committing.

Return the final integer answer in \\boxed{}, where 0 <= answer <= 99999. Never guess.""",

    # XYZA Framework
    """Apply the XYZA framework to solve this olympiad problem:

eXPLORE: Map the solution space. What are all possible approaches? List at least 3 distinct attack vectors.

YIELD: Generate candidate solutions from the most promising approaches. Don't commit yet.

ZERO-IN: Critically evaluate each candidate. Which has the strongest logical foundation? Which survives edge-case testing?

ACTUALIZE: Execute the winning approach with full rigor. Verify arithmetic. Check boundary conditions.

Return ONLY the final integer in \\boxed{}, where 0 <= answer <= 99999. Never guess.""",
]

VERIFICATION_PROMPT = """You are a mathematical verification expert. A solution has been proposed for this problem.

PROBLEM:
{problem}

PROPOSED ANSWER: {answer}

Your task:
1. Independently solve the problem WITHOUT looking at the proposed answer first
2. Compare your answer to the proposed answer
3. If they match: confirm with "VERIFIED: {answer}"
4. If they differ: explain the discrepancy and provide "CORRECTION: {your_answer}"
5. If uncertain: respond with "UNCERTAIN: {best_guess}"

The answer must be an integer in [0, 99999]. Be rigorous."""

In [None]:
def extract_boxed_text(text: str) -> str:
    """Extract text inside \\boxed{} from LaTeX-formatted text"""
    pattern = r"oxed{(.*?)}"
    matches = re.findall(pattern, text)
    if not matches:
        return ""
    for match in matches[::-1]:
        if match != "":
            return match
    return ""

def is_valid_answer_string(text: str) -> bool:
    try:
        if int(text) == float(text):
            if 0 <= int(text) <= 99_999:
                return True
    except Exception:
        pass
    return False

def generate_response(prompt: str, system_prompt: str, max_tokens: int = 4096) -> str:
    """Generate response from model"""
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt}
    ]
    
    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(text, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.7,
            do_sample=True,
            top_p=0.95,
            pad_token_id=tokenizer.eos_token_id,
        )
    
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
    return response

In [None]:
# Voting and solving
import math
from collections import Counter

completed_question_ids = set()
question_id_to_counter = {"" : Counter()}

def vote_answer(question_id: str, force_answer: bool = False) -> int | None:
    counter = question_id_to_counter[question_id]
    if force_answer and not counter:
        print("force_answer=True but no answer recorded")
        completed_question_ids.add(question_id)
        return 12453  # fallback

    # Log-weighted voting (favors larger answers slightly)
    modified_counter = Counter()
    for value, count in counter.items():
        modified_counter[value] += math.log(1.25 + abs(value)) * count

    total_score = sum(modified_counter.values())
    score_list = sorted((score, counter[value], value) for value, score in modified_counter.items())
    
    if force_answer:
        print(f"score_list | {total_score:8.1f} over {sum(counter.values())} attempts")
        for score, count, value in score_list[::-1][:5]:
            print(f"{value:10}   {score:8.1f} {count:8d}")
        return score_list[-1][-1]
    
    if score_list[-1][0] > max(3, total_score / (2 + math.log(1 + total_score))):
        if len(score_list) == 1 or score_list[-1][0] - score_list[-2][0] > 1:
            completed_question_ids.add(question_id)
    return None

def generate_solution(question_text: str, question_id: str = "", solution_index: int = 0, system_prompt: str = "") -> str:
    if question_id in completed_question_ids:
        return ""
    if time.time() >= cutoff_times[-1]:
        return ""

    if not system_prompt:
        system_prompt = SYSTEM_PROMPTS[0]
    
    response = generate_response(question_text, system_prompt)
    boxed_text = extract_boxed_text(response)
    
    # Save solution
    if question_id and response:
        answer_suffix = f"-{boxed_text}" if is_valid_answer_string(boxed_text) else ""
        with open(f"solutions/{question_id}/{solution_index:04d}{answer_suffix}.txt", "w") as f:
            f.write(response)

    if is_valid_answer_string(boxed_text):
        question_id_to_counter[question_id][int(boxed_text)] += 1
        vote_answer(question_id)

    return boxed_text

In [None]:
def verify_answer(question_text: str, proposed_answer: int, question_id: str = "") -> tuple:
    """Self-consistency verification"""
    if time.time() >= cutoff_times[-1]:
        return True, proposed_answer
    
    verification_prompt = VERIFICATION_PROMPT.format(problem=question_text, answer=proposed_answer)
    response = generate_response(verification_prompt, "You are a mathematical verification expert.", max_tokens=2048)
    
    if "VERIFIED:" in response.upper():
        print(f"  [verify] CONFIRMED: {proposed_answer}")
        return True, proposed_answer
    elif "CORRECTION:" in response.upper():
        match = re.search(r'CORRECTION:\s*(\d+)', response, re.IGNORECASE)
        if match:
            corrected = int(match.group(1))
            if 0 <= corrected <= 99999:
                print(f"  [verify] CORRECTION: {proposed_answer} -> {corrected}")
                return False, corrected
    
    boxed = extract_boxed_text(response)
    if is_valid_answer_string(boxed):
        verified = int(boxed)
        if verified == proposed_answer:
            return True, proposed_answer
        return False, verified
    
    return True, proposed_answer

def solve(question_text: str, question_id: str = "") -> int:
    """Cascading self-refinement solve"""
    print(f"processing {question_id}")
    os.makedirs(f"solutions/{question_id}", exist_ok=True)
    question_id_to_counter[question_id] = Counter()
    completed_question_ids.discard(question_id)

    if question_id and time.time() > cutoff_times[-1]:
        print("timeout")
        return 12314

    # Stage 1: First pass with 4 diverse prompts
    print(f"  [stage1] First pass")
    for i, prompt in enumerate(SYSTEM_PROMPTS[:4]):
        generate_solution(question_text, question_id, i, prompt)
        if question_id in completed_question_ids:
            break

    # Get first-pass answer
    counter = question_id_to_counter[question_id]
    first_pass_answer = counter.most_common(1)[0][0] if counter else 12453
    print(f"  [stage1] First pass answer: {first_pass_answer}")

    # Stage 2: Verification
    if time.time() < cutoff_times[-1] - 60:
        print(f"  [stage2] Verification")
        is_verified, verified_answer = verify_answer(question_text, first_pass_answer, question_id)
        if is_verified:
            completed_question_ids.add(question_id)
            return verified_answer
        question_id_to_counter[question_id][verified_answer] += 2

    # Stage 3: Retry with PROMETHEUS/CIC if time
    if time.time() < cutoff_times[-1] - 120:
        print(f"  [stage3] PROMETHEUS/CIC retry")
        for i, prompt in enumerate(SYSTEM_PROMPTS[4:7]):
            generate_solution(question_text, question_id, 100 + i, prompt)

    # Stage 4: Final vote
    final_answer = vote_answer(question_id, force_answer=True)
    completed_question_ids.add(question_id)
    return final_answer

In [None]:
# Test solve
if is_on_kaggle_interactive():
    test_answer = solve("What is 2 + 2?", "test")
    print(f"Test answer: {test_answer}")

# Submission

In [None]:
import sys
sys.path.insert(0, "/kaggle/input/ai-mathematical-olympiad-progress-prize-3")

import kaggle_evaluation.aimo_3_inference_server
import pandas as pd
import polars as pl

# Load reference for local testing
df = pd.read_csv("/kaggle/input/ai-mathematical-olympiad-progress-prize-3/reference.csv")
id_to_answer = dict(zip(df["id"], df["answer"]))
df.drop("answer", axis=1).to_csv("reference.csv", index=False)

correct = 0
total = 0

In [None]:
def predict(id_: pl.Series, problem: pl.Series) -> pl.DataFrame | pd.DataFrame:
    """Make a prediction - AIMO3 submission format"""
    global correct, total

    question_id = id_.item(0)
    question_text = problem.item(0)

    # Generate prediction
    prediction = solve(question_text, question_id=question_id)
    completed_question_ids.add(question_id)
    cutoff_times.pop()

    # Scoring (local testing only)
    try:
        true_answer = int(id_to_answer.get(question_id, -1))
    except:
        true_answer = -1

    total += 1
    if prediction == true_answer and true_answer != -1:
        correct += 1
        print(f"[debug] CORRECT | score={correct}/{total}")
    else:
        print(f"[debug] WRONG: predicted {prediction}, actual {true_answer} | score={correct}/{total}")

    # Return in required format
    return pl.DataFrame({"id": id_, "answer": prediction})

In [None]:
# Run inference server
inference_server = kaggle_evaluation.aimo_3_inference_server.AIMO3InferenceServer(predict)

if os.getenv("KAGGLE_IS_COMPETITION_RERUN"):
    inference_server.serve()
else:
    inference_server.run_local_gateway(("reference.csv",))