# Overview

I used a simple **Zero-Shot Self-Consistency Chain-of-Thought** approach using **Qwen2.5-math-7B-Instruct** model. I used this amazing [notebook](http://https://www.kaggle.com/code/takaito/aimo2-vllm-deepseek-math-7b-instruct-inference) by @[takaito](https://www.kaggle.com/takaito) as reference. 

Takaito's Notebook used a Greedy approach using deepseek-math model and scored 2. In this notebook we're using : 
* **Qwen2.5-math-7B-instruct model**
* **256** Samples per Question
* **L4x4** GPU
* temperature **0.7**, max_tokens : **2048**

Feel free to play with the params. 


# What else can be done : 

1. Add **TIR prompt** along with CoT
2. USe multiple models, e.g. **Qwen2.5, Numina, Deepseek-math**. L4x4 have around **90+GB** vRAM. Sufficient to infer using multiple models
3. Use TIR in a feedback manner (till now the best way to solve math problems, but surprisingly it's not doing great in this competition
4. And much more! Sky is the limit

**Caution**


This competition is much harder than the previous one. Obviously the questions are **AI Hard** in this competition, but another very important thing is, the Notebook run time limit is **5 hours** in this competition, whereas it was **9 hours** in the previous competition. This might be because we have **L4x4** GPUs now as an option along with T4x2/P100. That means we're quite dependent on L4 in this competition, becuase 5 hours with T4x2 is very low.

For each problem we have around **5.5-5.8 minutes (330-350 seconds)**

# Imports

In [None]:
# notebook start time for timing purposes
import time

NB_START = time.time()

In [None]:
# import libraries and modules
import os
import polars as pl
import kaggle_evaluation.aimo_2_inference_server

In [None]:
# Since we're using 4 GPUs we need to set the CUDA_VISIBLE_DEVICES environment variable to "0,1,2,3" to make sure that the GPUs are visible to the code running in the notebook
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

In [None]:
%%time

%pip uninstall -y torch  # Remove existing PyTorch installation, -y flag auto-confirms removal

%pip install -U --no-index --find-links=/kaggle/input/vllm-whl -U vllm  
# Install vLLM from local wheels directory
# -U: Upgrade if already installed
# --no-index: Don't search PyPI
# --find-links: Look for wheels in specified directory

%pip install -U --upgrade /kaggle/input/vllm-t4-fix/grpcio-1.62.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
# Install specific version of grpcio package from local wheel file
# Required for vLLM's RPC communication
# Compatible with Python 3.10 and Linux x86_64

%pip install -U --upgrade /kaggle/input/vllm-t4-fix/ray-2.11.0-cp310-cp310-manylinux2014_x86_64.whl
# Install specific version of Ray framework from local wheel file
# Used by vLLM for distributed computing
# Compatible with Python 3.10 and Linux x86_64

In [None]:
# Import required libraries and modules
import gc  # Garbage collection module for managing memory allocation and deallocation
import warnings  # Warning control module to suppress warnings in the notebook output

warnings.filterwarnings("ignore")  # Suppress warnings
import random  # Random number generation module for setting random seeds for reproducibility purposes
import scipy as sp  # Scientific computing module for mathematical functions and operations on arrays and matrices
import numpy as np  # Numerical computing module for working with arrays and matrices
import pandas as pd  # Data manipulation and analysis module for working with data structures
import math  # Mathematical functions module for mathematical operations
from glob import glob  # File path pattern matching module for finding files in directories
from pathlib import Path  # File path manipulation module for working with file paths
import joblib  # Joblib module for parallel processing and caching
import pickle  # Pickle module for serializing and deserializing Python objects
import itertools  # Iteration module for efficient looping
from tqdm import tqdm  # Progress bar module for tracking the progress of loops and tasks
import re  # Regular expression module for pattern matching and string manipulation
import vllm  # vLLM module for loading and using the vLLM language model for inference tasks and text generation tasks

# Load Model

In [None]:
# Load the vLLM language model with the specified configuration settings and model path
llm = vllm.LLM(
    "/kaggle/input/qwen7bmath",  # "deepseek-ai/deepseek-math-7b-instruct" 
    # The number of GPUs to use for distributed execution with tensor parallelism (4 GPUs)
    tensor_parallel_size=4,
    # The ratio (between 0 and 1) of GPU memory to reserve for the model (0.95 - 95%)
    gpu_memory_utilization=0.95,
    # Trust remote code (e.g., from HuggingFace) when downloading the model and tokenizer (True)
    trust_remote_code=True,
    # Data type for model weights and activations (half precision) -> L4 supports bfloat16
    # dtype="half",
    # Enable eager execution mode for debugging purposes (True)
    enforce_eager=True,
    # Swap space size in GB for storing model weights and activations (2 GB)
    swap_space=2,
)

# Load the vLLM tokenizer for tokenizing input text data and converting it into input tokens for the language model
tokenizer = llm.get_tokenizer()

# Utils

In [None]:
# Function to generate text using the vLLM language model with the specified requests, tokenizer, and model parameters
# Temperature sampling used. You can try experimenting with different values of temperature to control the randomness in text generation.
# Lower values of temperature (e.g., 0.1) will generate more deterministic and conservative text, while higher values of temperature (e.g., 1.0) will generate more diverse and creative text.
# The max_tokens parameter controls the maximum number of tokens to generate in the output text. You can adjust this parameter based on the desired length of the generated text.
def generate_text_vllm(requests, tokenizer, model):

    # Sampling parameters for controlling text generation behavior and output length
    sampling_params = vllm.SamplingParams(
        # Sampling temperature for controlling randomness in text generation (0.7)
        temperature=0.7,
        # Maximum number of tokens to generate in the output text (2048)
        max_tokens=2048,
    )
    # Generate text using the vLLM language model with the specified requests and sampling parameters
    responses = model.generate(requests,
                               sampling_params=sampling_params,
                               use_tqdm=False)
    # Initialize an empty list to store the generated text responses
    response_text_list = ([])
    # Iterate over the generated responses
    for response in responses:
        # Append the generated text to the response text list
        response_text_list.append(response.outputs[0].text)
    # Return the list of generated text responses
    return response_text_list

In [None]:
# Function to extract the numerical answer from the text output generated by the vLLM language model.
def naive_parse(answer):
    out = []
    start = False
    end = False
    # Reverse the text and iterate over the characters
    for l in reversed(list(answer)):
        # Check if the character is a digit and not at the end of the answer text
        if (l in "0123456789" and not end):
            start = True
            out.append(l)
        else:
            if start:
                end = True

    out = reversed(out)
    return "".join(out)

In [None]:
# Tool instruction for the CoT prompt (LaTeX format)
tool_instruction = (
    "\nPlease solve the problem above, and put your final answer within \\boxed{}."
)

In [None]:
# Counter module for counting occurrences of elements in a list or dictionary
from collections import Counter


# Function to determine the most consistent answer among the list of answers provided by the vLLM models for a given prompt. Our final answer will be the most consistent one among the possible ones.
# The function uses the Counter module from the collections library to count the occurrences of each answer in the list and return the most common answer.
# If there is a tie, the function returns the first answer in the list.
# If the list is empty, the function returns 0.
# The function also prints the most common answers and their counts for reference.
def get_majority_vote(answers):

    if not len(answers):
        # Return 0 if the list is empty
        return 0
    # Count the occurrences of each answer in the list
    c = Counter(answers)
    # Get the most common answer and its count
    value, _ = c.most_common()[0]
    # Print the most common answers
    print("Most Common answers : ", c.most_common()[:10])
    # Print a separator line
    print("=" * 50)
    # Try to convert the most common answer to an integer
    try:
        z = abs(value)
    # If the conversion fails, set the answer to 0
    except:
        z = value
    return z

In [None]:
# Regular expression module for pattern matching and string manipulation
import re


# Function to extract the \boxed{} answer from the generated text output using regular expressions and return the answer as an integer. If the answer extraction fails, the function returns -1.
def find_answer(generate_text):

    answer = -1

    try:
        # Extract the \boxed{} answer using regular expressions
        result_output = re.findall(r"\\boxed\{(\d+)\}", generate_text)

        # Check if the answer is found
        if len(result_output) > 0:
            # Parse the answer using the naive_parse function
            no = naive_parse(result_output[0])
            # Check if the parsed answer is not empty
            if len(no) > 0:
                # Convert the answer to an integer and take the modulo 1000
                answer = int(no) % 1000

            #print(answer)

        else:
            # Do nothing
            ok = 1

    except Exception as e:
        #print(e)
        #print("=" * 100)
        answer = -1

    return answer

In [None]:
# Extract the answers from the generated texts. If the answer extraction fails, the function returns -1. The extracted answers are stored in a list.
def extract_answer(texts):
    sols = []
    for text in texts:
        try:
            ans = find_answer(text)
            if ans >= 0:
                sols.append(ans)
        except:
            ans = -1
    return sols

In [None]:
# Function to get the final prediction from the list of solutions using majority voting.
# If the list is empty, the function returns 0. Otherwise, it returns the majority vote.
def fin_pred(sols):
    if len(sols):
        return get_majority_vote(sols)
    else:
        return 0

In [None]:
# Progress bar module for tracking the progress of loops and tasks
from tqdm import tqdm

num_generations = 2  # per generation 128 samples -> around 2 minutes


# Function to solve the given question using the vLLM language model and return the final answer.
# The function generates text for the question prompt with the tool instruction and extracts the answers from the generated text.
# The answers are then aggregated using majority voting to determine the final answer.
# The function returns the final answer as an integer.
def solve(question):
    ans = []
    for i in range(num_generations):
        prompt = question + tool_instruction
        generate_text = generate_text_vllm([prompt] * 128, tokenizer, llm)
        ans.extend(extract_answer(generate_text))
    answer = fin_pred(ans)
    return answer

# Prediction

In [None]:
# Function to make a prediction for the given question using the solve function. The function takes the question ID and the question text as input and returns the prediction as a DataFrame.
def predict(id_: str, question: pl.Series) -> pl.DataFrame | pd.DataFrame:

    # Extract the question text from the input Series
    question = question.to_pandas().values[0]

    # 4 hours 55 minutes limit for the notebook runtime (17700 seconds)
    if time.time() - NB_START <= 17700:
        try:
            # Solve the question using the solve function
            ans = solve(question)
        except:
            ans = 0
    else:
        ans = 0

    # Return the prediction as a DataFrame
    return pl.DataFrame({"id": id_, "answer": ans})

In [None]:
# Initialize the AIMO2InferenceServer with the predict function
inference_server = kaggle_evaluation.aimo_2_inference_server.AIMO2InferenceServer(
    predict)

# Check if the code is running in the Kaggle competition environment
if os.getenv("KAGGLE_IS_COMPETITION_RERUN"):
    # Start the inference server
    inference_server.serve()

# If the code is running in a local
else:
    # Run the local gateway with the test.csv file as input
    inference_server.run_local_gateway(
        ("/kaggle/input/ai-mathematical-olympiad-progress-prize-2/test.csv", ))

# Reference Set

> If you want to get the score on the reference set, set the value of **validate** to **True**

In [None]:
# Set the validation flag to True to validate the model's predictions
validate = False

# Check if the code is running in the Kaggle competition environment
if not os.getenv('KAGGLE_IS_COMPETITION_RERUN') and validate:
    df = pd.read_csv(
        "/kaggle/input/ai-mathematical-olympiad-progress-prize-2/reference.csv"
    )
    ans = []
    cnt = 1
    # For each problem in the dataset
    for i in tqdm(df.problem.tolist()):
        # Start time for each problem
        tmp_time = time.time()
        try:
            # Solve the problem using the model and append the answer
            ans.append(solve(i))
            # Print the problem number and time taken
            print(f"Problem {cnt} solved. Time taken {time.time()-tmp_time}")
            cnt += 1
        except:
            # If the problem cannot be solved, append 0
            ans.append(0)
    # Add the model's answers to the dataframe
    df["model_answer"] = ans
    # Check if the model's answers match the reference answers
    df['match'] = df.answer == df.model_answer
    # Print the number of matches and total examples
    print(f'{df.match.sum()} matches in {len(df)} examples')
    # Display the dataframe with the model's answers and matches
    display(df)