# test

In [1]:
import sys
import os
import numpy as np
import pandas as pd

# load reference data from a file
ref_path = '../temp/reference.csv'
ref_data = pd.read_csv(ref_path)
ref_data


Unnamed: 0,id,problem,answer
0,057f8a,Three airline companies operate flights from D...,79
1,192e23,Fred and George take part in a tennis tourname...,250
2,1acac0,Triangle $ABC$ has side length $AB = 120$ and ...,180
3,1fce4b,Find the three-digit number $n$ such that writ...,143
4,349493,"We call a sequence $a_1, a_2, \ldots$ of non-n...",3
5,480182,"Let $ABC$ be a triangle with $BC=108$, $CA=126...",751
6,71beb6,"For a positive integer $n$, let $S(n)$ denote ...",891
7,88c219,"For positive integers $x_1,\ldots, x_n$ define...",810
8,a1d40b,The Fibonacci numbers are defined as follows: ...,201
9,bbd91e,Alice writes all positive integers from $1$ to...,902


In [2]:
# show a problem
import textwrap

wrapped_text = textwrap.fill(ref_data.iloc[0, 1], width=50)
print(wrapped_text)

Three airline companies operate flights from
Dodola island. Each company has a different
schedule of departures. The first company departs
every 100 days, the second every 120 days and the
third every 150 days. What is the greatest
positive integer $d$ for which it is true that
there will be $d$ consecutive days without a
flight from Dodola island, regardless of the
departure times of the various airlines?


# Qwen/Qwen2.5-Math-7B-Instruct

<https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct>

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer


class TransformerSolver:
    def __init__(self, model_name, device="cuda"):
        self.model_name = model_name
        self.device = device
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype="auto",
            device_map="auto"
        )
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)

    def solve(self, prompt, reasoning_type="CoT"):
        if reasoning_type == "CoT":
            messages = [
                {"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
                {"role": "user", "content": prompt}
            ]
        elif reasoning_type == "TIR":
            messages = [
                {"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},
                {"role": "user", "content": prompt}
            ]
        else:
            raise ValueError("Unsupported reasoning type")

        text = self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )
        model_inputs = self.tokenizer([text], return_tensors="pt").to(self.device)

        generated_ids = self.model.generate(
            **model_inputs,
            max_new_tokens=10000
        )
        generated_ids = [
            output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
        ]

        response = self.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
        return response

In [4]:
import time

time1 = time.time()
model_name = "Qwen/Qwen2.5-Math-7B-Instruct"
device = "cuda" # the device to load the model onto
prompt = ref_data.iloc[0, 1]
solver_Qwen_7B = TransformerSolver(model_name, device)
response = solver_Qwen_7B.solve(prompt, reasoning_type="TIR")
time2 = time.time()

print(response)
print(f"Time taken: {time2-time1} seconds")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the cpu.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
2024-10-24 14:51:21.864749: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-10-24 14:51:22.392745: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


To solve this problem, we need to find the greatest positive integer \( d \) such that there will be \( d \) consecutive days without a flight from Dodola island, regardless of the departure times of the three airlines. This means we need to find the longest possible gap between the departures of the three airlines.

The key is to find the least common multiple (LCM) of the three intervals (100, 120, and 150). The LCM of these intervals will give us the length of the cycle after which the departures repeat. Once we have the LCM, we can determine the maximum gap within this cycle.

Let's calculate the LCM of 100, 120, and 150:

1. The prime factorization of 100 is \(2^2 \cdot 5^2\).
2. The prime factorization of 120 is \(2^3 \cdot 3 \cdot 5\).
3. The prime factorization of 150 is \(2 \cdot 3 \cdot 5^2\).

The LCM is obtained by taking the highest power of each prime that appears in the factorizations:

\[
\text{LCM}(100, 120, 150) = 2^3 \cdot 3 \cdot 5^2 = 600
\]

So, the departures rep

In [5]:
# show answer
import textwrap

wrapped_text = textwrap.fill(response, width=100)
print(wrapped_text)

To solve this problem, we need to find the greatest positive integer \( d \) such that there will be
\( d \) consecutive days without a flight from Dodola island, regardless of the departure times of
the three airlines. This means we need to find the longest possible gap between the departures of
the three airlines.  The key is to find the least common multiple (LCM) of the three intervals (100,
120, and 150). The LCM of these intervals will give us the length of the cycle after which the
departures repeat. Once we have the LCM, we can determine the maximum gap within this cycle.  Let's
calculate the LCM of 100, 120, and 150:  1. The prime factorization of 100 is \(2^2 \cdot 5^2\). 2.
The prime factorization of 120 is \(2^3 \cdot 3 \cdot 5\). 3. The prime factorization of 150 is \(2
\cdot 3 \cdot 5^2\).  The LCM is obtained by taking the highest power of each prime that appears in
the factorizations:  \[ \text{LCM}(100, 120, 150) = 2^3 \cdot 3 \cdot 5^2 = 600 \]  So, the
departures rep

# Qwen/Qwen2.5-Math-72B-Instruct

<https://huggingface.co/Qwen/Qwen2.5-Math-72B-Instruct>

In [None]:
model_name = "Qwen/Qwen2.5-Math-72B-Instruct"
device = "cuda" # the device to load the model onto
prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."
solver_Qwen_7B = TransformerSolver(model_name, device)
response = solver_Qwen_7B.solve(prompt, reasoning_type="CoT")
response