# Math Question Answer Verification Competition

## Starter Code

Borrowed from [official Unsloth implementation](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing#scrollTo=MKX_XKs_BNZR)

In [2]:
# %%capture
# This cell will take time
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Found existing installation: unsloth 2024.11.7
Uninstalling unsloth-2024.11.7:
  Successfully uninstalled unsloth-2024.11.7
Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-x0xjfrko/unsloth_7a2028707b0e4c33951c98824b0d3d27
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-x0xjfrko/unsloth_7a2028707b0e4c33951c98824b0d3d27
  Resolved https://github.com/unslothai/unsloth.git to commit f26d4e739ed507de7a9088da53d10fd02f58d160
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2024.11.7-py3-none-a

In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Load model and wrap with LoRA adapters

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, 
    bias = "none",   
    use_gradient_checkpointing = "unsloth", 
    random_state = 3407,
    use_rslora = True,  
    loftq_config = None, 
)

Unsloth 2024.11.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Competition dataset

In [6]:
# download and load competition dataset

from datasets import load_dataset
dataset = load_dataset("ad6398/nyu-dl-teach-maths-comp")
# print and see dataset
dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'is_correct', 'answer', 'solution'],
        num_rows: 1000000
    })
    test: Dataset({
        features: ['question', 'is_correct', 'answer', 'solution'],
        num_rows: 10000
    })
})

In [7]:
prompt = """As a skilled mathematician, your job is to check the accuracy of a given answer to a math problem. Analyze the question and solution carefully. If the answer is correct, respond with 'True'; if it is incorrect, respond with 'False.' Your response should only be 'True' or 'False.' Here is the math question and the provided answer.



### Question:
{}

### Answer:
{}

### Explainaition
{}
### Output:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    question = examples["question"]
    ans       = examples["answer"]
    exp       = examples["solution"]
    output      = examples["is_correct"]
    texts = []
    for instruction, input,expla, output in zip(question, ans, exp, output):
        
        text = prompt.format(instruction, input, expla, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }




In [8]:
# Process the training dataset and generate prompt for each datapoint

train_dataset = dataset['train'].map(formatting_prompts_func, batched = True,)

In [9]:
# Randomly sample 10,000 examples from the training dataset
train_sample = train_dataset.shuffle(seed=42).select(range(30000))

# Apply the formatting function in batched mode
train_sample = train_sample.map(formatting_prompts_func, batched=True)

In [10]:
# Print a sample formatted training example
print(train_sample['text'][0])

As a skilled mathematician, your job is to check the accuracy of a given answer to a math problem. Analyze the question and solution carefully. If the answer is correct, respond with 'True'; if it is incorrect, respond with 'False.' Your response should only be 'True' or 'False.' Here is the math question and the provided answer.



### Question:
A line is parameterized by
\[\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 2 \\ 3 \end{pmatrix} + t \begin{pmatrix} -1 \\ 5 \end{pmatrix}.\]A second line is parameterized by
\[\begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 0 \\ 7 \end{pmatrix} + u \begin{pmatrix} -1 \\ 4 \end{pmatrix}.\]Find the point where the lines intersect.

### Answer:
(2/3,4/3)

### Explainaition
First, we need to solve the system of equations
\[
\begin{aligned}
2 - t &= s\\
3 + 5t &= 7 + 4s
\end{aligned}
\]
by eliminating s.
We'll use sympy.
<llm-code>
from sympy import symbols, solve

# define the variables
t, s = symbols('t s')

# define the equations

## SFT

In [11]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

training_args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 1300,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    )

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_sample,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 4,
    packing = False, # Can make training 5x faster for short sequences.
    args = training_args
)

max_steps is given, it will override any value given in num_train_epochs


In [12]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 30,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 1,300
 "-____-"     Number of trainable parameters = 83,886,080


Step,Training Loss
1,1.5467
2,1.5467
3,1.509
4,1.0893
5,0.901
6,0.8642
7,0.6315
8,0.7041
9,0.638
10,0.7077


Step,Training Loss
1,1.5467
2,1.5467
3,1.509
4,1.0893
5,0.901
6,0.8642
7,0.6315
8,0.7041
9,0.638
10,0.7077


## inference

In [13]:
# Sample inferene data point
test_dataset = dataset['test']

sample_ques = test_dataset['question'][90]
sample_ans = test_dataset['answer'][90]
sample_exp = test_dataset['solution'][90]


In [14]:
# Running inference on single test
FastLanguageModel.for_inference(model)
input_prompt = prompt.format(
        sample_ques, # ques
        sample_ans, # given answer
        sample_exp,
        "", 
    )

print("Input Promt:\n", input_prompt)
inputs = tokenizer(
[
    input_prompt
], return_tensors = "pt").to("cuda")

input_shape = inputs['input_ids'].shape
input_token_len = input_shape[1] 
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)

response = tokenizer.batch_decode([outputs[0][input_token_len:]], skip_special_tokens=True)
response

Input Promt:
 As a skilled mathematician, your job is to check the accuracy of a given answer to a math problem. Analyze the question and solution carefully. If the answer is correct, respond with 'True'; if it is incorrect, respond with 'False.' Your response should only be 'True' or 'False.' Here is the math question and the provided answer.



### Question:
Find the number of functions $f : \mathbb{R} \to \mathbb{R}$ such that
\[f(x + f(y)) = x + y\]for all real numbers $x$ and $y.$

### Answer:
1

### Explainaition
If $x = -f(y)$, we get $f(-f(y)) = f(-f(y)) + y$, so $y = f(0)$.
So $f(x) = x - f(0)$.
Plugging this into the original equation gives us $(x - f(0) + y - f(0)) = x + y$, so $f(0) = 0$ and $f(x) = x$.
The given equation is satisfied and we get $\boxed{1}$ solution.
### Output:



['True']

In [15]:

import numpy as np


test_dataset = dataset['test']
correct_pred = 0
preds = []
size_ds = 1000

for i in range(test_dataset.num_rows):
  sample_ques = test_dataset['question'][i]
  sample_ans = test_dataset['answer'][i]
  sample_exp = test_dataset['solution'][i]
  # Running inference on single test
  FastLanguageModel.for_inference(model) # Enable native 2x faster inference
  input_prompt = prompt.format(
          sample_ques, # ques
          sample_ans,# given answer
          sample_exp,
          "", 
      )


  inputs = tokenizer(
  [
      input_prompt
  ], return_tensors = "pt").to("cuda")

  input_shape = inputs['input_ids'].shape
  input_token_len = input_shape[1]
  outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)

  response = tokenizer.batch_decode([outputs[0][input_token_len:]], skip_special_tokens=True)
  preds.append(response[0])
  print(i)
# accuracy = correct_pred/size_ds

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
5000
5001
5002
5003
5004
5005
5006
5007
5008
5009
5010
5011
5012
5013
5014
5015
5016
5017
5018
5019
5020
5021
5022
5023
5024
5025
5026
5027
5028
5029
5030
5031
5032
5033
5034
5035
5036
5037
5038
5039
5040
5041
5042
5043
5044
5045
5046
5047
5048
5049
5050
5051
5052
5053
5054
5055
5056
5057
5058
5059
5060
5061
5062
5063
5064
5065
5066
5067
5068
5069
5070
5071
5072
5073
5074
5075
5076
5077
5078
5079
5080
5081
5082
5083
5084
5085
5086
5087
5088
5089
5090
5091
5092
5093
5094
5095
5096
5097
5098
5099
5100
5101
5102
5103
5104
5105
5106
5107
5108
5109
5110
5111
5112
5113
5114
5115
5116
5117
5118
5119
5120
5121
5122
5123
5124
5125
5126
5127
5128
5129
5130
5131
5132
5133
5134
5135
5136
5137
5138
5139
5140
5141
5142
5143
5144
5145
5146
5147
5148
5149
5150
5151
5152
5153
5154
5155
5156
5157
5158
5159
5160
5161
5162
5163
5164
5165
5166
5167
5168
5169
5170
5171
5172
5173
5174
5175
5176
5177
5178
5179
5180
5181
5182
5183
5184
5185
5186


In [None]:
import pandas as pd
vals = pd.Series(preds)
def process_data(text):
  if "True" in text.split('\n')[0]:
    return "True"
  elif "False" in text.split('\n')[0]:
    return "False"

nvals = vals.apply(process_data)
nvals.value_counts()
nvals_arr = nvals.to_list()

In [None]:
# Add code to create the csv
import pandas as pd

df = pd.DataFrame({
    'ID': range(len(preds)),    # Index of the array
    'is_correct': nvals_arr     # Values from the array
})

# Save to csv file
df.to_csv("/content/submission_1.csv", index=False)

## saving model

In [21]:
model.save_pretrained("/content/lora_model")
tokenizer.save_pretrained("/content/lora_model")

('/content/lora_model/tokenizer_config.json',
 '/content/lora_model/special_tokens_map.json',
 '/content/lora_model/tokenizer.json')

In [22]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/content/lora_model", 
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) 


==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
