<a href="https://colab.research.google.com/github/TonyHanzhiSU/DL_Kaggle_Competition/blob/main/NYU_Kaggle_Competition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Kaggle Competition

This is the code for a Kaggle competition hosted by NYU DL course professor Gustavo Sandoval. The goal is to fine tune a pre-trained LlaMA 3.1 8B model to output the correctness of mathematical problems. The dataset is from Huggingface [ad6398/nyu-dl-teach-maths-comp](https://huggingface.co/datasets/ad6398/nyu-dl-teach-maths-comp).

Borrowed from [official Unsloth implementation](https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing#scrollTo=MKX_XKs_BNZR)

Install unsloth

In [None]:
# %%capture
# This cell will take time
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

In [None]:
!pip install huggingface-cli
!huggingface-cli login

In [None]:
import torch
print (torch.cuda.is_available())

True


In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 #
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


In [5]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit", # use the pre quantized model
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 8.0. CUDA Toolkit = 12.4.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


KeyboardInterrupt: 

## Load model and wrap with LoRA adapters

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3000,
    use_rslora = True,  # We support rank stabilized LoRA, can improve training stability and performance for specific tasks
    loftq_config = None, # And LoftQ
)

Unsloth 2024.11.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Load fined tuned model from my huggingface account
continuing training the models to improve the score

In [3]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "tonysu/llama_var", # use the pre quantized model
    #model_name = "Kaaay/DL_kaggle",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2024.11.7: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/340 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/336M [00:00<?, ?B/s]

Unsloth 2024.11.7 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Competition dataset

In [4]:
# download and load competition dataset

from datasets import load_dataset
dataset = load_dataset("ad6398/nyu-dl-teach-maths-comp")
# print and see dataset
dataset

README.md:   0%|          | 0.00/2.09k [00:00<?, ?B/s]

train-00000-of-00002.parquet:   0%|          | 0.00/195M [00:00<?, ?B/s]

train-00001-of-00002.parquet:   0%|          | 0.00/195M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/3.65M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['question', 'is_correct', 'answer', 'solution'],
        num_rows: 1000000
    })
    test: Dataset({
        features: ['question', 'is_correct', 'answer', 'solution'],
        num_rows: 10000
    })
})

In [5]:
prompt = """You are a highly skilled math verification assistant. Your task is to evaluate the correctness of the provided answer based on the given explanation.
Use the Explanation to analyze the logic step by step and determine if the answer is valid.
Respond with 'True' if the answer is logically and mathematically correct, otherwise respond with 'False'

### Question:
{}

### Answer:
{}

### Explanation:
{}

### Output:
{}"""


prompt_2 = """You are a math assistant. Analyze the provided solution step by step to determine if the answer is correct.Respond with 'True' if the answer is logically and mathematically correct, otherwise respond with 'False'

### Question:
{}

### Answer:
{}

### Explanation:
{}

### Reasoning:
[Step 1] ...
[Step 2] ...
[Final Step] ...
Output: {}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    question = examples["question"]
    ans       = examples["answer"]
    output      = examples["is_correct"]
    explanation = examples["solution"]
    texts = []
    for instruction, input, explanation,output in zip(question, ans, explanation, output):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt_2.format(instruction, input, explanation, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

In [8]:
from datasets import load_dataset
from transformers import AutoTokenizer, TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel, is_bfloat16_supported
from sklearn.model_selection import KFold
from torch.utils.data import Dataset
import torch
from sklearn.model_selection import train_test_split


# Load dataset from Hugging Face
dataset_name = "ad6398/nyu-dl-teach-maths-comp"
raw_dataset = load_dataset(dataset_name)
subset_dataset = raw_dataset["train"].select(range(420000, 470000))
train_dataset = subset_dataset.map(formatting_prompts_func, batched=True)



# Split the dataset into training and validation sets
train_indices, val_indices = train_test_split(
    list(range(len(train_dataset))),
    test_size=0.1,  # 20% for validation
    random_state=42  # Ensure reproducibility
)

# Select the training and validation subsets
train_subset = train_dataset.select(train_indices)
val_subset = train_dataset.select(val_indices)


# Custom Dataset class
# class MathDataset(Dataset):
#     def __init__(self, data, tokenizer, prompt_template, eos_token):
#         self.data = data
#         self.tokenizer = tokenizer
#         self.prompt_template = prompt_template
#         self.eos_token = eos_token

#     def __len__(self):
#         return len(self.data)

#     def __getitem__(self, idx):
#         row = self.data[idx]
#         question = row["question"]
#         answer = row["answer"]
#         solution = row["solution"]
#         is_correct = row["is_correct"]

#         formatted_text = self.prompt_template.format(
#             question, answer, solution, str(is_correct)
#         ) + self.eos_token

#         tokenized_text = self.tokenizer(
#             formatted_text,
#             truncation=True,
#             padding="max_length",
#             max_length=512,
#             return_tensors="pt"
#         )
#         label = -100 if answer == "" else is_correct
#         label_tensor = torch.tensor(label, dtype=torch.float)
#         return {
#             "input_ids": tokenized_text["input_ids"].squeeze(0),
#             "attention_mask": tokenized_text["attention_mask"].squeeze(0),
#             "labels": label_tensor
#         }

# Define the prompt template and EOS token
# prompt = """You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not. Your response should be 'True' if correct, otherwise 'False'.
# Below is Provided Question and Answer. The Explanation is a detailed reasoning or solution that explains the answer, please use it to enhance the model’s understanding of the answer's correctness.

# ### Question:
# {}

# ### Answer:
# {}

# ### Explanation:
# {}

# ### Output:
# {}"""

# eos_token = tokenizer.eos_token


# Training arguments for SFTTrainer
training_args = TrainingArguments(
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    num_train_epochs=1,
    #max_steps=800,
    learning_rate=2e-4,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    logging_steps=1,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=4242,
    output_dir="outputs",
    report_to="none"
)



Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

check the Ram of CPU

In [None]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 56.9 gigabytes of available RAM

You are using a high-RAM runtime!


## Supervised Find Tuning

In [9]:
for iteration in range(1):
  print(f"Starting Iteration {iteration + 1}")
  trainer = SFTTrainer(
      model = model,
      tokenizer = tokenizer,
      train_dataset = train_subset,
      dataset_text_field = "text",
      max_seq_length = max_seq_length,
      dataset_num_proc = 16, # for A100 settings
      packing = False, # Can make training 5x faster for short sequences.
      args = training_args
  )
  trainer.train()
  trainer.save_model(f"outputs/iteration_{iteration + 1}")
  #eval_results = trainer.evaluate()
  #print(f"Evaluation Results for Iteration {iteration + 1}: {eval_results}")

Starting Iteration 1


Map (num_proc=16):   0%|          | 0/45000 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 45,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 8 | Gradient Accumulation steps = 4
\        /    Total batch size = 32 | Total steps = 1,406
 "-____-"     Number of trainable parameters = 83,886,080


Step,Training Loss
1,0.3546
2,0.3596
3,0.4394
4,0.3726
5,0.3941
6,0.3981
7,0.3617
8,0.3932
9,0.3659
10,0.3613


In [None]:
iteration = 0
model_path = f"outputs/iteration_{iteration + 1}"

# Load the model using FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_path,  # Path to saved model
    max_seq_length=512,     # Set max sequence length
    dtype=None,           # Automatically detect fp16, bf16, or fp32 based on GPU support
    load_in_4bit=False      # Load in 4-bit precision for memory efficiency if desired
)


==((====))==  Unsloth 2024.11.6: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 8.9. CUDA Toolkit = 12.4.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!




Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

## Test on training set

Before testing on the full 10k test dataset, run the model on some random train dataset to see its performance.

Resume my model from HuggingFace

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "tonysu/llama_var", # use the pre quantized model
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = True,
)

==((====))==  Unsloth 2024.11.6: Fast Llama patching. Transformers = 4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 8.0. CUDA Toolkit = 12.4.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/340 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/336M [00:00<?, ?B/s]

Unsloth 2024.11.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [12]:
import torch
from sklearn.metrics import accuracy_score
from tqdm import tqdm
from torch.utils.data import DataLoader, Dataset
import random
# Select a subset of the training data for evaluation (e.g., first 1000 samples)
# Shuffle and select random indices
num_samples = 1000
train_dataset = dataset['train']
random_indices = random.sample(range(len(train_dataset)), num_samples)
eval_subset = train_dataset.select(random_indices)  # Adjust size as needed

# Prepare DataLoader for evaluation
class EvaluationDataset(Dataset):
    def __init__(self, data, tokenizer, prompt_template, eos_token):
        self.data = data
        self.tokenizer = tokenizer
        self.prompt_template = prompt_template
        self.eos_token = eos_token

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row = self.data[idx]
        question = row["question"]
        answer = row["answer"]
        explanation = row["solution"]
        label = row["is_correct"]  # True/False label

        # Format the input prompt
        input_prompt = self.prompt_template.format(
            question,
            answer,
            explanation,
            ""
        )

        # Tokenize the input
        tokenized_input = self.tokenizer(
            input_prompt,
            truncation=True,
            padding="max_length",
            max_length=512,
            return_tensors="pt"
        )

        return {
            "input_ids": tokenized_input["input_ids"].squeeze(0),
            "attention_mask": tokenized_input["attention_mask"].squeeze(0),
            "label": label  # True/False as target
        }

# Define the prompt template
prompt = """You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not. Your response should be 'True' if correct, otherwise 'False'.
Below is Provided Question and Answer. The Explanation is a detailed reasoning or solution that explains the answer, generate "True" or "False" after Output without anything else.

### Question:
{}

### Answer:
{}

### Explanation:
{}

### Output:
{}"""

eos_token = tokenizer.eos_token
eval_dataset = EvaluationDataset(eval_subset, tokenizer, prompt, eos_token)
eval_loader = DataLoader(eval_dataset, batch_size=16, shuffle=False)

# Model evaluation
FastLanguageModel.for_inference(model)
predictions = []
true_labels = []

def find_word_after_output(text):
    words = text.split()  # Split the text into a list of words
    for i, word in enumerate(words):
        if word.lower() == "output:":  # Check for "Output:"
            if i + 1 < len(words):  # Ensure there's a word after "Output:"
                return words[i + 1]
    return None  # Return None if "Output:" is not found or no word follows

with torch.no_grad():
    for batch in tqdm(eval_loader, desc="Evaluating"):
        inputs = {
            "input_ids": batch["input_ids"].to("cuda"),
            "attention_mask": batch["attention_mask"].to("cuda")
        }
        labels = batch["label"]

        # Generate predictions
        outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)

        # Decode and extract the first word
        for i, output in enumerate(outputs):
            response = tokenizer.batch_decode([output], skip_special_tokens=True)
            #print(response[0])
            the_word = find_word_after_output(response[0])  # Get the last word (True/False)
            #print(the_word)
            predictions.append(the_word in ("true", "True", "TRUE"))  # Convert to boolean
            true_labels.append(labels[i])
            #true_labels.append(labels[i].lower() == "true")  # Convert true label to boolean

# Calculate accuracy
print(predictions)
print(true_labels)
accuracy = accuracy_score(true_labels, predictions)
print(f"Accuracy on evaluation subset: {accuracy:.4f}")

Evaluating: 100%|██████████| 63/63 [04:38<00:00,  4.41s/it]

[False, True, False, True, False, True, False, True, False, False, False, True, False, True, True, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, True, True, True, True, False, True, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, True, False, False, False, True, False, True, False, False, False, False, False, True, False, False, False, False, False, False, True, True, False, True, True, True, True, True, False, False, False, True, False, False, False, False, False, False, False, True, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, True, True, False, True, False, True, False, True, False, True, False, False, False, False, False, False, True, F




## Batch Inference

Run the batch infrence on 10k test dataset and save the result in a csv file to submit.

In [7]:
import pandas as pd
from torch.utils.data import DataLoader, Dataset
from tqdm import tqdm

# Load test dataset
test_dataset = dataset['test']

# Prepare the prompt
prompt = """You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not. Your response should be 'True' if correct, otherwise 'False'.
Below is Provided Question and Answer. The Explanation is a detailed reasoning or solution that explains the answer, generate "True" or "False" after Output without anything else.

### Question:
{}

### Answer:
{}

### Explanation:
{}

### Output:
{}"""
# Custom Dataset Class for Batch Inference
class MathInferenceDataset(Dataset):
    def __init__(self, data, tokenizer, prompt_template, eos_token):
        self.data = data
        self.tokenizer = tokenizer
        self.prompt_template = prompt_template
        self.eos_token = eos_token

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):

        row = self.data[idx]
        question = row["question"]
        answer = row["answer"]
        explanation = row["solution"]

        # Format the input prompt
        input_prompt = self.prompt_template.format(
            question,
            answer,
            explanation,
            ""  # Placeholder for model to generate
        )

        # Tokenize the input
        tokenized_input = self.tokenizer(
            input_prompt,
            truncation=True,
            padding="max_length",
            max_length=512,
            return_tensors="pt"
        )

        return {
            "input_ids": tokenized_input["input_ids"].squeeze(0),
            "attention_mask": tokenized_input["attention_mask"].squeeze(0),
            "id": int(idx)
        }

def find_word_after_output(text):
    words = text.split()  # Split the text into a list of words
    for i, word in enumerate(words):
        if word.lower() == "output:":  # Check for "Output:"
            if i + 1 < len(words):  # Ensure there's a word after "Output:"
                return words[i + 1]
    return None  # Return None if "Output:" is not found or no word follows

# Create DataLoader for batch inference
batch_size = 64  # adjust this based on your GPU memory
eos_token = tokenizer.eos_token
inference_dataset = MathInferenceDataset(test_dataset, tokenizer, prompt, eos_token)
inference_loader = DataLoader(inference_dataset, batch_size=batch_size, shuffle=False)

# Enable 2x faster inference
FastLanguageModel.for_inference(model)

# List to store predictions and IDs
predictions = []
ids = []

# Perform batch inference
# model.eval()
with torch.no_grad():
    for batch in tqdm(inference_loader, desc="Processing Batches", total=len(inference_loader)):
        inputs = {
            "input_ids": batch["input_ids"].to("cuda"),
            "attention_mask": batch["attention_mask"].to("cuda")
        }

        # Generate predictions
        outputs = model.generate(**inputs, max_new_tokens=64, use_cache=True)
        #input_lengths = (inputs["input_ids"] != tokenizer.pad_token_id).sum(dim=1).tolist()



        for i, output in enumerate(outputs):
            #trimmed_output = output[input_lengths[i]:]  # Trim to get only the generated tokens
            response = tokenizer.batch_decode([output], skip_special_tokens=True)
            #print(response)
            the_word = find_word_after_output(response[0])  # Get the last word (True/False)
            #print(the_word)
            predictions.append(the_word in ("true", "True", "TRUE"))  # Convert to boolean
            #predictions.append(first_word)  # Store as "True" or "False"
            ids.append(int(batch["id"][i]))  # Ensure id is Python int
# Prepare submission file
submission = pd.DataFrame({
    "ID": ids,
    "is_correct": predictions
})

submission.to_csv("submission.csv", index=False)
print("Submission file saved as submission.csv")

Processing Batches: 100%|██████████| 157/157 [1:27:35<00:00, 33.47s/it]

Submission file saved as submission.csv





## sample inference

 Used to verified a single model output

In [None]:
# Sample inferene data point
test_dataset = dataset['test']

sample_ques = test_dataset['question'][123]
sample_ans = test_dataset['answer'][123]


In [None]:
sample_ques

'An apple tree produces 40 apples in its first year.  The second year the apple tree produces 8 more than double the amount of apples that it produced the first year, and the third year production went down by a fourth due to an insect infestation.  How many apples did the tree produce in total in the first three years?'

In [None]:
prompt = """You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not. Your response should be 'True' if correct, otherwise 'False'.
Below is Provided Question and Answer. The Explanation is a detailed reasoning or solution that explains the answer, please use it to enhance the model’s understanding of the answer's correctness.
Please generate your answer after Output and only input 'True' or 'False' without anything else.

### Question:
{}

### Answer:
{}

### Explanation:
{}

### Output:
{}"""

input_prompt = prompt.format(
        sample_ques, # ques
        sample_ans, # given answer
        test_dataset['solution'][123],
        "", # output - leave this blank for generation! LLM willl generate is it is True or False
    )

print("Input Promt:\n", input_prompt)

Input Promt:
 You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not. Your response should be 'True' if correct, otherwise 'False'.
Below is Provided Question and Answer. The Explanation is a detailed reasoning or solution that explains the answer, please use it to enhance the model’s understanding of the answer's correctness.
Please generate your answer after Output and only input 'True' or 'False' without anything else.

### Question:
An apple tree produces 40 apples in its first year.  The second year the apple tree produces 8 more than double the amount of apples that it produced the first year, and the third year production went down by a fourth due to an insect infestation.  How many apples did the tree produce in total in the first three years?

### Answer:
194

### Explanation:
Let's solve this problem using Python code.
<llm-code>
num_of_apples_first_year = 40
num_of_apples_second_year = 40 * 2 + 8
num_of_apples_t

In [None]:
# Running inference on single test
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
# input_prompt = prompt.format(
#         sample_ques, # ques
#         sample_ans, # given answer
#         "", # output - leave this blank for generation! LLM willl generate is it is True or False
#     )

inputs = tokenizer(
[
    input_prompt
], return_tensors = "pt").to("cuda")

input_shape = inputs['input_ids'].shape
input_token_len = input_shape[1] # 1 because of batch
outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
# you can get the whole generated text by uncommenting the below line
# text_generated = tokenizer.batch_decode([outputs, skip_special_tokens=True)

response = tokenizer.batch_decode([outputs[0][input_token_len:]], skip_special_tokens=True)
response

Input Promt:
 You are a great mathematician and you are tasked with finding if an answer to a given maths question is correct or not. Your response should be 'True' if correct, otherwise 'False'.
Below is Provided Question and Answer. The Explanation is a detailed reasoning or solution that explains the answer, please use it to enhance the model’s understanding of the answer's correctness.
Please generate your answer after Output and only input 'True' or 'False' without anything else.

### Question:
An apple tree produces 40 apples in its first year.  The second year the apple tree produces 8 more than double the amount of apples that it produced the first year, and the third year production went down by a fourth due to an insect infestation.  How many apples did the tree produce in total in the first three years?

### Answer:
194

### Explanation:
Let's solve this problem using Python code.
<llm-code>
num_of_apples_first_year = 40
num_of_apples_second_year = 40 * 2 + 8
num_of_apples_t

['194\n\n### Explanation:\n["The circle is inscribed in a triangle, and we know the sides of the triangle.\\nTo use the inradius formula, we need to know the area of the triangle.\\nWe can use Heron\'s formula to calculate the area.\\n<llm-code>\\nimport math\\n']

## saving model

Saving the model locally and also push to huggingface.

In [None]:
model.save_pretrained("model_1114_v1") # Local saving
tokenizer.save_pretrained("model_1114_v1")

('model_1114_v1/tokenizer_config.json',
 'model_1114_v1/special_tokens_map.json',
 'model_1114_v1/tokenizer.json')

In [None]:
model.save_pretrained("model_1114_v3") # Local saving
tokenizer.save_pretrained("model_1114_v3")

('model_1114_v3/tokenizer_config.json',
 'model_1114_v3/special_tokens_map.json',
 'model_1114_v3/tokenizer.json')

In [13]:
from huggingface_hub import Repository

# Define your repository on Hugging Face
repo_name = "tonysu/llama_var"


# Push the model to Hugging Face
model.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

No files have been modified since last commit. Skipping to prevent empty commit.


Saved model to https://huggingface.co/tonysu/llama_var
