<a href="https://colab.research.google.com/github/Bryan-Az/Mathematics-LLM/blob/main/%5BEvaluation%2C_GGUF%2C_Quantization%5D_Mathematics_LLM_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Evaluating the Math Finetuned 'Education & Math' Pre-Trained HuggingFaceTB SmolLM2-1.7B-Instruct Model as Compared to the Base Finetuned Model

This notebook is running on a GPU environment in Google Colab. The pre-trained foundation model we are using is being pulled from a cloud repository on HuggingFace. The model backbone was originally hosted by HuggingFaceTB, before being fine-tuned on our own dataset of math problems and subsequently uploaded to our own public model repo. Both the finetuned base/pre-trained LLaMA models have been quantized for use with llama.cpp for less memory-intense inference.

## Imports and Installs

In [2]:
# login with huggginface for using gated LLaMA foundational models
!huggingface-cli login --token $hf_token

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
The token `generaluse` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `generaluse`


In [5]:
%%capture
!pip install llama-cpp-python
!pip install datasets

In [12]:
from google.colab import userdata
from huggingface_hub import hf_hub_download
from llama_cpp import Llama
from torch.utils.data import Dataset as TorchDataset
from datasets import load_dataset
import pandas as pd
import re
import os
hf_token = userdata.get('HF_TOKEN')

## Loading the Quantized Models

In [32]:
finetuned_pretrained_model_repo = 'Alexis-Az/Math-Problem-LlaMA-3.2-1.7B-GGUF'
finetuned_base_model_repo = 'Alexis-Az/Math-Problem-LlaMA-3.2-1B-GGUF'
filename = 'unsloth.Q4_K_M.gguf'
filename_ftpt = 'unsloth_ftpt.Q4_K_M.gguf'
filename_ftb = 'unsloth_ftb.Q4_K_M.gguf'
max_seq_length = 4096

In [29]:
# Download the file
temp_path = hf_hub_download(repo_id=finetuned_base_model_repo, filename=filename, local_dir='.')

# Rename the file
os.rename(temp_path, filename_ftb)
# Download the file
temp_path = hf_hub_download(repo_id=finetuned_pretrained_model_repo, filename=filename, local_dir = '.')

# Rename the file
os.rename(temp_path, filename_ftpt)

unsloth.Q4_K_M.gguf:   0%|          | 0.00/955M [00:00<?, ?B/s]

unsloth.Q4_K_M.gguf:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

In [34]:
%%capture
finetuned_base_model = Llama(model_path=filename_ftb, max_seq_length=max_seq_length)

In [35]:
%%capture
finetuned_pretrained_model= Llama(model_path=filename_ftpt, max_seq_length=max_seq_length)

## Loading the Math Related Evaluation Datasets

In [37]:
integration_dataset="Alexis-Az/math_datasets"

### Addition Data

In [None]:
eval_df = pd.read_csv("training_data/addition_operations_eval.csv")

### Roots Data

In [38]:
val_roots = (load_dataset(integration_dataset, 'roots', split="train[-2000:]")).shuffle()

README.md:   0%|          | 0.00/2.99k [00:00<?, ?B/s]

roots/Roots.csv:   0%|          | 0.00/5.31M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/10000 [00:00<?, ? examples/s]

### Derivatives Data

In [39]:
val_derivs = (load_dataset(integration_dataset, 'derivatives', split="train[-2000:]")).shuffle()

derivatives/Derivatives.csv:   0%|          | 0.00/1.56M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/10000 [00:00<?, ? examples/s]