To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**Read our [Gemma 3 blog](https://unsloth.ai/blog/gemma3) for what's new in Unsloth and our [Reasoning blog](https://unsloth.ai/blog/r1-reasoning) on how to train reasoning models.**

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth vllm
else:
    # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]
    !pip install --no-deps unsloth vllm

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#@title Colab Extra Install { display-mode: "form" }
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth vllm
else:
    !pip install --no-deps unsloth vllm
    # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]
    # Skip restarting message in Colab
    import sys, re, requests; modules = list(sys.modules.keys())
    for x in modules: sys.modules.pop(x) if "PIL" in x or "google" in x else None
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft "trl==0.15.2" triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer

    # vLLM requirements - vLLM breaks Colab due to reinstalling numpy
    f = requests.get("https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt").content
    with open("vllm_requirements.txt", "wb") as file:
        file.write(re.sub(rb"(transformers|numpy|xformers)[^\n]{1,}\n", b"", f))
    !pip install -r vllm_requirements.txt

### Unsloth

Load up `Qwen 2.5 3B Instruct`, and set parameters

In [None]:
from unsloth import FastLanguageModel, is_bfloat16_supported
import torch
max_seq_length = 1024 # Can increase for longer reasoning traces
lora_rank = 64 # Larger rank = smarter, but slower

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen2.5-3B-Instruct",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    fast_inference = True, # Enable vLLM fast inference
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.5, # Reduce if out of memory
)

model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ], # Remove QKVO if out of memory
    lora_alpha = lora_rank,
    use_gradient_checkpointing = "unsloth", # Enable long context finetuning
    random_state = 3407,
)

### Data Prep
<a name="Data"></a>

We directly leverage [@willccbb](https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb) for data prep and all reward functions. You are free to create your own!

In [None]:
import re
from datasets import load_dataset, Dataset

# Load and prep dataset
SYSTEM_PROMPT = """
Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
...
</answer>
"""

XML_COT_FORMAT = """\
<reasoning>
{reasoning}
</reasoning>
<answer>
{answer}
</answer>
"""

def extract_xml_answer(text: str) -> str:
    answer = text.split("<answer>")[-1]
    answer = answer.split("</answer>")[0]
    return answer.strip()

def extract_hash_answer(text: str) -> str | None:
    if "####" not in text:
        return None
    return text.split("####")[1].strip()

# uncomment middle messages for 1-shot prompting
def get_gsm8k_questions(split = "train") -> Dataset:
    data = load_dataset('openai/gsm8k', 'main')[split] # type: ignore
    data = data.map(lambda x: { # type: ignore
        'prompt': [
            {'role': 'system', 'content': SYSTEM_PROMPT},
            {'role': 'user', 'content': x['question']}
        ],
        'answer': extract_hash_answer(x['answer'])
    }) # type: ignore
    return data # type: ignore

dataset = get_gsm8k_questions()

# Reward functions
def correctness_reward_func(prompts, completions, answer, **kwargs) -> list[float]:
    responses = [completion[0]['content'] for completion in completions]
    q = prompts[0][-1]['content']
    extracted_responses = [extract_xml_answer(r) for r in responses]
    print('-'*20, f"Question:\n{q}", f"\nAnswer:\n{answer[0]}", f"\nResponse:\n{responses[0]}", f"\nExtracted:\n{extracted_responses[0]}")
    return [2.0 if r == a else 0.0 for r, a in zip(extracted_responses, answer)]

def int_reward_func(completions, **kwargs) -> list[float]:
    responses = [completion[0]['content'] for completion in completions]
    extracted_responses = [extract_xml_answer(r) for r in responses]
    return [0.5 if r.isdigit() else 0.0 for r in extracted_responses]

def strict_format_reward_func(completions, **kwargs) -> list[float]:
    """Reward function that checks if the completion has a specific format."""
    pattern = r"^<reasoning>\n.*?\n</reasoning>\n<answer>\n.*?\n</answer>\n$"
    responses = [completion[0]["content"] for completion in completions]
    matches = [re.match(pattern, r) for r in responses]
    return [0.5 if match else 0.0 for match in matches]

def soft_format_reward_func(completions, **kwargs) -> list[float]:
    """Reward function that checks if the completion has a specific format."""
    pattern = r"<reasoning>.*?</reasoning>\s*<answer>.*?</answer>"
    responses = [completion[0]["content"] for completion in completions]
    matches = [re.match(pattern, r, flags=re.DOTALL) for r in responses]
    return [0.5 if match else 0.0 for match in matches]

def count_xml(text) -> float:
    count = 0.0
    if text.count("<reasoning>\n") == 1:
        count += 0.125
    if text.count("\n</reasoning>\n") == 1:
        count += 0.125
    if text.count("\n<answer>\n") == 1:
        count += 0.125
        count -= len(text.split("\n</answer>\n")[-1])*0.001
    if text.count("\n</answer>") == 1:
        count += 0.125
        count -= (len(text.split("\n</answer>")[-1]) - 1)*0.001
    return count

def xmlcount_reward_func(completions, **kwargs) -> list[float]:
    contents = [completion[0]["content"] for completion in completions]
    return [count_xml(c) for c in contents]

<a name="Train"></a>
### Train the model

Now set up GRPO Trainer and all configurations!

In [None]:
from trl import GRPOConfig, GRPOTrainer
training_args = GRPOConfig(
    use_vllm = True, # use vLLM for fast inference!
    learning_rate = 5e-6,
    adam_beta1 = 0.9,
    adam_beta2 = 0.99,
    weight_decay = 0.1,
    warmup_ratio = 0.1,
    lr_scheduler_type = "cosine",
    optim = "adamw_8bit",
    logging_steps = 1,
    bf16 = is_bfloat16_supported(),
    fp16 = not is_bfloat16_supported(),
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 1, # Increase to 4 for smoother training
    num_generations = 8, # Decrease if out of memory
    max_prompt_length = 256,
    max_completion_length = 200,
    # num_train_epochs = 1, # Set to 1 for a full training run
    max_steps = 250,
    save_steps = 250,
    max_grad_norm = 0.1,
    report_to = "none", # Can use Weights & Biases
    output_dir = "outputs",
)

And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the `reward` column increase!

You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient!

| Step | Training Loss | reward    | reward_std | completion_length | kl       |
|------|---------------|-----------|------------|-------------------|----------|
| 1    | 0.000000      | 0.125000  | 0.000000   | 200.000000        | 0.000000 |
| 2    | 0.000000      | 0.072375  | 0.248112   | 200.000000        | 0.000000 |
| 3    | 0.000000      | -0.079000 | 0.163776   | 182.500000        | 0.000005 |


In [None]:
trainer = GRPOTrainer(
    model = model,
    processing_class = tokenizer,
    reward_funcs = [
        xmlcount_reward_func,
        soft_format_reward_func,
        strict_format_reward_func,
        int_reward_func,
        correctness_reward_func,
    ],
    args = training_args,
    train_dataset = dataset,
)
trainer.train()

In [None]:
import pandas as pd

log_df = pd.DataFrame(trainer.state.log_history)
log_df.to_csv("grpo_training_rewards.csv", index=False)
log_df.tail()

In [None]:
import matplotlib.pyplot as plt

reward_cols = [c for c in log_df.columns if "reward" in c and "std" not in c]
plt.figure(figsize=(12,6))
for col in reward_cols:
    plt.plot(log_df["step"], log_df[col], label=col.replace("rewards / ", ""))
plt.xlabel("Training Step")
plt.ylabel("Reward")
plt.title("GRPO Training Reward Trace")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

<a name="Inference"></a>
### Inference
Now let's try the model we just trained! First, let's first try the model without any GRPO trained:

In [None]:
text = tokenizer.apply_chat_template([
    {"role" : "user", "content" : "How many r's are in strawberry?"},
], tokenize = False, add_generation_prompt = True)

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    [text],
    sampling_params = sampling_params,
    lora_request = None,
)[0].outputs[0].text

output

And now with the LoRA we just trained with GRPO - we first save the LoRA first!

In [None]:
model.save_lora("content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_saved_lora")

Now we load the LoRA and test:

In [None]:
text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : "How many r's are in strawberry?"},
], tokenize = False, add_generation_prompt = True)

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
    lora_request = model.load_lora("content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_saved_lora"),
)[0].outputs[0].text

output

Our reasoning model is much better - it's not always correct, since we only trained it for an hour or so - it'll be better if we extend the sequence length and train for longer!

###MMLU

In [None]:
import os, re, time, string
import pandas as pd
from vllm import SamplingParams
from google.colab import drive

drive.mount('/content/drive')
DATA_DIR = "/content/drive/MyDrive/ECE 661 FINAL PROJECT/mmlu_val"

In [None]:
def extract_xml_answer(output: str) -> str:
    matches = re.findall(r"<answer>\s*(.*?)\s*</answer>", output, re.DOTALL)
    if not matches:
        return ""
    raw = matches[-1].strip()
    tokens = raw.split()
    if not tokens:
        return ""
    word = tokens[0].strip(string.punctuation).upper()
    if word.lower() in {"the", "a", "an", "it", ""}:
        return ""
    return word

In [None]:
def evaluate_model_on_mmlu(model, tokenizer, data, lora_request=None, model_name="model", reasoning_format=True):
    import time
    import re
    import string
    from tqdm import tqdm
    from vllm import SamplingParams

    sampling_params = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=1024)

    system_prompt_reasoning = """Respond in the following format:
<reasoning>
Your reasoning here...
</reasoning>
<answer>
The final answer (A letter only, no explanation)
</answer>"""
    system_prompt_plain = """Respond in the following format:
<answer>
The final answer (A letter only, no explanation)
</answer>"""

    system_prompt = system_prompt_reasoning if reasoning_format else system_prompt_plain
    extract_answer = extract_xml_answer

    results = []
    correct = 0

    print(f"\n🔍 Running Evaluation: {model_name} | Reasoning = {reasoning_format}")
    for idx, (_, row) in enumerate(tqdm(data.iterrows(), total=len(data), desc=f"[{model_name}] Evaluating")):
        question = row["question"]
        choices = [row["A"], row["B"], row["C"], row["D"]]
        gold = row["answer"]
        subject = row.get("subject", "unknown")

        choice_text = "\n".join([f"{chr(65 + i)}. {c}" for i, c in enumerate(choices)])
        prompt_user = f"Question: {question}\n{choice_text}\nWhich one is correct?"

        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt_user}
        ]

        try:
            prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            prompt_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).input_ids[0]
            if len(prompt_ids) > 512:
                print(f"⚠️ Skipped Q#{idx} due to token overflow ({len(prompt_ids)} tokens)")
                continue

            output = model.fast_generate([prompt], sampling_params, lora_request=lora_request)[0].outputs[0].text
            time.sleep(0.05)
        except Exception as e:
            print(f"Generation failed for Q#{idx}: {e}")
            continue

        pred = extract_xml_answer(output)
        is_correct = pred == gold
        if is_correct:
            correct += 1

        reasoning = ""
        if reasoning_format:
            matches = re.findall(r"<reasoning>\s*(.*?)\s*</reasoning>", output, re.DOTALL)
            reasoning = matches[-1].strip() if matches else ""

        results.append({
            "subject": subject,
            "question": question,
            "A": row["A"],
            "B": row["B"],
            "C": row["C"],
            "D": row["D"],
            "answer": gold,
            "prediction": pred,
            "correct": is_correct,
            "model": model_name,
            "reasoning": reasoning,
            "raw_output": output.strip()
        })

        print(f"\n--- Q#{idx} ---")
        print(f"Question: {question}")
        for i, c in enumerate(choices):
            print(f"  {chr(65 + i)}. {c}")
        print(output.strip())
        print(f"Ground Truth: {gold} | Prediction: {pred} {'T' if is_correct else 'F'}")

    acc = correct / len(results)
    print(f"\n [{model_name}] Accuracy: {correct} / {len(results)} = {acc:.2%}")
    return results, acc



In [None]:
import os
print(os.listdir("/content/drive/MyDrive/ECE 661 FINAL PROJECT/mmlu_val"))
DATA_DIR = "/content/drive/MyDrive/ECE 661 FINAL PROJECT/mmlu_val"

def load_all_mmlu_csv(data_dir):
    all_data = []
    for fname in os.listdir(data_dir):
        if fname.endswith(".csv"):
            subject = fname.replace(".csv", "")
            df = pd.read_csv(os.path.join(data_dir, fname), header=None)
            df.columns = ["question", "A", "B", "C", "D", "answer"]
            df["subject"] = subject
            all_data.append(df)
    return pd.concat(all_data, ignore_index=True)

df_all = load_all_mmlu_csv(DATA_DIR)
df_all["choices"] = df_all[["A", "B", "C", "D"]].values.tolist()
df_all = df_all[["question", "choices", "answer", "subject"]]
print(df_all.head())
print(df_all.shape)

In [None]:
print(f"Total questions: {len(df_all)}")

In [None]:
# ===========================
# 🚀 模型推理对比（你需要预定义 model 和 tokenizer）
# ===========================
from tqdm import tqdm
import pandas as pd
base_plain, bplain = evaluate_model_on_mmlu(model, tokenizer, df_all, lora_request=None, model_name="Base_NoReason", reasoning_format=False)
base_reason, breason = evaluate_model_on_mmlu(model, tokenizer, df_all, lora_request=None, model_name="Base_Reason", reasoning_format=True)


In [None]:
print(breason)
print(bplain)

In [None]:
grpo_lora = model.load_lora("content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_saved_lora")
grpo_result,gres = evaluate_model_on_mmlu(model, tokenizer, df_all, lora_request=grpo_lora, model_name="GRPO", reasoning_format=True)

In [None]:
print(gres)

In [None]:
pd.DataFrame(base_plain).to_csv("/content/drive/MyDrive/ECE 661 FINAL PROJECT/base_no_reasoning_original.csv", index=False)
pd.DataFrame(base_reason).to_csv("/content/drive/MyDrive/ECE 661 FINAL PROJECT/base_with_reasoning_original.csv", index=False)

In [None]:
pd.DataFrame(grpo_result).to_csv("/content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_reasoning_original.csv", index=False)

print("✅ All outputs saved.")

[Base] Accuracy: 318 / 1339 = 23.75%
[GRPO] Accuracy: 620 / 1339 = 46.30%

MMLU-REDUX

In [None]:
from datasets import load_dataset
 # 默认加载所有子集
configs = [
    'abstract_algebra', 'anatomy', 'astronomy', 'business_ethics',
    'clinical_knowledge', 'college_biology', 'college_chemistry', 'college_computer_science',
    'college_mathematics', 'college_medicine', 'college_physics', 'computer_security',
    'conceptual_physics', 'econometrics', 'electrical_engineering', 'elementary_mathematics',
    'formal_logic', 'global_facts', 'high_school_biology', 'high_school_chemistry',
    'high_school_computer_science', 'high_school_european_history', 'high_school_geography',
    'high_school_government_and_politics', 'high_school_macroeconomics', 'high_school_mathematics',
    'high_school_microeconomics', 'high_school_physics', 'high_school_psychology',
    'high_school_statistics', 'high_school_us_history', 'high_school_world_history',
    'human_aging', 'human_sexuality', 'international_law', 'jurisprudence', 'logical_fallacies',
    'machine_learning', 'management', 'marketing', 'medical_genetics', 'miscellaneous',
    'moral_disputes', 'moral_scenarios', 'nutrition', 'philosophy', 'prehistory',
    'professional_accounting', 'professional_law', 'professional_medicine', 'professional_psychology',
    'public_relations', 'security_studies', 'sociology', 'us_foreign_policy', 'virology', 'world_religions'
]

def load_and_format_mmlu_redux(subjects):
    all_data = []
    for subject in subjects:
        ds = load_dataset("edinburgh-dawg/mmlu-redux-2.0", subject, split="test")
        rows = []
        for item in ds:
            question = item["question"]
            choices = item["choices"]
            answer = item["answer"]
            correct_answer = item.get("correct_answer", None)
            error_type = item.get("error_type", None)

            if error_type == "ok" and correct_answer is not None:
                if isinstance(correct_answer, int):
                    final_answer = chr(65 + correct_answer)
                else: continue
            else:
                if isinstance(answer, int):
                    final_answer = chr(65 + answer)
            if len(choices) == 4:
                rows.append({
                    "question": question,
                    "A": choices[0],
                    "B": choices[1],
                    "C": choices[2],
                    "D": choices[3],
                    "answer": final_answer,
                    "subject": subject
                })
        df = pd.DataFrame(rows)
        all_data.append(df)
    return pd.concat(all_data, ignore_index=True)

# 示例用法
df1 = load_and_format_mmlu_redux(configs)
print(df1.head())

In [None]:
print(df1.head())
print(df1.shape)

In [None]:
from tqdm import tqdm
import pandas as pd
base_plain, bplain = evaluate_model_on_mmlu(model, tokenizer, df1, lora_request=None, model_name="Base_NoReason", reasoning_format=False)
base_reason, breason = evaluate_model_on_mmlu(model, tokenizer, df1, lora_request=None, model_name="Base_Reason", reasoning_format=True)


In [None]:
grpo_lora = model.load_lora("content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_saved_lora")
grpo_result,gres = evaluate_model_on_mmlu(model, tokenizer, df1, lora_request=grpo_lora, model_name="GRPO", reasoning_format=True)

In [None]:
pd.DataFrame(base_plain).to_csv("/content/drive/MyDrive/ECE 661 FINAL PROJECT/base_no_reasoning_redux.csv", index=False)
pd.DataFrame(base_reason).to_csv("/content/drive/MyDrive/ECE 661 FINAL PROJECT/base_with_reasoning_redux.csv", index=False)
pd.DataFrame(grpo_result).to_csv("/content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_reasoning_redux.csv", index=False)
print("All outputs saved.")

In [None]:
print(breason)
print(bplain)
print(gres)

###SVAMP

In [None]:
import pandas as pd
import re
import time
from tqdm import tqdm
from vllm import SamplingParams

# def extract_answer_plain(output: str) -> str:
#     """用于 plain 模型，仅提取最后一个数字"""
#     numbers = re.findall(r"[-+]?\d*\.\d+|\d+", output)
#     return numbers[-1] if numbers else ""

def extract_answer_xml(output: str) -> str:
    """从 <answer> 标签中提取数字"""
    matches = re.findall(r"<answer>\s*(.*?)\s*</answer>", output, re.DOTALL)
    if not matches:
        return ""
    raw = matches[-1].strip()
    numbers = re.findall(r"[-+]?\d*\.\d+|\d+", raw)
    return numbers[-1] if numbers else ""

def evaluate_svamp(model, tokenizer, data, model_name="Base", lora_request=None, reasoning=True):
    results = []
    correct = 0
    sampling_params = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=512)

    if reasoning:
        system_prompt = """You are a helpful assistant that solves math word problems and answers using XML format.
<reasoning>
Your reasoning here step-by-step...
</reasoning>
<answer>
Final numeric answer only (no explanation).
</answer>"""
    else:
        system_prompt = """You are a helpful assistant. Answer the questions using XML format.
<answer>
Final numeric answer only (no explanation).
</answer>"""
    extract_fn = extract_answer_xml

    for idx, item in enumerate(tqdm(data, desc=f"[SVAMP:{model_name}]")):
        body = item.get("Body", "")
        question_part = item.get("Question", "")
        question = f"{body.strip()} {question_part.strip()}"

        gold = str(item["Answer"]).strip()

        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ]

        try:
            prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
            output = model.fast_generate([prompt], sampling_params, lora_request=lora_request)[0].outputs[0].text
            time.sleep(0.05)
        except Exception as e:
            print(f" Q#{idx} Error: {e}")
            continue

        pred = extract_fn(output)
        is_correct = pred == gold
        correct += int(is_correct)

        results.append({
            "id": idx,
            "question": question,
            "gold": gold,
            "prediction": pred,
            "correct": is_correct,
            "model": model_name,
            "raw_output": output.strip()
        })

        print(f"\n--- Q#{idx} ---")
        print(f"Q: {question}")
        print(output.strip())
        print(f"Ground Truth: {gold} | Prediction: {pred} {'Correct' if is_correct else 'False'}")

    acc = correct / len(results)
    acc_str = f"{model_name} Accuracy: {correct} / {len(results)} = {acc:.2%}"
    print("\n" + acc_str)

    df = pd.DataFrame(results)
    return df, acc_str


In [None]:
from datasets import load_dataset
svamp = load_dataset("ChilleD/SVAMP")["test"]


In [None]:
df_plain, acc_plain = evaluate_svamp(model, tokenizer, svamp, model_name="Base_NoReason", reasoning=False)
df_reason, acc_reason = evaluate_svamp(model, tokenizer, svamp, model_name="Base_Reason", reasoning=True)



In [None]:
grpo_lora = model.load_lora("content/drive/MyDrive/ECE 661 FINAL PROJECT/grpo_saved_lora")
df_grpo, acc_grpo = evaluate_svamp(model, tokenizer, svamp, model_name="GRPO", lora_request=grpo_lora, reasoning=True)

In [None]:
print(acc_plain)
print(acc_reason)
print(acc_grpo)

In [None]:
df_plain.to_csv("content/drive/MyDrive/ECE 661 FINAL PROJECT/svamp_base_noreason.csv", index=False)
df_reason.to_csv("content/drive/MyDrive/ECE 661 FINAL PROJECT/svamp_base_reason.csv", index=False)
df_grpo.to_csv("content/drive/MyDrive/ECE 661 FINAL PROJECT/svamp_grpo.csv", index=False)

<a name="Save"></a>
### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "",
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
