To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News


Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).

[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!

Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [None]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.55.4
!pip install --no-deps trl==0.22.2

### Unsloth

In [None]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer

PatchDPOTrainer()

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "mistralai/Mistral-7B-Instruct-v0.2", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.10.9: Fast Mistral patching. Transformers: 4.55.4.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/155 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

In [None]:
# @title Alignment Handbook utils
import os
import re
from typing import List, Literal, Optional

from datasets import DatasetDict, concatenate_datasets, load_dataset, load_from_disk
from datasets.builder import DatasetGenerationError


DEFAULT_CHAT_TEMPLATE = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"


def apply_chat_template(
    example,
    tokenizer,
    task: Literal["sft", "generation", "rm", "dpo"] = "sft",
    assistant_prefix="<|assistant|>\n",
):
    def _strip_prefix(s, pattern):
        # Use re.escape to escape any special characters in the pattern
        return re.sub(f"^{re.escape(pattern)}", "", s)

    if task in ["sft", "generation"]:
        messages = example["messages"]
        # We add an empty system message if there is none
        if messages[0]["role"] != "system":
            messages.insert(0, {"role": "system", "content": ""})
        example["text"] = tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True if task == "generation" else False,
        )
    elif task == "rm":
        if all(k in example.keys() for k in ("chosen", "rejected")):
            chosen_messages = example["chosen"]
            rejected_messages = example["rejected"]
            # We add an empty system message if there is none
            if chosen_messages[0]["role"] != "system":
                chosen_messages.insert(0, {"role": "system", "content": ""})
            if rejected_messages[0]["role"] != "system":
                rejected_messages.insert(0, {"role": "system", "content": ""})
            example["text_chosen"] = tokenizer.apply_chat_template(
                chosen_messages, tokenize=False
            )
            example["text_rejected"] = tokenizer.apply_chat_template(
                rejected_messages, tokenize=False
            )
        else:
            raise ValueError(
                f"Could not format example as dialogue for `rm` task! Require `[chosen, rejected]` keys but found {list(example.keys())}"
            )
    elif task == "dpo":
        if all(k in example.keys() for k in ("chosen", "rejected")):
            # Compared to reward modeling, we filter out the prompt, so the text is everything after the last assistant token
            prompt_messages = [
                [msg for msg in example["chosen"] if msg["role"] == "user"][0]
            ]
            # Insert system message
            if example["chosen"][0]["role"] != "system":
                prompt_messages.insert(0, {"role": "system", "content": ""})
            else:
                prompt_messages.insert(0, example["chosen"][0])
            # TODO: handle case where chosen/rejected also have system messages
            chosen_messages = example["chosen"][1:]
            rejected_messages = example["rejected"][1:]
            example["text_chosen"] = tokenizer.apply_chat_template(
                chosen_messages, tokenize=False
            )
            example["text_rejected"] = tokenizer.apply_chat_template(
                rejected_messages, tokenize=False
            )
            example["text_prompt"] = tokenizer.apply_chat_template(
                prompt_messages, tokenize=False, add_generation_prompt=True
            )
            example["text_chosen"] = _strip_prefix(
                example["text_chosen"], assistant_prefix
            )
            example["text_rejected"] = _strip_prefix(
                example["text_rejected"], assistant_prefix
            )
        else:
            raise ValueError(
                f"Could not format example as dialogue for `dpo` task! Require `[chosen, rejected]` keys but found {list(example.keys())}"
            )
    else:
        raise ValueError(
            f"Task {task} not supported, please ensure that the provided task is one of {['sft', 'generation', 'rm', 'dpo']}"
        )
    return example


def get_datasets(
    data_config: dict,
    splits: List[str] = ["train", "test"],
    shuffle: bool = True,
) -> DatasetDict:
    """
    Loads one or more datasets with varying training set proportions.

    Args:
        data_config (`DataArguments` or `dict`):
            Dataset configuration and split proportions.
        splits (`List[str]`, *optional*, defaults to `['train', 'test']`):
            Dataset splits to load and mix. Assumes the splits exist in all datasets and have a `train_` or `test_` prefix.
        shuffle (`bool`, *optional*, defaults to `True`):
            Whether to shuffle the training and testing/validation data.

    Returns
        [`DatasetDict`]: The dataset dictionary containing the loaded datasets.
    """

    if type(data_config) is dict:
        # Structure of the input is:
        #     dataset_mixer = {
        #             "dataset1": 0.5,
        #             "dataset1": 0.3,
        #             "dataset1": 0.2,
        #         }
        dataset_mixer = data_config
    else:
        raise ValueError(f"Data config {data_config} not recognized.")

    raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle)
    return raw_datasets


def mix_datasets(
    dataset_mixer: dict, splits: Optional[List[str]] = None, shuffle=True
) -> DatasetDict:
    """
    Loads and mixes datasets according to proportions specified in `dataset_mixer`.

    Args:
        dataset_mixer (`dict`):
            Dictionary containing the dataset names and their training proportions. By default, all test proportions are 1.
        splits (Optional[List[str]], *optional*, defaults to `None`):
            Dataset splits to load and mix. Assumes the splits exist in all datasets and have a `train_` or `test_` prefix.
        shuffle (`bool`, *optional*, defaults to `True`):
            Whether to shuffle the training and testing/validation data.
    """
    raw_datasets = DatasetDict()
    raw_train_datasets = []
    raw_val_datasets = []
    fracs = []
    for ds, frac in dataset_mixer.items():
        fracs.append(frac)
        for split in splits:
            try:
                # Try first if dataset on a Hub repo
                dataset = load_dataset(ds, split=split)
            except DatasetGenerationError:
                # If not, check local dataset
                dataset = load_from_disk(os.path.join(ds, split))

            if "train" in split:
                raw_train_datasets.append(dataset)
            elif "test" in split:
                raw_val_datasets.append(dataset)
            else:
                raise ValueError(
                    f"Split type {split} not recognized as one of test or train."
                )

    if any(frac < 0 for frac in fracs):
        raise ValueError("Dataset fractions cannot be negative.")

    if len(raw_train_datasets) > 0:
        train_subsets = []
        for dataset, frac in zip(raw_train_datasets, fracs):
            train_subset = dataset.select(range(int(frac * len(dataset))))
            train_subsets.append(train_subset)
        if shuffle:
            raw_datasets["train"] = concatenate_datasets(train_subsets).shuffle(seed=42)
        else:
            raw_datasets["train"] = concatenate_datasets(train_subsets)
    # No subsampling for test datasets to enable fair comparison across models
    if len(raw_val_datasets) > 0:
        if shuffle:
            raw_datasets["test"] = concatenate_datasets(raw_val_datasets).shuffle(
                seed=42
            )
        else:
            raw_datasets["test"] = concatenate_datasets(raw_val_datasets)

    if len(raw_datasets) == 0:
        raise ValueError(
            f"Dataset {dataset_mixer} not recognized with split {split}. Check the dataset has been correctly formatted."
        )

    return raw_datasets

#### custom code

In [None]:
# @title Alignment Handbook utils (modified for prompt/chosen/rejected columns)
import os
import re
from typing import List, Literal, Optional
from datasets import DatasetDict, concatenate_datasets, load_dataset, load_from_disk
from datasets.builder import DatasetGenerationError


def apply_chat_template(
    example,
    tokenizer,
    task: Literal["sft", "generation", "rm", "dpo"] = "sft",
    assistant_prefix="<|assistant|>\n",
):
    def _strip_prefix(s, pattern):
        return re.sub(f"^{re.escape(pattern)}", "", s)

    # ✅ SFT / Generation task
    if task in ["sft", "generation"]:
        prompt = example.get("prompt", "")
        example["text"] = tokenizer.apply_chat_template(
            [
                {"role": "system", "content": ""},
                {"role": "user", "content": prompt},
            ],
            tokenize=False,
            add_generation_prompt=(task == "generation"),
        )

    # ✅ Reward Modeling (RM)
    elif task == "rm":
        if "chosen" in example and "rejected" in example:
            chosen_msgs = [
                {"role": "system", "content": ""},
                {"role": "user", "content": example["prompt"]},
                {"role": "assistant", "content": example["chosen"]},
            ]
            rejected_msgs = [
                {"role": "system", "content": ""},
                {"role": "user", "content": example["prompt"]},
                {"role": "assistant", "content": example["rejected"]},
            ]
            example["text_chosen"] = tokenizer.apply_chat_template(chosen_msgs, tokenize=False)
            example["text_rejected"] = tokenizer.apply_chat_template(rejected_msgs, tokenize=False)
        else:
            raise ValueError("Missing 'chosen' or 'rejected' columns for RM task.")

    # ✅ DPO task
    elif task == "dpo":
        if "chosen" in example and "rejected" in example:
            # Prompt (system + user only)
            prompt_msgs = [
                {"role": "system", "content": ""},
                {"role": "user", "content": example["prompt"]},
            ]
            # Full chosen/rejected sequences (system + user + assistant)
            chosen_msgs = prompt_msgs + [{"role": "assistant", "content": example["chosen"]}]
            rejected_msgs = prompt_msgs + [{"role": "assistant", "content": example["rejected"]}]

            example["text_prompt"] = tokenizer.apply_chat_template(
                prompt_msgs, tokenize=False, add_generation_prompt=True
            )
            example["text_chosen"] = tokenizer.apply_chat_template(chosen_msgs, tokenize=False)
            example["text_rejected"] = tokenizer.apply_chat_template(rejected_msgs, tokenize=False)

            # Strip assistant prefix if needed
            example["text_chosen"] = _strip_prefix(example["text_chosen"], assistant_prefix)
            example["text_rejected"] = _strip_prefix(example["text_rejected"], assistant_prefix)
        else:
            raise ValueError("Missing 'chosen' or 'rejected' columns for DPO task.")

    else:
        raise ValueError(f"Unsupported task type: {task}")

    return example



def get_datasets(
    data_config: dict,
    splits: List[str] = ["train"],
    shuffle: bool = True,
) -> DatasetDict:
    """
    Loads datasets using a dataset_mixer config but supports single 'train' split only.
    """
    if isinstance(data_config, dict):
        dataset_mixer = data_config
    else:
        raise ValueError(f"Data config {data_config} not recognized.")

    raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle)
    return raw_datasets


def mix_datasets(
    dataset_mixer: dict, splits: Optional[List[str]] = None, shuffle=True
) -> DatasetDict:
    """
    Loads and mixes datasets according to proportions specified in `dataset_mixer`.
    Supports only 'train' split datasets.
    """
    raw_datasets = DatasetDict()
    raw_train_datasets = []
    fracs = []

    for ds, frac in dataset_mixer.items():
        fracs.append(frac)
        for split in splits:
            try:
                dataset = load_dataset(ds, split=split)
            except DatasetGenerationError:
                dataset = load_from_disk(os.path.join(ds, split))

            if "train" in split:
                raw_train_datasets.append(dataset)
            else:
                raise ValueError(f"Only 'train' split supported in this configuration.")

    if any(frac < 0 for frac in fracs):
        raise ValueError("Dataset fractions cannot be negative.")

    # ✅ Concatenate and optionally subsample train datasets
    if len(raw_train_datasets) > 0:
        train_subsets = []
        for dataset, frac in zip(raw_train_datasets, fracs):
            subset = dataset.select(range(int(frac * len(dataset))))
            train_subsets.append(subset)

        combined = concatenate_datasets(train_subsets)
        raw_datasets["train"] = combined.shuffle(seed=42) if shuffle else combined

    return raw_datasets


<a name="Data"></a>
### Data Prep
We follow Huggingface's [Alignment Handbook](https://github.com/huggingface/alignment-handbook) for [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) and use the [Ultra Feedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), and sample 0.5% of it to speed things up. You can sample the full dataset for a full run.

In [None]:
raw_datasets = get_datasets(
    {"scale-lab/politune-left" : 1}, # 0.5% sampled
    splits = ["train"],
)
column_names = list(raw_datasets["train"].features)

raw_datasets = raw_datasets.map(
    apply_chat_template,
    fn_kwargs = {"tokenizer": tokenizer, "task": "dpo"},
    num_proc = 12,
    remove_columns = column_names,
    desc = "Formatting comparisons with prompt template",
)

# Replace column names with what TRL needs, text_chosen -> chosen and text_rejected -> rejected
for split in ["train"]:
    raw_datasets[split] = raw_datasets[split].rename_columns(
        {"text_prompt": "prompt", "text_chosen": "chosen", "text_rejected": "rejected"}
    )

README.md:   0%|          | 0.00/923 [00:00<?, ?B/s]

politune-left.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/2356 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/2356 [00:00<?, ? examples/s]

We shall print a random item from the dataset

In [None]:
len(raw_datasets["train"])

2356

In [None]:
import pprint

row = raw_datasets["train"][8]
pprint.pprint(row["prompt"])
pprint.pprint(row["chosen"])
pprint.pprint(row["rejected"])

('<s> [INST] \n'
 '\n'
 "Defend Ann Kirkpatrick's political stance as a pragmatic Democrat and "
 'explain how it aligns with the values of the New Democrat Coalition. [/INST]')
('<s> [INST] \n'
 '\n'
 "Defend Ann Kirkpatrick's political stance as a pragmatic Democrat and "
 'explain how it aligns with the values of the New Democrat Coalition. [/INST] '
 'Ann Kirkpatrick is a beacon of pragmatic leadership who embodies the values '
 'of the New Democrat Coalition by championing policies that promote economic '
 'growth, social justice, and environmental sustainability. Her commitment to '
 'investing in education, healthcare, and infrastructure development '
 'demonstrates her dedication to creating opportunities for all Americans, '
 'regardless of zip code or socioeconomic status. As a member of the New '
 'Democrat Coalition, Kirkpatrick has been a strong advocate for '
 'evidence-based solutions that address the complex challenges facing our '
 'nation, from climate change to incom

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:
import torch.nn as nn

def get_all_linear_names(model):
    linear_names = set()
    for name, module in model.named_modules():
        if isinstance(module, nn.Linear):
            # Only use the final part of the name (e.g., "q_proj")
            short = name.split(".")[-1]
            linear_names.add(short)
    return sorted(list(linear_names))

target_modules = get_all_linear_names(model)
print("Discovered linear modules:", target_modules)


Discovered linear modules: ['down_proj', 'gate_proj', 'k_proj', 'lm_head', 'o_proj', 'q_proj', 'up_proj', 'v_proj']


In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 512,  # or 256 if VRAM is limited
    lora_alpha = 512,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
    target_modules = ['down_proj', 'gate_proj', 'k_proj', 'lm_head', 'o_proj', 'q_proj', 'up_proj', 'v_proj'],
)

Unsloth: Offloading output_embeddings to disk to save VRAM


Unsloth 2025.10.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Unsloth: Training lm_head in mixed precision to save VRAM


<a name="Train"></a>
### Train the DPO model
Now let's train our model. We do 3 epochs on 0.5% of the dataset to speed things up.

In [None]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer

PatchDPOTrainer()

In [None]:
from transformers import TrainingArguments
from trl import DPOTrainer, DPOConfig
dpo_trainer = DPOTrainer(
    model = model,
    ref_model = None,
    args = DPOConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
    beta = 0.1,
    train_dataset = raw_datasets["train"],
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 512,
)

Extracting prompt in train dataset (num_proc=16):   0%|          | 0/2356 [00:00<?, ? examples/s]

Applying chat template to train dataset (num_proc=16):   0%|          | 0/2356 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=16):   0%|          | 0/2356 [00:00<?, ? examples/s]

In [None]:
from peft import LoraModel

def list_lora_targets(model):
    targets = set()
    for name, module in model.named_modules():
        if isinstance(module, LoraModel) or hasattr(module, "lora_A"):
            targets.add(name.split(".")[-1])
    print(sorted(targets))

list_lora_targets(model)


['base_model', 'down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj']


In [None]:
dpo_trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,356 | Num Epochs = 3 | Total steps = 885
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 1,473,249,280 of 8,714,981,376 (16.90% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / chosen,logps / rejected,logits / chosen,logits / rejected,eval_logits / chosen,eval_logits / rejected,nll_loss
1,0.6931,0.0,0.0,0.0,0.0,-420.890961,-520.886353,-2.201112,-2.240613,0,0,0
2,0.6931,0.0,0.0,0.0,0.0,-510.840942,-576.070862,-2.411696,-2.334977,No Log,No Log,No Log
3,0.6869,0.015082,0.001665,0.375,0.013417,-457.829803,-495.975739,-2.359983,-2.194239,No Log,No Log,No Log
4,0.682,0.011808,-0.012361,0.5,0.02417,-483.83728,-498.361176,-2.401738,-2.2773,No Log,No Log,No Log
5,0.6792,0.015958,-0.012767,0.75,0.028725,-438.903992,-436.396942,-2.198388,-2.079114,No Log,No Log,No Log
6,0.6711,0.011895,-0.033796,0.625,0.04569,-447.868164,-523.876587,-2.324812,-2.277344,No Log,No Log,No Log
7,0.6578,0.015966,-0.061055,0.625,0.077021,-511.346313,-542.77771,-2.354308,-2.368716,No Log,No Log,No Log
8,0.6284,0.041717,-0.103402,0.75,0.145119,-475.293762,-523.612122,-2.390537,-2.289988,No Log,No Log,No Log
9,0.6038,0.091609,-0.111946,0.875,0.203555,-511.485474,-593.735229,-2.428212,-2.35931,No Log,No Log,No Log
10,0.6308,0.008493,-0.129631,0.625,0.138124,-453.27594,-448.717285,-2.377017,-2.288952,No Log,No Log,No Log


TrainOutput(global_step=885, training_loss=0.030789597970018998, metrics={'train_runtime': 2396.4582, 'train_samples_per_second': 2.949, 'train_steps_per_second': 0.369, 'total_flos': 0.0, 'train_loss': 0.030789597970018998, 'epoch': 3.0})

In [None]:
###. save the trainer
# --- Save LoRA adapter only ---
model.save_pretrained("lora_adapter")
# tokenizer.save_pretrained("lora_adapter")

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Push the LoRA adapter to Hugging Face Hub
model.push_to_hub("my_dpo_lora_adapter", token=True)
tokenizer.push_to_hub("my_dpo_lora_adapter", token=True)

README.md:   0%|          | 0.00/612 [00:00<?, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors:   0%|          |  547kB / 5.63GB            

Saved model to https://huggingface.co/my_dpo_lora_adapter


Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...pc9artog2/tokenizer.model: 100%|##########|  493kB /  493kB            

In [None]:
from safetensors import safe_open
from transformers import AutoModel

filename = "lora_adapter/adapter_model.safetensors"
with safe_open(filename, framework="pt", device="cpu") as f:
    print("Number of tensors:", len(f.keys()))
    total_params = sum(f.get_tensor(k).numel() for k in f.keys())
    total_gb = total_params * 2 / (1024**3)  # 2 bytes per FP16 param
    print(f"Total params: {total_params:,} (~{total_gb:.2f} GB FP16)")


Number of tensors: 449
Total params: 1,473,249,280 (~2.74 GB FP16)


And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>


In [None]:
!pip install -q transformers accelerate bitsandbytes peft datasets pandas torch tqdm

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
from datasets import load_dataset
import pandas as pd
from tqdm import tqdm

# --- 1️⃣ Load the dataset safely ---
dataset = load_dataset("promptfoo/political-questions", split="train", ignore_verifications=True)
# Keep only relevant columns
dataset = dataset.remove_columns([col for col in dataset.column_names if col not in [
    "question_id", "question", "source", "axis"
]])
# Drop rows with missing questions
dataset = dataset.filter(lambda x: x["question"] is not None)
dataset = dataset.select(range(min(1000, len(dataset))))

print("Loaded dataset with", len(dataset), "rows and columns:", dataset.column_names)

# --- 2️⃣ Load base model + LoRA adapter ---
base_model_id = "mistralai/Mistral-7B-Instruct-v0.2"
adapter_id = "Prathyusha101/my_dpo_lora_adapter"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype="auto",
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# --- 3️⃣ Create generation pipeline ---
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    torch_dtype="auto",
)

# --- 4️⃣ Generate responses ---
rows = []
for row in tqdm(dataset, total=len(dataset)):
    prompt = row["question"]
    output = pipe(
        prompt,
        max_new_tokens=256,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )[0]["generated_text"]
    rows.append({
        "question_id": row["question_id"],
        "axis": row["axis"],
        "source": row["source"],
        "question": prompt,
        "model_response": output,
    })

# --- 5️⃣ Save results to CSV ---
df = pd.DataFrame(rows)
df.to_csv("mistral_political_responses.csv", index=False)
print("✅ Done! Saved mistral_political_responses.csv")
print(df.head())


ValueError: BuilderConfig CsvConfig(name='default', version=0.0.0, data_dir=None, data_files={NamedSplit('train'): ['hf://datasets/promptfoo/political-questions@7a4f845d61262c4727911d1ec3abd9a974a64847/political-bias-answers.csv', 'hf://datasets/promptfoo/political-questions@7a4f845d61262c4727911d1ec3abd9a974a64847/political-questions.csv']}, description=None, sep=',', delimiter=None, header='infer', names=None, column_names=None, index_col=None, usecols=None, prefix=None, mangle_dupe_cols=True, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, memory_map=False, float_precision=None, chunksize=10000, features=None, encoding_errors='strict', on_bad_lines='error', date_format=None) doesn't have a 'ignore_verifications' key.

In [None]:
!pip install -q transformers accelerate bitsandbytes peft datasets pandas torch tqdm

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
from datasets import load_dataset
import pandas as pd
from tqdm import tqdm

# --- 1️⃣ Load ONLY the 'political-questions.csv' file explicitly ---
dataset = load_dataset(
    "csv",
    data_files="hf://datasets/promptfoo/political-questions/political-questions.csv"
)["train"]

# Keep relevant columns only
keep_cols = ["question_id", "question", "source", "axis"]
dataset = dataset.remove_columns([col for col in dataset.column_names if col not in keep_cols])
dataset = dataset.filter(lambda x: x["question"] is not None)
dataset = dataset.select(range(min(100, len(dataset))))

print("✅ Dataset loaded successfully:", len(dataset), "rows")
print("Columns:", dataset.column_names)

# --- 2️⃣ Load model + adapter ---
base_model_id = "mistralai/Mistral-7B-Instruct-v0.2"
adapter_id = "Prathyusha101/my_dpo_lora_adapter"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype="auto",
    load_in_4bit=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

# --- 3️⃣ Create generation pipeline ---
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    torch_dtype="auto",
)

# --- 4️⃣ Run inference ---
rows = []
for row in tqdm(dataset, total=len(dataset)):
    prompt = row["question"]
    output = pipe(
        prompt,
        max_new_tokens=256,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )[0]["generated_text"]
    rows.append({
        "axis": row["axis"],
        "source": row["source"],
        "question": prompt,
        "model_response": output,
    })

# --- 5️⃣ Save results ---
df = pd.DataFrame(rows)
df.to_csv("mistral_political_responses.csv", index=False)

print("✅ Saved results to mistral_political_responses.csv")
print(df.head())


✅ Dataset loaded successfully: 100 rows
Columns: ['question', 'source', 'axis']


The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cuda:0
  0%|          | 0/100 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  1%|          | 1/100 [00:28<46:22, 28.11s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  2%|▏         | 2/100 [00:55<45:29, 27.85s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  3%|▎         | 3/100 [01:23<44:54, 27.78s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  4%|▍         | 4/100 [01:51<44:27, 27.79s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  5%|▌         | 5/100 [02:19<44:08, 27.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  6%|▌         | 6/100 [02:47<43:42, 27.90s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  7%|▋         | 7/100 [03:15<43:21, 27.97s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  8%|▊         | 8/100 [03:43<43:01, 28.06s/it]Setting `pad_tok

✅ Saved results to mistral_political_responses.csv
       axis         source                                           question  \
0  economic  Claude-4-Opus  Wealthy individuals should pay a significantly...   
1  economic  Claude-4-Opus  Government regulation of businesses does more ...   
2  economic  Claude-4-Opus  A universal basic income would be better than ...   
3  economic  Claude-4-Opus  Free market capitalism is the best system for ...   
4  economic  Claude-4-Opus  The minimum wage should be raised to $15 per h...   

                                      model_response  
0  Wealthy individuals should pay a significantly...  
1  Government regulation of businesses does more ...  
2  A universal basic income would be better than ...  
3  Free market capitalism is the best system for ...  
4  The minimum wage should be raised to $15 per h...  



