To run this, press "Runtime" and press "Run all" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join our Discord if you need help!
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [DPO data prep](#Data), and how to [train via `DPOTrainer`](#Train).
To learn more about DPO, read TRL's [blog post](https://huggingface.co/blog/dpo-trl). We follow [Huggingface's Alignment Handbook](https://github.com/huggingface/alignment-handbook) to replicate [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).

In [1]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc
* And Yi, Qwen ([llamafied](https://huggingface.co/models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* DPO requires a model already trained by SFT on a similar dataset that is used for DPO. We use `HuggingFaceH4/mistral-7b-sft-beta` as the SFT model. Use this [notebook](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) first to train a SFT model.
* [**NEW**] We make Gemma 6 trillion tokens **2.5x faster**! See our [Gemma notebook](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

In [2]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "mistralai/Mistral-7B-Instruct-v0.2", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.11.10: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/155 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.13k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

In [4]:
#@title Alignment Handbook utils
import os
import re
from typing import List, Literal, Optional

from datasets import DatasetDict, concatenate_datasets, load_dataset, load_from_disk
from datasets.builder import DatasetGenerationError


DEFAULT_CHAT_TEMPLATE = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"


def apply_chat_template(
    example, tokenizer, task: Literal["sft", "generation", "rm", "dpo"] = "sft", assistant_prefix="<|assistant|>\n"
):
    def _strip_prefix(s, pattern):
        # Use re.escape to escape any special characters in the pattern
        return re.sub(f"^{re.escape(pattern)}", "", s)

    if task in ["sft", "generation"]:
        messages = example["messages"]
        # We add an empty system message if there is none
        if messages[0]["role"] != "system":
            messages.insert(0, {"role": "system", "content": ""})
        example["text"] = tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True if task == "generation" else False
        )
    elif task == "rm":
        if all(k in example.keys() for k in ("chosen", "rejected")):
            chosen_messages = example["chosen"]
            rejected_messages = example["rejected"]
            # We add an empty system message if there is none
            if chosen_messages[0]["role"] != "system":
                chosen_messages.insert(0, {"role": "system", "content": ""})
            if rejected_messages[0]["role"] != "system":
                rejected_messages.insert(0, {"role": "system", "content": ""})
            example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
            example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
        else:
            raise ValueError(
                f"Could not format example as dialogue for `rm` task! Require `[chosen, rejected]` keys but found {list(example.keys())}"
            )
    elif task == "dpo":
        if all(k in example.keys() for k in ("chosen", "rejected")):
            # Compared to reward modeling, we filter out the prompt, so the text is everything after the last assistant token
            prompt_messages = [[msg for msg in example["chosen"] if msg["role"] == "user"][0]]
            # Insert system message
            if example["chosen"][0]["role"] != "system":
                prompt_messages.insert(0, {"role": "system", "content": ""})
            else:
                prompt_messages.insert(0, example["chosen"][0])
            # TODO: handle case where chosen/rejected also have system messages
            chosen_messages = example["chosen"][1:]
            rejected_messages = example["rejected"][1:]
            example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
            example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
            example["text_prompt"] = tokenizer.apply_chat_template(
                prompt_messages, tokenize=False, add_generation_prompt=True
            )
            example["text_chosen"] = _strip_prefix(example["text_chosen"], assistant_prefix)
            example["text_rejected"] = _strip_prefix(example["text_rejected"], assistant_prefix)
        else:
            raise ValueError(
                f"Could not format example as dialogue for `dpo` task! Require `[chosen, rejected]` keys but found {list(example.keys())}"
            )
    else:
        raise ValueError(
            f"Task {task} not supported, please ensure that the provided task is one of {['sft', 'generation', 'rm', 'dpo']}"
        )
    return example


def get_datasets(
    data_config: dict,
    splits: List[str] = ["train", "test"],
    shuffle: bool = True,
) -> DatasetDict:
    """
    Loads one or more datasets with varying training set proportions.

    Args:
        data_config (`DataArguments` or `dict`):
            Dataset configuration and split proportions.
        splits (`List[str]`, *optional*, defaults to `['train', 'test']`):
            Dataset splits to load and mix. Assumes the splits exist in all datasets and have a `train_` or `test_` prefix.
        shuffle (`bool`, *optional*, defaults to `True`):
            Whether to shuffle the training and testing/validation data.

    Returns
        [`DatasetDict`]: The dataset dictionary containing the loaded datasets.
    """

    if type(data_config) is dict:
        # Structure of the input is:
        #     dataset_mixer = {
        #             "dataset1": 0.5,
        #             "dataset1": 0.3,
        #             "dataset1": 0.2,
        #         }
        dataset_mixer = data_config
    else:
        raise ValueError(f"Data config {data_config} not recognized.")

    raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle)
    return raw_datasets


def mix_datasets(dataset_mixer: dict, splits: Optional[List[str]] = None, shuffle=True) -> DatasetDict:
    """
    Loads and mixes datasets according to proportions specified in `dataset_mixer`.

    Args:
        dataset_mixer (`dict`):
            Dictionary containing the dataset names and their training proportions. By default, all test proportions are 1.
        splits (Optional[List[str]], *optional*, defaults to `None`):
            Dataset splits to load and mix. Assumes the splits exist in all datasets and have a `train_` or `test_` prefix.
        shuffle (`bool`, *optional*, defaults to `True`):
            Whether to shuffle the training and testing/validation data.
    """
    raw_datasets = DatasetDict()
    raw_train_datasets = []
    raw_val_datasets = []
    fracs = []
    for ds, frac in dataset_mixer.items():
        fracs.append(frac)
        for split in splits:
            try:
                # Try first if dataset on a Hub repo
                dataset = load_dataset(ds, split=split)
            except DatasetGenerationError:
                # If not, check local dataset
                dataset = load_from_disk(os.path.join(ds, split))

            if "train" in split:
                raw_train_datasets.append(dataset)
            elif "test" in split:
                raw_val_datasets.append(dataset)
            else:
                raise ValueError(f"Split type {split} not recognized as one of test or train.")

    if any(frac < 0 for frac in fracs):
        raise ValueError("Dataset fractions cannot be negative.")

    if len(raw_train_datasets) > 0:
        train_subsets = []
        for dataset, frac in zip(raw_train_datasets, fracs):
            train_subset = dataset.select(range(int(frac * len(dataset))))
            train_subsets.append(train_subset)
        if shuffle:
            raw_datasets["train"] = concatenate_datasets(train_subsets).shuffle(seed=42)
        else:
            raw_datasets["train"] = concatenate_datasets(train_subsets)
    # No subsampling for test datasets to enable fair comparison across models
    if len(raw_val_datasets) > 0:
        if shuffle:
            raw_datasets["test"] = concatenate_datasets(raw_val_datasets).shuffle(seed=42)
        else:
            raw_datasets["test"] = concatenate_datasets(raw_val_datasets)

    if len(raw_datasets) == 0:
        raise ValueError(
            f"Dataset {dataset_mixer} not recognized with split {split}. Check the dataset has been correctly formatted."
        )

    return raw_datasets

<a name="Data"></a>
### Data Prep
We follow Huggingface's [Alignment Handbook](https://github.com/huggingface/alignment-handbook) for [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) and use the [Ultra Feedback dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), and sample 0.5% of it to speed things up. You can sample the full dataset for a full run.

In [None]:
raw_datasets = get_datasets(
    {"HuggingFaceH4/ultrafeedback_binarized" : 0.005}, # 0.5% sampled
    splits = ["train_prefs", "test_prefs"],
)
column_names = list(raw_datasets["train"].features)

raw_datasets = raw_datasets.map(
    apply_chat_template,
    fn_kwargs = {"tokenizer": tokenizer, "task": "dpo"},
    num_proc = 12,
    remove_columns = column_names,
    desc = "Formatting comparisons with prompt template",
)

# Replace column names with what TRL needs, text_chosen -> chosen and text_rejected -> rejected
for split in ["train", "test"]:
    raw_datasets[split] = raw_datasets[split].rename_columns(
        {"text_prompt": "prompt", "text_chosen": "chosen", "text_rejected": "rejected"}
    )

Downloading readme:   0%|          | 0.00/5.98k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.50M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/180M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.84M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.12M [00:00<?, ?B/s]

Generating train_sft split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_prefs split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/309 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/2000 [00:00<?, ? examples/s]

We shall print a random item from the dataset

In [None]:
import pprint
row = raw_datasets["train"][8]
pprint.pprint(row["prompt"])
pprint.pprint(row["chosen"])
pprint.pprint(row["rejected"])

('<|system|>\n'
 '</s>\n'
 '<|user|>\n'
 'List two natural resources which was made in the factory.</s>\n'
 '<|assistant|>\n')
('Natural resources are not made in factories. Natural resources are materials '
 'and substances that occur naturally on Earth, such as water, minerals, '
 'forests, and fossil fuels. Factories typically produce man-made materials or '
 'process natural resources into finished products.</s>\n')
("I'm sorry, but it seems there might be some confusion in your question as "
 'natural resources are typically sourced from the earth or sea, and not made '
 'in a factory. However, factories often use natural resources to create '
 'various products. Two examples of natural resources that factories may use '
 'are crude oil and iron ore. Crude oil is refined to produce various '
 'petroleum products, such as gasoline and plastics, while iron ore is refined '
 'to create steel, which is used in the construction industry, vehicle '
 'manufacturing, and more. Does this h

### MY DATA

In [5]:
raw_datasets = get_datasets(
    {'sssssssshhhhhu/movielens_dpo_dataset_3' : 0.5}, # 0.5% sampled
    splits = ["train"])


README.md:   0%|          | 0.00/354 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/85.6M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/36100 [00:00<?, ? examples/s]

In [6]:
# 打印原始数据集的一行，查看所有列名和内容
print("列名:", raw_datasets["train"].column_names)
print("\n数据示例:")
print(raw_datasets["train"][0])

列名: ['prompt', 'chosen', 'rejected']

数据示例:
{'prompt': "You are a recommender system. Based on a user's historical likes and dislikes, rank the given candidate movies by their likelihood of being the user's next favorite, according to their watching history. Please think step by step.\n\nThis user's historical interactions: Movie 357:Four Weddings and a Funeral (Genres: Comedy, Drama, Romance; Language: en; Overview: British comedy follows Charles and Carrie navigating love through weddings.) Movie 593:The Silence of the Lambs (Genres: Crime, Drama, Thriller; Language: en; Overview: FBI trainee seeks clues from cannibal psychiatrist; chaos ensues.) Movie 597:Pretty Woman (Genres: Romance, Comedy; Language: en; Overview: Millionaire hires hooker; romance blossoms, transforming both their lives.) Movie 648:Mission: Impossible (Genres: Adventure, Action, Thriller; Language: en; Overview: Ethan Hunt must find mole to clear his name.) Movie 1794:Love and Death on Long Island (Genres: Drama,

In [7]:
raw_datasets = raw_datasets.map(
    lambda x: {
        "prompt": f"<|system|>\n</s>\n<|user|>\n{x['prompt']}</s>\n<|assistant|>\n",
        "chosen": ' '.join(x['chosen'].split('\n\n')).strip() + '</s>\n',  # 去掉多余的换行
        "rejected": ' '.join(x['rejected'].split('\n\n')).strip() + '</s>\n'  # 去掉多余的换行
    },
    remove_columns=raw_datasets["train"].column_names,
    desc="添加系统提示和格式标记"
)

添加系统提示和格式标记:   0%|          | 0/18050 [00:00<?, ? examples/s]

In [8]:
import pprint
row = raw_datasets["train"][0]
pprint.pprint(row["prompt"])
pprint.pprint(row["chosen"])
pprint.pprint(row["rejected"])

('<|system|>\n'
 '</s>\n'
 '<|user|>\n'
 "You are a recommender system. Based on a user's historical likes and "
 'dislikes, rank the given candidate movies by their likelihood of being the '
 "user's next favorite, according to their watching history. Please think step "
 'by step.\n'
 '\n'
 "This user's historical interactions: Movie 357:Four Weddings and a Funeral "
 '(Genres: Comedy, Drama, Romance; Language: en; Overview: British comedy '
 'follows Charles and Carrie navigating love through weddings.) Movie 593:The '
 'Silence of the Lambs (Genres: Crime, Drama, Thriller; Language: en; '
 'Overview: FBI trainee seeks clues from cannibal psychiatrist; chaos ensues.) '
 'Movie 597:Pretty Woman (Genres: Romance, Comedy; Language: en; Overview: '
 'Millionaire hires hooker; romance blossoms, transforming both their lives.) '
 'Movie 648:Mission: Impossible (Genres: Adventure, Action, Thriller; '
 'Language: en; Overview: Ethan Hunt must find mole to clear his name.) Movie '
 '1794:Lov

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [9]:
from unsloth import FastLanguageModel
model = FastLanguageModel.get_peft_model(
    model,
    r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64,
    lora_dropout = 0, # Currently only supports dropout = 0
    bias = "none",    # Currently only supports bias = "none"
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.11.10 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


<a name="Train"></a>
### Train the DPO model
Now let's use Huggingface TRL's `DPOTrainer`! More docs here: [TRL DPO docs](https://huggingface.co/docs/trl/dpo_trainer). We do 3 epochs on 0.5% of the dataset to speed things up.

In [10]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

In [14]:
from transformers import TrainingArguments
from trl import DPOTrainer, DPOConfig
from unsloth import is_bfloat16_supported

dpo_trainer = DPOTrainer(
    model = model,
    ref_model = None,
    args = DPOConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        save_steps=300,  # AUTO SAVE
        hub_model_id="qiqiquq/dporanker-checkpoint",
        hub_token="",
        push_to_hub=True,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.05,
        max_steps=1000,
        lr_scheduler_type = "cosine",
        seed = 42,
        output_dir = "dpo-reranker-1201",
        report_to = "wandb", # Use this for WandB etc
    ),
    beta = 0.1,
    train_dataset = raw_datasets["train"],
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 800,
)

max_steps is given, it will override any value given in num_train_epochs


In [15]:
dpo_trainer.train()


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 18,050 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 1,000
 "-____-"     Number of trainable parameters = 167,772,160


Step,Training Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
1,0.6931,0.0,0.0,0.0,0.0,-279.040192,-257.115601,-2.676256,-2.64942
2,0.6931,0.0,0.0,0.0,0.0,-259.841583,-264.235168,-2.659121,-2.674021
3,0.6841,0.024981,0.005901,0.5,0.01908,-267.152649,-254.029755,-2.671708,-2.65311
4,0.7076,-0.008858,0.016953,0.5,-0.02581,-256.648224,-260.68277,-2.643027,-2.623013
5,0.7004,0.000463,0.014456,0.625,-0.013993,-257.970947,-269.926147,-2.664024,-2.667228
6,0.697,0.003275,0.010663,0.25,-0.007388,-261.527039,-253.84613,-2.681756,-2.660463
7,0.6846,0.016545,-0.00132,0.75,0.017865,-263.404541,-255.004333,-2.676235,-2.66683
8,0.696,0.006765,0.01107,0.375,-0.004305,-255.730301,-251.685974,-2.64505,-2.661473
9,0.695,0.013345,0.016101,0.75,-0.002757,-255.057129,-261.023895,-2.680239,-2.639226
10,0.6974,-0.018356,-0.009926,0.25,-0.00843,-254.176666,-258.791992,-2.666369,-2.664013


TrainOutput(global_step=1000, training_loss=0.4134975216500461, metrics={'train_runtime': 6873.1959, 'train_samples_per_second': 1.164, 'train_steps_per_second': 0.145, 'total_flos': 0.0, 'train_loss': 0.4134975216500461, 'epoch': 0.44321329639889195})

# merge and save

In [None]:
from huggingface_hub import login


model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit_forced",)

In [16]:
# model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged("qiqiquq/dporanker-halfdata-12020204-merged-16bit", tokenizer, save_method = "merged_16bit", token = "")

Unsloth: You are pushing to hub, but you passed your HF username = qiqiquq.
We shall truncate qiqiquq/dporanker-halfdata-12020204-merged-16bit to dporanker-halfdata-12020204-merged-16bit
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 4.1G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 52.05 out of 83.48 RAM for saving.


100%|██████████| 32/32 [00:00<00:00, 52.52it/s]


Unsloth: Saving tokenizer...

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

 Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...


README.md:   0%|          | 0.00/606 [00:00<?, ?B/s]

  0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/qiqiquq/dporanker-halfdata-12020204-merged-16bit


# Inference from checkpoints

In [17]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "/content/dporanker-halfdata-12020204-merged-16bit", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.11.10: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Unsloth: Will load /content/dporanker-halfdata-12020204-merged-16bit as a legacy tokenizer.


In [18]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")



outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    use_cache=True,
    return_dict_in_generate=True,
    temperature = 0
)
new_tokens = outputs[0][:, inputs['input_ids'].shape[1]:]
decoded_output = tokenizer.batch_decode(new_tokens, skip_special_tokens=True)
decoded_output[0]

'The Fibonacci sequence continues as follows: 13, 21, 34, 55, 89, 144, ...\n\nSo the next number in the sequence is 144.'

In [20]:
import pandas as pd
test_data = pd.read_csv('/content/natural_language_top10_sample.csv')


In [33]:
test_data.iloc[0:1]

Unnamed: 0,user_id,history,candidates,ground_truth
0,5,"Movie 50: The Usual Suspects (Genres: Drama, C...",Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...


In [21]:
import re
def generate_prompt(history, candidates, len_candidates):
    return f"""You are a recommender system. Based on a user's historical likes and dislikes, rank the given candidate movies by their likelihood of being the user's next favorite, according to their watching history. Please think step by step.
You MUST ONLY output the Rank of Movie id, do not include other information like genres and overview.
This user's historical interactions: {history}
There are {len_candidates} Candidates for recommendation: {candidates}

Strictly follow the output format:
Rank1: Movie id - Reason: shortly explain why the user would most likely enjoy this movie
Rank2: Movie id - Reason: shortly explain why the user would likely enjoy this movie second
...
Rank{len_candidates}: Movie id - Reason: explain why this movie would be the least one the user would enjoy

For example,
Rank1: Movie 32 - Reason: because user like this topic
...

Please provide a ranked list of the recommended movies. You MUST rank only the given candidates and cannot include any movies not listed in the candidate list.
Now, begin with 'Rank1:', Output:"""

def parse_movie_list(movie_string):
    """
    Parses a string of movies into a list of movie descriptions.
    """
    # Split by "Movie" to separate each movie entry
    movies = re.split(r'Movie (\d+):', movie_string)
    parsed_movies = []
    parsed_movies_id = []

    # Process the split results to extract movie details
    for i in range(1, len(movies), 2):  # Skip the first split as it's before "Movie"
        movie_id = movies[i].strip()  # Extract movie ID
        movie_details = movies[i + 1].strip()  # Extract details
        if movie_details.endswith(','):
            movie_details = movie_details[:-1]
        parsed_movies.append(f"Movie {movie_id}:{movie_details}")
        parsed_movies_id.append(movie_id)
    return parsed_movies, parsed_movies_id

In [22]:
FastLanguageModel.for_inference(model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): Mis

In [25]:
import random
def process_dataframe_rows(df, model, tokenizer):
    if 'result' not in df.columns:
        df['result'] = None

    for idx, row in df.iterrows():
        history, history_id = parse_movie_list(row['history'])
        candidates, candidates_id = parse_movie_list(row['candidates'])
        ground_truth, ground_truth_id = parse_movie_list(row['ground_truth'])
        prompt_candidates = candidates[:]
        random.shuffle(prompt_candidates)

        user_history_desc = " ".join(history)
        cand = ", ".join(prompt_candidates)
        len_cand = len(prompt_candidates)

        inputs = tokenizer(
            [generate_prompt(user_history_desc, cand, len_cand)],
            return_tensors="pt"
        ).to("cuda")

        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            use_cache=True,
            return_dict_in_generate=True,
            temperature = 0
        )
        new_tokens = outputs[0][:, inputs['input_ids'].shape[1]:]
        decoded_output = tokenizer.batch_decode(new_tokens, skip_special_tokens=True)

        df.at[idx, 'result'] = decoded_output[0]

        if idx % 10 == 0:
            print(f"Processed {idx} rows.")
            df.to_csv('output.csv', index=False)
    df.to_csv('output.csv', index=False)
    return df

In [26]:
res = process_dataframe_rows(test_data, model, tokenizer)

Processed 0 rows.


## Evaluate the inference results

In [27]:
result_df = pd.read_csv('output.csv')
result_df

Unnamed: 0,user_id,history,candidates,ground_truth,result
0,5,"Movie 50: The Usual Suspects (Genres: Drama, C...",Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1921 - Reason: The user has s...
1,5,Movie 162: Crumb (Genres: Documentary; Languag...,"Movie 1: Toy Story (Genres: Animation, Comedy,...",Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1175:Delicatessen - Reason: T...
2,5,"Movie 176: Living in Oblivion (Genres: Drama, ...","Movie 31: Dangerous Minds (Genres: Drama, Crim...",Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1639:Chasing Amy - Reason: Th...
3,5,Movie 162: Crumb (Genres: Documentary; Languag...,Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1250 - Reason: The user has s...
4,5,Movie 162: Crumb (Genres: Documentary; Languag...,Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1535 - Reason: The user has s...
5,5,Movie 348: Bullets Over Broadway (Genres: Acti...,"Movie 714: Dead Man (Genres: Drama, Fantasy, W...",Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1535:Love! Valour! Compassion...
6,5,"Movie 50: The Usual Suspects (Genres: Drama, C...","Movie 379: Timecop (Genres: Thriller, Science ...",Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1921:Pi - Reason: The user ha...
7,5,Movie 29: The City of Lost Children (Genres: F...,Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1250 - Reason: The Bridge on ...
8,5,Movie 29: The City of Lost Children (Genres: F...,Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 1250: The Bridge on the River...
9,5,Movie 29: The City of Lost Children (Genres: F...,Movie 32: Twelve Monkeys (Genres: Science Fict...,Movie 32: Twelve Monkeys (Genres: Science Fict...,\n\nRank1: Movie 12 - Reason: The user has sho...


In [28]:
def parse_movie_gt(movie_string):
    """
    Parses a string of movies into a list of movie descriptions.
    """
    # Split by "Movie" to separate each movie entry
    movies = re.split(r'Movie (\d+):', movie_string)
    parsed_movies = []

    # Process the split results to extract movie details
    for i in range(1, len(movies), 2):  # Skip the first split as it's before "Movie"
        movie_id = movies[i].strip()  # Extract movie ID
        parsed_movies.append(int(movie_id))
    return parsed_movies

def parse_result(res_str):
    """
    Parses a string of movies into a list of movie descriptions.
    """
    movie_ids = re.findall(r'Rank\d+:\s*Movie\s*(\d+)', res_str)

    return [int(movie_id) for movie_id in movie_ids]


In [29]:
result_df['gt_parsed'] = result_df['ground_truth'].apply(parse_movie_gt)
result_df['result_parsed'] = result_df['result'].apply(parse_result)

In [30]:
result_df[['gt_parsed','result_parsed']]

Unnamed: 0,gt_parsed,result_parsed
0,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1921, 509, 32, 2311, 1715, 3852, 1439, 714, 3..."
1,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1175, 2142, 792, 1715, 3865, 2461, 1, 2272, 4..."
2,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1639, 509, 1921, 1535, 31, 1752, 1715, 1635, ..."
3,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1250, 1921, 1535, 1715, 255, 1058, 3601, 908,..."
4,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1535, 1722, 1381, 1175, 352, 1011, 3816, 1885..."
5,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1535, 1715, 1297, 908, 3682, 1591, 1732, 2285..."
6,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1921, 1535, 1513, 1715, 1520, 3826, 3142, 379..."
7,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1250, 32, 1635, 2456, 1130, 908, 991, 509, 32..."
8,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[1250, 1921, 1535, 2345, 522, 3219, 2771, 509]"
9,"[32, 509, 714, 908, 1175, 1250, 1535, 1635, 17...","[12, 321, 2390, 1081, 2411, 1535, 3718, 714, 1..."


In [31]:
import numpy as np
from typing import List

def calculate_metrics(gt_list: List[int], pred_list: List[int], k: int = 10) -> dict:
    pred_list = pred_list[:k]
    gt_set = set(gt_list)

    hits = sum(1 for item in pred_list if item in gt_set)
    hit_ratio = 1 if hits > 0 else 0

    precision = hits / k if k > 0 else 0
    recall = hits / len(gt_set) if gt_set else 0

    dcg = 0
    idcg = 0
    for i, item in enumerate(pred_list):
        if item in gt_set:
            dcg += 1 / np.log2(i + 2)

    for i in range(min(len(gt_set), k)):
        idcg += 1 / np.log2(i + 2)

    ndcg = dcg / idcg if idcg > 0 else 0

    return hit_ratio, precision, recall, ndcg

def add_metrics_to_df(df):
    df['hit_ratio'] = 0.0
    df['precision'] = 0.0
    df['recall'] = 0.0
    df['ndcg'] = 0.0

    for idx, row in df.iterrows():
        hit_ratio, precision, recall, ndcg = calculate_metrics(
            row['gt_parsed'],
            row['result_parsed']
        )

        df.at[idx, 'hit_ratio'] = hit_ratio
        df.at[idx, 'precision'] = precision
        df.at[idx, 'recall'] = recall
        df.at[idx, 'ndcg'] = ndcg

    return df

result_df = add_metrics_to_df(result_df)

print(f"Average Hit Ratio: {result_df['hit_ratio'].mean():.4f}")
print(f"Average Precision: {result_df['precision'].mean():.4f}")
print(f"Average Recall: {result_df['recall'].mean():.4f}")
print(f"Average NDCG: {result_df['ndcg'].mean():.4f}")

Average Hit Ratio: 1.0000
Average Precision: 0.3700
Average Recall: 0.3700
Average NDCG: 0.4647


And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Mistral 7b 2x faster [free Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. Gemma 6 trillion tokens is 2.5x faster! [free Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>