To run this, press "Runtime" and press "Run all" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Join our Discord if you need help!
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://github.com/unslothai/unsloth#installation-instructions---conda).

You will learn how to do [DPO data prep](#Data), and how to [train via `DPOTrainer`](#Train).
To learn more about DPO, read TRL's [blog post](https://huggingface.co/blog/dpo-trl). We follow [Huggingface's Alignment Handbook](https://github.com/huggingface/alignment-handbook) to replicate [Zephyr](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta).

In [None]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

* We support Llama, Mistral, CodeLlama, TinyLlama, Vicuna, Open Hermes etc
* And Yi, Qwen ([llamafied](https://huggingface.co/models?sort=trending&search=qwen+llama)), Deepseek, all Llama, Mistral derived archs.
* We support 16bit LoRA or 4bit QLoRA. Both 2x faster.
* `max_seq_length` can be set to anything, since we do automatic RoPE Scaling via [kaiokendev's](https://kaiokendev.github.io/til) method.
* With [PR 26037](https://github.com/huggingface/transformers/pull/26037), we support downloading 4bit models **4x faster**! [Our repo](https://huggingface.co/unsloth) has Llama, Mistral 4bit models.
* DPO requires a model already trained by SFT on a similar dataset that is used for DPO. We use `HuggingFaceH4/mistral-7b-sft-beta` as the SFT model. Use this [notebook](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing) first to train a SFT model.
* [**NEW**] We make Gemma 6 trillion tokens **2.5x faster**! See our [Gemma notebook](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

In [None]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


# Inference from checkpoints

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "qiqiquq/dpo-rpo-ranker-halfdata-1202-merged-16bit", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.11.10: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Unsloth: Will load qiqiquq/dpo-rpo-ranker-halfdata-1202-merged-16bit as a legacy tokenizer.


For Zero-shot Inference: Comment out the above cell and uncomment the following

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

o_model, o_tokenizer = FastLanguageModel.from_pretrained(
    model_name = "mistralai/Mistral-7B-Instruct-v0.2", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.11.10: Fast Mistral patching. Transformers:4.46.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.0. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


test it

In [None]:
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")



outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    use_cache=True,
    return_dict_in_generate=True,
    temperature = 0
)
new_tokens = outputs[0][:, inputs['input_ids'].shape[1]:]
decoded_output = tokenizer.batch_decode(new_tokens, skip_special_tokens=True)
decoded_output[0]

'The Fibonacci sequence continues as follows: 13, 21, 34, 55, 89, 144, ...\n\nSo, the next number in the sequence is 144.'

In [None]:
import pandas as pd
test_data = pd.read_csv('/content/natural_language_top10.csv') # REPLACE IT


In [None]:
test_data.iloc[0:1]

Unnamed: 0,user_id,history,candidates,ground_truth
0,5,"Movie 50: The Usual Suspects (Genres: Drama, C...",Movie 32: Twelve Monkeys (Genres: Science Fict...,"Movie 32: Twelve Monkeys,Movie 509: The Piano,..."


In [None]:
import re
def generate_prompt(history, candidates, len_candidates):
    return f"""You are a recommender system. Based on a user's historical likes and dislikes, rank the given candidate movies by their likelihood of being the user's next favorite, according to their watching history. Please think step by step.
You MUST ONLY output the Rank of Movie id, do not include other information like genres and overview.
This user's historical interactions: {history}
There are {len_candidates} Candidates for recommendation: {candidates}

Strictly follow the output format:
Rank1: Movie id - Reason: shortly explain why the user would most likely enjoy this movie
Rank2: Movie id - Reason: shortly explain why the user would likely enjoy this movie second
...
Rank{len_candidates}: Movie id - Reason: (shorter than 10 words) explain why this movie would be the least one the user would enjoy

For example,
Rank1: Movie 32 - Reason: because user like this topic (shorter than 10 words)
...

Please provide a ranked list of the recommended movies. You MUST rank only the given candidates and cannot include any movies not listed in the candidate list.
Now, begin with 'Rank1:', Output:"""

def parse_movie_list(movie_string):
    """
    Parses a string of movies into a list of movie descriptions.
    """
    # Split by "Movie" to separate each movie entry
    movies = re.split(r'Movie (\d+):', movie_string)
    parsed_movies = []
    parsed_movies_id = []

    # Process the split results to extract movie details
    for i in range(1, len(movies), 2):  # Skip the first split as it's before "Movie"
        movie_id = movies[i].strip()  # Extract movie ID
        movie_details = movies[i + 1].strip()  # Extract details
        if movie_details.endswith(','):
            movie_details = movie_details[:-1]
        parsed_movies.append(f"Movie {movie_id}:{movie_details}")
        parsed_movies_id.append(movie_id)
    return parsed_movies, parsed_movies_id

In [None]:
FastLanguageModel.for_inference(model)
FastLanguageModel.for_inference(o_model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): Mis

In [None]:
import random
import re

def process_dataframe_rows_with_batch(df, model, tokenizer, output_file = 'output.csv', batch_size=8):
    if 'result' not in df.columns:
        df['result'] = None

    batch_history = []
    batch_candidates = []
    batch_len_candidates = []
    batch_idx = []

    for idx, row in df.iterrows():
        history, history_id = parse_movie_list(row['history'])
        candidates, candidates_id = parse_movie_list(row['candidates'])
        ground_truth, ground_truth_id = parse_movie_list(row['ground_truth'])
        prompt_candidates = candidates[:]
        random.shuffle(prompt_candidates)

        user_history_desc = " ".join(history)
        cand = ", ".join(prompt_candidates)
        len_cand = len(prompt_candidates)

        batch_history.append(user_history_desc)
        batch_candidates.append(cand)
        batch_len_candidates.append(len_cand)
        batch_idx.append(idx)

        # If the batch is full, process it
        if len(batch_history) == batch_size or idx == len(df) - 1:
            prompts = [
                generate_prompt(hist, cand, len_cand)
                for hist, cand, len_cand in zip(batch_history, batch_candidates, batch_len_candidates)
            ]

            # Tokenize the batch
            inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True).to("cuda")

            # Generate predictions for the batch
            outputs = model.generate(
                **inputs,
                max_new_tokens=512,
                use_cache=True,
                return_dict_in_generate=True,
                temperature=0
            )

            # Decode the output
            new_tokens = outputs.sequences[:, inputs['input_ids'].shape[1]:]
            decoded_outputs = tokenizer.batch_decode(new_tokens, skip_special_tokens=True)

            # Write results back to the dataframe
            for idx_in_batch, result in zip(batch_idx, decoded_outputs):
                df.at[idx_in_batch, 'result'] = result

            # Clear the batch
            batch_history = []
            batch_candidates = []
            batch_len_candidates = []
            batch_idx = []

        if idx % 10 == 0:
            print(f"Processed {idx} rows.")
            df.to_csv(output_file, index=False)

    df.to_csv(output_file, index=False)
    return df


In [None]:
OUTPUT_trained = 'output_t.csv'
OUTPUT_original = 'otuput_o.csv'
begin = 300
inference_sample = 100
# res_1 = process_dataframe_rows_with_batch(test_data[begin:begin+inference_sample], model, tokenizer,
#                                         output_file = OUTPUT_trained,
#                                         batch_size=28)
res_2 = process_dataframe_rows_with_batch(test_data[begin:begin+inference_sample], o_model, o_tokenizer,
                                        output_file = OUTPUT_original,
                                        batch_size=28)
# 根据GPU使用率调整batch size；根据预计执行时间时间调整inference的数量

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['result'] = None


Processed 300 rows.
Processed 310 rows.
Processed 320 rows.
Processed 330 rows.
Processed 340 rows.
Processed 350 rows.
Processed 360 rows.
Processed 370 rows.
Processed 380 rows.
Processed 390 rows.


## Evaluate the inference results

In [None]:
result_df_t = pd.read_csv(OUTPUT_trained).dropna()
result_df_o = pd.read_csv(OUTPUT_original).dropna()

In [None]:
def parse_movie_gt(movie_string):
    """
    Parses a string of movies into a list of movie descriptions.
    """
    # Split by "Movie" to separate each movie entry
    movies = re.split(r'Movie (\d+):', movie_string)
    parsed_movies = []

    # Process the split results to extract movie details
    for i in range(1, len(movies), 2):  # Skip the first split as it's before "Movie"
        movie_id = movies[i].strip()  # Extract movie ID
        parsed_movies.append(int(movie_id))
    return parsed_movies

def parse_result(res_str):
    """
    Parses a string of movies into a list of movie descriptions.
    """
    movie_ids = re.findall(r'Rank\d+:\s*Movie\s*(\d+)', res_str)
    # movie_ids = re.findall(r'Rank \d+:\s*Movie\s*(\d+)', res_str) # 如果你在用SFT之后的模型，用这一行； IF YOU ARE USING MODEL AFTER SFT, USE THIS !!!!!!

    return [int(movie_id) for movie_id in movie_ids]


In [None]:
def process_df(result):
  result['candidate_parsed'] = result['candidates'].apply(parse_movie_gt)
  result['gt_parsed'] = result['ground_truth'].apply(parse_movie_gt)
  result['result_parsed'] = result['result'].apply(parse_result)
  return result

In [None]:
result_df_t = process_df(result_df_t)
result_df_o = process_df(result_df_o)

In [None]:
result_df_t[['candidate_parsed', 'gt_parsed','result_parsed']].head()

Unnamed: 0,candidate_parsed,gt_parsed,result_parsed
0,"[1, 272, 1196, 1354, 1662, 1693, 1951, 2144, 2...","[1, 1028, 1196, 2000, 2144, 2424, 2581, 2791, ...","[1693, 1, 2424, 1196, 1662, 1354, 1951, 3836, ..."
1,"[1, 349, 1196, 1880, 2144, 2388, 2791, 2792, 3...","[1, 1028, 1196, 2000, 2144, 2424, 2581, 2791, ...","[1196, 1, 2144, 1880, 2388, 3868, 1, 3136, 279..."
2,"[1, 107, 1123, 1196, 1762, 1926, 2243, 2579, 2...","[1, 1028, 1196, 2000, 2144, 2424, 2581, 2791, ...","[1, 2243, 1148, 597, 1270, 1307, 2804, 107, 17..."
3,"[500, 556, 1004, 1028, 1055, 2000, 2144, 2424,...","[1, 1028, 1196, 2000, 2144, 2424, 2581, 2791, ...","[2424, 500, 1285, 11, 2145, 1278, 1835, 2396, ..."
4,"[1, 176, 193, 1028, 1196, 1707, 2000, 3129, 31...","[1, 1028, 1196, 2000, 2144, 2424, 2581, 2791, ...","[1028, 1, 1707, 176, 1079, 2406, 1, 3147, 1196..."


In [None]:
import numpy as np
from typing import List

def calculate_metrics(gt_list: List[int], pred_list: List[int], k: int = 10) -> dict:
    pred_list = pred_list[:k]
    gt_set = set(gt_list)

    hits = sum(1 for item in pred_list if item in gt_set)
    hit_ratio = 1 if hits > 0 else 0

    precision = hits / k if k > 0 else 0
    recall = hits / len(gt_set) if gt_set else 0

    dcg = 0
    idcg = 0
    for i, item in enumerate(pred_list):
        if item in gt_set:
            dcg += 1 / np.log2(i + 2)

    for i in range(min(len(gt_set), k)):
        idcg += 1 / np.log2(i + 2)

    ndcg = dcg / idcg if idcg > 0 else 0

    return hit_ratio, precision, recall, ndcg

def add_metrics_to_df(df, k=10, comp_col = 'result_parsed'):
    df['hit_ratio'] = 0.0
    df['precision'] = 0.0
    df['recall'] = 0.0
    df['ndcg'] = 0.0

    for idx, row in df.iterrows():
        hit_ratio, precision, recall, ndcg = calculate_metrics(
            row['gt_parsed'],
            row[comp_col],
            k = k
        )

        df.at[idx, 'hit_ratio'] = hit_ratio
        df.at[idx, 'precision'] = precision
        df.at[idx, 'recall'] = recall
        df.at[idx, 'ndcg'] = ndcg

    return df


In [None]:
print('Non-ranker')
for k in [3, 5, 10]:
  result_df_t = add_metrics_to_df(result_df_t,k=k, comp_col = 'candidate_parsed')
  print('-'*20)
  print(f"For k = {k}")

  print(f"Average Hit Ratio: {result_df_t['hit_ratio'].mean():.4f}")
  print(f"Average Precision: {result_df_t['precision'].mean():.4f}")
  print(f"Average Recall: {result_df_t['recall'].mean():.4f}")
  print(f"Average NDCG: {result_df_t['ndcg'].mean():.4f}")

Non-ranker
--------------------
For k = 3
Average Hit Ratio: 0.7381
Average Precision: 0.3373
Average Recall: 0.1012
Average NDCG: 0.3679
--------------------
For k = 5
Average Hit Ratio: 0.9048
Average Precision: 0.3429
Average Recall: 0.1714
Average NDCG: 0.3637
--------------------
For k = 10
Average Hit Ratio: 1.0000
Average Precision: 0.3298
Average Recall: 0.3298
Average NDCG: 0.3468


In [None]:
print('Trained Ranker')
for k in [3, 5, 10]:
  result_df_t = add_metrics_to_df(result_df_t,k=k)
  print('-'*20)
  print(f"For k = {k}")

  print(f"Average Hit Ratio: {result_df_t['hit_ratio'].mean():.4f}")
  print(f"Average Precision: {result_df_t['precision'].mean():.4f}")
  print(f"Average Recall: {result_df_t['recall'].mean():.4f}")
  print(f"Average NDCG: {result_df_t['ndcg'].mean():.4f}")

Trained Ranker
--------------------
For k = 3
Average Hit Ratio: 0.9167
Average Precision: 0.4802
Average Recall: 0.1440
Average NDCG: 0.5287
--------------------
For k = 5
Average Hit Ratio: 0.9524
Average Precision: 0.3786
Average Recall: 0.1893
Average NDCG: 0.4457
--------------------
For k = 10
Average Hit Ratio: 0.9762
Average Precision: 0.2905
Average Recall: 0.2905
Average NDCG: 0.3606


In [None]:
print('Zero-shot Ranker')
for k in [3, 5, 10]:
  result_df_o = add_metrics_to_df(result_df_o,k=k)
  print('-'*20)
  print(f"For k = {k}")

  print(f"Average Hit Ratio: {result_df_o['hit_ratio'].mean():.4f}")
  print(f"Average Precision: {result_df_o['precision'].mean():.4f}")
  print(f"Average Recall: {result_df_o['recall'].mean():.4f}")
  print(f"Average NDCG: {result_df_o['ndcg'].mean():.4f}")

Zero-shot Ranker
--------------------
For k = 3
Average Hit Ratio: 0.8571
Average Precision: 0.4563
Average Recall: 0.1369
Average NDCG: 0.4832
--------------------
For k = 5
Average Hit Ratio: 0.9286
Average Precision: 0.3929
Average Recall: 0.1964
Average NDCG: 0.4314
--------------------
For k = 10
Average Hit Ratio: 0.9762
Average Precision: 0.2964
Average Recall: 0.2964
Average NDCG: 0.3503


And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/u54VK8m8tk) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Mistral 7b 2x faster [free Colab](https://colab.research.google.com/drive/1Dyauq4kTZoLewQ1cApceUQVNcnnNTzg_?usp=sharing)
2. Llama 7b 2x faster [free Colab](https://colab.research.google.com/drive/1lBzz5KeZJKXjvivbYvmGarix9Ao6Wxe5?usp=sharing)
3. TinyLlama 4x faster full Alpaca 52K in 1 hour [free Colab](https://colab.research.google.com/drive/1AZghoNBQaMDgWJpi4RbffGM1h6raLUj9?usp=sharing)
4. CodeLlama 34b 2x faster [A100 on Colab](https://colab.research.google.com/drive/1y7A0AxE3y8gdj4AVkl2aZX47Xu3P1wJT?usp=sharing)
5. Mistral 7b [free Kaggle version](https://www.kaggle.com/code/danielhanchen/kaggle-mistral-7b-unsloth-notebook)
6. We also did a [blog](https://huggingface.co/blog/unsloth-trl) with 🤗 HuggingFace, and we're in the TRL [docs](https://huggingface.co/docs/trl/main/en/sft_trainer#accelerate-fine-tuning-2x-using-unsloth)!
7. `ChatML` for ShareGPT datasets, [conversational notebook](https://colab.research.google.com/drive/1Aau3lgPzeZKQ-98h69CCu1UJcvIBLmy2?usp=sharing)
8. Text completions like novel writing [notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing)
9. Gemma 6 trillion tokens is 2.5x faster! [free Colab](https://colab.research.google.com/drive/10NbwlsRChbma1v55m8LAPYG15uQv6HLo?usp=sharing)

<div class="align-center">
  <a href="https://github.com/unslothai/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/u54VK8m8tk"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://ko-fi.com/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Kofi button.png" width="145"></a></a> Support our work if you can! Thanks!
</div>