To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**Read our [Gemma 3 blog](https://unsloth.ai/blog/gemma3) for what's new in Unsloth and our [Reasoning blog](https://unsloth.ai/blog/r1-reasoning) on how to train reasoning models.**

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth

### Unsloth

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    # Can select any from the below:
    # "unsloth/Qwen2.5-0.5B", "unsloth/Qwen2.5-1.5B", "unsloth/Qwen2.5-3B"
    # "unsloth/Qwen2.5-14B",  "unsloth/Qwen2.5-32B",  "unsloth/Qwen2.5-72B",
    # And also all Instruct versions and Math. Coding verisons!
    model_name = "unsloth/Qwen2.5-7B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.4.3: Fast Qwen2 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/106k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model-00002-of-00002.safetensors:   0%|          | 0.00/2.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

### Tokenizer

In [3]:
import functools
import logging
import re
from typing import List, Optional, Union

from transformers import PreTrainedTokenizerBase, BatchEncoding

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class ReversibleTokenizer:
    """为tokenizer添加文本反转功能的装饰器类，支持流式输出"""

    # 用户可见的指令标签
    USER_TAG_START = "<|do_r2l_start|>"
    USER_TAG_END = "<|do_r2l_end|>"

    # 内部用于编码/解码的标记
    MARKER_START = "<|r2l_marker_start|>"
    MARKER_END = "<|r2l_marker_end|>"

    def __init__(self, tokenizer: PreTrainedTokenizerBase):
        """初始化可反转tokenizer装饰器"""
        self.tokenizer = tokenizer
        self.original_encode = tokenizer.encode
        self.original_decode = tokenizer.decode
        self.original_call_one = tokenizer._call_one

        # 状态追踪 - 用于流式处理
        self.incomplete_markers = {}

        # 编译正则表达式
        self.encode_pattern = re.compile(
            re.escape(self.USER_TAG_START) + r'(.*?)' + re.escape(self.USER_TAG_END),
            re.DOTALL
        )
        self.decode_pattern = re.compile(
            re.escape(self.MARKER_START) + r'(.*?)' + re.escape(self.MARKER_END),
            re.DOTALL
        )

        # 应用补丁
        self._apply_patch()

    def _apply_patch(self) -> None:
        """应用编码/解码补丁到tokenizer"""
        # 添加内部标记作为特殊标记
        special_tokens_dict = {'additional_special_tokens': [self.MARKER_START, self.MARKER_END]}
        num_added = self.tokenizer.add_special_tokens(special_tokens_dict)
        logger.info(f"添加了 {num_added} 个内部标记: {special_tokens_dict['additional_special_tokens']}")

        # 替换原始方法
        self.tokenizer.encode = functools.partial(self.reversible_encode, self.tokenizer)
        self.tokenizer.decode = functools.partial(self.reversible_decode, self.tokenizer)
        self.tokenizer._call_one = functools.partial(self.reversible_call_one, self.tokenizer)

        # 添加流式解码方法
        self.tokenizer.stream_decode = functools.partial(self.stream_decode, self.tokenizer)

        logger.info(f"成功为tokenizer应用了反转补丁: {self.tokenizer.__class__.__name__}")

    def reversible_call_one(self, tokenizer: PreTrainedTokenizerBase,
                            text: Union[str, List[str], List[List[str]]],
                            text_pair: Optional[Union[str, List[str], List[List[str]]]] = None,
                            **kwargs) -> BatchEncoding:
        """
        重写_call_one方法，在调用原始方法前处理文本反转
        """
        # 处理文本反转，仅当输入是字符串时
        if isinstance(text, str):
            processed_text = self._process_text_for_reversing(text)
            # 调用原始_call_one方法
            return self.original_call_one(processed_text, text_pair, **kwargs)
        elif isinstance(text, list) and all(isinstance(t, str) for t in text):
            # 处理字符串列表
            processed_text = [self._process_text_for_reversing(t) for t in text]
            return self.original_call_one(processed_text, text_pair, **kwargs)
        elif isinstance(text, list) and all(isinstance(t, list) and all(isinstance(s, str) for s in t) for t in text):
            # 处理字符串列表的列表
            processed_text = [[self._process_text_for_reversing(s) for s in t] for t in text]
            return self.original_call_one(processed_text, text_pair, **kwargs)

        # 处理text_pair，如果存在且是字符串
        if text_pair is not None:
            if isinstance(text_pair, str):
                processed_text_pair = self._process_text_for_reversing(text_pair)
                return self.original_call_one(text, processed_text_pair, **kwargs)
            elif isinstance(text_pair, list) and all(isinstance(t, str) for t in text_pair):
                processed_text_pair = [self._process_text_for_reversing(t) for t in text_pair]
                return self.original_call_one(text, processed_text_pair, **kwargs)
            elif isinstance(text_pair, list) and all(
                    isinstance(t, list) and all(isinstance(s, str) for s in t) for t in text_pair):
                processed_text_pair = [[self._process_text_for_reversing(s) for s in t] for t in text_pair]
                return self.original_call_one(text, processed_text_pair, **kwargs)

        # 如果输入不是字符串或字符串列表，直接调用原始方法
        return self.original_call_one(text, text_pair, **kwargs)

    def _process_text_for_reversing(self, text: str) -> str:
        """处理文本中的反转标记，保持空格处理的一致性"""

        def reverse_and_mark(match):
            original_text = match.group(1)
            # 保留原始文本的空格，仅反转字符
            reversed_text = original_text[::-1]
            # 不添加额外空格
            return f"{self.MARKER_START}{reversed_text}{self.MARKER_END}"

        return self.encode_pattern.sub(reverse_and_mark, text)

    def reversible_encode(self, tokenizer: PreTrainedTokenizerBase, text: str, **kwargs) -> List[int]:
        """
        编码方法：反转<|do_r2l_start|>标签内的文本，并在调用原始编码前替换为内部标记
        """
        processed_text = self._process_text_for_reversing(text)

        # 在处理后的文本上调用原始tokenizer的encode
        return self.original_encode(processed_text, **kwargs)

    def reversible_decode(self, tokenizer: PreTrainedTokenizerBase,
                          token_ids: List[int], **kwargs) -> str:
        """
        解码方法：使用原始方法解码，然后找到内部标记，将其内容反转回原始内容
        """
        # 存储用户对跳过特殊标记的偏好
        user_skip_special_tokens = kwargs.pop('skip_special_tokens', False)
        # 存储用户对清理标记化空格的偏好，默认为False以保持空格
        clean_up_tokenization_spaces = kwargs.pop('clean_up_tokenization_spaces', False)

        # 步骤1：保留标记进行解码
        decode_kwargs = kwargs.copy()
        decode_kwargs['skip_special_tokens'] = False
        decode_kwargs['clean_up_tokenization_spaces'] = clean_up_tokenization_spaces
        decoded_text_with_markers = self.original_decode(token_ids, **decode_kwargs)

        # 步骤2：找到标记，反转内容，移除标记
        # 这个正则表达式处理需要更精确，以确保不引入额外空格
        def restore_original_and_remove_markers(match):
            # 获取标记内的文本并反转回来，移除任何前导和尾随空格
            reversed_text = match.group(1).strip()
            return reversed_text[::-1]

        restored_text = self.decode_pattern.sub(restore_original_and_remove_markers, decoded_text_with_markers)

        # 步骤3：如果用户要求，跳过特殊标记
        if user_skip_special_tokens:
            # 移除所有特殊标记，但保持空格处理一致
            for token in tokenizer.all_special_tokens:
                restored_text = restored_text.replace(token, "")

        return restored_text

    def stream_decode(self, tokenizer: PreTrainedTokenizerBase,
                      token_ids: List[int], stream_id: str = "default", **kwargs) -> str:
        """
        流式解码方法：专为流式输出设计，可以处理跨批次的标记
        """
        # 获取清理标记化空格的设置
        clean_up_tokenization_spaces = kwargs.pop('clean_up_tokenization_spaces', False)

        # 初始化流状态
        if stream_id not in self.incomplete_markers:
            self.incomplete_markers[stream_id] = {
                "buffer": [],
                "processed_text": "",
                "in_marker": False
            }

        state = self.incomplete_markers[stream_id]

        # 将新token添加到缓冲区
        state["buffer"].extend(token_ids)

        # 使用完整buffer解码
        decode_kwargs = kwargs.copy()
        decode_kwargs['skip_special_tokens'] = False
        decode_kwargs['clean_up_tokenization_spaces'] = clean_up_tokenization_spaces
        current_full_text = self.original_decode(state["buffer"], **decode_kwargs)

        # 处理反转标记
        processed_text = self._process_stream_text(current_full_text, state)

        # 计算新增的文本部分
        new_text = processed_text[len(state["processed_text"]):]
        state["processed_text"] = processed_text

        return new_text

    def _process_stream_text(self, text: str, state: dict) -> str:
        """处理流文本中的反转标记，状态跟踪确保标记处理正确"""
        # 完整标记对处理
        if self.MARKER_START in text and self.MARKER_END in text:
            # 处理所有完整的标记对
            processed = self.decode_pattern.sub(
                lambda m: m.group(1).strip()[::-1],
                text
            )
            state["in_marker"] = False
            return processed

        # 处理开始了但未结束的标记
        elif self.MARKER_START in text and not self.MARKER_END in text:
            # 记录我们正在处理一个标记
            state["in_marker"] = True
            # 只返回标记开始前的文本
            before_marker = text.split(self.MARKER_START, 1)[0]
            return before_marker

        # 当之前有未完成的标记，但这一批次没有结束标记时，不返回新文本
        elif state["in_marker"] and not self.MARKER_END in text:
            # 仍在标记内，不返回新内容
            return state["processed_text"]

        # 没有任何标记或标记已处理完，直接返回
        return text

    def reset_stream(self, stream_id: str = "default") -> None:
        """重置特定流的状态"""
        if stream_id in self.incomplete_markers:
            del self.incomplete_markers[stream_id]


# 便捷函数
def patch_tokenizer(tokenizer: PreTrainedTokenizerBase) -> PreTrainedTokenizerBase:
    """为tokenizer应用反转文本的补丁，并返回修改后的tokenizer"""
    # 设置clean_up_tokenization_spaces为False
    tokenizer.clean_up_tokenization_spaces = False
    reversible = ReversibleTokenizer(tokenizer)
    # 添加重置方法
    tokenizer.reset_stream = reversible.reset_stream
    return tokenizer


# 调整前记录嵌入层大小
original_embedding_size = model.get_input_embeddings().weight.shape[0]
print(f"原始词嵌入大小: {original_embedding_size}")

# 应用patch和resize
tokenizer = patch_tokenizer(tokenizer)
model.resize_token_embeddings(len(tokenizer))

# 检查调整后的嵌入层大小
new_embedding_size = model.get_input_embeddings().weight.shape[0]
print(f"调整后词嵌入大小: {new_embedding_size}")
print(f"新增token数: {new_embedding_size - original_embedding_size}")

原始词嵌入大小: 152064
调整后词嵌入大小: 151667
新增token数: -397


### Load model

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.4.3 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the Alpaca dataset from [yahma](https://huggingface.co/datasets/yahma/alpaca-cleaned), which is a filtered version of 52K of the original [Alpaca dataset](https://crfm.stanford.edu/2023/03/13/alpaca.html). You can replace this code section with your own data prep.

**[NOTE]** To train only on completions (ignoring the user's input) read TRL's docs [here](https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only).

**[NOTE]** Remember to add the **EOS_TOKEN** to the tokenized output!! Otherwise you'll get infinite generations!

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Alpaca.ipynb)

For text completions like novel writing, try this [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb).

In [10]:

import argparse
import json
import os
import random
# 导入项目的ReversibleTokenizer类
import sys
from typing import List, Dict, Tuple

from tqdm import tqdm
from transformers import AutoTokenizer, PreTrainedTokenizerBase

def generate_token_examples(
        tokenizer: PreTrainedTokenizerBase,
        max_examples: int = 100000,
        min_token_freq: int = 5,
        exclude_special_tokens: bool = True,
        seed: int = 42
) -> Tuple[List[Dict], Dict]:
    """
    为tokenizer的词汇表生成训练示例。

    Args:
        tokenizer: 要生成数据的tokenizer
        max_examples: 要生成的最大示例数
        min_token_freq: 每个token至少生成的示例数
        exclude_special_tokens: 是否排除特殊token
        seed: 随机种子

    Returns:
        Tuple[List[str], Dict]: 生成的示例列表和统计信息
    """
    random.seed(seed)

    # 获取词汇表（排除特殊token）
    vocab = list(tokenizer.get_vocab().items())
    vocab_size = len(vocab)

    # 过滤特殊token
    special_token_ids = set(tokenizer.all_special_ids) if exclude_special_tokens else set()
    filtered_vocab = [(token, idx) for token, idx in vocab
                      if idx not in special_token_ids
                      and len(token) > 1]  # 排除单字符token以获得更有意义的示例

    print(f"词汇表大小: {vocab_size}, 过滤后: {len(filtered_vocab)}")

    examples = []
    stats = {
        "total_tokens_processed": len(filtered_vocab),
        "examples_generated": 0,
        "tokens_covered": 0,
        "token_frequency": {}
    }

    # 为每个token创建至少min_token_freq个示例
    token_examples_count = {}

    # 计算需要多少轮才能达到min_token_freq
    rounds_needed = (min_token_freq + 2) // 3  # 每个token平均可以生成3种模式

    raw_to_r2l_instruction_list = [
        "Convert right to left way text to normal way",
        "Transform reversed text back to normal reading order",
        "Change text from right-to-left to standard left-to-right format",
        "Restore reversed text to its original character order",
        "Return this backwards text to normal reading direction",
        "Revert right-to-left text to conventional reading order",
        "Convert reversed character sequence back to normal",
        "Fix the direction of this text to read from left to right",
        "Normalize the character order in this reversed text",
        "Correct the reading direction of this text",
        "Restore the natural reading order of this reversed text",
        "Make this right-to-left text readable in standard format",
        "Repair the character sequence of this reversed text",
        "Adjust this backwards text to read normally",
        "Reorganize this reversed text to conventional reading order",
        "Decode this right-to-left text to normal format",
        "Return this text to proper reading direction",
        "Process reversed text to display in regular order",
        "Translate this backwards text to standard character ordering",
        "Fix this reversed text so it reads naturally"
    ]

    r2l_to_raw_instruction_list = [
        "Convert right to left way text to normal way",
        "Transform reversed text back to normal reading order",
        "Change text from right-to-left to standard left-to-right format",
        "Restore reversed text to its original character order",
        "Return this backwards text to normal reading direction",
        "Revert right-to-left text to conventional reading order",
        "Convert reversed character sequence back to normal",
        "Fix the direction of this text to read from left to right",
        "Normalize the character order in this reversed text",
        "Correct the reading direction of this text",
        "Restore the natural reading order of this reversed text",
        "Make this right-to-left text readable in standard format",
        "Repair the character sequence of this reversed text",
        "Adjust this backwards text to read normally",
        "Reorganize this reversed text to conventional reading order",
        "Decode this right-to-left text to normal format",
        "Return this text to proper reading direction",
        "Process reversed text to display in regular order",
        "Translate this backwards text to standard character ordering",
        "Fix this reversed text so it reads naturally"
    ]

    with tqdm(total=min(max_examples, len(filtered_vocab) * rounds_needed)) as pbar:
        for round_idx in range(rounds_needed):
            if stats["examples_generated"] >= max_examples:
                break

            # 每轮随机打乱词汇表
            random.shuffle(filtered_vocab)

            for token, token_id in filtered_vocab:
                if stats["examples_generated"] >= max_examples:
                    break

                # 如果这个token已经有足够的示例，跳过
                if token in token_examples_count and token_examples_count[token] >= min_token_freq:
                    continue

                # 获取token的原始文本表示
                text = tokenizer.decode([token_id])

                # 增加计数
                if token not in token_examples_count:
                    token_examples_count[token] = 0
                    stats["tokens_covered"] += 1

                # 随机选择生成模式
                mode = random.choice([1, 2])

                # alpaca instruction
                if mode == 1:
                    # 模式1: text -> <|do_r2l_start|>text<|do_r2l_end|>
                    s_instruction = random.choice(raw_to_r2l_instruction_list)
                    s_input = text
                    s_output = f"{ReversibleTokenizer.USER_TAG_START}{text}{ReversibleTokenizer.USER_TAG_END}"
                    example = dict(instruction=s_instruction, input=s_input, output=s_output, )
                    examples.append(example)
                elif mode == 2:
                    # 模式2: <|do_r2l_start|>text<|do_r2l_end|> -> text
                    s_instruction = random.choice(r2l_to_raw_instruction_list)
                    s_input = f"{ReversibleTokenizer.USER_TAG_START}{text}{ReversibleTokenizer.USER_TAG_END}"
                    s_output = text
                    example = dict(instruction=s_instruction, input=s_input, output=s_output, )
                    examples.append(example)

                # 更新统计信息
                token_examples_count[token] = token_examples_count.get(token, 0) + 1
                if token not in stats["token_frequency"]:
                    stats["token_frequency"][token] = 0
                stats["token_frequency"][token] += 1

                stats["examples_generated"] += 1
                pbar.update(1)

    return examples, stats

tokenizer_examples, stats = generate_token_examples(
        tokenizer=tokenizer,
        max_examples=len(tokenizer),
        min_token_freq=2,
        seed=42
    )

from datasets import Dataset
tokenizer_examples_dataset = Dataset.from_list(tokenizer_examples_dict)
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }

dataset = tokenizer_examples_dataset.map(formatting_prompts_func, batched = True,)


词汇表大小: 151667, 过滤后: 151407


100%|██████████| 151407/151407 [00:01<00:00, 98637.96it/s] 


Map:   0%|          | 0/151407 [00:00<?, ? examples/s]

In [24]:
for i in range(10):
  exp = tokenizer_examples[i]
  print(exp)
  a = tokenizer.encode(exp)
  print(a)
  c = tokenizer.decode(a)
  print(c)
  b = tokenizer.batch_decode(a)
  print(b)

<|do_r2l_start|>ри<|do_r2l_end|> ри
[151665, 77646, 151666, 18108, 1802]
ри ри
['<|r2l_marker_start|>', 'ир', '<|r2l_marker_end|>', ' р', 'и']
<|do_r2l_start|>odus<|do_r2l_end|> odus
[151665, 18881, 151666, 10785, 355]
odus odus
['<|r2l_marker_start|>', 'sudo', '<|r2l_marker_end|>', ' od', 'us']
<|do_r2l_start|> backButton<|do_r2l_end|>  backButton
[151665, 77, 1716, 84, 33, 31378, 370, 220, 151666, 220, 89726]
backButton  backButton
['<|r2l_marker_start|>', 'n', 'ott', 'u', 'B', 'kc', 'ab', ' ', '<|r2l_marker_end|>', ' ', ' backButton']
 sudden <|do_r2l_start|> sudden<|do_r2l_end|>
[10968, 220, 151665, 18694, 67, 355, 220, 151666]
 sudden sudden
[' sudden', ' ', '<|r2l_marker_start|>', 'ned', 'd', 'us', ' ', '<|r2l_marker_end|>']
_HC <|do_r2l_start|>_HC<|do_r2l_end|>
[98991, 220, 151665, 2149, 62, 151666]
_HC _HC
['_HC', ' ', '<|r2l_marker_start|>', 'CH', '_', '<|r2l_marker_end|>']
的画面 <|do_r2l_start|>的画面<|do_r2l_end|>
[111097, 220, 151665, 27091, 54623, 9370, 151666]
的画面 的画面
['的画面', 

In [None]:
# alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

# ### Instruction:
# {}

# ### Input:
# {}

# ### Response:
# {}"""

# EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
# def formatting_prompts_func(examples):
#     instructions = examples["instruction"]
#     inputs       = examples["input"]
#     outputs      = examples["output"]
#     texts = []
#     for instruction, input, output in zip(instructions, inputs, outputs):
#         # Must add EOS_TOKEN, otherwise your generation will go on forever!
#         text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
#         texts.append(text)
#     return { "text" : texts, }
# pass

# from datasets import load_dataset
# dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
# dataset = dataset.map(formatting_prompts_func, batched = True,)

README.md:   0%|          | 0.00/11.6k [00:00<?, ?B/s]

alpaca_data_cleaned.json:   0%|          | 0.00/44.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/51760 [00:00<?, ? examples/s]

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [11]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/151407 [00:00<?, ? examples/s]

In [12]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
8.16 GB of memory reserved.


In [13]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 151,407 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)


Step,Training Loss
1,4.2899
2,4.0377
3,4.0572
4,4.0691
5,3.5054
6,3.2405
7,2.4972
8,2.1633
9,1.6984
10,1.2497




In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

462.3942 seconds used for training.
7.71 minutes used for training.
Peak reserved memory = 7.893 GB.
Peak reserved memory for training = 2.129 GB.
Peak reserved memory % of max memory = 53.519 %.
Peak reserved memory for training % of max memory = 14.436 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

**[NEW] Try 2x faster inference in a free Colab for Llama-3.1 8b Instruct [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Unsloth_Studio.ipynb)**

In [18]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    # alpaca_prompt.format(
    #     "Continue the fibonnaci sequence.", # instruction
    #     "1, 1, 2, 3, 5, 8", # input
    #     "", # output - leave this blank for generation!
    # )
    "Hello"
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
print(outputs)
tokenizer.batch_decode(outputs)

tensor([[  9707,  82639,   2982,   1710,     17,     75,   4906,     91,     29,
          21927,     27,     91,   2982,   1710,     17,     75,   6213,     91,
             29, 151643]], device='cuda:0')


['Hello <|do_r2l_start|> Hello<|do_r2l_end|><|endoftext|>']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811, 


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/vocab.json',
 'lora_model/merges.txt',
 'lora_model/added_tokens.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# alpaca_prompt = You MUST copy from above!

inputs = tokenizer(
[
    alpaca_prompt.format(
        "What is a famous tall tower in Paris?", # instruction
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is a famous tall tower in Paris?

### Input:


### Response:
One of the most famous tall towers in Paris is the Eiffel Tower. It was built by Gustave Eiffel in 1889 and stands at a height of 324 meters (1,063 feet). It is a symbol of Paris and one of the most recognizable structures in the world.<|endoftext|>


You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit = load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)

In [None]:
# Save to 8bit Q8_0
if False: model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
