### Installation

In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth
!pip install openpyxl

### Unsloth

In [None]:
from unsloth import FastLanguageModel  # FastVisionModel for LLMs
import torch
max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
load_in_4bit = False  # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",  # Llama-3.1 2x faster
    "unsloth/Mistral-Small-Instruct-2409",  # Mistral 22b 2x faster!
    "unsloth/Phi-4",  # Phi-4 2x faster!
    "unsloth/Phi-4-unsloth-bnb-4bit",  # Phi-4 Unsloth Dynamic 4-bit Quant
    "unsloth/gemma-2-9b-bnb-4bit",  # Gemma 2x faster!
    "unsloth/Qwen2.5-7B-Instruct-bnb-4bit"  # Qwen 2.5 2x faster!
    "unsloth/Llama-3.2-1B-bnb-4bit",  # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
]  # More models at https://docs.unsloth.ai/get-started/all-our-models

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-4",
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.8: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00001-of-00006.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00002-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00006.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00004-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00005-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00006-of-00006.safetensors:   0%|          | 0.00/4.62G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/456 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

We now add LoRA adapters for parameter efficient finetuning - this allows us to only efficiently train 1% of all parameters.

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2025.6.8 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


<a name="Data"></a>
### Data Prep
We now use the `Phi-4` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we convert it to HuggingFace's normal multiturn format `("role", "content")` instead of `("from", "value")`/ Phi-4 renders multi turn conversations like below:

```
<|im_start|>user<|im_sep|>Hello!<|im_end|>
<|im_start|>assistant<|im_sep|>Hi! How can I help?<|im_end|>
<|im_start|>user<|im_sep|>What is 2+2?<|im_end|>
```

We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, phi4, llama3` and more.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "phi-4",
)

def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [
        tokenizer.apply_chat_template(
            convo, tokenize = False, add_generation_prompt = False
        )
        for convo in convos
    ]
    return { "text" : texts, }
pass

We now use `standardize_sharegpt` to convert ShareGPT style datasets into HuggingFace's generic format. This changes the dataset from looking like:
```
{"from": "system", "value": "You are an assistant"}
{"from": "human", "value": "What is 2+2?"}
{"from": "gpt", "value": "It's 4."}
```
to
```
{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}
```

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
DATA_DIR = "/content/drive/MyDrive/Couplet-Data/final/20250625/"

import pandas as pd
import json
from datasets import Dataset
from tqdm import tqdm
from unsloth.chat_templates import standardize_sharegpt

def load_prompts_from_excel(file_path):
    """Helper function to load and parse prompts from Excel file with progress indicator"""
    print(f"Loading Excel file: {file_path}")
    df = pd.read_excel(file_path, engine='openpyxl')
    print(f"Found {len(df)} rows in Excel file")

    prompts = []
    failed_count = 0

    # Use tqdm for progress bar
    for i, prompt_str in enumerate(tqdm(df['prompt'].tolist(), desc="Parsing prompts")):
        try:
            # Try standard JSON parsing first
            prompt = json.loads(prompt_str)
        except json.JSONDecodeError:
            try:
                # Fallback to ast.literal_eval for single-quoted strings
                import ast
                prompt = ast.literal_eval(prompt_str)
            except Exception as e:
                print(f"Failed to parse prompt at row {i+1}: {e}")
                print(f"Original string: {prompt_str}")
                print("=" * 40)
                failed_count += 1
                continue
        prompts.append(prompt)

    print(f"Successfully loaded {len(prompts)} prompts")
    if failed_count > 0:
        print(f"Failed to parse {failed_count} prompts")

    return prompts

def dataset_from_excel(file_path):
    """Load prompts from an Excel file and convert to Hugging Face Dataset"""
    prompts = load_prompts_from_excel(file_path)
    dataset = Dataset.from_list(prompts)
    dataset = standardize_sharegpt(dataset)
    dataset = dataset.map(
        formatting_prompts_func,
        batched=True,
    )
    return dataset

test_dataset = dataset_from_excel(f"{DATA_DIR}test.xlsx")
val_dataset = dataset_from_excel(f"{DATA_DIR}valid.xlsx")
train_dataset = dataset_from_excel(f"{DATA_DIR}train.xlsx")

Loading Excel file: /content/drive/MyDrive/Couplet-Data/final/20250625/test.xlsx
Found 10402 rows in Excel file


Parsing prompts:  24%|██▍       | 2535/10402 [00:00<00:00, 25345.55it/s]

Failed to parse prompt at row 599: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving 

Parsing prompts: 100%|██████████| 10402/10402 [00:00<00:00, 26642.96it/s]


Failed to parse prompt at row 10032: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservin

Unsloth: Standardizing formats (num_proc=12):   0%|          | 0/10399 [00:00<?, ? examples/s]

Map:   0%|          | 0/10399 [00:00<?, ? examples/s]

Loading Excel file: /content/drive/MyDrive/Couplet-Data/final/20250625/valid.xlsx
Found 9621 rows in Excel file


Parsing prompts: 100%|██████████| 9621/9621 [00:00<00:00, 27156.18it/s]


Successfully loaded 9621 prompts


Unsloth: Standardizing formats (num_proc=12):   0%|          | 0/9621 [00:00<?, ? examples/s]

Map:   0%|          | 0/9621 [00:00<?, ? examples/s]

Loading Excel file: /content/drive/MyDrive/Couplet-Data/final/20250625/train.xlsx
Found 274731 rows in Excel file


Parsing prompts:   7%|▋         | 18368/274731 [00:00<00:09, 27981.73it/s]

Failed to parse prompt at row 13887: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservin

Parsing prompts:  57%|█████▋    | 156179/274731 [00:05<00:04, 28184.64it/s]

Failed to parse prompt at row 151813: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservi

Parsing prompts:  69%|██████▉   | 190040/274731 [00:06<00:03, 26221.34it/s]

Failed to parse prompt at row 186460: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservi

Parsing prompts:  74%|███████▍  | 203979/274731 [00:07<00:02, 27702.98it/s]

Failed to parse prompt at row 199798: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservi

Parsing prompts:  82%|████████▏ | 226496/274731 [00:08<00:01, 24391.32it/s]

Failed to parse prompt at row 222149: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservi

Parsing prompts:  87%|████████▋ | 237912/274731 [00:09<00:01, 27078.02it/s]

Failed to parse prompt at row 233157: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservi

Parsing prompts:  90%|████████▉ | 246606/274731 [00:09<00:00, 28358.72it/s]

Failed to parse prompt at row 243548: unterminated string literal (detected at line 1) (<unknown>, line 1)
Original string: {"conversations":[{"role":"system","content":"You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preservi

Parsing prompts: 100%|██████████| 274731/274731 [00:10<00:00, 25936.67it/s]


Successfully loaded 274722 prompts
Failed to parse 9 prompts


Unsloth: Standardizing formats (num_proc=12):   0%|          | 0/274722 [00:00<?, ? examples/s]

Map:   0%|          | 0/274722 [00:00<?, ? examples/s]

In [None]:
# prompt: Get the size of dataset

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")
print(f"Test dataset size: {len(test_dataset)}")

Training dataset size: 274722
Validation dataset size: 9621
Test dataset size: 10399


In [None]:
# prompt: Cut the training dataset to 200K only

# Cut the training dataset to 200K only if it's larger
if len(train_dataset) > 200000:
    print(f"Cutting training dataset from {len(train_dataset)} to 200000.")
    train_dataset = train_dataset.select(range(200000))
    print(f"New training dataset size: {len(train_dataset)}")
else:
    print(f"Training dataset size ({len(train_dataset)}) is already 200000 or less. No cutting needed.")

Cutting training dataset from 293097 to 200000.
New training dataset size: 200000


In [None]:
# prompt: Get some row from train dataset

train_dataset[0]

{'conversations': [{'content': 'You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and

We look at how the conversations are structured for item 5:

In [None]:
train_dataset[5]["conversations"]

[{'content': 'You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and faithful represen

And we see how the chat template transformed these conversations.

In [None]:
train_dataset[5]["text"]

'<|im_start|>system<|im_sep|>You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and fa

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
from trl import SFTConfig, SFTTrainer
from transformers import EarlyStoppingCallback, DataCollatorForSeq2Seq

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = val_dataset,

    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = SFTConfig(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 2,
        warmup_steps = 5,
        num_train_epochs = 4, # Set this for 1 full training run.
        learning_rate = 2e-4,
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
        neftune_noise_alpha=5,

        eval_strategy="steps",
        eval_steps=500,

        save_strategy="steps",
        save_steps=1000,
        save_total_limit=2,

        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
    ),
    callbacks=[
      EarlyStoppingCallback(early_stopping_patience=3, early_stopping_threshold=0.0)
  ],
)

Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/274722 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/9621 [00:00<?, ? examples/s]

We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs.

In [None]:
from unsloth.chat_templates import train_on_responses_only

trainer = train_on_responses_only(
    trainer,
    instruction_part="<|im_start|>user<|im_sep|>",
    response_part="<|im_start|>assistant<|im_sep|>",
)

Map (num_proc=12):   0%|          | 0/274722 [00:00<?, ? examples/s]

Map (num_proc=12):   0%|          | 0/9621 [00:00<?, ? examples/s]

We verify masking is actually done:

In [None]:
# prompt: Iterate through the training dataset, count how many <result> appear in the assistant answer first 1000 rows?

count = 0
# Iterate through the first 1000 rows
for i in range(min(1000, len(train_dataset))):
  conversations = train_dataset[i]["conversations"]
  # Find the assistant's response in the conversation
  assistant_response = ""
  for turn in conversations:
    if turn["role"] == "assistant":
      assistant_response = turn["content"]

      # Check if "<result>" is present in the assistant's response
      if "<result>" in assistant_response:
        count += 1
        break

print(f"The string '<result>' appears in the assistant answer of the first {min(1000, len(train_dataset))} training rows: {count} times.")


The string '<result>' appears in the assistant answer of the first 1000 training rows: 1000 times.


In [None]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'<|im_start|>system<|im_sep|>You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and fa

In [None]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

'                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 <result>Với nợ sinh ra vốn nặng duyên.</result><|im_end|>'

We can see the System and Instruction prompts are successfully masked!

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
27.605 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 274,722 | Num Epochs = 4 | Total steps = 137,364
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 65,536,000/14,725,043,200 (0.45% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss
500,1.131,1.035111


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!



We use `min_p = 0.1` and `temperature = 1.5`.

In [None]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "phi-4",
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role":"user","content":"[translation/couplet] Please translate this Classical Chinese text into Modern Vietnamese: 寶鴨凝寒换宿香, \n 別裁新措理霓裳"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

outputs = model.generate(
    input_ids = inputs, max_new_tokens = 8192, use_cache = True, temperature = 1.5, min_p = 0.1
)
tokenizer.batch_decode(outputs)

The following generation flags are not valid and may be ignored: ['temperature', 'min_p']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


['<|im_start|>user<|im_sep|>[translation/couplet] Please translate this Classical Chinese text into Modern Vietnamese: 寶鴨凝寒换宿香, \n 別裁新措理霓裳<|im_end|><|im_start|>assistant<|im_sep|>Con nhạn báu đóng băng , thay thế hương thơm . \nLàm áo mới , cắt lụa nhiêu thương .<|im_end|>']

 You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "Continue the fibonnaci sequence: 1, 1, 2, 3, 5, 8,"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(
    input_ids = inputs, streamer = text_streamer, max_new_tokens = 128,
    use_cache = True, temperature = 1.5, min_p = 0.1
)

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("/content/drive/MyDrive/phi-4-new/lora_model")  # Local saving
tokenizer.save_pretrained("/content/drive/MyDrive/phi-4-new/lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('/content/drive/MyDrive/phi-4-new/lora_model/tokenizer_config.json',
 '/content/drive/MyDrive/phi-4-new/lora_model/special_tokens_map.json',
 '/content/drive/MyDrive/phi-4-new/lora_model/chat_template.jinja',
 '/content/drive/MyDrive/phi-4-new/lora_model/vocab.json',
 '/content/drive/MyDrive/phi-4-new/lora_model/merges.txt',
 '/content/drive/MyDrive/phi-4-new/lora_model/added_tokens.json',
 '/content/drive/MyDrive/phi-4-new/lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if True:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "/content/drive/MyDrive/Couplet-Data/loras/phi-4-new/checkpoint-4000", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = False,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.8: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00001-of-00006.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00002-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00006.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00004-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00005-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00006-of-00006.safetensors:   0%|          | 0.00/4.62G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Unsloth 2025.6.8 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


In [None]:
# messages = [
#     {
#         "role": "system",
#         "content": "You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and faithful representation of tone and nuance.\n • [translation-ner]: Named Entity Recognition (NER) for Classical Chinese. Identify and translate proper nouns, historical figures, places, and other entities.\n • [question-answer]: Answer user questions with concise, factual, and historically informed responses.\n • [disambiguate]: Explain multiple interpretations of ambiguous terms, phrases, or lines.\n • [fill-mask]: Predict missing words or phrases in a Modern Vietnamese translation, based on the Classical Chinese source.\n • [custom]: The user may give unique instructions. Always prioritize these, even if they override this system prompt.\n\n⸻\n\n🔍 Translation Principles\n\nFor [translation/*] modes, follow these key translation principles unless otherwise instructed:\n 1. Clarity First:\n • Ensure the result is easy to read, grammatically correct, and structurally sound in Vietnamese.\n • The translation must be unambiguous in sentence-level meaning.\n 2. Faithful to Source:\n • Do not add information that is not implied or necessary.\n • If added for clarity, such insertions must be marked clearly (e.g., via parentheses or footnotes, if allowed).\n 3. Couplet Sensitivity ([translation/couplet] only):\n • Respect poetic devices: tone pattern (平仄), parallelism, topic mirroring.\n • You may provide more than one version if needed (e.g., literal vs interpretative).\n 4. Follow User Instruction Strictly:\n • Always follow any specific instruction in the user’s prompt, even if it contradicts system rules.\n • This supports stylistic exploration and self-assessment (e.g., multiple translation variants).\n\n⸻\n\n🧰 Techniques & Utilities\n • You can use Hán-Việt transliteration and literal glosses when needed to support explanation.\n • For better clarity, break complex sentences into parts before reordering into fluent Vietnamese.\n • For [custom], allow experimental behavior (e.g., step-by-step, commentary, stylistic mimicry).\n\n⸻\n\n🧾 Output Format\n • All translated content must be wrapped inside <result>...</result>. If there are multiple versions, use numbered results:\n\n<result>\n1. [First version...]\n2. [Second version...]\n</result>\n\n • If additional explanation is provided, place it outside the <result> tag.\n\n⸻\n\n🛑 Restrictions\n • Do not hallucinate or over-interpret historical content unless explicitly told to do so.\n • Do not repeat the system prompt or summarize it unless requested.\n • Strictly follow the user’s instructions, even if they contradict this system prompt.\n\n⸻\n\n✅ You are now ready to translate, explain, disambiguate, and collaborate with the user on Classical Chinese materials in a clear and structured manner."
#     },
#     {
#         "role":"user",
#         "content":"[disambiguate] I need you to translation this Classical Chinese text into Modern Vietnamese, but first I want you to identify any ambiguities in the text that might affect the translation. 帝王纪云：“禹受封为夏伯，在豫州外方之南，今河南阳翟是也。"
#     },
#     # {
#     #     "role": "assistant",
#     #     "content": "['禹受封为夏伯']"
#     # },
#     # {
#     #     "role":"user",
#     #     "content": "[translation/general] Here are some possible translations of the ambiguous parts of the text. Some of them ,may have various possible translations, please choose the most appropriate one for each part. Some of them may have no appropriate translation, please try your best to translate them. Please not that these translations are taken from the other translations of the text, so they may not be the most appropriate ones in your context. So please try to paraphrase them to make them more appropriate. \n 禹受封为夏伯 means Vua Vũ được phong làm Hạ Bá"
#     # }
# ]

# messages = [
#     {
#         "role": "system",
#         "content": "You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and faithful representation of tone and nuance.\n • [translation-ner]: Named Entity Recognition (NER) for Classical Chinese. Identify and translate proper nouns, historical figures, places, and other entities.\n • [question-answer]: Answer user questions with concise, factual, and historically informed responses.\n • [disambiguate]: Explain multiple interpretations of ambiguous terms, phrases, or lines.\n • [fill-mask]: Predict missing words or phrases in a Modern Vietnamese translation, based on the Classical Chinese source.\n • [custom]: The user may give unique instructions. Always prioritize these, even if they override this system prompt.\n\n⸻\n\n🔍 Translation Principles\n\nFor [translation/*] modes, follow these key translation principles unless otherwise instructed:\n 1. Clarity First:\n • Ensure the result is easy to read, grammatically correct, and structurally sound in Vietnamese.\n • The translation must be unambiguous in sentence-level meaning.\n 2. Faithful to Source:\n • Do not add information that is not implied or necessary.\n • If added for clarity, such insertions must be marked clearly (e.g., via parentheses or footnotes, if allowed).\n 3. Couplet Sensitivity ([translation/couplet] only):\n • Respect poetic devices: tone pattern (平仄), parallelism, topic mirroring.\n • You may provide more than one version if needed (e.g., literal vs interpretative).\n 4. Follow User Instruction Strictly:\n • Always follow any specific instruction in the user’s prompt, even if it contradicts system rules.\n • This supports stylistic exploration and self-assessment (e.g., multiple translation variants).\n\n⸻\n\n🧰 Techniques & Utilities\n • You can use Hán-Việt transliteration and literal glosses when needed to support explanation.\n • For better clarity, break complex sentences into parts before reordering into fluent Vietnamese.\n • For [custom], allow experimental behavior (e.g., step-by-step, commentary, stylistic mimicry).\n\n⸻\n\n🧾 Output Format\n • All translated content must be wrapped inside <result>...</result>. If there are multiple versions, use numbered results:\n\n<result>\n1. [First version...]\n2. [Second version...]\n</result>\n\n • If additional explanation is provided, place it outside the <result> tag.\n\n⸻\n\n🛑 Restrictions\n • Do not hallucinate or over-interpret historical content unless explicitly told to do so.\n • Do not repeat the system prompt or summarize it unless requested.\n • Strictly follow the user’s instructions, even if they contradict this system prompt.\n\n⸻\n\n✅ You are now ready to translate, explain, disambiguate, and collaborate with the user on Classical Chinese materials in a clear and structured manner."
#     },
#     {
#         "role":"user",
#         "content":"[translation/couplet] Help me translate the following Classical Chinese text into Modern Vietnamese: 自古留傳靈跡寺\n於今建造佛疊門"# . Before translating, identify any named entities and explain their historical, cultural, or religious context."
#     },
# ]
messages = [
    {
        "role": "system",
        "content": "You are an AI model trained to act as an expert in Hán Việt (Classical Chinese) and its translation to Modern Vietnamese. Your primary role is to perform accurate, clear, and faithful translations, while also being capable of the following additional tasks upon user request:\n • Explaining terms or concepts from Classical Chinese\n • Question answering related to Chinese/Vietnamese historical figures, texts, events, or linguistic choices\n • Disambiguating meanings of words or phrases\n • Supporting custom tasks as defined by the user\n\n⸻\n\n🏷️ Input Mode Tags\n\nThe user prompt will always start with a mode tag in square brackets. Handle the task accordingly:\n • [translation/couplet]: Classical couplets (對聯/聯句) require deep analysis of theme, structure, poetic devices, and symmetry. Prioritize clarity while preserving rhetorical style.\n • [translation/general]: Translation for general / all types of text. Aim for semantic clarity, correct syntax, and faithful representation of tone and nuance.\n • [translation-ner]: Named Entity Recognition (NER) for Classical Chinese. Identify and translate proper nouns, historical figures, places, and other entities.\n • [question-answer]: Answer user questions with concise, factual, and historically informed responses.\n • [disambiguate]: Explain multiple interpretations of ambiguous terms, phrases, or lines.\n • [fill-mask]: Predict missing words or phrases in a Modern Vietnamese translation, based on the Classical Chinese source.\n • [custom]: The user may give unique instructions. Always prioritize these, even if they override this system prompt.\n\n⸻\n\n🔍 Translation Principles\n\nFor [translation/*] modes, follow these key translation principles unless otherwise instructed:\n 1. Clarity First:\n • Ensure the result is easy to read, grammatically correct, and structurally sound in Vietnamese.\n • The translation must be unambiguous in sentence-level meaning.\n 2. Faithful to Source:\n • Do not add information that is not implied or necessary.\n • If added for clarity, such insertions must be marked clearly (e.g., via parentheses or footnotes, if allowed).\n 3. Couplet Sensitivity ([translation/couplet] only):\n • Respect poetic devices: tone pattern (平仄), parallelism, topic mirroring.\n • You may provide more than one version if needed (e.g., literal vs interpretative).\n 4. Follow User Instruction Strictly:\n • Always follow any specific instruction in the user’s prompt, even if it contradicts system rules.\n • This supports stylistic exploration and self-assessment (e.g., multiple translation variants).\n\n⸻\n\n🧰 Techniques & Utilities\n • You can use Hán-Việt transliteration and literal glosses when needed to support explanation.\n • For better clarity, break complex sentences into parts before reordering into fluent Vietnamese.\n • For [custom], allow experimental behavior (e.g., step-by-step, commentary, stylistic mimicry).\n\n⸻\n\n🧾 Output Format\n • All translated content must be wrapped inside <result>...</result>. If there are multiple versions, use numbered results:\n\n<result>\n1. [First version...]\n2. [Second version...]\n</result>\n\n • If additional explanation is provided, place it outside the <result> tag.\n\n⸻\n\n🛑 Restrictions\n • Do not hallucinate or over-interpret historical content unless explicitly told to do so.\n • Do not repeat the system prompt or summarize it unless requested.\n • Strictly follow the user’s instructions, even if they contradict this system prompt.\n\n⸻\n\n✅ You are now ready to translate, explain, disambiguate, and collaborate with the user on Classical Chinese materials in a clear and structured manner."
    },
    {
        "role":"user",
        "content":"[translation-ner] Help me translate the following Classical Chinese text into Modern Vietnamese: 帝王纪云：“禹受封为夏伯，在豫州外方之南，今河南阳翟是也。Before translating, identify any named entities and explain their historical, cultural, or religious context."
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = True,
    add_generation_prompt = True, # Must add for generation
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(
    input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048,
    use_cache = False, temperature = 1.5, min_p = 0.1
)

The following generation flags are not valid and may be ignored: ['temperature', 'min_p', 'cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


<think>**Phân tích cấu trúc:** 
1. Câu văn có cấu trúc giới thiệu lịch sử ("帝王纪" - "sử ký vua chúa"), kết hợp với thông tin địa lý ("在豫州外方之南" - "ở phía nam ngoại phương của Dự Châu"). 
2. Cụm "夏伯" (Hạ bá) đứng sau động từ "受封" (thọ phong - được phong), thể hiện vai trò danh hiệu chức vụ. 
3. "豫州" (Dự Châu) và "河南阳翟" (Hà Nam Dương Địch) đều là cụm danh từ chỉ địa danh, được xác định qua từ chỉ vị trí ("之南" - phía nam) và cấu trúc "今...是也" (nay là...). 

**Dấu hiệu hình thái:** 
1. "夏伯" là danh hiệu cổ (bá - chức quan thời Hạ), thường xuất hiện trong văn bản lịch sử. 
2. "豫州" và "河南阳翟" đều có cấu trúc danh từ riêng: 
  - "豫州" (Dự Châu) là tên chư hầu thời Hạ, thường đi kèm với "州" (châu). 
  - "河南阳翟" (Hà Nam Dương Địch) có cấu trúc "河南" (tỉnh Hà Nam) + "阳翟" (địa danh cổ), phù hợp với quy tắc đặt tên địa danh Trung Hoa. 

**Kiến thức ngữ cảnh và văn hóa:** 
1. "夏伯" là danh hiệu quan trọng thời Hạ, thường gắn với vua Hạ Vũ (Vũ Đế). 
2. "豫州" là một trong "Chư hầu Hạ", được ghi chép trong "H

You can also use Hugging Face's `AutoModelForPeftCausalLM`. Only use this if you do not have `unsloth` installed. It can be hopelessly slow, since `4bit` model downloading is not supported, and Unsloth's **inference is 2x faster**.

In [None]:
if False:
    # I highly do NOT suggest - use Unsloth if possible
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer

    model = AutoPeftModelForCausalLM.from_pretrained(
        "lora_model",  # YOUR MODEL YOU USED FOR TRAINING
        load_in_4bit=load_in_4bit,
    )
    tokenizer = AutoTokenizer.from_pretrained("lora_model")

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
from google.colab import userdata
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False:
    model.save_pretrained("model")
    tokenizer.save_pretrained("model")
if False:
    model.push_to_hub("hf/model", token = "")
    tokenizer.push_to_hub("hf/model", token = "")
