# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4

In [1]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Basic SFT

In [2]:
# Step 1: Load the dataset
from datasets import load_dataset
from trl import SFTTrainer

dataset = load_dataset("imdb", split="train")
dataset

  from .autonotebook import tqdm as notebook_tqdm
2023-12-28 03:46:46.117999: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-28 03:46:46.142200: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-28 03:46:46.142257: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-28 03:46:46.143242: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-28 03:46:46.1

Dataset({
    features: ['text', 'label'],
    num_rows: 25000
})

In [3]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("distilgpt2", device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

1024

In [4]:
# Step 3: Define the Trainer
trainer = SFTTrainer(
    model = model,
    train_dataset = dataset.select(range(1000)),
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
)

trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33mguntsvzz[0m. Use [1m`wandb login --relogin`[0m to force relogin


You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=375, training_loss=3.8194332682291665, metrics={'train_runtime': 350.8208, 'train_samples_per_second': 8.551, 'train_steps_per_second': 1.069, 'total_flos': 499060462583808.0, 'train_loss': 3.8194332682291665, 'epoch': 3.0})

## Advanced usage
Train on completions only
- You can use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False. To instantiate that collator for instruction data, pass a response template and the tokenizer. 
- Here is an example of how it would work to fine-tune distilgpt2 on completions only on the CodeAlpaca dataset:

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")
dataset

  from .autonotebook import tqdm as notebook_tqdm
2023-12-28 03:53:31.696814: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-28 03:53:31.717465: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-28 03:53:31.717486: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-28 03:53:31.718260: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-28 03:53:31.7

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 20022
})

In [6]:
model = AutoModelForCausalLM.from_pretrained("distilgpt2", device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
tokenizer.pad_token = tokenizer.eos_token

In [7]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

#check instruction-prompt
formatting_prompts_func(dataset[:2])

['### Question: Create a function that takes a specific input and produces a specific output using any mathematical operators. Write corresponding code in Python.\n ### Answer: def f(x):\n    """\n    Takes a specific input and produces a specific output using any mathematical operators\n    """\n    return x**2 + 3*x',
 "### Question: Generate a unique 8 character string that contains a lowercase letter, an uppercase letter, a numerical digit, and a special character. Write corresponding code in Python.\n ### Answer: import string\nimport random\n\ndef random_password_string():\n    characters = string.ascii_letters + string.digits + string.punctuation\n    password = ''.join(random.sample(characters, 8))\n    return password\n\nif __name__ == '__main__':\n    print(random_password_string())"]

In [8]:
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)

trainer = SFTTrainer(
    model,
    train_dataset=dataset.select(range(1000)),
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

trainer.train() 

Loading cached processed dataset at /home/todsavadt/.cache/huggingface/datasets/lucasmccabe-lmi___parquet/lucasmccabe-lmi--CodeAlpaca-20k-b92d1194a2c963a0/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-40d1763a83a210c6.arrow
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=375, training_loss=1.802888671875, metrics={'train_runtime': 165.0694, 'train_samples_per_second': 18.174, 'train_steps_per_second': 2.272, 'total_flos': 211711609405440.0, 'train_loss': 1.802888671875, 'epoch': 3.0})

## Instruction-Tuning

### Format your input prompts
For instruction fine-tuning, it is quite common to have two columns inside the dataset: one for the prompt & the other for the response.

This allows people to format examples like Stanford-Alpaca did as follows:

In [13]:
test = '''
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:
'''

In [14]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTTrainer

dataset = load_dataset("HuggingFaceH4/instruction-dataset")
dataset = dataset.remove_columns("meta")
dataset

Found cached dataset json (/home/todsavadt/.cache/huggingface/datasets/HuggingFaceH4___json/HuggingFaceH4--instruction-dataset-4371c4e593217484/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|██████████| 1/1 [00:00<00:00, 1224.61it/s]


DatasetDict({
    test: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 327
    })
})

In [20]:
def format_instruction(sample):
	return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['prompt']}

### Response:
{sample['completion']}
""".strip()

format_instruction(dataset['test'][0])

'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nArianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?\n\n### Response:\nDenote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24'

In [22]:
model = AutoModelForCausalLM.from_pretrained("distilgpt2", device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
tokenizer.pad_token = tokenizer.eos_token

In [23]:
trainer = SFTTrainer(
    model,
    train_dataset=dataset['test'],
    tokenizer=tokenizer,
    max_seq_length=1024,
    formatting_func=formatting_prompts_func,
)

trainer.train() 

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=123, training_loss=3.2161634491711126, metrics={'train_runtime': 61.2649, 'train_samples_per_second': 16.012, 'train_steps_per_second': 2.008, 'total_flos': 87600245981184.0, 'train_loss': 3.2161634491711126, 'epoch': 3.0})