# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [4]:
!pip3 install peft==0.7.1
!pip3 install trl==0.7.4
!pip3 install transformer==4.36.2

[31mERROR: Could not find a version that satisfies the requirement transformer==4.36.2 (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for transformer==4.36.2[0m[31m
[0m

In [5]:
import transformers
transformers.__version__

'4.39.2'

In [3]:
import trl
trl.__version__

ImportError: cannot import name 'top_k_top_p_filtering' from 'transformers' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/__init__.py)

In [None]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

#os.environ['http_proxy']  = 'http://192.41.170.23:3128'
#os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

## Basic SFT

In [None]:
# Step 1: Load the dataset
from datasets import Dataset

dataset_path = 'alpaca_data.json'
# Create a Dataset object
dataset = Dataset.from_json(dataset_path)
dataset

Generating train split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['instruction', 'output', 'input'],
    num_rows: 52002
})

In [None]:
dataset[0]

{'instruction': 'Give three tips for staying healthy.',
 'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'input': ''}

In [None]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, 
    device_map = 'auto'
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

1024

In [None]:
# Step 3: Define the Trainer
from transformers import TrainingArguments
from trl import SFTTrainer
training_args = TrainingArguments(
    output_dir = 'tmp_trainer', #default = 'tmp_trainer'
    num_train_epochs=5, #default = 3
)

trainer = SFTTrainer(
    model = model,
    args = training_args,
    train_dataset = dataset.select(range(1000)),
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
)

ImportError: cannot import name 'top_k_top_p_filtering' from 'transformers' (/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/transformers/__init__.py)

In [None]:
trainer.train()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
500,3.5807


TrainOutput(global_step=625, training_loss=3.558915576171875, metrics={'train_runtime': 163.9405, 'train_samples_per_second': 30.499, 'train_steps_per_second': 3.812, 'total_flos': 837709216874496.0, 'train_loss': 3.558915576171875, 'epoch': 5.0})

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [None]:
# Step 1: Load the dataset
from datasets import load_dataset
dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")
dataset

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 20022
})

In [None]:
dataset[20000]

{'instruction': 'Design an algorithm for finding the nth node from the tail of a linked list.',
 'input': '',
 'output': '"""\ndef nthFromLast(head, n): \n    # Initialize slow and fast pointers \n    slow  = head \n    fast = head \n  \n    # Move fast pointer n-1 times \n    while (n > 0): \n        if (fast == None): \n            return None\n  \n        fast = fast.next\n        n = n - 1\n  \n    # Move both slow and fast pointer together \n    while (fast.next != None): \n        slow = slow.next\n        fast = fast.next\n  \n    return slow\n"""'}

In [None]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


In [None]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

#check instruction-prompt
formatting_prompts_func(dataset[:2])

['### Question: Create a function that takes a specific input and produces a specific output using any mathematical operators. Write corresponding code in Python.\n ### Answer: def f(x):\n    """\n    Takes a specific input and produces a specific output using any mathematical operators\n    """\n    return x**2 + 3*x',
 "### Question: Generate a unique 8 character string that contains a lowercase letter, an uppercase letter, a numerical digit, and a special character. Write corresponding code in Python.\n ### Answer: import string\nimport random\n\ndef random_password_string():\n    characters = string.ascii_letters + string.digits + string.punctuation\n    password = ''.join(random.sample(characters, 8))\n    return password\n\nif __name__ == '__main__':\n    print(random_password_string())"]

In [None]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator



DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [None]:
# Step 3: Define the Trainer
trainer = SFTTrainer(
    model,
    train_dataset=dataset.select(range(1000)),
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

trainer.train() 

Loading cached processed dataset at /home/todsavadt/.cache/huggingface/datasets/lucasmccabe-lmi___parquet/lucasmccabe-lmi--CodeAlpaca-20k-b92d1194a2c963a0/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec/cache-40d1763a83a210c6.arrow
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=375, training_loss=1.802888671875, metrics={'train_runtime': 165.0694, 'train_samples_per_second': 18.174, 'train_steps_per_second': 2.272, 'total_flos': 211711609405440.0, 'train_loss': 1.802888671875, 'epoch': 3.0})

### Standard-Alpaca : Format your input prompts
For instruction fine-tuning, it is quite common to have two columns inside the dataset: one for the prompt & the other for the response.

This allows people to format examples like Stanford-Alpaca did as follows:

In [None]:
test = '''
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{response}
'''

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTTrainer

dataset = load_dataset("HuggingFaceH4/instruction-dataset")
dataset = dataset.remove_columns("meta")
dataset

Downloading readme:   0%|          | 0.00/199 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/461k [00:00<?, ?B/s]

Generating test split: 0 examples [00:00, ? examples/s]

DatasetDict({
    test: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 327
    })
})

In [None]:
def format_instruction(sample):
	return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['prompt']}

### Response:
{sample['completion']}
""".strip()

format_instruction(dataset['test'][0])

'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nArianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?\n\n### Response:\nDenote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24'

In [None]:
model = AutoModelForCausalLM.from_pretrained("distilgpt2", device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
tokenizer.pad_token = tokenizer.eos_token

In [None]:
trainer = SFTTrainer(
    model,
    train_dataset=dataset['test'],
    tokenizer=tokenizer,
    max_seq_length=1024,
    formatting_func=format_instruction,
)

trainer.train() 

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=123, training_loss=3.2161634491711126, metrics={'train_runtime': 61.2649, 'train_samples_per_second': 16.012, 'train_steps_per_second': 2.008, 'total_flos': 87600245981184.0, 'train_loss': 3.2161634491711126, 'epoch': 3.0})