# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformer==4.36.2

In [2]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(


'4.36.2'

In [3]:
import trl
trl.__version__

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


'0.7.4'

In [4]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

## Task 1

In [5]:
# load Alpaca dataset 
import json
# Opening JSON file
f = open('alpaca_data.json')
 
# returns JSON object as 
# a dictionary
data = json.load(f)
 
# Closing file
f.close()

In [6]:
# data

In [7]:
len(data)

52002

In [8]:
#Map json format with dataset package
from datasets import Dataset

# Extract instructions, input, and outputs
instructions = [entry['instruction'] for entry in data]
inputs = [entry['input'] for entry in data]
outputs = [entry['output'] for entry in data]

# Create a Dataset
dataset = Dataset.from_dict({'instruction': instructions, 'input': inputs, 'output': outputs})

# Print dataset info
print(dataset)

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 52002
})


In [9]:
dataset[0]

{'instruction': 'Give three tips for staying healthy.',
 'input': '',
 'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.'}

In [10]:
#create function for setup the instruction format
import textwrap

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        # if the dataset has no input, I don't need to add input on promp
        if len(example['input'][i])==0:
            text = f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {example['instruction'][i]}
            
            ### Response: 
            {example['output'][i]}
        
            """
        # if the dataset has input, I add input on promp
        else:
            text = f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {example['instruction'][i]}
            
            ### Input: 
            {example['input'][i]}
            
            ### Response: 
            {example['output'][i]}
         
            """
        text = ' \n '.join(line.strip() for line in text.split(' \n'))
        output_texts.append(textwrap.dedent(text).strip())
    return output_texts

#check instruction-prompt
formatting_prompts_func(dataset[:6])


['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. \n ### Instruction: \n Give three tips for staying healthy. \n ### Response: \n 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n 2. Exercise regularly to keep your body active and strong. \n 3. Get enough sleep and maintain a consistent sleep schedule.',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. \n ### Instruction: \n What are the three primary colors? \n ### Response: \n The three primary colors are red, blue, and yellow.',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. \n ### Instruction: \n Describe the structure of an atom. \n ### Response: \n An atom is made up of

## Task 2

### Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [12]:
# Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

  _torch_pytree._register_pytree_node(


In [13]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = " ### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [14]:
# Define the Trainer
from transformers import TrainingArguments
training_args = TrainingArguments(
    num_train_epochs=5, #default = 3
    output_dir = 'tmp_trainer'
)

trainer = SFTTrainer(
    model,
    train_dataset=dataset.select(range(10000)), #since I have less time, I will train model with small size of sample (10000 samples) 
    formatting_func=formatting_prompts_func,
    max_seq_length=256,
    args=training_args
)

trainer.train() 


Map: 100%|██████████| 10000/10000 [00:01<00:00, 8041.12 examples/s]
  0%|          | 0/6250 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
  8%|▊         | 500/6250 [21:51<3:24:34,  2.13s/it]Checkpoint destination directory tmp_trainer\checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.


{'loss': 1.8165, 'learning_rate': 4.600000000000001e-05, 'epoch': 0.4}


 16%|█▌        | 1000/6250 [42:15<3:13:54,  2.22s/it]

{'loss': 1.6838, 'learning_rate': 4.2e-05, 'epoch': 0.8}


 24%|██▍       | 1500/6250 [1:03:38<2:45:35,  2.09s/it]

{'loss': 1.5929, 'learning_rate': 3.8e-05, 'epoch': 1.2}


 32%|███▏      | 2000/6250 [1:23:53<2:48:38,  2.38s/it]

{'loss': 1.5346, 'learning_rate': 3.4000000000000007e-05, 'epoch': 1.6}


 40%|████      | 2500/6250 [1:44:21<2:13:27,  2.14s/it]

{'loss': 1.5295, 'learning_rate': 3e-05, 'epoch': 2.0}


 48%|████▊     | 3000/6250 [2:04:42<2:11:16,  2.42s/it]

{'loss': 1.4388, 'learning_rate': 2.6000000000000002e-05, 'epoch': 2.4}


 56%|█████▌    | 3500/6250 [2:24:50<1:43:45,  2.26s/it]

{'loss': 1.4476, 'learning_rate': 2.2000000000000003e-05, 'epoch': 2.8}


 64%|██████▍   | 4000/6250 [2:44:43<1:41:31,  2.71s/it]

{'loss': 1.4069, 'learning_rate': 1.8e-05, 'epoch': 3.2}


 72%|███████▏  | 4500/6250 [3:05:04<1:01:50,  2.12s/it]

{'loss': 1.3898, 'learning_rate': 1.4000000000000001e-05, 'epoch': 3.6}


 80%|████████  | 5000/6250 [3:25:19<51:41,  2.48s/it]  

{'loss': 1.3907, 'learning_rate': 1e-05, 'epoch': 4.0}


 88%|████████▊ | 5500/6250 [3:45:30<32:38,  2.61s/it]

{'loss': 1.3506, 'learning_rate': 6e-06, 'epoch': 4.4}


 96%|█████████▌| 6000/6250 [4:05:38<09:07,  2.19s/it]

{'loss': 1.3481, 'learning_rate': 2.0000000000000003e-06, 'epoch': 4.8}


100%|██████████| 6250/6250 [4:15:39<00:00,  2.45s/it]

{'train_runtime': 15339.347, 'train_samples_per_second': 3.26, 'train_steps_per_second': 0.407, 'train_loss': 1.48797197265625, 'epoch': 5.0}





TrainOutput(global_step=6250, training_loss=1.48797197265625, metrics={'train_runtime': 15339.347, 'train_samples_per_second': 3.26, 'train_steps_per_second': 0.407, 'train_loss': 1.48797197265625, 'epoch': 5.0})

In [15]:
# Save the trained model
trainer.save_model("./saved_model")

## Task 3

In [2]:
from datasets import load_dataset

# Load alpaca_eval dataset 
eval_dataset = load_dataset("tatsu-lab/alpaca_eval")
eval_dataset

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


DatasetDict({
    eval: Dataset({
        features: ['instruction', 'output', 'generator', 'dataset'],
        num_rows: 805
    })
})

In [3]:
# eval_dataset['eval'][10]['instruction']

In [94]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# create the function to generate the output text
def generate_output(text):
    model_name_or_path = "./saved_model" 
    device = "cpu"

    saved_model = AutoModelForCausalLM.from_pretrained(model_name_or_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
    
    input_ids = tokenizer.encode(text, return_tensors="pt")

    # Generate text with the model
    output_ids = saved_model.generate(input_ids, max_length=150)

    # Decode the generated output
    generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    print()
    print(generated_text)

Example 1

In [95]:
generate_output(f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {eval_dataset['eval'][50]['instruction']}""")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            What year was the Yamato Battleship built? 
                    ### Response: 
 The Yamato Battleship was built in the year 1789. It was built in the year 1789. It was built in the year 1789. It was built in the


In [96]:
eval_dataset['eval'][50]['output']

'The Yamato Battleship was built in 1941.'

Example 2

In [97]:
generate_output(f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {eval_dataset['eval'][380]['instruction']}""")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            How does metabolism work? 
               ### Response: 
 Metabolism is the process by which metabolism works. It involves the process by which the body releases energy and stores energy. It is the process by which the body releases energy and stores energy. It is the process by


In [101]:
eval_dataset['eval'][380]['output']

'Metabolism is the process by which the body converts food into energy. It involves a series of chemical reactions that break down carbohydrates, fats, and proteins in the food and convert them into energy that the cells can use. The energy is then used for activities such as growth, repair, and movement.'

Example 3

In [107]:
generate_output(f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {eval_dataset['eval'][250]['instruction']}""")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            Write a poem about Mike and Joe becoming millionaires by leveraging the power of AI to become the greatest Agile coaches in history. Include content from the agile manifesto. 
                  ### Response: 
              


In [108]:
eval_dataset['eval'][250]['output']

'Mike and Joe, two ambitious men\nSet out to become millionaires in the end\nBut they knew they couldn’t do it alone\nSo they used the power of AI to reach their goal\n\nThey decided to become the greatest Agile coaches\nAnd help people find success in their approaches\nTheir aim was to foster collaboration\nAnd value individuals and interactions\n\nWith the help of AI, their dreams came to life\nAnd soon they were millionaires, with no strife\nThey embraced customer collaboration\nAnd delivered working software with iteration\n\nTheir focus was on responding to change\nBy continuously improving their range\nThey kept their systems simple and easy to use\nSo everyone was able to benefit from the news\n\nMike and Joe had become quite renowned\nFor their success as Agile coaches and their wealth had abounded\nTheir story is still told, even today\nOf how two men leveraged AI to make their way'

Example 4

In [109]:
generate_output(f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {eval_dataset['eval'][10]['instruction']}""")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            do you think retinoid is effective on removing the acne? because I have a lot of it. 
                                               


In [110]:
eval_dataset['eval'][10]['output']

'Yes, retinoids are effective in treating acne. They work by increasing cell turnover, which helps to reduce the appearance of existing acne and prevent new outbreaks. Retinoids also help to unclog pores, which in turn reduces the amount of bacteria that can cause infections. In general, retinoids help to reduce inflammation and oil production, making them a great option for those with acne.'

Example 5

In [111]:
generate_output(f"""
            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            {eval_dataset['eval'][30]['instruction']}""")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




            Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
            
            ### Instruction: 
            How do I detail a car? 
                   ### Response: 
                                      


In [112]:
eval_dataset['eval'][30]['output']

'1. Gather the necessary materials for detailing a car, such as a vacuum cleaner, microfiber towels, car wash soap, soft brush, glass cleaner, upholstery cleaner, interior wipes, wax and tire dressing.\n\n2. Start by vacuuming the interior of the car to remove dirt, debris, and other particles.\n\n3. Use a soft brush to dust the surfaces and clean the vents.\n\n4. Use the car wash soap and a microfiber towel to clean the exterior of the car.\n\n5. Apply glass cleaner to the windows and use a microfiber towel to clean them.\n\n6. Apply upholstery cleaner to the seats and use a microfiber towel to clean them.\n\n7. Use interior wipes to clean the dashboard, center console, and other interior surfaces.\n\n8. Apply wax to the exterior of the car to protect it from the elements.\n\n9. Apply tire dressing to the tires to give them a glossy finish.\n\n10. Clean the rims and tires to finish the detailing process.'

From the 5 examples that I compare with the gold label (output label from the dataset), you can see that the model is not quite good since in the first 2 examples, the model generates the response that relates to the question but it is not correct information. In the last 3 examples, the model cannot generate the response. it may be caused by the model being trained with a small sample size of the dataset and a small number of epochs.