# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformer==4.36.2

In [1]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm


'4.36.2'

In [2]:
import trl
trl.__version__






'0.7.4'

In [3]:
import os
import torch
# Set GPU device
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"

# os.environ['http_proxy']  = 'http://192.41.170.23:3128'
# os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [19]:
# Step 1: Load the dataset
from datasets import load_dataset
# dataset = load_dataset("lucasmccabe-lmi/CodeAlpaca-20k", split="train")

file_path = 'data/alpaca_data.json'
dataset = load_dataset('json', data_files=file_path)['train']
dataset

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 52002
})

In [20]:
dataset[20000]

{'instruction': 'Name the given musical note.',
 'input': '(A musical note)',
 'output': 'The musical note is an F sharp.'}

In [21]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

In [22]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

#check instruction-prompt
formatting_prompts_func(dataset[:2])

['### Question: Give three tips for staying healthy.\n ### Answer: 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 '### Question: What are the three primary colors?\n ### Answer: The three primary colors are red, blue, and yellow.']

In [23]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [25]:
from datasets import load_dataset

dataset_eval = load_dataset("tatsu-lab/alpaca_eval")['eval']
dataset_eval

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Dataset({
    features: ['instruction', 'output', 'generator', 'dataset'],
    num_rows: 805
})

In [26]:
# Step 3: Define the Trainer
trainer = SFTTrainer(
    model,
    train_dataset=dataset.select(range(1000)),
    eval_dataset=dataset_eval,
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

trainer.train() 

Map: 100%|██████████| 1000/1000 [00:00<00:00, 6449.13 examples/s]
Map: 100%|██████████| 805/805 [00:00<00:00, 5633.55 examples/s]
  0%|          | 0/375 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
100%|██████████| 375/375 [43:38<00:00,  6.98s/it] 

{'train_runtime': 2618.3556, 'train_samples_per_second': 1.146, 'train_steps_per_second': 0.143, 'train_loss': 2.6033253580729165, 'epoch': 3.0}





TrainOutput(global_step=375, training_loss=2.6033253580729165, metrics={'train_runtime': 2618.3556, 'train_samples_per_second': 1.146, 'train_steps_per_second': 0.143, 'train_loss': 2.6033253580729165, 'epoch': 3.0})

In [27]:
trainer.evaluate()

100%|██████████| 101/101 [03:48<00:00,  2.27s/it]


{'eval_loss': 2.3399505615234375,
 'eval_runtime': 229.7263,
 'eval_samples_per_second': 3.504,
 'eval_steps_per_second': 0.44,
 'epoch': 3.0}

In [28]:
dataset_eval[0]

{'instruction': 'What are the names of some famous actors that started their careers on Broadway?',
 'output': 'Some famous actors that started their careers on Broadway include: \n1. Hugh Jackman \n2. Meryl Streep \n3. Denzel Washington \n4. Julia Roberts \n5. Christopher Walken \n6. Anthony Rapp \n7. Audra McDonald \n8. Nathan Lane \n9. Sarah Jessica Parker \n10. Lin-Manuel Miranda',
 'generator': 'text_davinci_003',
 'dataset': 'helpful_base'}

In [29]:
# Encode input text
input_text = dataset_eval[0]['instruction']
input_ids = tokenizer.encode(input_text, return_tensors="pt")

# Generate output
output = model.generate(input_ids, max_length=256, num_beams=5, no_repeat_ngram_size=2, top_k=50, top_p=0.95, temperature=0.7)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated text:\n", generated_text)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated text:
 What are the names of some famous actors that started their careers on Broadway? Some of them have been known for their roles in popular culture such as "The Sopranos" and "Downton Abbey," and some have also been associated with the entertainment industry. Some are also considered to be the most successful actors of all time, while others are often overlooked due to the fact that they are still considered one of the best actors in the world. Additionally, some of these actors have had success in other genres, including acting, directing, and directing. These are a few examples of actors who have made a name for themselves on the Broadway stage, as well as their ability to create a lasting impact on society. Finally, there are some notable names that have helped to make the industry a more diverse and diverse place.

1. John C. Reilly
2. J.J. Abrams
3. George Clooney
4. Mark Wahlberg
5. Robert De Niro
6. Paul Bettany
7. Tom Hanks
8. Leonardo DiCaprio
9. David Fincher
10

In [30]:
import pickle
torch.save(model.state_dict(), 'app/model/sft_alpaca.pt')

In [31]:
dataset_eval[1]

{'instruction': 'How did US states get their names?',
 'output': 'US states get their names from a variety of sources, including Native American tribes, Spanish explorers, British colonists, and even presidents. For example, the state of Alabama was named after the Native American tribe that lived in the area, while the state of Florida gets its name from the Spanish explorer, Ponce de Leon, who explored the area in the 1500s. Other states are named after English kings (like Virginia, named after England\'s "Virgin Queen," Queen Elizabeth I) or presidents (like Washington, named after George Washington).',
 'generator': 'text_davinci_003',
 'dataset': 'helpful_base'}