# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformer==4.36.2

In [2]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(


'4.36.2'

In [3]:
import trl
trl.__version__

'0.8.1'

In [4]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [5]:
# Step 1: Load the dataset
from datasets import load_dataset
dataset_train = load_dataset('json', data_files='data/alpaca_data.json', split='train')
dataset_train

Dataset({
    features: ['input', 'instruction', 'output'],
    num_rows: 52002
})

In [6]:
dataset_train[20000]

{'input': '(A musical note)',
 'instruction': 'Name the given musical note.',
 'output': 'The musical note is an F sharp.'}

In [7]:
dataset_eval = load_dataset("tatsu-lab/alpaca_eval", split='eval', trust_remote_code=True)
dataset_eval = dataset_eval.remove_columns(["generator", "dataset"])
dataset_eval

Downloading builder script: 100%|██████████| 8.10k/8.10k [00:00<?, ?B/s]
Downloading readme: 100%|██████████| 30.0/30.0 [00:00<?, ?B/s]
Downloading data: 100%|██████████| 621k/621k [00:00<00:00, 1.16MB/s]


C:\Users\LEGION\.cache\huggingface\datasets\downloads\07bde58ae497102ab81d326d84eafcf6c2c7e8df8cd8b8d0ef64d9eceab41ada


Generating eval split: 805 examples [00:00, 29113.05 examples/s]


Dataset({
    features: ['instruction', 'output'],
    num_rows: 805
})

In [8]:
dataset_eval[200]

{'instruction': 'what are five important topics for game design',
 'output': '1. Storytelling\n2. Player Mechanics\n3. Art Direction\n4. Level Design\n5. User Interface Design'}

In [9]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "distilgpt2"

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')

tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

tokenizer.pad_token = tokenizer.eos_token

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


1024

In [10]:
dataset_eval[0].keys()

dict_keys(['instruction', 'output'])

In [11]:
dataset_train[:2]

{'input': ['', ''],
 'instruction': ['Give three tips for staying healthy.',
  'What are the three primary colors?'],
 'output': ['1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
  'The three primary colors are red, blue, and yellow.']}

### Standard-Alpaca : Format your input prompts
For instruction fine-tuning, it is quite common to have two columns inside the dataset: one for the prompt & the other for the response.

This allows people to format examples like Stanford-Alpaca did as follows:

In [12]:
def formatting_prompts_func(examples):
	output_texts = []

	for i in range(len(examples['instruction'])):
		instruction = examples["instruction"][i]
		input_text = examples["input"][i] if 'input' in examples.keys() else ""
		response = examples["output"][i]
	
		if len(input_text) > 1:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_text}

### Response:
{response}
""".strip()
			
		else:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{response}
""".strip()

		output_texts.append(text)

	return output_texts

#check instruction-prompt
formatting_prompts_func(dataset_train[:2])

['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Response:\n1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Response:\nThe three primary colors are red, blue, and yellow.']

In [13]:
dataset_eval[0]

{'instruction': 'What are the names of some famous actors that started their careers on Broadway?',
 'output': 'Some famous actors that started their careers on Broadway include: \n1. Hugh Jackman \n2. Meryl Streep \n3. Denzel Washington \n4. Julia Roberts \n5. Christopher Walken \n6. Anthony Rapp \n7. Audra McDonald \n8. Nathan Lane \n9. Sarah Jessica Parker \n10. Lin-Manuel Miranda'}

In [14]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

In [20]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir = './results', #default = 'tmp_trainer'
    save_strategy = 'epoch',
    evaluation_strategy = 'epoch',
    gradient_checkpointing = True,
    per_device_train_batch_size = 2,
    per_device_eval_batch_size = 2,
    num_train_epochs = 3, #default = 3
)

# Step 3: Define the Trainer
trainer = SFTTrainer(
    model,
    args = training_args,
    train_dataset = dataset_train.select(range(10000)),
    eval_dataset = dataset_eval,
    formatting_func = formatting_prompts_func,
    data_collator = collator,
    max_seq_length = max_seq_length,
)

trainer.train()

Map: 100%|██████████| 10000/10000 [00:01<00:00, 6948.87 examples/s]
Map: 100%|██████████| 805/805 [00:00<00:00, 4745.49 examples/s]
  3%|▎         | 502/15000 [00:43<19:28, 12.41it/s]

{'loss': 2.5646, 'learning_rate': 4.8333333333333334e-05, 'epoch': 0.1}


  7%|▋         | 1002/15000 [01:25<19:38, 11.88it/s]

{'loss': 2.5457, 'learning_rate': 4.666666666666667e-05, 'epoch': 0.2}


 10%|█         | 1502/15000 [02:08<19:04, 11.79it/s]

{'loss': 2.4884, 'learning_rate': 4.5e-05, 'epoch': 0.3}


 13%|█▎        | 2002/15000 [02:50<17:54, 12.10it/s]

{'loss': 2.46, 'learning_rate': 4.3333333333333334e-05, 'epoch': 0.4}


 17%|█▋        | 2501/15000 [03:38<19:40, 10.59it/s]

{'loss': 2.4181, 'learning_rate': 4.166666666666667e-05, 'epoch': 0.5}


 20%|██        | 3001/15000 [04:27<17:29, 11.43it/s]  

{'loss': 2.4617, 'learning_rate': 4e-05, 'epoch': 0.6}


 23%|██▎       | 3501/15000 [05:13<16:20, 11.72it/s]

{'loss': 2.4227, 'learning_rate': 3.8333333333333334e-05, 'epoch': 0.7}


 27%|██▋       | 4001/15000 [05:58<15:57, 11.49it/s]

{'loss': 2.3986, 'learning_rate': 3.6666666666666666e-05, 'epoch': 0.8}


 30%|███       | 4501/15000 [06:42<18:30,  9.45it/s]

{'loss': 2.3853, 'learning_rate': 3.5e-05, 'epoch': 0.9}


 33%|███▎      | 5000/15000 [07:33<14:29, 11.49it/s]

{'loss': 2.4144, 'learning_rate': 3.3333333333333335e-05, 'epoch': 1.0}



 33%|███▎      | 5000/15000 [07:44<14:29, 11.49it/s]

{'eval_loss': 2.275275945663452, 'eval_runtime': 10.3966, 'eval_samples_per_second': 77.429, 'eval_steps_per_second': 38.763, 'epoch': 1.0}


 37%|███▋      | 5500/15000 [08:39<15:51,  9.99it/s]  

{'loss': 2.0803, 'learning_rate': 3.1666666666666666e-05, 'epoch': 1.1}


 40%|████      | 6002/15000 [09:31<17:16,  8.68it/s]

{'loss': 2.1074, 'learning_rate': 3e-05, 'epoch': 1.2}


 43%|████▎     | 6501/15000 [10:22<13:49, 10.25it/s]

{'loss': 2.0902, 'learning_rate': 2.8333333333333335e-05, 'epoch': 1.3}


 47%|████▋     | 7000/15000 [11:14<12:55, 10.32it/s]

{'loss': 2.0769, 'learning_rate': 2.6666666666666667e-05, 'epoch': 1.4}


 50%|█████     | 7502/15000 [12:04<12:34,  9.94it/s]

{'loss': 2.125, 'learning_rate': 2.5e-05, 'epoch': 1.5}


 53%|█████▎    | 8002/15000 [12:48<10:02, 11.61it/s]

{'loss': 2.0503, 'learning_rate': 2.3333333333333336e-05, 'epoch': 1.6}


 57%|█████▋    | 8502/15000 [13:32<09:24, 11.52it/s]

{'loss': 2.0957, 'learning_rate': 2.1666666666666667e-05, 'epoch': 1.7}


 60%|██████    | 9002/15000 [14:16<08:43, 11.45it/s]

{'loss': 2.0874, 'learning_rate': 2e-05, 'epoch': 1.8}


 63%|██████▎   | 9502/15000 [14:58<07:47, 11.75it/s]

{'loss': 2.0937, 'learning_rate': 1.8333333333333333e-05, 'epoch': 1.9}


 67%|██████▋   | 10000/15000 [15:41<06:48, 12.25it/s]

{'loss': 2.0955, 'learning_rate': 1.6666666666666667e-05, 'epoch': 2.0}



 67%|██████▋   | 10000/15000 [15:52<06:48, 12.25it/s]

{'eval_loss': 2.2738211154937744, 'eval_runtime': 10.3595, 'eval_samples_per_second': 77.707, 'eval_steps_per_second': 38.902, 'epoch': 2.0}


 70%|███████   | 10502/15000 [16:37<06:04, 12.36it/s]  

{'loss': 1.9015, 'learning_rate': 1.5e-05, 'epoch': 2.1}


 73%|███████▎  | 11001/15000 [17:25<05:32, 12.04it/s]

{'loss': 1.8978, 'learning_rate': 1.3333333333333333e-05, 'epoch': 2.2}


 77%|███████▋  | 11501/15000 [18:12<05:22, 10.85it/s]

{'loss': 1.883, 'learning_rate': 1.1666666666666668e-05, 'epoch': 2.3}


 80%|████████  | 12001/15000 [19:01<04:55, 10.16it/s]

{'loss': 1.8697, 'learning_rate': 1e-05, 'epoch': 2.4}


 83%|████████▎ | 12501/15000 [19:50<04:15,  9.79it/s]

{'loss': 1.9237, 'learning_rate': 8.333333333333334e-06, 'epoch': 2.5}


 87%|████████▋ | 13000/15000 [20:41<03:15, 10.22it/s]

{'loss': 1.9256, 'learning_rate': 6.666666666666667e-06, 'epoch': 2.6}


 90%|█████████ | 13502/15000 [21:32<02:28, 10.06it/s]

{'loss': 1.8845, 'learning_rate': 5e-06, 'epoch': 2.7}


 93%|█████████▎| 14002/15000 [22:24<01:41,  9.84it/s]

{'loss': 1.8336, 'learning_rate': 3.3333333333333333e-06, 'epoch': 2.8}


 97%|█████████▋| 14502/15000 [23:16<00:51,  9.61it/s]

{'loss': 1.8837, 'learning_rate': 1.6666666666666667e-06, 'epoch': 2.9}


100%|██████████| 15000/15000 [24:07<00:00, 10.37it/s]

{'loss': 1.8783, 'learning_rate': 0.0, 'epoch': 3.0}


                                                     
100%|██████████| 15000/15000 [24:18<00:00, 10.37it/s]

{'eval_loss': 2.2890400886535645, 'eval_runtime': 11.6468, 'eval_samples_per_second': 69.118, 'eval_steps_per_second': 34.602, 'epoch': 3.0}


100%|██████████| 15000/15000 [24:23<00:00, 10.25it/s]

{'train_runtime': 1463.4588, 'train_samples_per_second': 20.499, 'train_steps_per_second': 10.25, 'train_loss': 2.1447730875651043, 'epoch': 3.0}





TrainOutput(global_step=15000, training_loss=2.1447730875651043, metrics={'train_runtime': 1463.4588, 'train_samples_per_second': 20.499, 'train_steps_per_second': 10.25, 'train_loss': 2.1447730875651043, 'epoch': 3.0})

In [26]:
trainer.save_model('model/instruction_tuning')

In [27]:
model_name_or_path = "model/instruction_tuning"

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')

In [28]:
from transformers import pipeline

text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=500
)

In [32]:
text_generator("What should I eat today?")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'What should I eat today? Eat any fruits and vegetables I find popular and healthy? Yes, I eat plenty of fruits and vegetables. No matter how much I eat, I usually take a tablespoon of sugar that’s a part of the fiber they used to achieve their goal. I usually eat a variety of fruits and vegetables, vitamins and minerals, and other essential nutrients like potassium and calcium, phosphorus, and potassium. Plus, I usually eat plenty of fruits and vegetables every day. However, it’s important to keep in mind that getting regular exercise is important, so eating healthy meals throughout your day is something you should never miss! Exercise also helps to build confidence and self-esteem after weight loss. Lastly, it’s important to keep a balanced diet throughout the day. Have a cup of coffee or light snacks when needed! Enjoy! #Baking #Sleep #Vegan #Fruits #Vegan #Fruits #Vegan #Bananas #Bananas #Plump #Wash #MakesItFrog #Vegan #Vegan #Organic #Vegan #Vegan #ReducingGre