# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [1]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformers==4.36.2

In [2]:
import transformers
transformers.__version__

  from .autonotebook import tqdm as notebook_tqdm


'4.36.2'

In [3]:
import trl
trl.__version__



'0.7.4'

In [4]:
import os
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cuda')

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer. 

In [5]:
# Step 1: Load the dataset
from datasets import load_dataset
train_ds = load_dataset('json', data_files='dataset/alpaca_data.json', split='train')
train_ds

Dataset({
    features: ['input', 'instruction', 'output'],
    num_rows: 52002
})

In [6]:
train_ds[20000]

{'input': '(A musical note)',
 'instruction': 'Name the given musical note.',
 'output': 'The musical note is an F sharp.'}

In [7]:
eval_ds = load_dataset("tatsu-lab/alpaca_eval", split='eval', trust_remote_code=True)
eval_ds = eval_ds.remove_columns(["generator", "dataset"])
eval_ds

Dataset({
    features: ['instruction', 'output'],
    num_rows: 805
})

In [8]:
eval_ds[200]

{'instruction': 'what are five important topics for game design',
 'output': '1. Storytelling\n2. Player Mechanics\n3. Art Direction\n4. Level Design\n5. User Interface Design'}

In [9]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name_or_path = "distilgpt2"

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    device_map = 'auto')

tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

tokenizer.pad_token = tokenizer.eos_token

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

1024

### Standard-Alpaca : Format your input prompts
For instruction fine-tuning, it is quite common to have two columns inside the dataset: one for the prompt & the other for the response.

This allows people to format examples like Stanford-Alpaca did as follows:

In [10]:
def formatting_prompts_func(examples):
	output_texts = []

	for i in range(len(examples['instruction'])):
		instruction = examples["instruction"][i]
		input_text = examples["input"][i] if 'input' in examples.keys() else ""
		response = examples["output"][i]
	
		if len(input_text) > 1:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input_text}

### Response:
{response}
""".strip()
			
		else:
			text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:
{response}
""".strip()

		output_texts.append(text)

	return output_texts

#check instruction-prompt
formatting_prompts_func(train_ds[:2])

['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for staying healthy.\n\n### Response:\n1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 'Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nWhat are the three primary colors?\n\n### Response:\nThe three primary colors are red, blue, and yellow.']

In [11]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM

response_template = "### Response:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

### Model Training

In [12]:
from transformers import TrainingArguments

save_path = './results/final'

training_args = TrainingArguments(
    output_dir = './results', #default = 'tmp_trainer'
    save_strategy = 'epoch',
    logging_strategy = 'epoch',
    evaluation_strategy = 'epoch',
    per_device_train_batch_size = 8,
    per_device_eval_batch_size = 8,
    num_train_epochs = 3, #default = 3
)

# Step 3: Define the Trainer
trainer = SFTTrainer(
    model,
    args = training_args,
    train_dataset = train_ds.select(range(10000)),
    eval_dataset = eval_ds,
    formatting_func = formatting_prompts_func,
    data_collator = collator,
    max_seq_length = max_seq_length,
)

trainer.train()
trainer.save_model(save_path)

  0%|          | 0/3750 [00:00<?, ?it/s]You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
 33%|███▎      | 1250/3750 [02:37<03:50, 10.83it/s] 

{'loss': 2.5394, 'learning_rate': 3.3333333333333335e-05, 'epoch': 1.0}


                                                   
 33%|███▎      | 1250/3750 [02:57<03:50, 10.83it/s]

{'eval_loss': 2.1940672397613525, 'eval_runtime': 19.9782, 'eval_samples_per_second': 40.294, 'eval_steps_per_second': 5.056, 'epoch': 1.0}


 67%|██████▋   | 2500/3750 [05:37<01:54, 10.87it/s]  

{'loss': 2.2647, 'learning_rate': 1.6666666666666667e-05, 'epoch': 2.0}


                                                   
 67%|██████▋   | 2500/3750 [05:57<01:54, 10.87it/s]

{'eval_loss': 2.1724660396575928, 'eval_runtime': 19.9961, 'eval_samples_per_second': 40.258, 'eval_steps_per_second': 5.051, 'epoch': 2.0}


100%|██████████| 3750/3750 [08:36<00:00, 10.34it/s]  

{'loss': 2.1411, 'learning_rate': 0.0, 'epoch': 3.0}


                                                   
100%|██████████| 3750/3750 [08:56<00:00, 10.34it/s]

{'eval_loss': 2.168952703475952, 'eval_runtime': 19.9561, 'eval_samples_per_second': 40.338, 'eval_steps_per_second': 5.061, 'epoch': 3.0}


100%|██████████| 3750/3750 [08:57<00:00,  6.98it/s]


{'train_runtime': 537.7753, 'train_samples_per_second': 55.785, 'train_steps_per_second': 6.973, 'train_loss': 2.3150337890625, 'epoch': 3.0}


### Inference

In [13]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

save_path = './results/final'

model = AutoModelForCausalLM.from_pretrained(
    save_path,
    device_map = 'auto')

tokenizer = AutoTokenizer.from_pretrained(
    save_path)

text_generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map = 'auto',
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens = 100,
    temperature = 1.0
)

In [14]:
def instruction_prompt(sample):
	
	if 'input' in sample.keys():
		return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Input:
{sample['input']}

### Response:
""".strip()
			
	else:
		return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Response:
""".strip()

In [15]:
import warnings
warnings.filterwarnings("ignore")

def compare_responses(pipeline, sample):
    print(f"Instruction:\n{sample['instruction']}\n")

    if 'input' in sample.keys():
        if len(sample['input']) > 1:
            print(f"Input:\n{sample['input']}\n")

    print(f"Gold Response:\n{sample['output']}\n")

    output = pipeline(instruction_prompt(sample))
    response = output[0]['generated_text'].split("### Response:\n")[-1]

    print(f"Generated Response:\n{response}\n")

In [17]:
compare_responses(text_generator, eval_ds[10])

Instruction:
do you think retinoid is effective on removing the acne? because I have a lot of it

Gold Response:
Yes, retinoids are effective in treating acne. They work by increasing cell turnover, which helps to reduce the appearance of existing acne and prevent new outbreaks. Retinoids also help to unclog pores, which in turn reduces the amount of bacteria that can cause infections. In general, retinoids help to reduce inflammation and oil production, making them a great option for those with acne.

Generated Response:
Yes, retinoid can help reduce acne by eliminating symptoms like bloating. Just use products like soap and makeup to help with aging. Additionally, it can boost your risk of getting acne, leading to acne. Finally, it has great antioxidant properties and anti-aging, so it's absolutely worthwhile to try retinoid before you make your next acne!
I also recommend the antihydrogen gel Retinoid in the form of moisturizers. I love it and love how it keeps

