# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [17]:
!pip3 install peft==0.7.1
!pip3 install trl==0.7.4
!pip3 install transformer==4.36.2

[31mERROR: Could not find a version that satisfies the requirement transformer==4.36.2 (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for transformer==4.36.2[0m[31m
[0m

In [18]:
import transformers
transformers.__version__

'4.38.2'

In [19]:
import trl
trl.__version__

'0.7.4'

In [20]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

#os.environ['http_proxy']  = 'http://192.41.170.23:3128'
#os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

## Basic SFT

In [21]:
# Step 1: Load the dataset
from datasets import Dataset

dataset_path = '/content/alpaca_data.json'
# Create a Dataset object
dataset = Dataset.from_json(dataset_path)
dataset

Dataset({
    features: ['input', 'instruction', 'output'],
    num_rows: 52002
})

In [22]:
dataset[0]

{'input': '',
 'instruction': 'Give three tips for staying healthy.',
 'output': '1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.'}

In [23]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    device_map = 'auto'
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

1024

In [24]:
# Step 3: Define the Trainer
from transformers import TrainingArguments
from trl import SFTTrainer
training_args = TrainingArguments(
    output_dir = 'tmp_trainer', #default = 'tmp_trainer'
    num_train_epochs=5, #default = 3
)

trainer = SFTTrainer(
    model = model,
    args = training_args,
    train_dataset = dataset.select(range(1000)),
    dataset_text_field = "instruction",
    max_seq_length = max_seq_length,
)

## Instruction-Tuning
Train on completions only
- Use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only.
- Note that this works only in the case when packing=False.
- To instantiate that collator for instruction data, pass a response template and the tokenizer.

In [25]:
# Step 1: Load the dataset
from datasets import Dataset

dataset_path = '/content/alpaca_data.json'
# Create a Dataset object
dataset = Dataset.from_json(dataset_path)
dataset

Dataset({
    features: ['input', 'instruction', 'output'],
    num_rows: 52002
})

In [26]:
dataset[20000]

{'input': '(A musical note)',
 'instruction': 'Name the given musical note.',
 'output': 'The musical note is an F sharp.'}

In [27]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

In [28]:
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Question: {example['instruction'][i]}\n ### Answer: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

#check instruction-prompt
formatting_prompts_func(dataset[:2])

['### Question: Give three tips for staying healthy.\n ### Answer: 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule.',
 '### Question: What are the three primary colors?\n ### Answer: The three primary colors are red, blue, and yellow.']

In [29]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

### Evaluation

In [30]:
# Step 1: Load the dataset
from datasets import load_dataset
eval_dataset = load_dataset("tatsu-lab/alpaca_eval", split='eval')
eval_dataset

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/8.10k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/30.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/621k [00:00<?, ?B/s]

/root/.cache/huggingface/datasets/downloads/07bde58ae497102ab81d326d84eafcf6c2c7e8df8cd8b8d0ef64d9eceab41ada


Generating eval split: 0 examples [00:00, ? examples/s]

Dataset({
    features: ['instruction', 'output', 'generator', 'dataset'],
    num_rows: 805
})

In [None]:
# Step 3: Define the Trainer
trainer = SFTTrainer(
    model,
    train_dataset=dataset.select(range(1000)),
    eval_dataset=eval_dataset.select(range(50)),
    formatting_func=formatting_prompts_func,
    data_collator=collator,
)

trainer.train()

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Step,Training Loss


In [None]:
trainer.evaluate()

### Testing

In [None]:
# Define input text
input_text = "How can I improve my coding skills?"

# Generate output directly with the input text
output = model.generate(
tokenizer.encode(input_text, return_tensors="pt").to(device),
max_length=300, # Increased max_length for longer output
num_beams=8, # Increased num_beams for more diverse outputs
no_repeat_ngram_size=3, # Increased no_repeat_ngram_size for better diversity
top_k=50,
top_p=0.95,
temperature=0.7
)

# Decode and print the generated text
print("Generated text:\n", tokenizer.decode(output[0], skip_special_tokens=True))


In [None]:
import pickle

save_path = 'model.pkl'
pickle.dump(model, open(save_path, 'wb'))