# [Supervised Fine-tuning Trainer](https://huggingface.co/docs/trl/sft_trainer)

Supervised fine-tuning (or SFT for short) is a crucial step in RLHF. In TRL we provide an easy-to-use API to create your SFT models and train them with few lines of code on your dataset.

[Python Script](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py)

In [None]:
# !pip3 install peft==0.7.1
# !pip3 install trl==0.7.4
# !pip3 install transformer==4.36.2

In [1]:
import transformers
transformers.__version__

  _torch_pytree._register_pytree_node(


'4.36.2'

In [2]:
import trl
trl.__version__

  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


'0.7.4'

In [3]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

#os.environ['http_proxy']  = 'http://192.41.170.23:3128'
#os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

device(type='cpu')

In [4]:
# Step 1: Load the dataset
from datasets import load_dataset
dataset = load_dataset('json', data_files="C:\\Users\\Haneesha\\OneDrive\\Desktop\\NLP\\A8\\alpaca_data.json", split="train")
dataset

Dataset({
    features: ['output', 'instruction', 'input'],
    num_rows: 52002
})

In [5]:
dataset[1]

{'output': 'The three primary colors are red, blue, and yellow.',
 'instruction': 'What are the three primary colors?',
 'input': ''}

In [6]:
alpaca_eval=load_dataset("tatsu-lab/alpaca_eval",split='eval')
alpaca_eval=alpaca_eval.remove_columns(["dataset","generator"])
alpaca_eval

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Dataset({
    features: ['instruction', 'output'],
    num_rows: 805
})

In [7]:
alpaca_eval[1]

{'instruction': 'How did US states get their names?',
 'output': 'US states get their names from a variety of sources, including Native American tribes, Spanish explorers, British colonists, and even presidents. For example, the state of Alabama was named after the Native American tribe that lived in the area, while the state of Florida gets its name from the Spanish explorer, Ponce de Leon, who explored the area in the 1500s. Other states are named after English kings (like Virginia, named after England\'s "Virgin Queen," Queen Elizabeth I) or presidents (like Washington, named after George Washington).'}

In [8]:
# Step 2: Load the model & Tokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name_or_path = "distilgpt2"
model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path, 
    device_map = 'auto'
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name_or_path)

# Make sure to pass a correct value for max_seq_length as the default value will be set to min(tokenizer.model_max_length, 1024).
max_seq_length = min(tokenizer.model_max_length, 1024)
max_seq_length

  _torch_pytree._register_pytree_node(


1024

In [9]:
def formatting_prompts_func(examples):
    output_texts = []


    if isinstance(examples, str):
        examples = json.loads(examples)


    if not isinstance(examples, list):
        examples = [examples]

    for i, example in enumerate(examples):
        instruction = example.get("instruction", "")
        input_text = example.get("input", "")
        response = example.get("output", "")

        text = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

""".format(instruction)

        if input_text:
            text += "### Input:\n{}\n\n".format(input_text)

        text += "### Response:\n{}".format(response)

        output_texts.append(text.strip())

    return output_texts

# Check instruction-prompt
formatting_prompts_func(dataset[:2])


["Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n['Give three tips for staying healthy.', 'What are the three primary colors?']\n\n### Input:\n['', '']\n\n### Response:\n['1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \\n2. Exercise regularly to keep your body active and strong. \\n3. Get enough sleep and maintain a consistent sleep schedule.', 'The three primary colors are red, blue, and yellow.']"]

In [10]:
# use the DataCollatorForCompletionOnlyLM to train your model on the generated prompts only
from trl import SFTTrainer, DataCollatorForCompletionOnlyLM
response_template = " ### Answer:"
collator = DataCollatorForCompletionOnlyLM(response_template, tokenizer=tokenizer)
collator

DataCollatorForCompletionOnlyLM(tokenizer=GPT2TokenizerFast(name_or_path='distilgpt2', vocab_size=50257, model_max_length=1024, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|endoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '<|endoftext|>'}, clean_up_tokenization_spaces=True),  added_tokens_decoder={
	50256: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
}, mlm=False, mlm_probability=0.15, pad_to_multiple_of=None, tf_experimental_compile=False, return_tensors='pt')

**Model Training**


In [11]:
# Step 3: Define the Trainer
from transformers import TrainingArguments
from trl import SFTTrainer
save_path='./trainer/trainer_model'
training_args = TrainingArguments(
    output_dir = './trainer', #default = 'tmp_trainer'
    save_strategy='epoch',
    evaluation_strategy='epoch',
    gradient_checkpointing=True,
    per_device_train_batch_size=3,
    per_device_eval_batch_size=3,
    
    num_train_epochs=5, #default = 3
)


In [12]:
from transformers import TrainingArguments

# Check if the tokenizer has a padding token defined
if tokenizer.pad_token_id is None:
    # If not, define a padding token and set it
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
    model.config.pad_token_id = tokenizer.pad_token_id

# Define training arguments
training_args = TrainingArguments(
    output_dir='./trainer',  
    save_strategy='epoch',
    evaluation_strategy='epoch',
    gradient_checkpointing=True,
    per_device_train_batch_size=3,
    per_device_eval_batch_size=3,
    num_train_epochs=2,  
)

# Define the Trainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset.select(range(1000)),
    eval_dataset=alpaca_eval,
    formatting_func=formatting_prompts_func,
    data_collator=collator,
    max_seq_length=max_seq_length,
)

# Train the model
trainer.train()

# Save the trained model
trainer.save_model(save_path)


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/805 [00:00<?, ? examples/s]

  0%|          | 0/2 [00:00<?, ?it/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.

### Instruction:
['Give three tips for staying healthy.', 'What are the three primary colors?', 'Describe the structure of an atom.', 'How can we reduce air pollution?', 'Describe a time when you had to make a difficult decision.', 'Identify the odd one out.', 'Explain why the following fraction is equivalent to 1/4', 'Write a short story in third person narration about a protagonist who has to make an important career decision.', 'Render a 3D model of a house', 'Evaluate this sentence for spelling and grammar mistakes', 'How did Julius Caesar die?', 'What is the capital of France?', 'Generate a list of ten items a person might need for a camping trip', 'Discuss the causes of the Great Depression', 'Classify the following into animals, plants, and minerals', 'Exp

  0%|          | 0/1 [00:00<?, ?it/s]

Checkpoint destination directory ./trainer\checkpoint-1 already exists and is non-empty.Saving will proceed but saved results may be invalid.


{'eval_loss': nan, 'eval_runtime': 1.1213, 'eval_samples_per_second': 0.892, 'eval_steps_per_second': 0.892, 'epoch': 1.0}



### Instruction:
['Give three tips for staying healthy.', 'What are the three primary colors?', 'Describe the structure of an atom.', 'How can we reduce air pollution?', 'Describe a time when you had to make a difficult decision.', 'Identify the odd one out.', 'Explain why the following fraction is equivalent to 1/4', 'Write a short story in third person narration about a protagonist who has to make an important career decision.', 'Render a 3D model of a house', 'Evaluate this sentence for spelling and grammar mistakes', 'How did Julius Caesar die?', 'What is the capital of France?', 'Generate a list of ten items a person might need for a camping trip', 'Discuss the causes of the Great Depression', 'Classify the following into animals, plants, and minerals', 'Explain the use of word embeddings in Natural Language Processing', 'Describe the function of a computer motherboard', 'Reverse engineer this code to create a new version', 'Propose an ethical solution to the problem of data priv

  0%|          | 0/1 [00:00<?, ?it/s]

Checkpoint destination directory ./trainer\checkpoint-2 already exists and is non-empty.Saving will proceed but saved results may be invalid.


{'eval_loss': nan, 'eval_runtime': 1.3994, 'eval_samples_per_second': 0.715, 'eval_steps_per_second': 0.715, 'epoch': 2.0}
{'train_runtime': 14.5434, 'train_samples_per_second': 0.138, 'train_steps_per_second': 0.138, 'train_loss': 0.0, 'epoch': 2.0}


In [14]:
from transformers import AutoModelForCausalLM, AutoTokenizer,TextGenerationPipeline
save_path='./trainer/trainer_model'
model = AutoModelForCausalLM.from_pretrained(save_path)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(save_path)

# Create the text generation pipeline
text_generator = TextGenerationPipeline(
    model=model,
    tokenizer=tokenizer,
    device=model.device,
    pad_token_id=tokenizer.eos_token_id,
    max_length=100,  
    temperature=1.0  
)


In [15]:
def format_instruction(sample):
    if 'input' in sample.keys():
        return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Input:
{sample['input']}

### Response:
""".strip()
    else:
        return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Response:
""".strip()

In [19]:
alpaca_eval[1]

{'instruction': 'How did US states get their names?',
 'output': 'US states get their names from a variety of sources, including Native American tribes, Spanish explorers, British colonists, and even presidents. For example, the state of Alabama was named after the Native American tribe that lived in the area, while the state of Florida gets its name from the Spanish explorer, Ponce de Leon, who explored the area in the 1500s. Other states are named after English kings (like Virginia, named after England\'s "Virgin Queen," Queen Elizabeth I) or presidents (like Washington, named after George Washington).'}

In [16]:
model = AutoModelForCausalLM.from_pretrained("distilgpt2", device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
tokenizer.pad_token = tokenizer.eos_token

In [20]:
import warnings
from transformers import pipeline

warnings.filterwarnings("ignore")

def compare_responses(pipeline, sample):
    print(f"Instruction:\n{sample['instruction']}\n")

    input_text = sample.get('input', '')
    if input_text:
        print(f"Input:\n{input_text}\n")

    print(f"Gold Response:\n{sample['output']}\n")

    response = pipeline(sample['instruction'])[0]['generated_text'].split("### Response:\n")[-1]

    print(f"Generated Response:\n{response}\n")

# Create a text generation pipeline
text_generator = pipeline("text-generation")

# Usage example:
compare_responses(text_generator, alpaca_eval[1])


No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Instruction:
How did US states get their names?

Gold Response:
US states get their names from a variety of sources, including Native American tribes, Spanish explorers, British colonists, and even presidents. For example, the state of Alabama was named after the Native American tribe that lived in the area, while the state of Florida gets its name from the Spanish explorer, Ponce de Leon, who explored the area in the 1500s. Other states are named after English kings (like Virginia, named after England's "Virgin Queen," Queen Elizabeth I) or presidents (like Washington, named after George Washington).

Generated Response:
How did US states get their names?

Numerous states have passed statutes designed to protect private information — including information that has been passed on by state. As a rule, the "public" information has to only be provided to officials of the

