# HuggingFace Supervised Fine-tuning Trainer (SFT)
## Full Fine-tuning

https://huggingface.co/docs/trl/en/sft_trainer

## TinyLlamma
https://arxiv.org/pdf/2401.02385
https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v0.1
https://huggingface.co/facebook/opt-350m
https://huggingface.co/facebook/MobileLLM-125M

## Example scripts
https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py

## Inspired by
https://colab.research.google.com/github/huggingface/smol-course/blob/main/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb


In [1]:
# ! pip install wandb

In [2]:
# Import necessary libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer, setup_chat_format
import torch

In [3]:
import os

# Select the base model
model_name = "HuggingFaceTB/SmolLM2-135M"

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v0.1"
model_name = "facebook/opt-350m"

# Requires code to be executed for loading the model
# model_name = "facebook/MobileLLM-125M"

os.environ["WANDB_PROJECT"] = "fb-opt-350-ft"
os.environ["WANDB_DIR"] = "./temp"
os.environ["WANDB_JOB_NAME"] = "some-job-name"

## 1. Prepare the dataset

**Dataset format support**

https://huggingface.co/docs/trl/en/sft_trainer#dataset-format-support

In [5]:
# Load a sample dataset
from datasets import load_dataset

dataset_name = "HuggingFaceTB/smoltalk"
dataset_split = "everyday-conversations"

ds = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")



## 2. Load the model to appropriate available device (CPU/GPU)

In [4]:
# Check the machine in use and set the device to use for training
# cuda = GPU, mps = Metal Performance Shaders on macOS or Apple GPU, cpu otherwise
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() else "cpu"
)

# Print device info
print("Model loaded to: ", device)



# Load the pretrained model & move it to the specified device
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name
).to(device)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)

# Setup for the model specific chat format
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

Model loaded to:  cpu


## 3. Setup the training configuration

**SFTConfig**

https://huggingface.co/docs/trl/v0.12.2/en/sft_trainer#trl.SFTConfig

This object specifies hyperparameters and settings for the fine-tuning process. It’s tailored to supervised fine-tuning tasks, often used for adapting language models to specific tasks or datasets.

In [6]:
from datetime import datetime

# Get the current timestamp
current_time = datetime.now()

# Create a readable timestamp
formatted_time = current_time.strftime("%b-%d-%Y-%H-%M-%S")

# Create a name for the run
wandb_run_name = f"FT_run_{formatted_time}"

# Adjust the model
fine_tuned_model_name = f"fine-tuned-chat-model"

# Model assets output folder
model_output_folder = "c:/temp/sft_output"

# SFTrainer configuration
sft_config = SFTConfig(

    # Output directory for model assets
    output_dir = model_output_folder,  

    # Hyperparameter : Controls maximum number of steps to be executed
    # Maximum number of gradient update steps during training.
    max_steps=100,  

    # Common starting point for fine-tuning
    # The initial learning rate for the optimizer.
    learning_rate=5e-5,  

    # Set according to your GPU memory capacity
    # Number of training samples per device in each batch. Smaller values help fit large models into memory-constrained GPUs.
    per_device_train_batch_size=4,  

    # Frequency of logging training metrics
    # Logs metrics (e.g., loss) every 10 steps during training.
    logging_steps=10,  

    # Frequency of saving model checkpoints
    # Saves model checkpoints every 100 steps. In case of failure, loss or work will be limited to a maximum of 100 steps
    save_steps=100,  

    # Evaluate the model at regular intervals
    eval_strategy="steps",  

    # Frequency of evaluation
    # Run the model evaluation after every 50 steps
    eval_steps=50,  

    # Use MPS for mixed precision training
    use_mps_device=(
        True if device == "mps" else False
    ),  

    # Set a unique name for your model - used for HuggingFace hub
    hub_model_id=fine_tuned_model_name,  

    # Reporting
    report_to = "wandb",
    run_name = wandb_run_name,
)



## 3. Setup the Supervised Fine-tuning trainer

**SFTrainer**

https://huggingface.co/docs/trl/v0.12.2/en/sft_trainer#trl.SFTTrainer

**SFTrainer extends the transformers.Trainer class**

https://huggingface.co/docs/transformers/en/main_classes/trainer#api-reference%20][%20transformers.Trainer

In [7]:
# Initialize the SFTTrainer
trainer = SFTTrainer(

    # The language model being fine-tuned.
    model=model,

    # Passes the fine-tuning configuration defined above 
    args=sft_config,

    # Training dataset
    train_dataset=ds["train"],

    # Evaluation dataset
    eval_dataset=ds["test"],

    # Tokenizer used
    tokenizer=tokenizer,
    
)



Map:   0%|          | 0/2260 [00:00<?, ? examples/s]

Map:   0%|          | 0/119 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


## 4. Train the model

Wandb - configuration
https://docs.wandb.ai/guides/track/environment-variables/

import os
os.environ["WANDB_DISABLED"] = "True"

In [8]:
import os 

# Train the model
trainer.train()

# Save the model
trainer.save_model(f"./{fine_tuned_model_name}")

wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: acloudfan (raj-acloudfan). Use `wandb login --relogin` to force relogin


Step,Training Loss,Validation Loss
50,1.3067,1.389864
100,1.4179,1.327439


## 5. Upload to HF hub


In [10]:
import getpass

print("Provide the HUGGINGFACEHUB_API_TOKEN:")
HUGGINGFACEHUB_API_TOKEN=getpass.getpass()

trainer.push_to_hub(token=HUGGINGFACEHUB_API_TOKEN)


Provide the HUGGINGFACEHUB_API_TOKEN:


 ········


model.safetensors:   0%|          | 0.00/1.32G [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.56k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/acloudfan/fine-tuned-chat-model/commit/e0a163df1518126d0a5d8833a50ef84f45f541fd', commit_message='End of training', commit_description='', oid='e0a163df1518126d0a5d8833a50ef84f45f541fd', pr_url=None, repo_url=RepoUrl('https://huggingface.co/acloudfan/fine-tuned-chat-model', endpoint='https://huggingface.co', repo_type='model', repo_id='acloudfan/fine-tuned-chat-model'), pr_revision=None, pr_num=None)

## 6. Try out the model

In [16]:
from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="acloudfan/fine-tuned-chat-model") #, device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

I'd choose the future. I'd like to see the future, but I don't want to be stuck in the past. What if I had a time machine and could travel back in time? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person? Would I still be the same person?


In [17]:
!python --version

Python 3.12.4
