# Instruction Tuning
This module will guide you through instruction tuning language models. Instruction tuning involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks.

In this module, we will explore two topics: 1) Chat Templates and 2) SFT

## Pre-install

In [1]:
# !pip install transformers
# !pip install trl
# !pip install accelerate
# !pip install datasets

In [2]:
import os
import torch
# Set GPU device
os.environ["CUDA_VISIBLE_DEVICES"] = "3"
#uncomment this if you are not using our department puffer
os.environ['http_proxy']  = 'http://192.41.170.23:3128'
os.environ['https_proxy'] = 'http://192.41.170.23:3128'

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer, setup_chat_format

#make our work comparable if restarted the kernel
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

cuda


In [4]:
torch.cuda.get_device_name(0)

'NVIDIA GeForce RTX 2080 Ti'

In [5]:
torch.__version__

'2.5.1+cu124'

## 1. Chat Templates
Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. For more detailed information, refer to the Chat Templates section.

**Base Models vs Instruct Models**

A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template.

**Understanding Chat Templates**
At their core, chat templates define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template:

```sh
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
```

The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation:

In [6]:
messages = [
    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
    {"role": "user", "content": "Can you explain what a chat template is?"},
    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."}
]

Let's break down the above example, and see how it maps to the chat template format.

### 1.1 System Messages

System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example:

In [7]:
system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}

### 1.2 Conversations

Chat templates maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations:

In [8]:
conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]

In [9]:
## Implementation with Transformers
# The transformers library provides built-in support for chat templates. Here's how to use them:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to sort a list"},
]

In [10]:
## Apply chat template without tokenization
# The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print("Conversation with template:\n", input_text)

Conversation with template:
 <|im_start|>system
You are a helpful coding assistant.<|im_end|>
<|im_start|>user
Write a Python function to sort a list<|im_end|>
<|im_start|>assistant



In [11]:
## Tokenize the conversation
# Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True
)

print("Conversation with template:\n", input_text)

Conversation with template:
 [1, 9690, 198, 2683, 359, 253, 5356, 8598, 11173, 30, 2, 198, 1, 4093, 198, 19161, 253, 5905, 1517, 288, 4440, 253, 1398, 2, 198, 1, 520, 9531, 198]


In [12]:
##Decode the conversation
# Note that the conversation is represented as above but with a further assistant message.
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True
)

print("Conversation decoded:\n", tokenizer.decode(token_ids=input_text))

Conversation decoded:
 <|im_start|>system
You are a helpful coding assistant.<|im_end|>
<|im_start|>user
Write a Python function to sort a list<|im_end|>
<|im_start|>assistant



In [13]:
## Custom Formatting
# You can customize how different message types are formatted. For example, adding special tokens or formatting for different roles:
template = """
<|system|>{system_message}
<|user|>{user_message}
<|assistant|>{assistant_message}
""".lstrip()

In [14]:
## Multi-Turn Support
# Templates can handle complex multi-turn conversations while maintaining context:

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What is calculus?"},
    {"role": "assistant", "content": "Calculus is a branch of mathematics..."},
    {"role": "user", "content": "Can you give me an example?"},
]

In [15]:
## Apply chat template without tokenization
# The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

print("Conversation with template:", input_text)

Conversation with template: <|im_start|>system
You are a math tutor.<|im_end|>
<|im_start|>user
What is calculus?<|im_end|>
<|im_start|>assistant
Calculus is a branch of mathematics...<|im_end|>
<|im_start|>user
Can you give me an example?<|im_end|>
<|im_start|>assistant



## 2. Supervised Fine-Tuning
Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see the Supervised Fine-Tuning page.

**Understanding Supervised Fine-Tuning**

At its core, supervised fine-tuning is about teaching a pre-trained model to perform specific tasks through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case.

SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs.

**When to Use Supervised Fine-Tuning**

The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains.

For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. Similarly, in medical or legal applications, accuracy and adherence to domain-specific terminology becomes crucial. In these cases, SFT can help align the model's responses with professional standards and domain expertise.

**The Fine-Tuning Process**

The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. 

First, you'll need to prepare or select a dataset that represents your target task. This dataset should include diverse examples that cover the range of scenarios your model will encounter. The quality of this data is important - each example should demonstrate the kind of output you want your model to produce. Next comes the actual fine-tuning phase, where you'll use frameworks like Hugging Face's `transformers` and `trl` to train the model on your dataset. 

**The Role of SFT in Preference Alignment**
SFT plays a fundamental role in aligning language models with human preferences. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) rely on SFT to form a base level of task understanding before further aligning the model’s responses with desired outcomes. Pre-trained models, despite their general language proficiency, may not always generate outputs that match human preferences. SFT bridges this gap by introducing domain-specific data and guidance, which improves the model’s ability to generate responses that align more closely with human expectations.

**Supervised Fine-Tuning With Transformer Reinforcement Learning**
A key software package for Supervised Fine-Tuning is Transformer Reinforcement Learning (TRL). TRL is a toolkit used to train transformer language models using reinforcement learning (RL).

Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). We will use TRL in a number of modules throughout this repo.

## 3. Supervised Fine-Tuning with SFTTrainer

### 3.1 Load Model and Tokenization 

In [16]:
# Step 1 : Load the model and tokenizer
model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_name)
model = model.to(device)

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)

# Set up the chat format
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

# Set our name for the finetune to be saved &/ uploaded to
finetune_name = "SmolLM2-FT-MyDataset"
finetune_tags = ["smol-course", "module_1"]

### 3.2 Dataset Preparation
We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.

TRL will format input messages based on the model's chat templates. They need to be represented as a list of dictionaries with the keys: role and content,.

In [17]:
# Step 2 : Load a sample dataset
from datasets import load_dataset

# TODO: define your dataset and config using the path and name parameters
dataset = load_dataset(path="HuggingFaceTB/smoltalk", name="everyday-conversations")
dataset

# TODO: 🦁 If your dataset is not in a format that TRL can convert to the chat template, you will need to process it. 

DatasetDict({
    train: Dataset({
        features: ['full_topic', 'messages'],
        num_rows: 2260
    })
    test: Dataset({
        features: ['full_topic', 'messages'],
        num_rows: 119
    })
})

In [19]:
dataset['train']['messages'][0]

[{'content': 'Hi there', 'role': 'user'},
 {'content': 'Hello! How can I help you today?', 'role': 'assistant'},
 {'content': "I'm looking for a beach resort for my next vacation. Can you recommend some popular ones?",
  'role': 'user'},
 {'content': "Some popular beach resorts include Maui in Hawaii, the Maldives, and the Bahamas. They're known for their beautiful beaches and crystal-clear waters.",
  'role': 'assistant'},
 {'content': 'That sounds great. Are there any resorts in the Caribbean that are good for families?',
  'role': 'user'},
 {'content': 'Yes, the Turks and Caicos Islands and Barbados are excellent choices for family-friendly resorts in the Caribbean. They offer a range of activities and amenities suitable for all ages.',
  'role': 'assistant'},
 {'content': "Okay, I'll look into those. Thanks for the recommendations!",
  'role': 'user'},
 {'content': "You're welcome. I hope you find the perfect resort for your vacation.",
  'role': 'assistant'}]

### 3.3 Configuring the SFTTrainer
The SFTTrainer is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources.


In [22]:
# Step 3.1 : Set configure the SFTTrainer
sft_config = SFTConfig(
    output_dir="./sft_output",
    max_steps=1000,  # Adjust based on dataset size and desired training duration
    per_device_train_batch_size=4,  # Set according to your GPU memory capacity
    learning_rate=5e-5,  # Common starting point for fine-tuning
    logging_steps=10,  # Frequency of logging training metrics
    save_steps=200,  # Frequency of saving model checkpoints
    evaluation_strategy="steps",  # Evaluate the model at regular intervals
    eval_steps=50,  # Frequency of evaluation
    use_mps_device=(
        True if device == "mps" else False
    ),  # Use MPS for mixed precision training
    hub_model_id=finetune_name,  # Set a unique name for your model
)

# Step 3.2 : Initialize the SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
    eval_dataset=dataset["test"],
)

  trainer = SFTTrainer(


### 3.4 Training the Model
With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss.

In [23]:
# Step 4 : Train the model
trainer.train()

# Save the model
trainer.save_model(f"./{finetune_name}")

Step,Training Loss,Validation Loss
50,1.0657,1.158981
100,1.1116,1.124065
150,1.0624,1.095486
200,1.0482,1.0797
250,1.0412,1.070457
300,1.0292,1.061473
350,1.0034,1.054751
400,1.0065,1.050794
450,1.0211,1.042636
500,1.0763,1.033726


### Test the fine-tuned model on the same prompt

In [59]:
# Load the model and tokenizer
# model_name = "HuggingFaceTB/SmolLM2-135M"
model_name = "./sft_output/checkpoint-1000"

model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_name)
model = model.to(device)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)

In [67]:
# Let's test the base model before training
prompt = "How are you?"

# Format with template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)
# Generate response
inputs = tokenizer(formatted_prompt, return_tensors="pt", max_length=256, truncation=True).to(device)
# TODO: use the fine-tuned to model generate a response, just like with the base example.

In [68]:
outputs = model.generate(inputs['input_ids'])
print(tokenizer.decode(outputs[0]))

<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
Hello! How can I help you today? I'm a teacher and I'm


In [33]:
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")

Memory footprint: 538.06 MB
