# Unit 1: 3 - Chat Templates SmolLM2

**Collaborators**:
* Roberto Rodriguez ([@Cyb3rWard0g](https://x.com/Cyb3rWard0g))

## SmolLM2 Chat Templates

[SmolLM2](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B-Instruct) uses structured chat templates to format user-assistant interactions. Unlike single-turn prompts, chat-based models require proper formatting to maintain conversation history. These templates ensure that every model, despite having unique special tokens, receives properly structured input.

To explore chat templates, we first need to load the SmolLM2 model and tokenizer.

### Install Required Libraries

In [None]:
# !pip install transformers torch

### Loading SmolLM2 Efficiently

To avoid downloading the model every time (**~3.42 GB**), we first check if it exists locally before loading:

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import os

MODEL_NAME = "HuggingFaceTB/SmolLM2-1.7B-Instruct"
MODEL_DIR = "data/smollm2"

def load_model():
    if os.path.exists(MODEL_DIR):
        print("Loading model from local directory.")
        model = AutoModelForCausalLM.from_pretrained(MODEL_DIR)
    else:
        print("Downloading model...")
        model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
        model.save_pretrained(MODEL_DIR)
    return model

device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = load_model().to(device)

Loading model from local directory.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Understanding Chat Templates
Chat templates are responsible for formatting conversational exchanges before they are passed to the model. This ensures the model can differentiate between system instructions, user messages, and assistant responses.

### Inspecting SmolLM2 Chat Template

In [2]:
# View the chat template format used by SmolLM2
template = tokenizer.chat_template
print("Chat Template Format:")
print(template)

Chat Template Format:
{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}


### Using the Chat Template in a Conversation

In [3]:
messages = [
    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
    {"role": "user", "content": "Can you explain what a chat template is?"},
    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},
    {"role": "user", "content": "How do I use it?"},
]

In [4]:
# Convert messages into a properly formatted prompt
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print("Formatted Prompt:")
print(formatted_prompt)

Formatted Prompt:
<|im_start|>system
You are a helpful assistant focused on technical topics.<|im_end|>
<|im_start|>user
Can you explain what a chat template is?<|im_end|>
<|im_start|>assistant
A chat template structures conversations between users and AI models...<|im_end|>
<|im_start|>user
How do I use it?<|im_end|>
<|im_start|>assistant



### Sending the Formatted Prompt to the Model


In [5]:
# Encode raw text input with attention mask
encoded_input = tokenizer(formatted_prompt, return_tensors="pt").to(device)
input_ids = encoded_input["input_ids"]
attention_mask = encoded_input["attention_mask"]
count_prompt_tokens = input_ids.shape[1]  # Save prompt length

In [6]:
# Generate response
outputs = model.generate(
    input_ids, 
    attention_mask=attention_mask, # Avoids padding/EOS confusion
    max_new_tokens=50, 
    eos_token_id=tokenizer.eos_token_id  # Ensures stopping when EOS is reached
)

# Extract only assistant-generated tokens
generated_tokens = outputs[0, count_prompt_tokens:]

# Decode assistant response
output = tokenizer.decode(generated_tokens, skip_special_tokens=True)
print("Assistant Response:", output)

Assistant Response: To use a chat template, you would typically input a message into the chat interface, and the template would generate a response based on the predefined structure. The template would usually include placeholders for user input, such as "What's your name?" or


## Base Models vs. Instruct Models
- **Base Models**: Trained on raw text data and predict the next token.
- **Instruct Models**: Fine-tuned to follow structured prompts (chat templates).

## Conclusion
- Chat templates are essential for structuring conversations.
- They ensure SmolLM2 understands multi-turn interactions properly.
- Using `apply_chat_template()` simplifies formatting user-agent exchanges.
- Base models require manual structuring, whereas instruct models automatically handle formatted conversations.