# Exploring Chat Templates with SmolLM2

This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses.

In [3]:
# Install the requirements in Google Colab
!pip install transformers datasets trl huggingface_hub

# Authenticate to Hugging Face
from huggingface_hub import login

login()

# for convenience you can create an environment variable containing your hub token as HF_TOKEN

Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting trl
  Downloading trl-0.12.2-py3-none-any.whl.metadata (11 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading trl-0.12.2-py3-none-any.whl (365 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m365.7/365.7 kB[0m [31m13.0 MB/s[0m

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
# Import necessary libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import setup_chat_format
import torch

## SmolLM2 Chat Template

Let's explore how to use a chat template with the `SmolLM2` model. We'll define a simple conversation and apply the chat template.

In [5]:
# Dynamically set the device
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() else "cpu"
)

model_name = "HuggingFaceTB/SmolLM2-135M"
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_name
).to(device)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=model_name)
model, tokenizer = setup_chat_format(model=model, tokenizer=tokenizer)

config.json:   0%|          | 0.00/704 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.66k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/801k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/831 [00:00<?, ?B/s]

In [None]:
# Define messages for SmolLM2
messages = [
    {"role": "user", "content": "Hello, how are you?"},
    {
        "role": "assistant",
        "content": "I'm doing well, thank you! How can I assist you today?",
    },
]

# Apply chat template without tokenization

The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.


In [None]:
input_text = tokenizer.apply_chat_template(messages, tokenize=False)

print("Conversation with template: ", input_text)

Conversation with template:  <|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>



# Decode the conversation

Note that the conversation is represented as above but with a further assistant message.


In [None]:
input_text = tokenizer.apply_chat_template(
    messages, tokenize=True,
)

print("Conversation decoded:", tokenizer.decode(token_ids=input_text),sep="\n")
input_text = tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True
)

print("Conversation decoded with assistant message:", tokenizer.decode(token_ids=input_text),sep="\n")

Conversation decoded:
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>

Conversation decoded with assistant message:
<|im_start|>user
Hello, how are you?<|im_end|>
<|im_start|>assistant
I'm doing well, thank you! How can I assist you today?<|im_end|>
<|im_start|>assistant



# Tokenize the conversation

Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.



In [None]:
input_text = tokenizer.apply_chat_template(messages)

print("Conversation tokenized:", input_text)

input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

print("Conversation tokenized:", input_text)

Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198]
Conversation tokenized: [1, 4093, 198, 19556, 28, 638, 359, 346, 47, 2, 198, 1, 520, 9531, 198, 57, 5248, 2567, 876, 28, 9984, 346, 17, 1073, 416, 339, 4237, 346, 1834, 47, 2, 198, 1, 520, 9531, 198]


<div style='background-color: lightblue; padding: 10px; border-radius: 5px; margin-bottom: 20px; color:black'>
    <h2 style='margin: 0;color:blue'>Exercise: Process a dataset for SFT</h2>
    <p>Take a dataset from the Hugging Face hub and process it for SFT. </p>
    <p><b>Difficulty Levels</b></p>
    <p>🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format.</p>
    <p>🐕 Convert the `openai/gsm8k` dataset into chatml format.</p>
</div>

In [None]:
from IPython.core.display import display, HTML

display(
    HTML(
        """<iframe
  src="https://huggingface.co/datasets/HuggingFaceTB/smoltalk/embed/viewer/all/train?row=0"
  frameborder="0"
  width="100%"
  height="360px"
></iframe>
"""
    )
)

In [6]:
from datasets import load_dataset
ds = load_dataset("HuggingFaceTB/smoltalk", "everyday-conversations")


def process_dataset(sample):
    # TODO: 🐢 Convert the sample into a chat format
    # use the tokenizer's method to apply the chat template
    print(dir(sample))
    for key in sample.keys():
        print(key)
    messages = sample.get("messages")
    message_list = []
    for message in messages:
        print(message)
        message_list.append(message)
    result = tokenizer.apply_chat_template(message_list)
    return {"result":result}


ds = ds.map(process_dataset)

README.md:   0%|          | 0.00/9.25k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/946k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/52.6k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2260 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/119 [00:00<?, ? examples/s]

Map:   0%|          | 0/2260 [00:00<?, ? examples/s]

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
{'content': "That sounds great. What's the best way to experience Japanese culture?", 'role': 'user'}
{'content': 'Trying the local food is a big part of Japanese culture. You should try sushi, ramen, and tempura. Also, visit an onsen (hot spring) for a unique experience.', 'role': 'assistant'}
{'content': "Okay, I'll do that. Are there any special events or festivals I should know about?", 'role': 'user'}
{'content': 'Yes, the Cherry Blossom Festival (Hanami) is a famous event in Japan. It usually takes place in March and April. You can also experience the Golden Week, a week-long holiday in Japan with many festivals and events.', 'role': 'assistant'}
['_MutableMapping__marker', '__abstractmethods__', '__class__', '__class_getitem__', '__contains__', '__copy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__ini

Map:   0%|          | 0/119 [00:00<?, ? examples/s]

['_MutableMapping__marker', '__abstractmethods__', '__class__', '__class_getitem__', '__contains__', '__copy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__ior__', '__iter__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__or__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__ror__', '__setattr__', '__setitem__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__weakref__', '_abc_impl', '_format_all', 'clear', 'copy', 'data', 'format', 'formatter', 'fromkeys', 'get', 'items', 'keys', 'keys_to_format', 'pa_table', 'pop', 'popitem', 'setdefault', 'update', 'values']
full_topic
messages
{'content': 'Hey!', 'role': 'user'}
{'content': 'Hello! How can I help you today?', 'role': 'assistant'}
{'content': "I'm planning a trip to Paris. What are some popular tourist attractions?", 'role': 'user'}
{'co

In [None]:
display(
    HTML(
        """<iframe
  src="https://huggingface.co/datasets/openai/gsm8k/embed/viewer/main/train"
  frameborder="0"
  width="100%"
  height="360px"
></iframe>
"""
    )
)

In [11]:
ds = load_dataset("openai/gsm8k", "main")


def process_dataset(sample):
    # TODO: 🐕 Convert the sample into a chat format

    # 1. create a message format with the role and content

    # 2. apply the chat template to the samples using the tokenizer's method
    message_list = [{"content":sample["question"],"role":"question"},{"content":sample["answer"],"role":"answer"}]
    print(message_list)
    result = tokenizer.apply_chat_template(message_list)
    print(result)
    return {"result":result}


ds = ds.map(process_dataset)

Map:   0%|          | 0/7473 [00:00<?, ? examples/s]

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
[{'content': 'James catches 3 kinds of fish.  He catches 200 pounds of trout,  50% more pounds of salmon, and twice as much Tuna.  How many pounds of fish did he catch?', 'role': 'question'}, {'content': 'He caught 200*.55=100 pounds more salmon than trout\nSo he caught 200+100=<<200+100=300>>300 pounds of salmon\nHe caught 300*2=<<300*2=600>>600 pounds of tuna\nSo in total, he caught 600+300+200=<<600+300+200=1100>>1100 pounds of fish\n#### 1100', 'role': 'answer'}]
[1, 19945, 198, 20822, 28035, 216, 35, 5479, 282, 2698, 30, 216, 909, 28035, 216, 34, 32, 32, 8473, 282, 23008, 28, 256, 37, 32, 21, 540, 8473, 282, 11701, 28, 284, 6757, 347, 1083, 312, 9106, 30, 216, 1073, 800, 8473, 282, 2698, 1250, 384, 6063, 47, 2, 198, 1, 11247, 198, 3681, 8041, 216, 34, 32, 32, 13773, 37, 37, 45, 33, 32, 32, 8473, 540, 11701, 670, 23008, 198, 2931, 384, 8041, 216, 34, 32, 32, 27, 33, 32, 32, 45, 33691, 34, 32, 32, 27, 33, 32, 32, 45, 3

Map:   0%|          | 0/1319 [00:00<?, ? examples/s]

[{'content': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'role': 'question'}, {'content': 'Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer’s market.\n#### 18', 'role': 'answer'}]
[1, 19945, 198, 14247, 305, 417, 99, 26077, 2060, 216, 33, 38, 5246, 567, 1194, 30, 2306, 21910, 1296, 327, 13848, 897, 5738, 284, 278, 1154, 35114, 927, 327, 874, 2428, 897, 1194, 351, 1876, 30, 2306, 26064, 260, 17867, 418, 260, 5283, 23, 2342, 2956, 327, 1885, 34, 567, 3961, 21881, 3785, 30, 1073, 1083, 281, 8358, 1072, 1041, 919, 897, 1194, 418, 260, 5283, 23, 2342, 47, 2, 198, 1, 11247, 198, 14247, 305, 26064, 216, 33, 38, 731, 216, 35, 731, 216, 36, 446, 22646, 33, 38, 29, 35, 29, 36, 45, 41, 7791, 41, 21

## Conclusion

This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.

In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood.