# Chat templates

Chat template specifies how the conversation (messages) are converted into a single tokenizable string in format that model expects. Different models expect very different input formats for chat. Chat templates are part of the tokenizer, they are used to encode the conversation into model specific format. 

https://huggingface.co/docs/transformers/main/en/chat_templating#what-are-generation-prompts

**Pre-requisites**
1. You must have an understanding of **Tokenizers** : Covered in lessons under the section **Hugging Face Models : Advanced**
2. You must understand  the use of **Datasets** : Covered in lessons under the section **Datasets for Training, and Testing**

**Google Colab**
You MUST set the HuggingFace token otherwise you will get an authorization error

In [1]:
from transformers import AutoTokenizer
from IPython.display import JSON
from dotenv import load_dotenv
import os

# Load the file that contains the HuggingFace token
# CHANGE THE location
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

# Needed as we are using Gated models in this notebook
# Make sure to get access to the gated models on HF
os.environ['HF_TOKEN']=os.environ['HUGGINGFACEHUB_API_TOKEN']

# Try out model for chat template for it
model_ids = [
                "meta-llama/Llama-3.3-70B-Instruct",
                "HuggingFaceH4/zephyr-7b-beta",
                "mistralai/Mistral-7B-Instruct-v0.2",
                "google/gemma-2-2b-it",
                "google-bert/bert-base-uncased"
]

# Sample chat messages
sample_chat = [
  {"role": "system", "content": "You are a polite and helpful customer service agent"},
  {"role": "user", "content": "Hello, how are you?"},
  {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

## 1. Checkout special tokens

https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

**Note:**

Some of the models are **gated** and requires you to get access on HuggingFace

In [2]:
# change this to try out the model of your choice
test_model_index = 3

tokenizer = AutoTokenizer.from_pretrained(model_ids[test_model_index])

print("Model : ", model_ids[test_model_index])
print("Special tokens : \n", tokenizer.special_tokens_map)
# print("Additional special tokens : ", tokenizer.additional_special_tokens)
print("Padding side : ", tokenizer.padding_side)
print("Tuncation side : ", tokenizer.truncation_side)
# tokenizer

Model :  google-bert/bert-base-uncased
Special tokens : 
 {'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'}
Padding side :  right
Tuncation side :  right


## 2. Checkout chat template

https://huggingface.co/docs/transformers/v4.47.1/en/main_classes/tokenizer#transformers.PreTrainedTokenizerFast.apply_chat_template


**Note**

* Some models may not be compatible with ChatML
* Some parameters have no effect in case of *tokenize=False*

**Mistral:**  Tokenizer adds the control tokens [INST] and [/INST] to indicate the start and end of user messages (but not assistant messages!), and the entire chat is condensed into a single string. 

**Gemma:** Does not support  message with the *role=system*

**BERT:**  Does not support chat_template as it is NOT a chat model (but a base model)

In [3]:
# change this to try out the model of your choice
test_model_index = 0

# Whether to tokenize or not
tokenize_or_not = False

# By default tokenize returns an array of tokens
# By setting the return_dicts you can get a dictionary
return_dict_or_not = False

# Whether to pad or not - no effect if tokenize=False
pad_or_not = True

truncation_or_not = True

# max_length controls the length - if not specified model max_length is used
max_length = 1024

# Whether to add assistant marker at end or not
# Generation Prompt
add_generation_prompt_or_not = False

# If true, doesn't add the end marker - prompts the model to continue with generation of last message
# Cannot be true if add_generation_prompt=True
continue_final_message_or_not = False

tokenizer = AutoTokenizer.from_pretrained(model_ids[test_model_index])

print("Model : ", model_ids[test_model_index])
print("======================================")

# Some models may NOT support the ChatML format - for example gemma 
try:
    formatted_chat = tokenizer.apply_chat_template(sample_chat, 
                                                   tokenize = tokenize_or_not, 
                                                   max_length = max_length,
                                                   padding = pad_or_not,
                                                   truncation = truncation_or_not,
                                                   add_generation_prompt=add_generation_prompt_or_not,
                                                   continue_final_message=continue_final_message_or_not,
                                                   return_dict = return_dict_or_not)
    print(formatted_chat)
except Exception as e:
    print(e)



Model :  meta-llama/Llama-3.3-70B-Instruct
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are a polite and helpful customer service agent<|eot_id|><|start_header_id|>user<|end_header_id|>

Hello, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm doing great. How can I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|>

I'd like to show off how chat templating works!<|eot_id|>


## 3. Prepare the training dataset for Chat

* Create a test dataset from a dictionary
* Add a new column to dataset for the *model specific formatted chat text*
* Fine-tuning is carried out using the chats in *formatted chat column* in the dataset

In [None]:
from datasets import Dataset

chat1 = [
    {"role": "user", "content": "Which is bigger, the moon or the sun?"},
    {"role": "assistant", "content": "The sun."}
]

chat2 = [
    {"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
    {"role": "assistant", "content": "A bacterium."}
]

test_model_index = 1

tokenizer = AutoTokenizer.from_pretrained(model_ids[test_model_index])

# Create a dataset from dictionary
chat_dataset = Dataset.from_dict({"chat": [chat1, chat2]})

JSON(chat_dataset[0])

In [None]:
# Use map to create a formattet chat column in the dataset
chat_dataset_formatted = chat_dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})

# Print the row
JSON(chat_dataset_formatted[0])

## 4. Custom templates

**Jinja** is a template engine for Python that allows developers to create dynamic content.

https://jinja.palletsprojects.com/en/stable/templates/



In [None]:
custom_chat_template = """
{%- for message in messages %}
    {{- '<|bert_start|>' + message['role'] + '\n' + message['content'] + '<|bert_end|>' + '\n' }}
{%- endfor %}
"""

In [None]:
# Bert Base Uncased
test_model_index = 4

tokenizer = AutoTokenizer.from_pretrained(model_ids[test_model_index])

tokenizer.chat_template = custom_chat_template

In [None]:
formatted_chat = tokenizer.apply_chat_template(sample_chat, tokenize=False)
formatted_chat