# Day 6: Tokenization and Chat Templates

This notebook focuses on understanding tokenizers and chat templates for preparing text for LLM inference. You'll learn how to:

- Load pre-trained tokenizers from the Hugging Face Hub
- Apply chat templates to format messages for different model architectures
- Compare tokenization approaches across different models (DeepSeek, Phi-4)
- Understand how tokenizers convert natural language into token sequences

The main focus is on preparing prompts correctly for different model families and understanding how chat templates standardize conversation formats across various models.


<a href="https://colab.research.google.com/github/Istiaq-Fuad/learning-llm-engineering/blob/main/day6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


In [1]:
from google.colab import userdata
from huggingface_hub import login
from transformers import AutoTokenizer

In [3]:
# Log in to Hugging Face

hf_token = userdata.get('HF_TOKEN')
if hf_token and hf_token.startswith("hf_"):
  print("HF key looks good so far")
else:
  print("HF key is not set - please click the key in the left sidebar")
login(hf_token, add_to_git_credential=True)

# Check Google Colab GPU

gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)
  if gpu_info.find('Tesla T4') >= 0:
    print("Success - Connected to a T4")
  else:
    print("NOT CONNECTED TO A T4")

HF key looks good so far
Mon Dec 22 11:35:09 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   47C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                       

## Load tokenizers and define chat messages


In [9]:
DEEPSEEK = "deepseek-ai/DeepSeek-V3.1"
PHI4 = "microsoft/Phi-4-mini-instruct"

In [10]:
deepseek_tokenizer = AutoTokenizer.from_pretrained(DEEPSEEK)
phi4_tokenizer = AutoTokenizer.from_pretrained(PHI4)

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/15.5M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/249 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

In [None]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {
        "role": "user",
        "content": "Tell a light-hearted joke for a room of Data Scientists",
    },
]

prompt = deepseek_tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
print(prompt)

<｜begin▁of▁sentence｜>You are a helpful assistant<｜User｜>Tell a light-hearted joke for a room of Data Scientists<｜Assistant｜></think>


In [12]:
text = "I am curiously excited to show Hugging Face Tokenizers in action to my LLM engineers"
print("DeepSeek:")
tokens = deepseek_tokenizer.encode(text)
print(tokens)
print(deepseek_tokenizer.batch_decode(tokens))
print("\nPhi 4:")
tokens = phi4_tokenizer.encode(text)
print(tokens)
print(phi4_tokenizer.batch_decode(tokens))

Llama:
[0, 43, 1030, 108771, 15046, 304, 1801, 24133, 5426, 11906, 47948, 24524, 295, 4271, 304, 1026, 33792, 47, 26170]
['<｜begin▁of▁sentence｜>', 'I', ' am', ' curiously', ' excited', ' to', ' show', ' Hug', 'ging', ' Face', ' Token', 'izers', ' in', ' action', ' to', ' my', ' LL', 'M', ' engineers']

Phi 4:
[40, 939, 4396, 23138, 15209, 316, 2356, 59116, 4512, 29049, 17951, 24223, 306, 3736, 316, 922, 451, 19641, 32437]
['I', ' am', ' cur', 'iously', ' excited', ' to', ' show', ' Hug', 'ging', ' Face', ' Token', 'izers', ' in', ' action', ' to', ' my', ' L', 'LM', ' engineers']


## Compare tokenization across models


In [None]:
print("DeepSeek chat template: ")
print(
    deepseek_tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
)

print("\nPhi 4 chat template: ")
print(
    phi4_tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
)

DeepSeek chat template: 
<｜begin▁of▁sentence｜>You are a helpful assistant<｜User｜>Tell a light-hearted joke for a room of Data Scientists<｜Assistant｜></think>

Phi 4 chat template: 
<|system|>You are a helpful assistant<|end|><|user|>Tell a light-hearted joke for a room of Data Scientists<|end|><|assistant|>


In [14]:
QWEN_CODER = "Qwen/Qwen2.5-Coder-7B-Instruct"

## Tokenize code with Qwen


In [None]:
qwen_tokenizer = AutoTokenizer.from_pretrained(QWEN_CODER)
code = """
def hello_world(person):
  print("Hello", person)
"""
tokens = qwen_tokenizer.encode(code)
for token in tokens:
    print(f"{token}={qwen_tokenizer.decode(token)}")

198=

750=def
23811= hello
31792=_world
29766=(person
982=):

220= 
1173= print
445=("
9707=Hello
497=",
1697= person
340=)

