# How does ChatML affect the quality of LLMs?

ChatML is short for Chat Makeup Language. More details can be [find here](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/ai-services/openai/includes/chat-markup-language.md).

ChatML make explicit to the model the source of each piece of text, and particularly show the boundary between human and AI. For ChatGPT, manage conversations in a format like following:

```python
[
    {"role": "system", "content": "Provide some context and/or instructions to the model."},
    ["role": "user", "content": "The user’s message goes here"],
]
```

Then convert the conversation messages into following form as input to the model, To adhere the instruction format, append the assistant token at the end:

```text

<|im_start|>system 
Provide some context and/or instructions to the model.
<|im_end|> 
<|im_start|>user 
The user’s message goes here
<|im_end|> 
<|im_start|>assistant 

```

Different models follow different ChatML formats, Make sure you generate the right prompt to achieve the expected model performance.



## Compare Huggingface(w/ ChatML), vLLM (w/o ChatML)

In [1]:
# huggingface version with right ChatML.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
model_id = "/data/hf/Baichuan2-7B-Chat/"

tokenizer = AutoTokenizer.from_pretrained(model_id,
    revision="v2.0",
    use_fast=False,
    trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id,
    revision="v2.0",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(model_id, revision="v2.0")
messages = []
messages.append({"role": "user", "content": "Compare the advantages and disadvantages of Beijing and New York"})
response = model.chat(tokenizer, messages)
print(response)

  from .autonotebook import tqdm as notebook_tqdm


Beijing, as the capital of China, is a modern city with a long history. It has many advantages and disadvantages compared to New York.

Advantages:
1. Political importance: As the capital of China, Beijing is an important political, economic, and cultural center in the country.
2. History and culture: With a long history, Beijing is home to many historical sites and cultural relics, such as the Forbidden City, the Great Wall, and the Temple of Heaven.
3. Transportation: Beijing has a well-developed transportation system, including high-speed trains, subways, and buses.
4. Education: Beijing is a major educational center in China, with many top universities and research institutions.
5. International influence: As the political and cultural center of China, Beijing has a growing international influence.

Disadvantages:
1. Air pollution: Due to the large number of vehicles and industrial emissions, Beijing's air quality can be poor at times.
2. Traffic congestion: The dense traffic in Be

In [1]:
# vLLM version without ChatML
from vllm import LLM, SamplingParams
model_id = "/data/hf/Baichuan2-7B-Chat/"

sampling_params = SamplingParams(repetition_penalty=1.05, top_k=5, top_p=0.85, max_tokens=2048, temperature=0.3)
llm = LLM(model=model_id, trust_remote_code=True, tensor_parallel_size=2)

response = llm.generate("Compare the advantages and disadvantages of Beijing and New York")
print(response[0].outputs[0].text)

  from .autonotebook import tqdm as notebook_tqdm
2024-01-04 10:31:17,015	INFO util.py:159 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2024-01-04 10:31:17,362	INFO util.py:159 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.
2024-01-04 10:31:19,175	INFO worker.py:1673 -- Started a local Ray instance.


INFO 01-04 10:31:20 llm_engine.py:73] Initializing an LLM engine with config: model='/data/hf/Baichuan2-7B-Chat/', tokenizer='/data/hf/Baichuan2-7B-Chat/', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, enforce_eager=False, seed=0)
INFO 01-04 10:31:35 llm_engine.py:223] # GPU blocks: 3047, # CPU blocks: 1024
[36m(RayWorkerVllm pid=933127)[0m INFO 01-04 10:31:37 model_runner.py:394] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.




[36m(RayWorkerVllm pid=933127)[0m INFO 01-04 10:32:11 model_runner.py:437] Graph capturing finished in 35 secs.
[36m(RayWorkerVllm pid=933126)[0m INFO 01-04 10:31:37 model_runner.py:394] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.


Processed prompts: 100%|██████████| 1/1 [00:00<00:00,  3.81it/s]

 in terms of culture and lifestyle
Beijing and New York are two of the world





The HF version of the model provides high-quality responses, while the vLLM version give very poor response quality. 

Model response of HF version:
```text

Beijing, as the capital of China, is a modern city with a long history. It has many advantages and disadvantages compared to New York.

Advantages:
1. Political importance: As the capital of China, Beijing is an important political, economic, and cultural center in the country.
2. History and culture: With a long history, Beijing is home to many historical sites and cultural relics, such as the Forbidden City, the Great Wall, and the Temple of Heaven.
3. Transportation: Beijing has a well-developed transportation system, including high-speed trains, subways, and buses.
4. Education: Beijing is a major educational center in China, with many top universities and research institutions.
5. International influence: As the political and cultural center of China, Beijing has a growing international influence.

Disadvantages:
1. Air pollution: Due to the large number of vehicles and industrial emissions, Beijing's air quality can be poor at times.
2. Traffic congestion: The dense traffic in Beijing can cause long delays and congestion during peak hours.
3. High cost of living: The cost of living in Beijing is generally higher than in other cities in China, due to the high price of real estate and other living expenses.
4. Language barrier: For non-Chinese speakers, learning Chinese and adapting to the local culture can be challenging.
5. Food and drink: While there are numerous Chinese cuisine options in Beijing, Western food may not be as readily available or affordable as in New York.

New York, located in the United States, is a global city known for its diversity, economy, and culture. It also has its own set of advantages and disadvantages compared to Beijing.

Advantages:
1. Diversity: New York is famous for its diverse population and multicultural environment, which offers a wide range of food, entertainment, and social opportunities.
2. Economy: New York is one of the world's leading financial centers, with a strong stock market, banking, and investment sector.
3. Culture: New York is a major center of art, fashion, and entertainment, with numerous museums, theaters, and music venues.
4. Transportation: New York has an efficient and extensive public transportation system, including subways, buses, and taxis.
5. International influence: As the largest city in the United States, New York has a significant global influence in various fields, such as politics, business, and media.

Disadvantages:
1. Crime rate: New York has a higher crime rate than many other cities, particularly in lower-income areas.
2. Cost of living: The cost of living in New York is generally higher than in other cities, due to the high price of real estate, rent, and other living expenses.
3. Weather: New York's weather can be variable, with cold winters and humid summers.
4. Language barrier: While English is the official language of the city, learning American English and adapting to the local culture can still be challenging for non-native speakers.
5. Urban sprawl: The vast size of New York can make it difficult to navigate for those not familiar with the city, especially outside of Manhattan.


```

Model response of vLLM version:

```text
 in terms of culture and lifestyle
Beijing and New York are two of the world
```

This is due to the fact that Baichuan chat model used a unique ChatML format during supervised fine-tuning, Following the fine-tuning ChatML format is essential to achive the expected model performance. Always we can found the ChatML format definition on huggingface model page or code.

Let's try to incorporate [Baichuan ChatML](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/generation_utils.py) into vLLM and observe the outcome.

## Incorporate ChatML into vLLM

In [5]:
from typing import List
from queue import Queue


chatml_cfg = {"user_token_id": 195, "assistant_token_id": 196, "max_new_tokens": 2048, "model_max_length": 4096}

def build_chat_input(config, tokenizer, messages: List[dict]):
    def _parse_messages(messages, split_role="user"):
        system, rounds = "", []
        round = []
        for i, message in enumerate(messages):
            if message["role"] == "system":
                assert i == 0
                system = message["content"]
                continue
            if message["role"] == split_role and round:
                rounds.append(round)
                round = []
            round.append(message)
        if round:
            rounds.append(round)
        return system, rounds

    max_new_tokens = config.get("max_new_tokens")
    max_input_tokens = config.get("model_max_length") - max_new_tokens
    system, rounds = _parse_messages(messages, split_role="user")
    system_tokens = tokenizer.encode(system)
    max_history_tokens = max_input_tokens - len(system_tokens)

    history_tokens = []
    for round in rounds[::-1]:
        round_tokens = []
        for message in round:
            if message["role"] == "user":
                round_tokens.append(config.get("user_token_id"))
            else:
                round_tokens.append(config.get("assistant_token_id"))
            round_tokens.extend(tokenizer.encode(message["content"]))
        if len(history_tokens) == 0 or len(history_tokens) + len(round_tokens) <= max_history_tokens:
            history_tokens = round_tokens + history_tokens  # concat left
            if len(history_tokens) < max_history_tokens:
                continue
        break

    input_tokens = system_tokens + history_tokens
    if messages[-1]["role"] != "assistant":
        input_tokens.append(chatml_cfg["assistant_token_id"])
    input_tokens = input_tokens[-max_input_tokens:]  # truncate left
    return input_tokens

In [8]:
# vLLM chatml version
from vllm import LLM, SamplingParams


model_id = "/data/hf/Baichuan2-7B-Chat/"

sampling_params = SamplingParams(repetition_penalty=1.05, top_k=5, top_p=0.85, max_tokens=2048, temperature=0.3)
llm = LLM(model=model_id, trust_remote_code=True, tensor_parallel_size=2)

prompt = "Compare the advantages and disadvantages of Beijing and New York"
prompt_token_ids = build_chat_input(chatml_cfg, llm.get_tokenizer(), [{"role": "user", "content": prompt}])

response = llm.generate(prompt_token_ids=[prompt_token_ids], sampling_params=sampling_params)
print(response[0].outputs[0].text)


Processed prompts: 100%|██████████| 1/1 [00:07<00:00,  7.51s/it]

As two famous cities in the world, Beijing and New York have their own advantages and disadvantages. The following is a comparison between them:

Advantages of Beijing:
1. History and culture: Beijing is the capital of China and has a long history. It is full of historical sites and cultural relics, such as the Forbidden City, the Great Wall, and the Temple of Heaven.
2. Politics and economy: Beijing is an important political and economic center in China. Many government departments and multinational companies have headquarters here.
3. Education: Beijing has many well-known universities and research institutions, providing high-quality education for students.
4. Transportation: Beijing has a complete transportation system, including high-speed railways, subways, buses, and taxis.
5. Natural environment: Although Beijing is a large city, it still has some green spaces, such as the Summer Palace and the Forest Park.

Disadvantages of Beijing:
1. Air pollution: Due to the large number of




The model's response quality has significantly improved!!

Be carefull when you use any LLM inference libraries like vLLM, fastGPT, etc.