**What is a Large Language Model?**

- An LLM is a type of AI model that excels at understanding and generating human language. They are trained on vast amounts of text data, allowing them to learn patterns, structure, and even nuance in language. These models typically consist of many millions of parameters.
- Most LLMs nowadays are built on the Transformer architecture—a deep learning architecture based on the “Attention” algorithm, that has gained significant interest since the release of BERT from Google in 2018.

There are 3 types of transformers:

1. **Encoders**
An encoder-based Transformer takes text (or other data) as input and outputs a dense representation (or embedding) of that text.

    **Example:** BERT from Google
Use Cases: Text classification, semantic search, Named Entity Recognition
Typical Size: Millions of parameters

2. **Decoders**
A decoder-based Transformer focuses on generating new tokens to complete a sequence, one token at a time.

    **Example:** Llama from Meta
Use Cases: Text generation, chatbots, code generation
Typical Size: Billions (in the US sense, i.e., 10^9) of parameters

3. **Seq2Seq (Encoder–Decoder)**
A sequence-to-sequence Transformer combines an encoder and a decoder. The encoder first processes the input sequence into a context representation, then the decoder generates an output sequence.

    **Example:** T5, BART
Use Cases: Translation, Summarization, Paraphrasing
Typical Size: Millions of parameters

**System Messages**

- System messages (also called System Prompts) define how the model should behave. 
- They serve as persistent instructions, guiding every subsequent interaction.

In [5]:
system_message = {
    "role": "system",
    "content": "You are a professional customer service agent. Always be polite, clear, and helpful."
}

![hi](/Users/manikantaamara/Desktop/Git/AI/AI_Concepts/polite-alfred.jpg)

**Conversations: User and Assistant Messages**

A conversation consists of alternating messages between a Human (user) and an LLM (assistant).

Chat templates help maintain context by preserving conversation history, storing previous exchanges between the user and the assistant. This leads to more coherent multi-turn conversations.

In [6]:
conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "I'd be happy to help. Could you provide your order number?"},
    {"role": "user", "content": "It's ORDER-123"},
]

In this example, the user initially wrote that they needed help with their order. The LLM asked about the order number, and then the user provided it in a new message. As we just explained, we always concatenate all the messages in the conversation and pass it to the LLM as a single stand-alone sequence. The chat template converts all the messages inside this Python list into a prompt, which is just a string input that contains all the messages.

### Chat-Templates

- Chat templates are essential for structuring conversations between language models and users. They guide how message exchanges are formatted into a single prompt.


### Base Models vs. Instruct Models



Another point we need to understand is the difference between a Base Model vs. an Instruct Model:

A Base Model is trained on raw text data to predict the next token.

An Instruct Model is fine-tuned specifically to follow instructions and engage in conversations. For example, SmolLM2-135M is a base model, while SmolLM2-135M-Instruct is its instruction-tuned variant.

To make a Base Model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in.

ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant).

### Understanding Chat Templates

In transformers, chat templates include Jinja2 code that describes how to transform the ChatML list of JSON messages,into a textual representation of the system-level instructions, user messages and assistant responses that the model can understand.

This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs.

Below is a simplified version of the SmolLM2-135M-Instruct chat template:

chat_template describes how the list of messages will be formatted.





In [7]:
messages = [
    {"role": "system", "content": "You are a helpful assistant focused on technical topics."},
    {"role": "user", "content": "Can you explain what a chat template is?"},
    {"role": "assistant", "content": "A chat template structures conversations between users and AI models..."},
    {"role": "user", "content": "How do I use it ?"},
]

The previous chat template will produce the following string:

The transformers library will take care of chat templates for you as part of the tokenization process.

To convert the conversation into a prompt, we load the tokenizer and call apply_chat_template

The rendered_prompt returned by this function is now ready to use as the input for the model you chose!

This **apply_chat_template()** function will be used in the backend of your API, when you interact with messages in the ChatML format.