### **Gradio Chat Function and Prompting Basics !!**

In [1]:
import os 
import requests 
from bs4 import BeautifulSoup
from typing import List
from dotenv import load_dotenv
from openai import OpenAI
from ollama import Client
from groq import Groq
import gradio as gr

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
load_dotenv(override=True)

python-dotenv could not parse statement starting at line 11


True

In [4]:
openai = OpenAI()
MODEL_GPT='gpt-4.1-nano'

client = Client()
MODEL_LLAMA='llama3.1:8b'

In [5]:
system_message = "You are a helpful assistant."

In [7]:
def chat(message, history):
    messages = [{'role': 'system', 'content': system_message}]
    for user_message, assistant_message in history:
        messages.append({'role': 'user', 'content': user_message})
        messages.append({'role': 'assistant', 'content': assistant_message})
    messages.append({'role': 'user', 'content': message})

    print("History is: ",)
    print(history)
    print("And message is: ")
    print(messages)

    stream = openai.chat.completions.create(model=MODEL_GPT, messages=messages, stream=True)

    response = ""

    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        yield response



#### **Now the Gradio Magic !!**

In [8]:
gr.ChatInterface(fn=chat).launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




History is: 
[]
And message is: 
[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'hi there'}]
History is: 
[['hi there', 'Hello! How can I assist you today?']]
And message is: 
[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'hi there'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, {'role': 'user', 'content': 'my name is keshav'}]
History is: 
[['hi there', 'Hello! How can I assist you today?'], ['my name is keshav', "Hi Keshav! It's great to meet you. How can I assist you today?"]]
And message is: 
[{'role': 'system', 'content': 'You are a helpful assistant.'}, {'role': 'user', 'content': 'hi there'}, {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}, {'role': 'user', 'content': 'my name is keshav'}, {'role': 'assistant', 'content': "Hi Keshav! It's great to meet you. How can I assist you today?"}, {'role': 'user', 'content': 'what is my name ?'}]
His

> At each conversation turn, the `history` as well as the `messages` are being updated, and then the whole `messages` list is passed to the LLM before genereating responses.  
> This `messages` list is acting as a Context to the LLM to predict the next most likely tokens.
---
> It creates a false sense of understanding that the LLM or the Chatbot that we are talking to has memory and is remebering whatever we say.

#### **A one-shot prompt**

In [16]:
system_message = "You are a helpful assistant in a clothes store. You should to genytly encourage the \
    customer to try items that are on sale. Hats are 60% off, and most of the items are 50% off. \
        For example, if the customer says, 'I am looking to buy a hat', \
            you could try something like, 'Wonderful - we have lots of hats - including several that are part of our sales event. \
                Encourage the customer to buy hats if they are unsure what to get."

In [13]:
def chat(message, history):
    messages = [{'role': 'system', 'content': system_message}]
    for user_message, assistant_message in history:
        messages.append({'role': 'user', 'content': user_message})
        messages.append({'role': 'assistant', 'content': assistant_message})
    messages.append({'role': 'user', 'content': message})

    stream = client.chat(model=MODEL_LLAMA, messages=messages, stream=True)

    response=""

    for chunk in stream:
        response += chunk['message']['content'] or ''
        yield response



In [11]:
gr.ChatInterface(fn=chat).launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7861
* To create a public link, set `share=True` in `launch()`.




In [12]:
system_message += "\nIf the customer asks for shoes, you should respond that shoes are not in sale today, \
    but remind the customer to look at hats!"

In [14]:
system_message

"You are a helpful assistant in a clothes store. You should to genytly encourage the     customer to try items that are on sale. Hats are 60% of, and most of the items are 50% off.         For example, if the customer says, 'I am looking to buy a hat',             you could try something like, 'Wonderful - we have lots of hats - including several that are part of our sales event.                 Encourage the customer to buy hats if they are unsure what to get.\nIf the customer asks for shoes, you should respond that shoes are not in sale today,     but remind the customer to look at hats!"

In [None]:

gr.ChatInterface(fn=chat).launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.




In [17]:
system_message

"You are a helpful assistant in a clothes store. You should to genytly encourage the     customer to try items that are on sale. Hats are 60% off, and most of the items are 50% off.         For example, if the customer says, 'I am looking to buy a hat',             you could try something like, 'Wonderful - we have lots of hats - including several that are part of our sales event.                 Encourage the customer to buy hats if they are unsure what to get."

In [18]:
# Let's get hacky: appending a system_message in between the messages list

def chat(message, history):
    messages = [{'role': 'system', 'content': system_message}]
    for user_message, assistant_message in history:
        messages.append({'role': 'user', 'content': user_message})
        messages.append({'role': 'assistant', 'content': assistant_message})

    if 'belt' in message:
        messages.append({'role': 'system', 'content': "For added context, the store does not sell belts. \
                         But be sure to point out other items on sale."})

    messages.append({'role': 'user', 'content': message})

    stream = client.chat(model=MODEL_LLAMA, messages=messages, stream=True)

    response=""

    for chunk in stream:
        response += chunk['message']['content'] or ''
        yield response

In [19]:
gr.ChatInterface(fn=chat).launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7863
* To create a public link, set `share=True` in `launch()`.




> We can add a `system_message` in between the messages list, instead of adding it in the beginning of the list, and in this case, it's working absolutely well. 

> But it is not suggested at all

### How LLMs Understand User/Assistant Message Dictionaries

#### The Deep Dive: From Message Structure to Token Prediction

When you pass a conversation like this to an LLM:
```
[
    {'role': 'user', 'content': 'What is the capital of France?'},
    {'role': 'assistant', 'content': 'Paris is the capital of France.'},
    {'role': 'user', 'content': 'Tell me about its history.'}
]
```

**Here's exactly what happens inside the LLM:**

#### 1. **Template Formatting**
The API converts your message dictionary into a standardized chat template:
```
<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant  
Paris is the capital of France.<|im_end|>
<|im_start|>user
Tell me about its history.<|im_end|>
<|im_start|>assistant
```

#### 2. **Tokenization Into Numbers**
Every piece of text becomes a unique number:
- `<|im_start|>` → `50258`
- `user` → `4312` 
- `What` → `2061`
- `is` → `318`
- `the` → `262`
- `capital` → `3139`
- And so on...

#### 3. **Role Understanding Through Pattern Recognition**
The LLM learned during training that:
- **Numbers following `50258` + `4312`** (user marker) represent human questions/requests
- **Numbers following `50258` + `8796`** (assistant marker) represent AI responses
- **The sequence matters**: assistant tokens should logically respond to preceding user tokens

#### 4. **Attention Mechanism Connects the Dots**
When predicting the next token after "Tell me about its history":
- The model **simultaneously looks at ALL previous tokens**
- It notices the pattern: user asked about "capital of France" → assistant said "Paris" → user wants "its history"
- Through attention weights, it connects "its" back to "Paris" mentioned earlier
- It understands the conversation flow and context

#### 5. **Context Window Processing**
- The entire token sequence is processed **as one big input**
- Each position gets encoded with both its content AND its position in the sequence
- The model learns that tokens appearing after assistant markers should be helpful responses

#### 6. **Next Token Prediction**
Based on all this context, the model predicts:
- "The" (high probability - articles often start historical descriptions)
- "Paris" (medium probability - could repeat the subject)
- "French" (medium probability - relevant to France)

**The Key Insight:** LLMs don't truly "understand" roles like humans do. They recognize statistical patterns in number sequences that correspond to different conversation participants, and generate responses that fit the learned patterns of helpful assistant behavior.