<div align="center">
<img src="https://poorit.in/image.png" alt="Poorit" width="40" style="vertical-align: middle;"> <b>AI SYSTEMS ENGINEERING 1</b>

## Unit 1: Tokenization and Conversation Memory

**CV Raman Global University, Bhubaneswar**  
*AI Center of Excellence*

</div>

---

### What You'll Learn

In this notebook, you will:

1. **Understand tokenization** and how text is converted to tokens
2. **Use tiktoken** to encode and decode text
3. **Learn about context windows** and their implications for API costs
4. **Understand conversation memory** and the "illusion" of memory in LLMs

**Duration:** ~1 hour

---

## 1. Environment Setup

In [None]:
# Install required packages
!pip install -q openai tiktoken

In [None]:
import os
from getpass import getpass
import tiktoken
from openai import OpenAI

In [None]:
# Configure OpenAI
api_key = getpass("Enter your OpenAI API Key: ")
os.environ['OPENAI_API_KEY'] = api_key
client = OpenAI(api_key=api_key)
MODEL = "gpt-4o-mini"

---

## 2. What is Tokenization?

**Tokens** are the fundamental units that LLMs work with. They're not exactly words or characters, but somewhere in between.

- A token is typically 3-4 characters
- Common words are often single tokens
- Rare words may be split into multiple tokens

**Why does this matter?**
- API pricing is based on tokens (input + output)
- Context windows are measured in tokens
- Understanding tokens helps optimize costs

In [None]:
# Get the tokenizer for GPT-4o-mini
encoding = tiktoken.encoding_for_model("gpt-4o-mini")

In [None]:
# Encode a sentence into tokens
text = "Hello, my name is Ravi and I study at CV Raman University"
tokens = encoding.encode(text)

print(f"Text: {text}")
print(f"Token count: {len(tokens)}")
print(f"Token IDs: {tokens}")

In [None]:
# See what each token represents
for token_id in tokens:
    token_text = encoding.decode([token_id])
    print(f"{token_id:6d} ‚Üí '{token_text}'")

In [None]:
# Decode tokens back to text
decoded_text = encoding.decode(tokens)
print(f"Decoded: {decoded_text}")

---

## 3. Token Counting and Cost Estimation

Let's create a utility to count tokens and estimate API costs.

In [None]:
# GPT-4o-mini pricing (as of 2024)
INPUT_PRICE_PER_1M = 0.15  # $0.15 per 1M input tokens
OUTPUT_PRICE_PER_1M = 0.60  # $0.60 per 1M output tokens

def count_tokens(text):
    """Count the number of tokens in a text."""
    return len(encoding.encode(text))

def estimate_cost(input_tokens, output_tokens):
    """Estimate cost in USD."""
    input_cost = (input_tokens / 1_000_000) * INPUT_PRICE_PER_1M
    output_cost = (output_tokens / 1_000_000) * OUTPUT_PRICE_PER_1M
    return input_cost + output_cost

In [None]:
# Example: Estimate cost for a conversation
system_prompt = "You are a helpful assistant that explains concepts clearly."
user_message = "Explain machine learning in simple terms."

input_tokens = count_tokens(system_prompt + user_message)
estimated_output = 200  # Assume ~200 tokens output

print(f"Input tokens: {input_tokens}")
print(f"Estimated output: {estimated_output}")
print(f"Estimated cost: ${estimate_cost(input_tokens, estimated_output):.6f}")

---

## 4. Context Windows

Every model has a **context window** - the maximum number of tokens it can process in a single request.

| Model | Context Window |
|-------|---------------|
| GPT-4o-mini | 128,000 tokens |
| GPT-4o | 128,000 tokens |
| Claude 3.5 Sonnet | 200,000 tokens |
| Llama 3.2 | 128,000 tokens |

**Important**: Context window includes both input AND output tokens!

In [None]:
# Check how much of context window a large text would use
large_text = "This is a test sentence. " * 1000
token_count = count_tokens(large_text)

context_window = 128_000
usage_percent = (token_count / context_window) * 100

print(f"Text length: {len(large_text)} characters")
print(f"Token count: {token_count}")
print(f"Context window usage: {usage_percent:.2f}%")

---

## 5. The "Illusion" of Memory

Here's an important insight: **LLMs have no memory**. Every API call is completely stateless.

Let's demonstrate this:

In [None]:
# First message - introduce ourselves
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Priya!"}
]

response = client.chat.completions.create(model=MODEL, messages=messages)
print("Response:", response.choices[0].message.content)

In [None]:
# New message - ask for our name (without context)
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What's my name?"}
]

response = client.chat.completions.create(model=MODEL, messages=messages)
print("Response:", response.choices[0].message.content)

### The LLM doesn't remember!

Every call is stateless. To create the "illusion" of memory, we must include the full conversation history:

In [None]:
# Include the full conversation history
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Priya!"},
    {"role": "assistant", "content": "Hello Priya! It's nice to meet you. How can I help you today?"},
    {"role": "user", "content": "What's my name?"}
]

response = client.chat.completions.create(model=MODEL, messages=messages)
print("Response:", response.choices[0].message.content)

### Key Insights

1. **Every call is stateless** - the model doesn't "remember" previous calls
2. **We pass the full conversation** - this creates the illusion of memory
3. **Cost implications** - longer conversations cost more (more tokens)
4. **ChatGPT uses this trick** - it stores and sends the full conversation each time

---

## 6. Building a Conversation Manager

In [None]:
class Conversation:
    """Manage a conversation with memory."""
    
    def __init__(self, system_prompt="You are a helpful assistant"):
        self.messages = [{"role": "system", "content": system_prompt}]
    
    def chat(self, user_message):
        self.messages.append({"role": "user", "content": user_message})
        
        response = client.chat.completions.create(
            model=MODEL,
            messages=self.messages
        )
        
        assistant_message = response.choices[0].message.content
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        return assistant_message
    
    def get_token_count(self):
        total = sum(count_tokens(m["content"]) for m in self.messages)
        return total

In [None]:
# Test the conversation manager
conv = Conversation()

print("User: Hi, I'm Amit!")
print("Assistant:", conv.chat("Hi, I'm Amit!"))

print("\nUser: What's my name?")
print("Assistant:", conv.chat("What's my name?"))

print(f"\nTotal tokens used: {conv.get_token_count()}")

---

## 7. Exercise: Token Analysis

Analyze how different languages and text types tokenize differently.

In [None]:
# Exercise: Compare token counts for different texts
texts = [
    "Hello world",
    "Artificial Intelligence",
    "‡§®‡§Æ‡§∏‡•ç‡§§‡•á",  # Hindi
    "ü§ñüí°üöÄ",  # Emojis
]

for text in texts:
    tokens = encoding.encode(text)
    print(f"'{text}' ‚Üí {len(tokens)} tokens")

---

## Key Takeaways

1. **Tokens** are the units LLMs work with - typically 3-4 characters

2. **tiktoken** is OpenAI's tokenizer library - use it to count tokens and estimate costs

3. **Context windows** limit how much text you can process - includes input AND output

4. **LLMs are stateless** - memory is created by passing the full conversation each time

5. **Cost optimization** - manage conversation length to control API costs

### What's Next?

In the next notebook, we'll explore:
- JSON structured outputs
- Chaining multiple LLM calls
- Streaming responses for better UX

---

## Additional Resources

- [OpenAI Tokenizer Tool](https://platform.openai.com/tokenizer)
- [tiktoken Documentation](https://github.com/openai/tiktoken)
- [OpenAI Pricing](https://openai.com/pricing)

---

**Course Information:**
- **Institution:** CV Raman Global University, Bhubaneswar
- **Program:** AI Center of Excellence
- **Course:** AI Systems Engineering 1
- **Developed by:** [Poorit Technologies](https://poorit.in) - *Transform Graduates into Industry-Ready Professionals*

---