# Tokenizing with code

In [1]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4.1-mini")

tokens = encoding.encode("Hi my name is Muhammad and I like banoffee pie")

In [2]:
tokens

[12194, 922, 1308, 382, 67641, 326, 357, 1299, 9171, 26458, 5148]

In [3]:
for token_id in tokens:
    token_text = encoding.decode([token_id])
    print(f"{token_id} = {token_text}")

12194 = Hi
922 =  my
1308 =  name
382 =  is
67641 =  Muhammad
326 =  and
357 =  I
1299 =  like
9171 =  ban
26458 = offee
5148 =  pie


In [4]:
encoding.decode([326])

' and'

# The Illusion of "memory

In [5]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"


In [6]:
from openai import OpenAI

# openai = OpenAI()
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

In [7]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Muhammad!"}
    ]

In [8]:
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

"As-salamu alaykum, Muhammad! (peace be upon you) It's great to meet you. How can I assist you today? Do you have any questions or topics you'd like to discuss? I'm all ears and here to help!"

In [9]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What's my name?"}
    ]

In [10]:
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

"I don't have any information about you, so I'm not sure what your name is. We just started chatting, and I don't have any context or personal data about you. Would you like to share your name with me?"

### wha??
every call to an LLM is completely STATELESS. It's a totally new call, every single time. As AI engineers, it's OUR JOB to devise techniques to give the impression that the LLM has a "memory".

In [11]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Muhammad!"},
    {"role": "assistant", "content": "As-salamu alaykum, Muahmmad! (peace be upon you) It's a pleasure to meet you. How can I assist you today? Is there anything on your mind that you'd like to talk about or ask about?"},
    {"role": "user", "content": "What's my name?"}
    ]

In [12]:
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

'Your name is Muhammad.'

1. Every call to an LLM is stateless
2. We pass in the entire conversation so far in the input prompt, every time
3. This gives the illusion that the LLM has memory - it apparently keeps the context of the conversation
4. But this is a trick; it's a by-product of providing the entire conversation, every time