# Day 4

## Tokenizing with code

In [27]:
!pip install tiktoken
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-4.1-mini")
tokens = encoding.encode("I looooove youuuuuu")



In [28]:
tokens

[40, 1445, 61341, 1048, 481, 9084, 176972]

In [29]:
for token_id in tokens:
    token_text = encoding.decode([token_id])
    print(f"{token_id} = {token_text}")

40 = I
1445 =  lo
61341 = ooo
1048 = ove
481 =  you
9084 = uu
176972 = uuu


' and'

# And another topic!

### The Illusion of "memory"

Many of you will know this already. But for those that don't -- this might be an "AHA" moment!

In [9]:
import os
from dotenv import load_dotenv

from openai import OpenAI

OLLAMA_BASE_URL = "http://localhost:11434/v1"
ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key="anything")

### You should be very comfortable with what the next cell is doing!

_I'm creating a new instance of the OpenAI Python Client library, a lightweight wrapper around making HTTP calls to an endpoint for calling the GPT LLM, or other LLM providers_

In [10]:
from openai import OpenAI

openai = OpenAI()

OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

### A message to OpenAI is a list of dicts

In [12]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Natalia"}
]

In [14]:
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

"Hello Natalia! It's lovely to meet you. How can I assist you today? Do you have any questions or topics you'd like to discuss? I'm all ears!"

### OK let's now ask a follow-up question

In [15]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What's my name?"}
]

In [16]:
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

"I don't have any information about your identity, but I can try to help you figure it out. Can you please tell me something about yourself, like where you're from or what you're interested in? That might give me some clues about who you are!"

### Wait, wha??

We just told you!

What's going on??

Here's the thing: every call to an LLM is completely STATELESS. It's a totally new call, every single time. As AI engineers, it's OUR JOB to devise techniques to give the impression that the LLM has a "memory".

In [20]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Natalia!"},
    {"role": "assistant", "content": "Hi Natalia! How can I assist you today?"},
    {"role": "user", "content": "What's my name?"}
    ]

In [21]:
response = ollama.chat.completions.create(model="llama3.2", messages=messages)
response.choices[0].message.content

'Your name is Natalia. (I think it might not be the case, though - we just started our conversation!)'

## To recap

With apologies if this is obvious to you - but it's still good to reinforce:

1. Every call to an LLM is stateless
2. We pass in the entire conversation so far in the input prompt, every time
3. This gives the illusion that the LLM has memory - it apparently keeps the context of the conversation
4. But this is a trick; it's a by-product of providing the entire conversation, every time
5. An LLM just predicts the most likely next tokens in the sequence; if that sequence contains "My name is Ed" and later "What's my name?" then it will predict.. Ed!

The ChatGPT product uses exactly this trick - every time you send a message, it's the entire conversation that gets passed in.

"Does that mean we have to pay extra each time for all the conversation so far"

For sure it does. And that's what we WANT. We want the LLM to predict the next tokens in the sequence, looking back on the entire conversation. We want that compute to happen, so we need to pay the electricity bill for it!

