### generative pre-trained transformer
### more parameters= more context, more memory, more time
### early days: neural netwroks trained at character level
### middle ground: tokens, chunks of words, limited set, fragment of words 


In [3]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-4.1-mini")

tokens = encoding.encode("Hi my name is Agnes and I like reading romance novels")

In [4]:
tokens

[12194, 922, 1308, 382, 157967, 326, 357, 1299, 6085, 30327, 43813]

In [5]:
for token_id in tokens:
    token_text = encoding.decode([token_id])
    print(f"{token_id} = {token_text}")

12194 = Hi
922 =  my
1308 =  name
382 =  is
157967 =  Agnes
326 =  and
357 =  I
1299 =  like
6085 =  reading
30327 =  romance
43813 =  novels


In [6]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

API key found and looks good so far!


In [7]:
from openai import OpenAI

openai = OpenAI()

In [8]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Agnes!"}
    ]

In [9]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

'Hello, Agnes! How can I assist you today?'

In [10]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What's my name?"}
    ]

In [11]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

"I don't have access to that information. How can I assist you today?"

### every call to an LLM is completely STATELESS. It's a totally new call, every single time. As AI engineers, it's OUR JOB to devise techniques to give the impression that the LLM has a "memory"

In [12]:
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Hi! I'm Agnes!"},
    {"role": "assistant", "content": "Hi Agnes! How can I assist you today?"},
    {"role": "user", "content": "What's my name?"}
    ]

In [13]:
response = openai.chat.completions.create(model="gpt-4.1-mini", messages=messages)
response.choices[0].message.content

'Your name is Agnes! How can I help you today?'

1. Every call to an LLM is stateless
2. We pass in the entire conversation so far in the input prompt, every time
3. This gives the illusion that the LLM has memory - it apparently keeps the context of the conversation
4. But this is a trick; it's a by-product of providing the entire conversation, every time
5. An LLM just predicts the most likely next tokens in the sequence; if that sequence contains "My name is Agnes" and later "What's my name?" then it will predict.. Agnes!


### context window: max number of tokens that the model can consider when generating the next token , includes original input prompt, subsequent conversation, latest input prompt and almost all the output prompt 
### governs how well model can remember references, context , and content 
- multi shot prompting 