# The Illusion of "memory"

# Connect to local ollama model

In [18]:
import os
from dotenv import load_dotenv
from openai import OpenAI

Load the environment properties from '.env'

In [19]:
load_dotenv(override=True)

api_key = os.getenv('OPENAI_API_KEY')

if not api_key:
    print('No API key found. Please set the OPENAI_API_KEY environment variable.')
if not api_key.startswith('sk-proj'):
    print('Invalid API key format, doesn\'t start with \'sk-proj\'. Please ensure you are using an Ollama API key.')
else:
    print('API key loaded successfully.')

    # Print the first 8 characters (for security) of API key
    print(f'API Key: {api_key[:15]}...')  

API key loaded successfully.
API Key: sk-proj-dummy-k...


Create client to connect with LLM

In [20]:
OLLAMA_BASE_URL = os.getenv('OLLAMA_BASE_URL')

MODEL = 'gemma3:1b'

openai = OpenAI(base_url=OLLAMA_BASE_URL)

Connect to model and ask about your self as a first prompt

In [21]:
payload = [{'role': 'system', 'content': 'You are a helpful assistant.'},
           {'role': 'user', 'content': 'Who am I?'}]

response = openai.chat.completions.create(model=MODEL, messages=payload)

print(response.choices[0].message.content)

Thatâ€™s a fun question! ðŸ˜Š 

As a large language model, I donâ€™t have a single, definitive answer. I was created by Google. 

**You are a user who has interacted with me.** 

Do you want to play a little game? We could try a question-based round!


Hmm, so it failed to retrive any information about the requestor.

Not before asking this question lets have a introduction caht and then ask this question.

In [22]:
payload = [{'role': 'system', 'content': 'You are a helpful assistant.'},
           {'role': 'user', 'content': 'Hello, I\'m Vivek?'}]

response = openai.chat.completions.create(model=MODEL, messages=payload)

print(response.choices[0].message.content)

Hello Vivek! Itâ€™s nice to meet you. How can I help you today? ðŸ˜Š


Ohh, So, even though we shared our information that is name in just previou call it still unable to retrive that information.
* This is because all the calls to LLMs are stateless, they don't have any information about previous conversations.
* This is because it's the responsility of programmer to capture details from previous conversations and send it along with current prompt.

In [23]:
payload = [{'role': 'system', 'content': 'You are a helpful assistant.'},
           {'role': 'user', 'content': 'what is my name?'}]

response = openai.chat.completions.create(model=MODEL, messages=payload)

print(response.choices[0].message.content)

As a large language model, I do not have access to your personally identifiable information. I cannot know your name. 

Itâ€™s a fun question though! ðŸ˜Š 

If youâ€™d like to play a guessing game, I can try!


Creating a payload to mimic previous conversations

* Note:
> Does not working with llm models with 1b paramas.

> Worked as expected with '3b' param llm model.

In [25]:
MODEL_3B_PARAM = 'llama3.2:3b'
payload = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': 'Hello, I\'m Vivek?'},
    {'role': 'assistant', 'content': 'Hello Vivek! How can I assist you today?'},
    {'role': 'user', 'content': 'what is my name?'}
]


response = openai.chat.completions.create(model=MODEL_3B_PARAM, messages=payload)
print(response.choices[0].message.content)

Vivek is the answer to that! You told me your name earlier, actually. Is everything okay? Was there something specific you wanted to chat about or ask for help with?


## Recap

1. Every call to an LLM is stateless even if it local ollama or third-party hosted frontier model.
2. The entire conversation happened so far is passed in the input prompt to LLM, every time we give a new prompt (input/query) to LLM.
3. Which gives us the illusion that the LLM has memory while it's apparently keeping the context of the conversation.
4. Instead it's a trick; it's a by-product of providing the entire conversation, every time.
5. An LLM just predicts the most likely next tokens in the sequence; if that sequence contains "My name is Vivek" and later "What's my name?" then it will predict.. Vivek!

The ChatGPT product uses exactly this trick - every time you send a message, it's the entire conversation that gets passed in.

### Costing
* "Does that mean we have to pay extra each time for all the conversation so far"
* For sure it does. And that's what we WANT. We want the LLM to predict the next tokens in the sequence, looking back on the entire conversation. We want that compute to happen, so we need to pay the electricity bill for it!

