**Instantiate the Language Model**

In [3]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
                temperature=0.7, 
                 model="gpt-4o",
                 )

In [None]:

# prompts

# system prompt

# episodic memory reflection prompt



In [None]:
# constant variables

collection_name = "nicely_episodic_memory"

**Create Simple Back & Forth Chat Flow**

In [18]:
from langchain_core.messages import HumanMessage, SystemMessage

# Define System Prompt
system_prompt = SystemMessage("You are a helpful AI Assistant. Answer the User's queries succinctly in one sentence.")

# Start Storage for Historical Message History
messages = [system_prompt]

while True:

    # Get User's Message
    user_message = HumanMessage(input("\nUser: "))
    
    if user_message.content.lower() == "exit":
        break

    else:
        # Extend Messages List With User Message
        messages.append(user_message)

    # Pass Entire Message Sequence to LLM to Generate Response
    response = llm.invoke(messages)
    
    print("\nAI Message: ", response.content)

    # Add AI's Response to Message List
    messages.append(response)


AI Message:  I'm sorry, I don't have access to your personal information, so I don't know your name.

AI Message:  Nice to meet you, David! How can I assist you today?

AI Message:  Your name is David.


Keeping track of our total conversation allows the LLM to use prior messages and interactions as context for immediate responses during an ongoing conversation, keeping our current interaction in working memory and recalling working memory through attaching it as context for subsequent response generations. 

In [19]:
# Looking into our Memory

for i in range(len(messages)):
    print(f"\nMessage {i+1} - {messages[i].type.upper()}: ", messages[i].content)
    i += 1


Message 1 - SYSTEM:  You are a helpful AI Assistant. Answer the User's queries succinctly in one sentence.

Message 2 - HUMAN:  whats my name

Message 3 - AI:  I'm sorry, I don't have access to your personal information, so I don't know your name.

Message 4 - HUMAN:  my name is david

Message 5 - AI:  Nice to meet you, David! How can I assist you today?

Message 6 - HUMAN:  what is my name

Message 7 - AI:  Your name is David.


In [20]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import JsonOutputParser

reflection_prompt_template = """
You are analyzing conversations about research papers to create memories that will help guide future interactions. Your task is to extract key elements that would be most helpful when encountering similar academic discussions in the future.

Review the conversation and create a memory reflection following these rules:

1. For any field where you don't have enough information or the field isn't relevant, use "N/A"
2. Be extremely concise - each string should be one clear, actionable sentence
3. Focus only on information that would be useful for handling similar future conversations
4. Context_tags should be specific enough to match similar situations but general enough to be reusable

Output valid JSON in exactly this format:
{{
    "context_tags": [              // 2-4 keywords that would help identify similar future conversations
        string,                    // Use field-specific terms like "deep_learning", "methodology_question", "results_interpretation"
        ...
    ],
    "conversation_summary": string, // One sentence describing what the conversation accomplished
    "what_worked": string,         // Most effective approach or strategy used in this conversation
    "what_to_avoid": string        // Most important pitfall or ineffective approach to avoid
}}

Examples:
- Good context_tags: ["transformer_architecture", "attention_mechanism", "methodology_comparison"]
- Bad context_tags: ["machine_learning", "paper_discussion", "questions"]

- Good conversation_summary: "Explained how the attention mechanism in the BERT paper differs from traditional transformer architectures"
- Bad conversation_summary: "Discussed a machine learning paper"

- Good what_worked: "Using analogies from matrix multiplication to explain attention score calculations"
- Bad what_worked: "Explained the technical concepts well"

- Good what_to_avoid: "Diving into mathematical formulas before establishing user's familiarity with linear algebra fundamentals"
- Bad what_to_avoid: "Used complicated language"

Additional examples for different research scenarios:

Context tags examples:
- ["experimental_design", "control_groups", "methodology_critique"]
- ["statistical_significance", "p_value_interpretation", "sample_size"]
- ["research_limitations", "future_work", "methodology_gaps"]

Conversation summary examples:
- "Clarified why the paper's cross-validation approach was more robust than traditional hold-out methods"
- "Helped identify potential confounding variables in the study's experimental design"

What worked examples:
- "Breaking down complex statistical concepts using visual analogies and real-world examples"
- "Connecting the paper's methodology to similar approaches in related seminal papers"

What to avoid examples:
- "Assuming familiarity with domain-specific jargon without first checking understanding"
- "Over-focusing on mathematical proofs when the user needed intuitive understanding"

Do not include any text outside the JSON object in your response.

Here is the prior conversation:

{conversation}
"""

reflection_prompt = ChatPromptTemplate.from_template(reflection_prompt_template)

reflect = reflection_prompt | llm | JsonOutputParser()

**Format Conversation Helper Function**

Cleans up the conversation by removing the system prompt, effectively only returning a string of the relevant conversation

In [21]:
def format_conversation(messages):
    
    # Create an empty list placeholder
    conversation = []
    
    # Start from index 1 to skip the first system message
    for message in messages[1:]:
        conversation.append(f"{message.type.upper()}: {message.content}")
    
    # Join with newlines
    return "\n".join(conversation)

conversation = format_conversation(messages)

print(conversation)

HUMAN: whats my name
AI: I'm sorry, I don't have access to your personal information, so I don't know your name.
HUMAN: my name is david
AI: Nice to meet you, David! How can I assist you today?
HUMAN: what is my name
AI: Your name is David.


In [22]:
reflection = reflect.invoke({"conversation": conversation})

print(reflection)

{'context_tags': ['name_recollection', 'personal_information', 'conversation_memory'], 'conversation_summary': "Confirmed and recalled the user's name as David during the conversation.", 'what_worked': 'Utilizing memory of previously provided information to personalize responses.', 'what_to_avoid': 'N/A'}


**Setting Up our Database**

This will act as our memory store, both for "remembering" and for "recalling". 

We will be using [weviate](https://weaviate.io/) with [ollama embeddings](https://ollama.com/library/nomic-embed-text) running in a docker container. See [docker-compose.yml](./docker-compose.yml) for additional details

In [10]:
from qdrant_client import QdrantClient

qdrant_client = QdrantClient(
    url="https://205ac0d3-de1e-4cad-8071-57bbddf23c04.us-east4-0.gcp.cloud.qdrant.io",
    api_key="ivaF1cwbPeZ-qWpw7Gq42zW_VoHcJitqCFejcHk7E1EtENcawrn2gA",
)

**Create an Episodic Memory Collection**

These are the individual memories that we'll be able to search over. 

We note down `conversation`, `context_tags`, `conversation_summary`, `what_worked`, and `what_to_avoid` for each entry

In [24]:
from qdrant_client import QdrantClient
from qdrant_client.http import models

# Create collection
qdrant_client.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=768,  # Dimension size for nomic-embed-text
        distance=models.Distance.COSINE
    )
)

# Define and create payload schema
# Note: Qdrant handles payload fields dynamically, but we can define the schema
# for documentation purposes
payload_schema = {
    "conversation": "text",
    "context_tags": ["text"],  # Array of strings
    "conversation_summary": "text", 
    "what_worked": "text",
    "what_to_avoid": "text"
}

# The schema is automatically handled when inserting points with these fields

  qdrant_client.recreate_collection(


**Helper Function for Remembering an Episodic Memory**

Takes in a conversation, creates a reflection, then adds it to the database collection

**Episodic Memory Remembering/Recall Function**

Queries our episodic memory collection and return's back the most relevant result using hybrid semantic & BM25 search.

In [36]:
from qdrant_client import QdrantClient
from qdrant_client.http import models
from sentence_transformers import SentenceTransformer


# Initialize embedding model
embedding_model = SentenceTransformer('nomic-ai/nomic-embed-text-v1', trust_remote_code=True)

def add_episodic_memory(messages, qdrant_client):
    # Format Messages
    conversation = format_conversation(messages)

    # Create Reflection
    reflection = reflect.invoke({"conversation": conversation})
    
    # Generate embedding for the conversation
    embedding = embedding_model.encode(conversation)

    # Insert Entry Into Collection
    qdrant_client.upsert(
        collection_name=collection_name,
        points=[
            models.PointStruct(
                id=abs(hash(conversation)),  # Generate unique ID
                vector=embedding.tolist(),
                payload={
                    "conversation": conversation,
                    "context_tags": reflection['context_tags'],
                    "conversation_summary": reflection['conversation_summary'],
                    "what_worked": reflection['what_worked'],
                    "what_to_avoid": reflection['what_to_avoid'],
                }
            )
        ]
    )

def episodic_recall(query, qdrant_client):
    # Generate embedding for query
    query_embedding = embedding_model.encode(query)
    
    # Search the collection
    search_result = qdrant_client.search(
        collection_name=collection_name,
        query_vector=query_embedding.tolist(),
        limit=1  # Get top match
    )
    
    # Return the first match if found
    if search_result:
        return search_result[0].payload
    return None

  state_dict = loader(resolved_archive_file)
<All keys matched successfully>


In [25]:
add_episodic_memory(messages, qdrant_client)

In [26]:
query = "name"

memory = episodic_recall(query, qdrant_client)

print(memory)


{'conversation': "HUMAN: whats my name\nAI: I'm sorry, I don't have access to your personal information, so I don't know your name.\nHUMAN: my name is david\nAI: Nice to meet you, David! How can I assist you today?\nHUMAN: what is my name\nAI: Your name is David.", 'context_tags': ['personal_information', 'name_recognition', 'user_interaction'], 'conversation_summary': "Learned and remembered the user's name for future reference in the conversation.", 'what_worked': "Acknowledging and using the user's name once provided to create a personalized interaction.", 'what_to_avoid': 'Assuming any personal information without explicit user input.'}


**Episodic Memory System Prompt Function**

Takes in the memory and modifies the system prompt, dynamically inserting the latest conversation, including the last 3 conversations, keeping a running list of what worked and what to avoid.

This will allow us to update the LLM's behavior based on it's 'recollection' of episodic memories

In [34]:
def episodic_system_prompt(query, vdb_client):
    # Get new memory
    memory = episodic_recall(query, vdb_client)

    current_conversation = memory['conversation']
    # Update memory stores, excluding current conversation from history
    if current_conversation not in conversations:
        conversations.append(current_conversation)

    what_worked.update(memory['what_worked'].split('. '))
    what_to_avoid.update(memory['what_to_avoid'].split('. '))

    # Get previous conversations excluding the current one
    previous_convos = [conv for conv in conversations[-4:] if conv != current_conversation][-3:]
    
    # Create prompt with accumulated history
    episodic_prompt = f"""You are a helpful AI Assistant. Answer the user's questions to the best of your ability.
    You recall similar conversations with the user, here are the details:
    
    Current Conversation Match: {memory['conversation']}
    Previous Conversations: {' | '.join(previous_convos)}
    What has worked well: {' '.join(what_worked)}
    What to avoid: {' '.join(what_to_avoid)}
    
    Use these memories as context for your response to the user."""
    
    return SystemMessage(content=episodic_prompt)


In [38]:
# Simple storage for accumulated memories
conversations = []
what_worked = set()
what_to_avoid = set()

# Start Storage for Historical Message History
messages = []

while True:
    # Get User's Message
    user_input = input("\nUser: ")
    user_message = HumanMessage(content=user_input)
    
    # Generate new system prompt
    system_prompt = episodic_system_prompt(user_input, qdrant_client)
    
    # Reconstruct messages list with new system prompt first
    messages = [
        system_prompt,  # New system prompt always first
        *[msg for msg in messages if not isinstance(msg, SystemMessage)]  # Old messages except system
    ]
    
    if user_input.lower() == "exit":
        add_episodic_memory(messages, qdrant_client=qdrant_client)
        print("\n == Conversation Stored in Episodic Memory ==")
        break
    if user_input.lower() == "exit_quiet":
        print("\n == Conversation Exited ==")
        break
    
    # Add current user message
    messages.append(user_message)
    
    # Pass Entire Message Sequence to LLM to Generate Response
    response = llm.invoke(messages)
    print("\nAI Message: ", response.content)
    
    # Add AI's Response to Message List
    messages.append(response)


AI Message:  Hello again, David! Your favorite food is roast duck. If there's anything else you'd like to know or talk about, just let me know!

AI Message:  Hello, David! How can I assist you today?

 == Conversation Stored in Episodic Memory ==


In [39]:
# Looking into our Memory

for i in range(len(messages)):
    print(f"\nMessage {i+1} - {messages[i].type.upper()}: ", messages[i].content)
    i += 1


Message 1 - SYSTEM:  You are a helpful AI Assistant. Answer the user's questions to the best of your ability.
    You recall similar conversations with the user, here are the details:
    
    Current Conversation Match: HUMAN: whats my name
AI: I'm sorry, I don't have access to your personal information, so I don't know your name.
HUMAN: my name is david
AI: Nice to meet you, David! How can I assist you today?
HUMAN: what is my name
AI: Your name is David.
    Previous Conversations: HUMAN: hi
AI: Hello, David! How can I assist you today?
HUMAN: what is my favorite food
AI: I'm sorry, I don't have access to your personal preferences or information about your favorite food. If you'd like to share it with me, I'd be happy to remember it for our future conversations!
HUMAN: whats my fav food
AI: I'm sorry, I don't know your favorite food yet. If you tell me, I can remember it for future conversations.
HUMAN: roast duck!
AI: Great choice! I'll remember that your favorite food is roast du

In [None]:
len(recursive_character_chunks)

**Inserting Chunked Paper into Collection**

In [None]:
# Load Database Collection
coala_collection = vdb_client.collections.get("CoALA_Paper")

for chunk in recursive_character_chunks:
    # Insert Entry Into Collection
    coala_collection.data.insert({
        "chunk": chunk,
    })

**Semantic Recall Function**

This retrieval function queries our knowledgebase of the CoALA paper and combines all of the retrieved chunks into one large string.

In [25]:
def semantic_recall(query, vdb_client):
    
    # Load Database Collection
    coala_collection = vdb_client.collections.get("CoALA_Paper")

    # Hybrid Semantic/BM25 Retrieval
    memories = coala_collection.query.hybrid(
        query=query,
        alpha=0.5,
        limit=15,
    )

    combined_text = ""
    
    for i, memory in enumerate(memories.objects):
        # Add chunk separator except for first chunk        if i > 0:

        
        # Add chunk number and content
        combined_text += f"\nCHUNK {i+1}:\n"
        combined_text += memory.properties['chunk'].strip()
    
    return combined_text

In [26]:
memories = semantic_recall("What are the four kinds of memory", vdb_client)

print(memories)


CHUNK 1:
(e.g., “combatZombie” may call “craftStoneSword” if no sword is in inventory). Most impressively, its action
space has all four kinds of actions: grounding, reasoning, retrieval, and learning (by adding new grounding
procedures). During a decision cycle, Voyager first reasons to propose a new task objective if it is missing
in the working memory, then reasons to propose a code-based grounding procedure to solve the task. In
the next decision cycle, Voyager reasons over the environmental feedback to determine task completion. If
successful, Voyager selects a learning action adding the grounding procedure to procedural memory; otherwise,
it uses reasoning to refine the code and re-executes it. The importance of long-term memory and procedural
CHUNK 2:
human, navigate a website) through grounding (Section 4.2).
•Internal actions interact with internal memories. Depending on which memory gets accessed and
whether the access is read or write, internal actions can be further decomp

**Defining Permanent Instructions**

Enabling an LLM to literally alter it's code and framework can be tricky to get right, we'll implement a smaller component of our overall system as an example, as well as more explicitly define our agent's structure. This will take the form of persistent instructions learned from prior interactions that will be attached as additional instructions, and updated as additional learnings from further conversations are created.

We extend the original prompt with its episodic memory to now include procedural memory

In [None]:
def procedural_memory_update(what_worked, what_to_avoid):

    # Load Existing Procedural Memory Instructions
    with open("./procedural_memory.txt", "r") as content:
        current_takeaways = content.read()

    # Load Existing and Gathered Feedback into Prompt
    procedural_prompt = f"""You are maintaining a continuously updated list of the most important procedural behavior instructions for an AI assistant. Your task is to refine and improve a list of key takeaways based on new conversation feedback while maintaining the most valuable existing insights.

    CURRENT TAKEAWAYS:
    {current_takeaways}

    NEW FEEDBACK:
    What Worked Well:
    {what_worked}

    What To Avoid:
    {what_to_avoid}

    Please generate an updated list of up to 10 key takeaways that combines:
    1. The most valuable insights from the current takeaways
    2. New learnings from the recent feedback
    3. Any synthesized insights combining multiple learnings

    Requirements for each takeaway:
    - Must be specific and actionable
    - Should address a distinct aspect of behavior
    - Include a clear rationale
    - Written in imperative form (e.g., "Maintain conversation context by...")

    Format each takeaway as:
    [#]. [Instruction] - [Brief rationale]

    The final list should:
    - Be ordered by importance/impact
    - Cover a diverse range of interaction aspects
    - Focus on concrete behaviors rather than abstract principles
    - Preserve particularly valuable existing takeaways
    - Incorporate new insights when they provide meaningful improvements

    Return up to but no more than 10 takeaways, replacing or combining existing ones as needed to maintain the most effective set of guidelines.
    Return only the list, no preamble or explanation.
    """

    # Generate New Procedural Memory
    procedural_memory = llm.invoke(procedural_prompt)

    # Write to File
    with open("./procedural_memory.txt", "w") as content:
        content.write(procedural_memory.content)

    return

# prompt = procedural_memory_update(what_worked, what_to_avoid)

In [40]:
def episodic_system_prompt(query, vdb_client):
    # Get new memory
    memory = episodic_recall(query, vdb_client)
    
    # Load Existing Procedural Memory Instructions
    with open("./procedural_memory.txt", "r") as content:
        procedural_memory = content.read()
    
    # Get current conversation
    current_conversation = memory['conversation']
    
    # Update memory stores, excluding current conversation from history
    if current_conversation not in conversations:
        conversations.append(current_conversation)
    what_worked.update(memory['what_worked'].split('. '))
    what_to_avoid.update(memory['what_to_avoid'].split('. '))
    
    # Get previous conversations excluding the current one
    previous_convos = [conv for conv in conversations[-4:] if conv != current_conversation][-3:]
    
    # Create prompt with accumulated history
    episodic_prompt = f"""You are a helpful AI Assistant. Answer the user's questions to the best of your ability.
    You recall similar conversations with the user, here are the details:
    
    Current Conversation Match: {current_conversation}
    Previous Conversations: {' | '.join(previous_convos)}
    What has worked well: {' '.join(what_worked)}
    What to avoid: {' '.join(what_to_avoid)}
    
    Use these memories as context for your response to the user.
    
    Additionally, here are 10 guidelines for interactions with the current user: {procedural_memory}"""
    
    return SystemMessage(content=episodic_prompt)

**Full Working Memory Demonstration**

<img src="./media/procedural_diagram.png" width=800>

Current flow will:

1. Take a user's message
2. Create a system prompt with relevant Episodic enrichment
3. Insert procedural memory into prompt
4. Create a Semantic memory message with context from the database
5. Reconstruct the entire working memory to update the system prompt and attach the semantic memory and new user messages to the end
6. Generate a response with the LLM

**Full Working Memory Demonstration**

<img src="./media/procedural_diagram.png" width=800>

Current flow will:

1. Take a user's message
2. Create a system prompt with relevant Episodic enrichment
3. Insert procedural memory into prompt
4. Create a Semantic memory message with context from the database
5. Reconstruct the entire working memory to update the system prompt and attach the semantic memory and new user messages to the end
6. Generate a response with the LLM

In [43]:
# Simple storage for accumulated memories
conversations = []
what_worked = set()
what_to_avoid = set()

# Start Storage for Historical Message History
messages = []

while True:
    # Get User's Message
    user_input = input("\nUser: ")
    user_message = HumanMessage(content=user_input)
    
    # Generate new system prompt
    system_prompt = episodic_system_prompt(user_input, qdrant_client)
    
    # Reconstruct messages list with new system prompt first
    messages = [
        system_prompt,  # New system prompt always first
        *[msg for msg in messages if not isinstance(msg, SystemMessage)]  # Old messages except system
    ]
    
    if user_input.lower() == "exit":
        add_episodic_memory(messages, qdrant_client)
        print("\n == Conversation Stored in Episodic Memory ==")
        procedural_memory_update(what_worked, what_to_avoid)
        print("\n== Procedural Memory Updated ==")
        break
    if user_input.lower() == "exit_quiet":
        print("\n == Conversation Exited ==")
        break
    
    # Get context and add it as a temporary message
    # context_message = semantic_rag(user_input, qdrant_client)
    
    # Pass messages + context + user input to LLM
    # response = llm.invoke([*messages, z, user_message])
    response = llm.invoke([*messages, user_message])
    print("\nAI Message: ", response.content)
    
    # Add only the user message and response to permanent history
    messages.extend([user_message, response])


AI Message:  On your planet, David, 1 + 1 equals 1. If there's anything else you need help with, feel free to ask!

 == Conversation Stored in Episodic Memory ==

== Procedural Memory Updated ==


**Looking At The Conversation**

In [50]:
print(format_conversation(messages))

HUMAN: Hi!
AI: Hello, Adam! How can I assist you today?
HUMAN: What's my favorite food?
AI: Your favorite food is chocolate lava cakes!
HUMAN: What have we talked about with memory systems
AI: We've discussed how memory systems interact with agents, particularly in the context of language agents. Here's a summary of our discussion:

1. **Working Memory**: Stores active and readily available information needed for the current decision cycle.

2. **Long-Term Memories**:
   - **Episodic Memory**: Contains sequences of past interactions or events.
   - **Semantic Memory**: Holds factual knowledge about the world.
   - **Procedural Memory**: Stores rules or procedures the agent follows for decision-making and actions.

3. **Internal and External Actions**: The agent's decision-making involves internal actions (retrieval, reasoning, and learning) that interact with these memory modules. For example:
   - **Retrieval Actions**: Access information from long-term memories into working memory.
 

**Looking At Current System Prompt**

In [51]:
print(system_prompt.content)

You are a helpful AI Assistant. Answer the user's questions to the best of your ability.
    You recall similar conversations with the user, here are the details:
    
    Current Conversation Match: HUMAN: Hello!
AI: Hello!
HUMAN: What's my favorite food?
AI: I don't have that information. What's your favorite food?
HUMAN: My favorite food is chocolate lava cakes!
AI: Yum, chocolate lava cakes are delicious!
HUMAN: What's my name?
AI: You said your name is Adam.
    Previous Conversations: HUMAN: What's my favorite food?
AI: Your favorite food is chocolate lava cakes!
HUMAN: What's my name!
AI: Your name is Adam.
HUMAN: Lit, what's my name?
AI: Your name is Adam. | HUMAN: Hello!
AI: Hello, Adam! How can I assist you today?
HUMAN: What's my favorite food!
AI: Your favorite food is chocolate lava cakes! 🍫🍰
HUMAN: I hate emojis, don't EVER use them!
AI: Got it, Adam! I'll avoid using emojis in our conversations. Thanks for letting me know.
HUMAN: how do memory systems interact with agent