# Memory Types in Agents and Humans

## **Procedural Memory**
- **Definition:** Long-term memory for how to perform tasks; akin to a brain’s core instruction set.
- **In Humans:** Remembering how to ride a bike.
- **In Agents:**
  - Defined in the CoALA paper as a combination of **LLM weights** and **agent code** that determine how the agent operates.
  - **Current Practice:**
    - Few, if any, agentic systems update their LLM weights or rewrite their code automatically.
    - Some agents update their **system prompt** dynamically, though this is rare.

---

## **Semantic Memory**
- **Definition:** Long-term store of knowledge.
- **In Humans:** Facts learned in school, meanings of concepts, and their relationships.
- **In Agents:**
  - Defined in the CoALA paper as a **repository of facts about the world**.
  - **Usage:**
    - Personalizes applications by extracting information from interactions.
    - Application-specific storage, retrieved in future conversations to influence responses.

---

## **Episodic Memory**
- **Definition:** Recalling specific past events.
- **In Humans:** Remembering particular events or episodes experienced in the past.
- **In Agents:**
  - Defined in the CoALA paper as **storing sequences of past actions**.
  - **Usage:**
    - Implemented as **few-shot example prompting** or **dynamic few-shot prompting** with collected sequences.
    - Useful for tasks with a "correct" way of performing actions.
    - Differs from semantic memory, which is used when there’s no specific "correct" way to act or when new tasks arise frequently.



# Updating Agent Memory

## **1. How to Update Memory**
Developers consider two main ways to update agent memory:

### **In the Hot Path**
- **Definition:** The agent explicitly updates memory before responding, usually through tool-calling.
- **Example:** ChatGPT uses this approach.
- **Pros:**
  - Immediate memory updates.
- **Cons:**
  - Adds latency before delivering responses.
  - Combines memory logic with agent logic.

### **In the Background**
- **Definition:** A separate process updates memory during or after the conversation.
- **Pros:**
  - No added latency.
  - Keeps memory logic separate from agent logic.
- **Cons:**
  - Memory updates are not immediate.
  - Requires extra logic to determine when to trigger background updates.

### **User Feedback for Memory Updates**
- **Relevance:** Particularly useful for episodic memory.
- **Example:** If a user marks an interaction as positive, save the feedback for future use.

---

## **2. Why Memory Matters for Agents**
Memory significantly enhances the usefulness of agentic systems by enabling:
- Personalization
- Improved task performance through episodic and semantic recall

### **LangChain’s Contributions to Agent Memory**
LangChain simplifies leveraging memory in applications by providing:
1. **Low-level abstractions** for a memory store in **LangGraph**.
2. **Templates** for running memory updates both:
   - In the hot path
   - In the background
3. **Dynamic Few-Shot Example Selection** in **LangSmith** for rapid iteration.

---

## **3. Visual Comparison**

The image below demonstrates the two approaches:
- **In the Hot Path** (left side): Immediate memory updates before responding.
- **In the Background** (right side): Delayed memory updates via background processes.

![image.png](attachment:image.png)

In [1]:
import torch

In [2]:
from google.colab import auth
auth.authenticate_user()
from huggingface_hub import login
login('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')


In [1]:
from transformers import pipeline
from langchain_core.messages import HumanMessage, SystemMessage
import torch

# Load GPT-Neo or GPT-J model (example: EleutherAI/gpt-j-6B)
print("Loading the GPT model... (This might take a while on the first run)")
llm = pipeline(
    "text-generation",
    model="EleutherAI/gpt-j-6B",  # GPT-J for better performance and large-scale text generation
    device="cuda" if torch.cuda.is_available() else "cpu",
    model_kwargs={"temperature": 0.7},
)

# Define System Prompt
system_prompt = SystemMessage("You are a helpful AI Assistant. Answer the User's queries succinctly in one sentence.")

# Start Storage for Historical Message History
messages = [system_prompt]

while True:
    # Get User's Message
    user_message = input("\nUser: ")
    if user_message.lower() == "exit":
        break

    # Create a chat-like input for the pipeline
    conversation = "\n".join(
        [f"User: {msg.content}" if isinstance(msg, HumanMessage) else f"AI: {msg.content}" for msg in messages]
    )
    conversation += f"\nUser: {user_message}\nAI:"

    # Generate Response
    try:
        response = llm(conversation, max_length=150, return_full_text=False)[0]["generated_text"].strip()
    except KeyError:  # If no 'generated_text' key, handle this case
        response = llm(conversation, max_length=150, return_full_text=False)[0].strip()

    print("\nAI: ", response)

    # Append to message history
    messages.append(HumanMessage(user_message))
    messages.append(SystemMessage(response))


Loading the GPT model... (This might take a while on the first run)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


KeyboardInterrupt: 