# Context Engineering: Sessions & Memory

**Course Day:** Day 2

**Description:** Learn how to build stateful, intelligent LLM agents through Context Engineering - the dynamic assembly and management of information within an LLM's context window. Master Sessions for short-term conversation management and Memory for long-term persistence.

## Learning Objectives

- Understand Context Engineering and its role in building stateful AI agents
- Implement Sessions to manage conversation history and working memory
- Build Memory systems for long-term persistence across multiple sessions
- Apply memory generation techniques (extraction and consolidation)
- Implement memory retrieval strategies for relevant context
- Handle multi-agent systems with shared and separate session histories
- Optimize long-context conversations through compaction strategies
- Deploy production-ready session and memory management

## Key Concepts

### Context Engineering

**Definition:** The process of dynamically assembling and managing information within an LLM's context window to enable stateful, intelligent agents

**Evolution:** Evolved from Prompt Engineering (static system instructions) to dynamic, state-aware prompt construction

**Analogy:** Like mise en place for a chef - gathering and preparing all ingredients before cooking to ensure excellent results

#### Components

**Context to Guide Reasoning:**
- System Instructions: High-level directives defining agent's persona and capabilities
- Tool Definitions: Schemas for APIs or functions the agent can use
- Few-Shot Examples: Curated examples for in-context learning

**Evidential & Factual Data:**
- Long-Term Memory: Persisted knowledge across multiple sessions
- External Knowledge: Information from RAG databases or documents
- Tool Outputs: Data returned by tool calls
- Sub-Agent Outputs: Results from specialized agents
- Artifacts: Non-textual data (files, images)

**Immediate Conversational Information:**
- Conversation History: Turn-by-turn record of current interaction
- State/Scratchpad: Temporary in-progress information
- User's Prompt: The immediate query to address

#### Lifecycle
1. Fetch Context: Retrieve user memories, RAG documents, recent events
2. Prepare Context: Dynamically construct the full prompt (blocking process)
3. Invoke LLM and Tools: Iteratively call LLM and tools until response generated
4. Upload Context: Persist new information to storage (background process)

### Sessions

**Definition:** A container for an entire conversation with an agent, holding chronological history and working memory

**Characteristics:**
- Self-contained record tied to a specific user
- Functions as temporary 'workbench' for single conversation
- Contains Events (building blocks) and State (working memory)
- Requires persistent storage for production (e.g., Agent Engine Sessions)

**Event Types:**
- User Input: Messages from user (text, audio, image, etc.)
- Agent Response: Agent's reply to user
- Tool Call: Agent's decision to use external tool/API
- Tool Output: Data returned from tool call

**State Definition:** Structured 'working memory' or scratchpad holding temporary data like shopping cart items

**Storage Requirements:**
- **Development:** In-memory storage acceptable
- **Production:** Robust databases (Agent Engine Sessions, Spanner, Redis)

### Memory

**Definition:** The mechanism for long-term persistence, capturing and consolidating key information across multiple sessions

**Analogy:** Like an organized filing cabinet - reviewing desk materials, discarding drafts, filing critical documents

**Key Capabilities:**
- Personalization: Remember user preferences and past interactions
- Context Window Management: Compact history via summaries and facts
- Data Mining: Extract insights from conversations
- Agent Self-Improvement: Learn from previous runs via procedural memories

#### Memory vs RAG

| Aspect              | Memory                                                                 | RAG                                                                 |
|---------------------|------------------------------------------------------------------------|---------------------------------------------------------------------|
| **Goal**            | Create personalized, stateful experience                               | Inject external, factual knowledge                                  |
| **Data Source**     | Dialogue between user and agent                                        | Static, pre-indexed knowledge base                                  |
| **Isolation**       | Highly isolated per-user                                               | Generally shared across users                                       |
| **Information Type**| Dynamic, user-specific, uncertain                                      | Static, factual, authoritative                                      |
| **Write Pattern**   | Event-based processing (every turn or session end)                     | Batch processing (offline)                                          |
| **Read Pattern**    | Memory-as-a-tool or static retrieval                                   | Retrieved as-a-tool when needed                                     |
| **Data Format**     | Natural language snippet or structured profile                         | Natural-language chunk                                              |
| **Preparation**     | Extraction and consolidation                                           | Chunking and indexing                                               |

**Analogy:** RAG is research librarian (expert on facts), Memory is personal assistant (expert on user)

### Memory Types

#### By Information Type

**Declarative:**
- Definition: Knowledge of facts, figures, and events (knowing what)
- Examples: User preferences, Past interactions, Entity facts

**Procedural:**
- Definition: Knowledge of skills and workflows (knowing how)
- Examples: Tool call sequences, Successful strategies, Best practices playbooks

#### By Organization
- **Collections:** Multiple self-contained natural language memories per user
- **Structured Profile:** Set of core facts like contact card (quick lookups)
- **Rolling Summary:** Single evolving summary of entire user-agent relationship

#### By Storage
- **Vector Databases:** Semantic similarity search for conceptually similar memories
- **Knowledge Graphs:** Network of entities and relationships for complex connections
- **Hybrid:** Combines both for structured reasoning and semantic search

#### By Creation
- **Explicit:** User directly commands agent to remember (e.g., 'Remember my anniversary')
- **Implicit:** Agent infers and extracts from conversation without direct command

#### By Scope
- **User Level:** Tied to specific user ID, persists across all sessions
- **Session Level:** Compaction of single conversation, isolated to that session
- **Application Level:** Accessible by all users (e.g., procedural memories)

### Memory Generation

**Definition:** LLM-driven ETL pipeline that transforms raw conversational data into structured insights

#### Stages
1. Ingestion: Client provides raw data (conversation history)
2. Extraction & Filtering: LLM extracts meaningful content matching topic definitions
3. Consolidation: Self-editing process comparing new info with existing memories
4. Storage: Persist to durable storage (vector database or knowledge graph)

#### Extraction Methods
- **Schema Based:** LLM follows predefined JSON schema using structured output
- **Natural Language:** LLM guided by natural language topic descriptions
- **Few-Shot:** LLM learns from examples showing ideal memory extraction

#### Consolidation Operations
- UPDATE: Modify existing memory with new/corrected information
- CREATE: Add entirely novel memory unrelated to existing ones
- DELETE/INVALIDATE: Remove incorrect or irrelevant old memories

#### Consolidation Challenges
- Information Duplication: Same fact mentioned multiple ways
- Conflicting Information: User's state changes over time
- Information Evolution: Simple facts become more nuanced
- Memory Relevance Decay: Proactive pruning of stale memories

### Memory Provenance

**Definition:** Detailed record of memory's origin and history to assess trustworthiness

**Importance:** Critical for evaluating memory quality during consolidation and inference

**Source Types:**
- Bootstrapped Data: Pre-loaded from internal systems (high trust)
- User Input: Explicit (high trust) or implicit from conversation (lower trust)
- Tool Output: From external tools (discouraged - brittle and stale)

**Conflict Resolution Strategies:**
- Prioritize most trusted source
- Favor most recent information
- Look for corroboration across multiple data points

**Confidence Factors:**
- **Increases:** Multiple trusted sources provide consistent information
- **Decreases:** Time-based decay, contradictory information introduced

### Memory Retrieval

**Scoring Dimensions:**
- Relevance (Semantic Similarity): Conceptual relation to current conversation
- Recency (Time-based): How recently was memory created
- Importance (Significance): Overall criticality of memory

#### Advanced Techniques
- **Query Rewriting:** LLM rewrites ambiguous input or expands to multiple queries
- **Reranking:** Initial broad search, then LLM re-evaluates top results
- **Specialized Retriever:** Train custom retriever via fine-tuning

#### Timing Strategies
- **Proactive Retrieval:** Auto-load at start of every turn (may add unnecessary latency)
- **Reactive Retrieval:** Memory-as-a-tool where agent decides when to retrieve (more efficient)

### Inference with Memories

#### Placement Options

**System Instructions:**
- Method: Append memories to system prompt with preamble
- Advantages: High authority, Clean separation, Ideal for stable global info
- Disadvantages: Risk of over-influence, Requires dynamic prompt construction, Incompatible with memory-as-a-tool

**Conversation History:**
- Method: Inject directly into turn-by-turn dialogue
- Advantages: Works with memory-as-a-tool, Supports multimodal content
- Disadvantages: Noisy/increases tokens, Risk of dialogue injection, Needs careful perspective management

**Best Practice:** Hybrid strategy - system prompt for stable user profile, dialogue injection for transient episodic memories

## Multi-Agent Sessions

### Shared Unified History
- **Description:** All agents read/write to same single conversation history
- **Use Case:** Tightly coupled collaborative tasks requiring single source of truth
- **Example:** Multi-step problem-solving where one agent's output is direct input for next
- **ADK Implementation:** LLM-driven delegation writes sub-agent events to same session

### Separate Individual Histories
- **Description:** Each agent maintains private conversation history
- **Use Case:** Loosely coupled systems where agents function as black boxes
- **Communication:** Via Agent-as-a-tool or Agent-to-Agent (A2A) Protocol
- **Challenge:** Framework-specific schemas create isolation between different frameworks

### Interoperability Solution
- **Problem:** Session stores couple database schema to framework's internal objects
- **Solution:** Abstract shared knowledge into framework-agnostic Memory layer
- **Benefit:** Memory stores processed, canonical information as strings/dictionaries

## Session Compaction

### Motivation
- Context Window Limits: LLMs have maximum text they can process
- API Costs: Charges based on token count sent/received
- Latency: More text takes longer to process
- Quality: Performance degrades with increased noise and autoregressive errors

### Strategies

| Name                  | Description                                      | Pros                  | Cons                              |
|-----------------------|--------------------------------------------------|-----------------------|-----------------------------------|
| **Keep Last N Turns** | Sliding window - keep most recent N turns, discard older | Simple implementation | Loses potentially important context |
| **Token-Based Truncation** | Count tokens backward from latest, include up to limit | Precise token management | Arbitrary cutoff may lose context |
| **Recursive Summarization** | Replace older messages with AI-generated summary | Preserves key information while reducing tokens | Expensive (additional LLM calls) |

### Trigger Mechanisms
- Count-Based: Token size or turn count threshold
- Time-Based: Lack of activity for set period
- Event-Based: Semantic/task completion detected

### Best Practices
- Perform expensive operations asynchronously in background
- Persist compaction results to avoid repeated computation
- Track which events are included in summary to prevent redundant LLM sends

## Production Considerations

### Sessions

**Security & Privacy:**
- Strict Isolation: One user can never access another's session (ACLs)
- PII Redaction: Redact before writing to storage (use Model Armor)
- Authentication: Every request must be authenticated/authorized

**Data Integrity:**
- Time-to-Live (TTL): Auto-delete inactive sessions
- Data Retention Policy: Define how long sessions kept before archival
- Deterministic Order: Guarantee operations appended chronologically

**Performance:**
- Fast Read/Write: Session data on hot path of every interaction
- Minimize Transfer Size: Filter/compact history before sending
- Network Latency: Stateless runtimes retrieve from central database

### Memory

**Architecture:**
- Decoupled Service: Memory processing separate from main application
- Non-blocking API Calls: Agent pushes data, service acknowledges immediately
- Internal Queue: Service manages background processing
- Persistent Storage: Dedicated durable database for memories

**Scalability:**
- Race Condition Prevention: Transactional operations or optimistic locking
- Message Queue: Buffer high-volume events
- Failure Handling: Retry with exponential backoff, dead-letter queue
- Multi-region: Built-in replication for global applications

**Security:**
- Data Isolation: Strict per-user/tenant memory separation with ACLs
- User Control: Opt-out of memory generation, request data deletion
- PII Redaction: Sanitize before committing to memory
- Memory Poisoning Prevention: Validate and sanitize via Model Armor
- Exfiltration Risk: Anonymize shared memories (e.g., procedural)

## Evaluation Metrics

### Memory Quality
- **Precision:** Of all memories created, percentage that are accurate/relevant
- **Recall:** Of all relevant facts, percentage captured
- **F1 Score:** Harmonic mean of precision and recall

### Retrieval Performance
- **Recall@K:** Is correct memory found within top K retrieved results?
- **Latency:** Retrieval must execute within strict budget (e.g., <200ms)

### Task Success
- **Method:** LLM judge compares agent's output to golden answer
- **Measures:** How well memory contributed to final outcome

### Continuous Improvement
1. Establish baseline metrics
2. Analyze failures systematically
3. Tune system (prompts, algorithms)
4. Re-evaluate to measure impact

## Code Examples

### Session Truncation (ADK)

**Description:** Keep last N turns without modifying stored session

In [None]:
from google.adk.plugins.context_filter_plugin import ContextFilterPlugin

app = App(
    name='hello_world_app',
    root_agent=agent,
    plugins=[
        ContextFilterPlugin(num_invocations_to_keep=10)
    ]
)

### Session Compaction (ADK)

**Description:** LLM-based summarization after N turns

In [None]:
from google.adk.apps.app import EventsCompactionConfig

app = App(
    name='hello_world_app',
    root_agent=agent,
    events_compaction_config=EventsCompactionConfig(
        compaction_interval=5,
        overlap_size=1
    )
)

### Memory Generation (Multimodal)

**Description:** Generate memories from text and multimodal input

In [None]:
response = client.agent_engines.memories.generate(
    name=agent_engine_name,
    direct_contents_source={
        'events': [{
            'content': types.Content(
                role='user',
                parts=[
                    types.Part.from_text('Context about input'),
                    types.Part.from_bytes(data=BYTES, mime_type=MIME),
                    types.Part.from_uri(file_uri='path', mime_type=MIME)
                ]
            )
        }]
    },
    scope={'user_id': user_id}
)

### Memory Retrieval (Proactive)

**Description:** Automatically retrieve memories at start of every turn

In [None]:
from google.adk.tools.preload_memory_tool import PreloadMemoryTool

agent = LlmAgent(
    tools=[PreloadMemoryTool()]
)

### Memory Retrieval (Reactive)

**Description:** Agent decides when to retrieve memories

In [None]:
def load_memory(query: str, tool_context: ToolContext):
    '''Retrieves memories. Available info: user preferences, favorites...'''
    response = tool_context.search_memory(query)
    return response.memories

agent = LlmAgent(
    tools=[load_memory]
)

### Memory in System Instructions

**Description:** Dynamically add memories to system prompt

In [None]:
from jinja2 import Template

template = Template('''
{{ system_instructions }}
<MEMORIES>
{% for memory in data %}* {{ memory.memory.fact }}{% endfor %}
</MEMORIES>
''')

prompt = template.render(
    system_instructions=instructions,
    data=retrieved_memories
)

## Task A: Build a Session-Based Shopping Assistant with Compaction

**Objective:** Create an agent that maintains shopping cart state within a session and implements conversation compaction

**Difficulty:** Intermediate

**Estimated Time:** 45-60 minutes

### Requirements
- Build an agent that helps users shop for products
- Implement session state to track shopping cart items
- Add products to cart, remove products, view cart
- Implement token-based or turn-based session compaction
- Test with a long conversation (10+ turns)
- Verify compaction preserves cart state while reducing history

### Implementation Steps

#### Step 1: Define Cart Management Tools
- Create add_to_cart(product_name, quantity) function
- Create remove_from_cart(product_name) function
- Create view_cart() function
- All functions should accept tool_context parameter
- Access session state via tool_context to read/write cart data

In [None]:
def add_to_cart(product_name: str, quantity: int, tool_context: ToolContext) -> dict:
    '''Adds product to shopping cart'''
    # Access session state
    cart = tool_context._invocation_context.state.get('cart', [])
    
    # Add product
    cart.append({'product': product_name, 'quantity': quantity})
    
    # Update state
    tool_context._invocation_context.state['cart'] = cart
    
    return {'status': 'success', 'cart': cart}

#### Step 2: Create Shopping Agent
- Initialize LlmAgent with Gemini model
- Add cart management functions to tools list
- Write clear instructions for handling shopping requests
- Include guidance on when to use each tool

#### Step 3: Configure Session with Compaction
- Wrap agent in App with resumability enabled
- Add ContextFilterPlugin to keep last N turns (e.g., 5)
- OR configure EventsCompactionConfig for summarization
- Create Runner with session_service

In [None]:
app = App(
    name='shopping_assistant',
    root_agent=shopping_agent,
    resumability_config=ResumabilityConfig(is_resumable=True),
    plugins=[
        ContextFilterPlugin(num_invocations_to_keep=5)
    ]
)

#### Step 4: Test Long Conversation
- Generate session_id for conversation
- Add 5+ products to cart in separate turns
- Remove some products
- View cart multiple times
- Ask questions about products
- Verify cart state persists across all turns

#### Step 5: Verify Compaction
- Inspect session history after conversation
- Confirm older events are compacted/summarized
- Verify cart state remains accurate
- Test that agent can still access cart correctly

### Validation Criteria
- Cart state persists across entire conversation
- Session history shows compaction after threshold
- Agent correctly references cart contents after compaction
- No loss of critical cart data
- Token count reduced compared to full history

### Bonus Challenges
- Implement cart total calculation tool
- Add product search with mock inventory
- Implement checkout process with confirmation
- Add session timeout handling
- Persist cart to external storage for recovery

## Task B: Build a Memory-Enabled Personal Assistant with Custom Topics

**Objective:** Create an agent that generates, stores, and retrieves personalized memories using custom memory topics

**Difficulty:** Advanced

**Estimated Time:** 60-90 minutes

### Requirements
- Set up Agent Engine Memory Bank or equivalent memory service
- Define custom memory topics relevant to personal assistance
- Implement memory generation from conversations
- Build memory retrieval (proactive or reactive)
- Test personalization across multiple sessions
- Implement memory consolidation to handle updates

### Implementation Steps

#### Step 1: Setup Memory Bank
- Create Agent Engine in Google Cloud Vertex AI
- Configure memory_bank_config with custom topics
- Define custom topics: preferences, schedule, relationships, projects
- Provide few-shot examples for each topic type
- Initialize VertexAiMemoryBankService in ADK

In [None]:
memory_bank_config = {
    'customization_configs': [{
        'memory_topics': [
            {'custom_memory_topic': {
                'label': 'user_preferences',
                'description': 'User likes, dislikes, favorites'
            }},
            {'custom_memory_topic': {
                'label': 'user_schedule',
                'description': 'Meetings, appointments, availability'
            }}
        ],
        'generate_memories_examples': {
            'conversationSource': {'events': [...]},
            'generatedMemories': [{'fact': '...'}]
        }
    }]
}

#### Step 2: Create Memory Generation Tool
- Define generate_memories(tool_context) function
- Extract session from tool_context
- Call memory_service.add_session_to_memory(session)
- Set wait_for_completion=False for background processing
- Return success status to agent

In [None]:
def generate_memories(tool_context: ToolContext):
    '''Saves important info about the user for future conversations'''
    tool_context._invocation_context.memory_service.add_session_to_memory(
        tool_context._invocation_context.session
    )
    return {'status': 'memory_generation_triggered'}

#### Step 3: Create Memory Retrieval Strategy
- OPTION A: Use PreloadMemoryTool for automatic retrieval
- OPTION B: Create custom load_memory(query) tool
- Tool should search memories by semantic similarity
- Return formatted memory facts to agent
- Document available memory types in tool description

In [None]:
def load_memory(query: str, tool_context: ToolContext):
    '''Retrieves what you know about the user. Topics: preferences, schedule, relationships, projects'''
    response = tool_context.search_memory(query)
    return [{'fact': m.memory.fact} for m in response.memories]

#### Step 4: Build Personal Assistant Agent
- Create LlmAgent with both memory tools
- Write instructions to use memories for personalization
- Include guidance on when to generate new memories
- Add instructions for when to retrieve memories
- Configure Runner with memory_service

In [None]:
runner = Runner(
    app=app,
    session_service=session_service,
    memory_service=VertexAiMemoryBankService(
        agent_engine_id=AGENT_ENGINE_ID,
        project=PROJECT,
        location=LOCATION
    )
)

#### Step 5: Test Memory Lifecycle
- SESSION 1: User shares preferences (favorite food, meeting time)
- Trigger memory generation at end of session
- Wait for background generation to complete
- SESSION 2: Ask agent about user preferences
- Verify agent retrieves and uses correct memories
- SESSION 3: User updates preference (changes favorite food)
- Generate memories again
- Verify memory consolidation updated preference

#### Step 6: Test Cross-Session Personalization
- Create new session with same user_id
- Ask agent to make recommendation
- Verify agent uses memories from previous sessions
- Test that agent doesn't hallucinate - only uses stored memories

### Custom Memory Topics

**Example Topics:**
- **user_preferences:** User's likes, dislikes, favorites, and personal preferences
- **user_schedule:** Recurring meetings, appointments, availability patterns, time zones
- **user_relationships:** Important people, family members, colleagues, and their details
- **ongoing_projects:** Current work projects, goals, deadlines, and progress

**Few-Shot Example:**
- Conversation: [{'role': 'model', 'text': "What's your favorite cuisine?"}, {'role': 'user', 'text': 'I love Italian food, especially pasta carbonara'}]
- Expected Memories: [{'fact': 'The user's favorite cuisine is Italian.'}, {'fact': 'The user especially loves pasta carbonara.'}]

### Validation Criteria
- Memories generated from first conversation
- Memories successfully retrieved in subsequent sessions
- Agent personalizes responses based on memories
- Memory updates correctly handled via consolidation
- No memory leakage between different users
- Background memory generation doesn't block user experience

### Bonus Challenges
- Implement procedural memories for agent's successful strategies
- Add memory confidence scoring based on source type
- Build memory provenance tracking
- Implement memory pruning for stale information
- Create memory visualization dashboard
- Add support for multimodal memory sources (images, audio)
- Implement memory-based proactive suggestions

### Testing Scenarios

| Scenario          | Session 1 Description                                      | Session 2 Description                                      | Session 3 Description                                      |
|-------------------|------------------------------------------------------------|------------------------------------------------------------|------------------------------------------------------------|
| **Preference Learning** | User mentions they're vegetarian and prefer morning meetings | Agent suggests restaurant - should recommend vegetarian options | Agent schedules meeting - should prefer morning times |
| **Memory Update** | User says favorite color is blue                           | User says they changed their mind, favorite is green now   | Agent should remember green, not blue                      |
| **Relationship Context** | User mentions sister Sarah lives in Boston, birthday Nov 15 | Mid-November - agent should proactively mention Sarah's birthday | Planning Boston trip - agent should suggest visiting Sarah |

## Resources

### Documentation
- [Agent Engine Sessions](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/sessions/overview)
- [Agent Engine Memory Bank](https://cloud.google.com/agent-builder/agent-engine/memory-bank/set-up)
- [ADK Memory Guide](https://google.github.io/adk-docs/memory/)
- [ADK Callbacks](https://google.github.io/adk-docs/callbacks/)
- [Model Context Protocol](https://spec.modelcontextprotocol.io/)
- [Agent-to-Agent Protocol](https://agent2agent.info/docs/concepts/message/)

### Papers
- [In-Context Learning](https://arxiv.org/abs/2301.00234)
- [Long Context Limitations](https://ai.google.dev/gemini-api/docs/long-context)
- [Memory Systems](https://huggingface.co/blog/Kseniase/memory)
- [Atomic Facts](https://arxiv.org/pdf/2412.15266)
- [Memory Poisoning](https://arxiv.org/pdf/2503.03704)
- [RLHF on Google Cloud](https://cloud.google.com/blog/products/ai-machine-learning/rlhf-on-google-cloud)

### Videos
- [ADK Runtime Deep Dive](https://www.youtube.com/watch?v=44C8u0CDtSo)

## Best Practices

### Session Management
- Always use persistent storage in production (never in-memory for prod)
- Implement strict ACLs for session isolation between users
- Redact PII before persisting session data
- Set appropriate TTL policies for inactive sessions
- Filter or compact history before sending to agent
- Use deterministic ordering for event appending
- Monitor session read/write latency closely
- Implement proper error handling for session failures

### Memory Generation
- Run memory generation asynchronously in background
- Never block user response waiting for memory generation
- Define clear, specific custom memory topics
- Provide few-shot examples for complex topics
- Trigger generation at appropriate cadence (not every turn)
- Implement proper error handling and retry logic
- Track memory provenance for trustworthiness
- Regularly prune stale or low-confidence memories

### Memory Retrieval
- Combine relevance, recency, and importance in scoring
- Don't rely solely on semantic similarity
- Cache expensive retrieval operations when possible
- Set strict latency budgets for hot-path retrieval
- Consider memory-as-a-tool for efficiency
- Document available memory types in tool descriptions
- Test retrieval with diverse query types
- Monitor retrieval performance metrics continuously

### Memory Security
- Enforce strict per-user memory isolation with ACLs
- Sanitize and validate before persisting memories
- Use Model Armor or similar for PII redaction
- Provide user controls for opt-out and deletion
- Anonymize shared memories (procedural, application-level)
- Audit memory access patterns regularly
- Implement memory poisoning defenses
- Handle data deletion requests properly (derived data)

### Context Engineering
- Treat context construction as a mise en place process
- Dynamically select only relevant information for context
- Balance information completeness vs. token efficiency
- Use in-context learning with relevant few-shot examples
- Implement context compaction strategies proactively
- Monitor context window usage and costs
- Test with varying context sizes and compositions
- Profile hot-path vs. background operations

## Common Pitfalls

| Pitfall                                      | Solution                                                                 |
|---------------------------------------------|--------------------------------------------------------------------------|
| **Blocking user experience with memory generation** | Always use wait_for_completion=False and run generation in background    |
| **Processing same events multiple times**   | Track which events already contributed to memories, avoid redundant processing |
| **Relying only on semantic similarity for retrieval** | Blend relevance, recency, and importance scores                         |
| **Not handling memory conflicts**           | Implement consolidation with clear conflict resolution strategy          |
| **Storing too much in session state**       | Keep only temporary working memory in state, move facts to long-term memory |
| **Not compacting long conversations**       | Implement token-based or turn-based compaction early                    |
| **Sharing memories across users accidentally** | Always scope memories by user_id, test isolation rigorously              |
| **Treating memories as ground truth**       | Track provenance, confidence scores, validate against authoritative sources |
| **Not pruning stale memories**              | Implement time-based decay and periodic pruning                          |
| **Injecting memories without context**      | Add preambles explaining what the memories represent                     |

## Troubleshooting

### Session Issues
- **Symptom:** Session data not persisting across calls
- **Checks:**
  - Verify persistent storage is configured (not InMemorySessionService)
  - Confirm same session_id used across calls
  - Check database connectivity and permissions
  - Verify session isn't expired via TTL policy

### Memory Generation Failures
- **Symptom:** Memories not being created
- **Checks:**
  - Verify Agent Engine is properly configured
  - Check custom topic definitions are clear
  - Ensure conversation contains relevant information
  - Review memory generation API response for errors
  - Wait sufficient time for background processing
  - Check IAM permissions for memory service

### Memory Retrieval Empty
- **Symptom:** No memories returned when expected
- **Checks:**
  - Verify memories were actually generated (check database)
  - Confirm user_id matches between generation and retrieval
  - Test with broader/different queries
  - Check if memories were pruned or expired
  - Verify memory scope (user vs session vs application)
  - Review retrieval query is semantically related

### High Latency
- **Symptom:** Agent responses are slow
- **Checks:**
  - Profile where time is spent (context fetch, LLM, memory)
  - Implement session history compaction
  - Use proactive memory retrieval sparingly
  - Cache frequently retrieved memories
  - Optimize memory retrieval query complexity
  - Check database query performance
  - Reduce number of tools/memories in context

### Context Window Overflow
- **Symptom:** Context window exceeded errors
- **Checks:**
  - Implement ContextFilterPlugin or EventsCompactionConfig
  - Reduce number of turns kept in history
  - Summarize older conversation sections
  - Limit number of memories retrieved
  - Use recursive summarization for long sessions
  - Monitor token counts proactively

## Advanced Topics

### Procedural Memory
**Definition:** Memories that capture 'how to' perform tasks, not just 'what' facts

**Lifecycle Differences:**
- Extraction: Requires specialized prompts to distill reusable strategies
- Consolidation: Curates workflows, patches flawed steps, integrates best practices
- Retrieval: Finds plans that guide task execution, not just answer questions

**Comparison to Fine-Tuning:**
- **Fine-Tuning:** Slow, offline training process altering model weights
- **Procedural Memory:** Fast, online adaptation via in-context learning

**Use Cases:**
- Agent builds playbook of successful problem-solving strategies
- Learning optimal tool call sequences
- Capturing domain-specific workflows
- Self-improvement through experience

### Multimodal Memory

**From Multimodal Source:**
- Description: Process images/audio/video to create textual memories
- Example: Transcribe voice memo, extract memory: 'User frustrated about shipping delay'
- Current Standard: Most memory managers use this approach

**With Multimodal Content:**
- Description: Store non-textual media directly in memory
- Example: User uploads logo image, agent stores actual image file
- Challenges: Requires specialized models and infrastructure
- Current Status: Advanced implementation, less common

### Memory Consolidation Strategies
- **Deduplication:** Merge redundant memories mentioning same fact differently
- **Conflict Resolution:** Handle contradictory information with trust hierarchy
- **Evolution:** Update simple facts to more nuanced versions
- **Forgetting:** Prune stale, low-confidence, or irrelevant memories
- **Confidence Scoring:** Dynamic confidence based on corroboration and age

### Hybrid Architectures

**RAG + Memory:**
- Pattern: RAG for world facts, Memory for user knowledge
- Analogy: RAG is research librarian, Memory is personal assistant
- Implementation: Both retrieval systems work in parallel, context merges results

**Vector + Graph:**
- Pattern: Vector DB for semantic search, Knowledge Graph for relationships
- Advantage: Structured reasoning AND conceptual similarity
- Use Case: Complex queries requiring both entity relationships and semantic matching

### Multi-Agent Memory Sharing
- **Challenge:** Different frameworks have incompatible session schemas
- **Solution:** Memory as universal translation layer
- **Benefit:** Heterogeneous agents share common cognitive resource
- **Implementation:** All agents connect to same memory service
- **A2A Protocol:** Agent-to-Agent communication for message passing

## Real-World Examples

### Customer Support
- **Scenario:** Multi-tier support system with escalation
- **Session Strategy:** Shared history across tier agents
- **Memory Strategy:** User-level memories for past issues and preferences
- **Key Features:**
  - Track customer history across all interactions
  - Remember previous issues and resolutions
  - Personalize support based on customer preferences
  - Generate procedural memories for successful resolution patterns

### Personal Assistant
- **Scenario:** Scheduling, reminders, and personalization
- **Session Strategy:** Separate sessions per conversation
- **Memory Strategy:** Rich user profile with preferences, schedule, relationships
- **Key Features:**
  - Remember user preferences and habits
  - Track ongoing projects and goals
  - Maintain relationships context
  - Proactive suggestions based on patterns

### Educational Tutor
- **Scenario:** Personalized learning with progress tracking
- **Session Strategy:** Session per learning session
- **Memory Strategy:** Student profile, knowledge gaps, learning style
- **Key Features:**
  - Track student's knowledge level per topic
  - Remember learning style preferences
  - Adapt difficulty based on past performance
  - Generate procedural memories for effective teaching strategies

### Sales Assistant
- **Scenario:** Lead qualification and relationship management
- **Session Strategy:** Session per sales interaction
- **Memory Strategy:** Client preferences, business needs, interaction history
- **Key Features:**
  - Remember client's business context and pain points
  - Track previous conversations and proposals
  - Personalize pitch based on known preferences
  - Consolidate information across multiple touchpoints

## Quiz Questions

1. **Question:** What is the key difference between Context Engineering and Prompt Engineering?
   - Options:
     - Context Engineering is only for production systems
     - Context Engineering dynamically constructs state-aware prompts, while Prompt Engineering focuses on static system instructions
     - Prompt Engineering is more advanced
     - They are the same thing
   - **Correct:** 1
   - **Explanation:** Context Engineering evolved from Prompt Engineering to address the entire payload, dynamically constructing prompts based on user, history, and external data

2. **Question:** What is the primary purpose of a Session in agent systems?
   - Options:
     - Long-term storage of user preferences
     - Container for a single conversation with chronological history and working memory
     - External API integration
     - Model fine-tuning data
   - **Correct:** 1
   - **Explanation:** A Session encapsulates the immediate dialogue history and working memory for a single, continuous conversation tied to a specific user

3. **Question:** How does Memory differ from RAG?
   - Options:
     - Memory is for user-specific dynamic context, RAG is for shared factual knowledge
     - RAG is faster than Memory
     - Memory can't handle multimodal data
     - RAG is always more accurate
   - **Correct:** 0
   - **Explanation:** Memory creates personalized, stateful experiences from user dialogue, while RAG injects external, factual knowledge from static knowledge bases

4. **Question:** What are the two main stages of memory generation?
   - Options:
     - Reading and Writing
     - Extraction and Consolidation
     - Indexing and Querying
     - Training and Inference
   - **Correct:** 1
   - **Explanation:** Memory generation involves Extraction (filtering meaningful content) and Consolidation (merging, updating, or deleting memories to maintain coherence)

5. **Question:** Why should memory generation run asynchronously?
   - Options:
     - It's cheaper
     - It's more accurate
     - To avoid blocking user response with expensive LLM calls and database writes
     - It's a requirement of all memory systems
   - **Correct:** 2
   - **Explanation:** Memory generation is expensive and should run in the background after the agent responds to keep the user experience fast and responsive

6. **Question:** What is memory consolidation?
   - Options:
     - Compressing memories to save storage space
     - Self-editing process that merges, updates, or deletes memories to maintain coherent knowledge
     - Moving memories to cold storage
     - Converting text memories to embeddings
   - **Correct:** 1
   - **Explanation:** Consolidation is the sophisticated stage where new memories are integrated with existing ones through UPDATE, CREATE, or DELETE operations

7. **Question:** What is the recommended approach for handling long conversations?
   - Options:
     - Always send full history to maintain perfect context
     - Implement compaction strategies like summarization or truncation
     - Restart the conversation when it gets too long
     - Increase the context window indefinitely
   - **Correct:** 1
   - **Explanation:** Compaction strategies (token-based truncation, recursive summarization) reduce token count while preserving vital information

8. **Question:** What is memory provenance and why does it matter?
   - Options:
     - The location where memory is stored
     - Detailed record of memory's origin and history used to assess trustworthiness
     - The time when memory was created
     - The format of the memory data
   - **Correct:** 1
   - **Explanation:** Provenance tracks memory sources and history, enabling trust assessment, conflict resolution, and proper handling of derived data

9. **Question:** What's the difference between proactive and reactive memory retrieval?
   - Options:
     - Proactive is faster
     - Proactive auto-loads at every turn, reactive lets agent decide when to retrieve
     - Reactive is more accurate
     - They are the same
   - **Correct:** 1
   - **Explanation:** Proactive retrieval automatically loads memories every turn; reactive (memory-as-a-tool) lets the agent decide when retrieval is needed

10. **Question:** Why can't different agent frameworks directly share sessions?
    - Options:
      - Security restrictions
      - Different frameworks use incompatible internal schemas for storing session events
      - Sessions can only be stored locally
      - It's a technical limitation of LLMs
    - **Correct:** 1
    - **Explanation:** Each framework's session storage couples the database schema to its internal objects, making sessions framework-specific and non-portable

## Glossary

| Term                  | Definition                                                                 |
|-----------------------|----------------------------------------------------------------------------|
| **context_engineering** | The process of dynamically assembling and managing information within an LLM's context window |
| **session**           | Container for single conversation with chronological history and working memory |
| **memory**            | Long-term persistence mechanism capturing key information across multiple sessions |
| **event**             | Building block of conversation (user input, agent response, tool call, tool output) |
| **state**             | Structured working memory or scratchpad with temporary data |
| **extraction**        | Process of distilling meaningful information from source data into memories |
| **consolidation**     | Self-editing process comparing new information with existing memories to maintain coherence |
| **provenance**        | Detailed record of memory's origin and history for trustworthiness assessment |
| **compaction**        | Strategy to reduce conversation history size while preserving key information |
| **declarative_memory**| Knowledge of facts, figures, and events (knowing what) |
| **procedural_memory** | Knowledge of skills and workflows (knowing how) |
| **memory_as_a_tool**  | Pattern where agent decides when to generate or retrieve memories |
| **context_window**    | Maximum amount of text an LLM can process in a single API call |
| **token**             | Basic unit of text processing for LLMs, roughly 3/4 of a word |
| **embedding**         | Vector representation of text enabling semantic similarity search |
| **vector_database**   | Database storing embeddings for fast similarity search |
| **knowledge_graph**   | Network of entities and relationships for structured reasoning |
| **rag**               | Retrieval-Augmented Generation - injecting external knowledge into context |
| **llm**               | Large Language Model - AI model trained on vast text data |
| **a2a**               | Agent-to-Agent protocol for message passing between agents |
| **mcp**               | Model Context Protocol - standard for agent tool integration |
| **acl**               | Access Control List - defines who can access what resources |
| **pii**               | Personally Identifiable Information requiring protection |
| **ttl**               | Time-To-Live - how long data persists before automatic deletion |
| **hot_path**          | Critical code path that must execute quickly (user-facing) |
| **cold_storage**      | Low-cost storage for infrequently accessed data |

## Summary

### Key Takeaways
- Context Engineering is the foundation of stateful AI - dynamically assembling information for each LLM call
- Sessions manage immediate conversation state; Memory provides long-term persistence
- Memory generation is an LLM-driven ETL pipeline: Ingestion → Extraction → Consolidation → Storage
- Always run memory generation asynchronously to avoid blocking user experience
- Memory differs from RAG: Memory is personalized user context, RAG is shared factual knowledge
- Implement compaction strategies early to manage long conversations
- Track memory provenance for trustworthiness and conflict resolution
- Use hybrid retrieval: blend relevance, recency, and importance scores
- Enforce strict isolation and security: ACLs, PII redaction, user controls
- Memory enables true personalization and agent self-improvement

### Production Checklist
- ✓ Use persistent storage for sessions (not in-memory)
- ✓ Implement strict per-user isolation with ACLs
- ✓ Redact PII before persisting data
- ✓ Configure appropriate TTL policies
- ✓ Implement session compaction strategy
- ✓ Run memory generation asynchronously
- ✓ Define clear custom memory topics
- ✓ Track memory provenance and confidence
- ✓ Implement memory pruning strategy
- ✓ Monitor performance metrics (latency, token usage)
- ✓ Test memory isolation rigorously
- ✓ Provide user controls for data deletion
- ✓ Set up error handling and retry logic
- ✓ Profile hot-path operations
- ✓ Implement comprehensive evaluation metrics

## Next Steps

### After Completing Tasks
- Experiment with different compaction strategies and measure impact
- Build multi-user system and test memory isolation
- Implement procedural memories for agent self-improvement
- Create memory visualization and debugging tools
- Test with multimodal memory sources (images, audio)
- Build hybrid RAG + Memory architecture
- Implement advanced retrieval with query rewriting and reranking
- Deploy to production with monitoring and evaluation
- Optimize for cost and latency at scale
- Explore Agent-to-Agent communication patterns

### Continue Learning
- Day 4: Multi-agent orchestration patterns
- Day 5: Production deployment and monitoring
- Advanced: Fine-tuning vs. in-context learning
- Advanced: Building custom memory managers
- Advanced: Memory-augmented reasoning