# Memento Framework Tutorial

Learn how to build agents that learn from experience using case-based reasoning.

## Overview

This tutorial covers:
1. **Basic Memento Setup**: Initialize agents with memory
2. **Case-Based Reasoning**: Store and retrieve past experiences
3. **Parametric Memory**: Train neural retrievers
4. **Integration Patterns**: Combine with other frameworks
5. **Production Deployment**: Best practices

**Prerequisites**: Understanding of LLM agents and Python

## Part 1: Understanding Case-Based Reasoning

### What is CBR?

Case-Based Reasoning solves new problems by:
1. **Retrieve**: Find similar past cases
2. **Reuse**: Adapt solutions from those cases
3. **Revise**: Test and refine the solution
4. **Retain**: Store the new experience

### Simple Example

In [None]:
# Simple in-memory case store

class SimpleCase:
    def __init__(self, problem, solution, success):
        self.problem = problem
        self.solution = solution
        self.success = success

class SimpleCBR:
    def __init__(self):
        self.cases = []
    
    def retrieve(self, problem, k=3):
        """Retrieve most similar cases (simple keyword matching)."""
        scored = []
        for case in self.cases:
            # Simple similarity: count common words
            problem_words = set(problem.lower().split())
            case_words = set(case.problem.lower().split())
            similarity = len(problem_words & case_words)
            scored.append((similarity, case))
        
        # Return top-k
        scored.sort(reverse=True, key=lambda x: x[0])
        return [case for _, case in scored[:k]]
    
    def store(self, problem, solution, success):
        """Store a new case."""
        case = SimpleCase(problem, solution, success)
        self.cases.append(case)

# Example usage
cbr = SimpleCBR()

# Store some experiences
cbr.store(
    problem="Find all PDF files in Documents",
    solution="find ~/Documents -name '*.pdf'",
    success=True
)

cbr.store(
    problem="Find all image files modified yesterday",
    solution="find . -name '*.jpg' -mtime -1",
    success=True
)

cbr.store(
    problem="Count lines in all Python files",
    solution="find . -name '*.py' | xargs wc -l",
    success=True
)

# Retrieve similar cases for a new problem
new_problem = "Find all text files in Downloads"
similar_cases = cbr.retrieve(new_problem, k=2)

print(f"New Problem: {new_problem}\n")
print("Similar Past Cases:")
for i, case in enumerate(similar_cases, 1):
    print(f"\n{i}. {case.problem}")
    print(f"   Solution: {case.solution}")
    print(f"   Success: {case.success}")

### Key Insight

The agent can now learn from experience:
- Sees that similar file-finding problems use `find` command
- Adapts the pattern for the new file type (txt vs pdf)
- No model fine-tuning needed!

## Part 2: Semantic Retrieval with Embeddings

Simple keyword matching is limited. Let's use embeddings for semantic similarity.

In [None]:
# Install required packages
# !pip install sentence-transformers numpy

In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np
from typing import List, Tuple

class SemanticCase:
    def __init__(self, problem, solution, success, embedding=None):
        self.problem = problem
        self.solution = solution
        self.success = success
        self.embedding = embedding

class SemanticCBR:
    def __init__(self, model_name='all-MiniLM-L6-v2'):
        self.encoder = SentenceTransformer(model_name)
        self.cases = []
    
    def store(self, problem, solution, success):
        """Store case with embedding."""
        embedding = self.encoder.encode(problem)
        case = SemanticCase(problem, solution, success, embedding)
        self.cases.append(case)
    
    def retrieve(self, problem: str, k: int = 3, 
                 success_only: bool = False) -> List[Tuple[float, SemanticCase]]:
        """Retrieve most semantically similar cases."""
        # Encode query
        query_embedding = self.encoder.encode(problem)
        
        # Compute cosine similarity with all cases
        scored = []
        for case in self.cases:
            if success_only and not case.success:
                continue
            
            # Cosine similarity
            similarity = np.dot(query_embedding, case.embedding) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(case.embedding)
            )
            scored.append((similarity, case))
        
        # Sort by similarity
        scored.sort(reverse=True, key=lambda x: x[0])
        return scored[:k]

print("Semantic CBR system ready!")

In [None]:
# Create semantic CBR system
semantic_cbr = SemanticCBR()

# Store diverse experiences
experiences = [
    ("Search for emails from john@company.com", "email_search sender:john@company.com", True),
    ("Find messages sent yesterday", "email_search date:yesterday", True),
    ("Calculate average of numbers in file", "awk '{sum+=$1} END {print sum/NR}' data.txt", True),
    ("Get weather forecast for Tokyo", "curl 'wttr.in/Tokyo'", True),
    ("Download file from URL", "wget https://example.com/file.pdf", False),  # Failed
    ("Extract URLs from HTML", "grep -oP 'href=\"\K[^\"]+' page.html", True),
    ("Convert image format", "convert input.png output.jpg", True),
]

for problem, solution, success in experiences:
    semantic_cbr.store(problem, solution, success)

print(f"Stored {len(semantic_cbr.cases)} cases in memory\n")

# Test semantic retrieval
test_queries = [
    "Find emails from alice@example.com",
    "Compute mean of values in dataset",
    "Check weather in London"
]

for query in test_queries:
    print(f"Query: {query}")
    print("="*80)
    
    similar = semantic_cbr.retrieve(query, k=2, success_only=True)
    
    for i, (score, case) in enumerate(similar, 1):
        print(f"\n{i}. Similarity: {score:.3f}")
        print(f"   Problem: {case.problem}")
        print(f"   Solution: {case.solution}")
    
    print("\n" + "="*80 + "\n")

### Observation

Semantic retrieval finds **conceptually similar** cases, not just keyword matches:
- "Find emails from alice" → "Search for emails from john"
- "Compute mean" → "Calculate average"
- "Check weather in London" → "Get weather forecast for Tokyo"

This is the foundation of Memento's memory system!

## Part 3: Building a Simple Memory-Augmented Agent

Let's create an agent that uses CBR to solve tasks.

In [None]:
class MemoryAugmentedAgent:
    """Agent that learns from experience."""
    
    def __init__(self, cbr_system: SemanticCBR):
        self.cbr = cbr_system
    
    def solve(self, task: str) -> str:
        """Solve a task using memory."""
        print(f"\nTask: {task}")
        print("-" * 80)
        
        # 1. Retrieve relevant memories
        similar_cases = self.cbr.retrieve(task, k=3, success_only=True)
        
        if not similar_cases:
            print("No relevant memories found. Solving from scratch...")
            return self._solve_from_scratch(task)
        
        print(f"\nRetrieved {len(similar_cases)} relevant memories:\n")
        
        # 2. Show retrieved cases
        for i, (score, case) in enumerate(similar_cases, 1):
            print(f"{i}. [{score:.3f}] {case.problem}")
            print(f"   → {case.solution}\n")
        
        # 3. Adapt solution from most similar case
        best_score, best_case = similar_cases[0]
        
        print(f"Adapting solution from most similar case (score={best_score:.3f})...")
        
        # Simple adaptation (in practice, use LLM)
        adapted_solution = self._adapt_solution(task, best_case)
        
        print(f"\nProposed Solution: {adapted_solution}\n")
        
        return adapted_solution
    
    def learn(self, task: str, solution: str, success: bool):
        """Store experience in memory."""
        self.cbr.store(task, solution, success)
        print(f"✓ Stored experience: {task} [{('Success' if success else 'Failure')}]")
    
    def _solve_from_scratch(self, task: str) -> str:
        """Fallback when no memories available."""
        # In practice, use LLM without memory context
        return "[No memory guidance - would use base LLM]"
    
    def _adapt_solution(self, task: str, case: SemanticCase) -> str:
        """Adapt solution from similar case."""
        # In practice, prompt LLM to adapt
        # For demo, just show the retrieved solution
        return f"[Adapted from: {case.solution}]"

print("Memory-augmented agent created!")

In [None]:
# Create agent with the CBR system we built earlier
agent = MemoryAugmentedAgent(semantic_cbr)

# Test on new tasks
new_tasks = [
    "Find all emails from support@company.com in the last week",
    "What's the weather in Paris?",
    "Calculate sum of numbers in data.csv"
]

for task in new_tasks:
    solution = agent.solve(task)
    print("=" * 80 + "\n")

### Key Benefits

1. **Faster**: Reuse known solutions instead of reasoning from scratch
2. **More Accurate**: Learn from successful strategies
3. **Continual**: Improves over time as more experiences accumulate
4. **Efficient**: No model fine-tuning required

## Part 4: Parametric Memory (Neural Retriever)

So far we used fixed embeddings. Memento trains a neural network to **learn** which cases are most relevant.

In [None]:
# Install PyTorch if needed
# !pip install torch transformers

In [None]:
import torch
import torch.nn as nn
from transformers import AutoModel, AutoTokenizer

class MemoryRetrieverClassifier(nn.Module):
    """
    Neural network that learns to score query-case relevance.
    
    Architecture:
    1. Encode [QUERY] query [CASE] case [PLAN] plan
    2. Extract [CLS] embedding
    3. Binary classifier: relevant or not
    """
    
    def __init__(self, model_name='sentence-transformers/all-MiniLM-L6-v2'):
        super().__init__()
        
        # Pre-trained encoder
        self.encoder = AutoModel.from_pretrained(model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        
        # Classification head
        hidden_size = self.encoder.config.hidden_size
        self.classifier = nn.Sequential(
            nn.Linear(hidden_size, 256),
            nn.ReLU(),
            nn.Dropout(0.1),
            nn.Linear(256, 2)  # [irrelevant, relevant]
        )
    
    def forward(self, query: str, case: str):
        """Score query-case pair."""
        # Format input
        text = f"[QUERY] {query} [CASE] {case}"
        
        # Tokenize
        inputs = self.tokenizer(
            text,
            return_tensors='pt',
            truncation=True,
            max_length=512
        )
        
        # Encode
        outputs = self.encoder(**inputs)
        
        # [CLS] embedding
        cls_embedding = outputs.last_hidden_state[:, 0, :]
        
        # Classify
        logits = self.classifier(cls_embedding)
        
        return logits
    
    def predict_relevance(self, query: str, case: str) -> float:
        """Predict relevance score (0-1)."""
        with torch.no_grad():
            logits = self.forward(query, case)
            probs = torch.softmax(logits, dim=1)
            relevance_score = probs[0, 1].item()  # P(relevant)
        return relevance_score

print("Neural retriever model defined!")

### Training the Retriever

The key insight: **Learn from task outcomes**

- If a retrieved case led to success → Positive training example
- If a retrieved case led to failure → Negative training example

In [None]:
# Simulate training data

training_examples = [
    # (query, case, label)
    # Label: 1 = relevant (led to success), 0 = irrelevant (led to failure)
    
    # Positive examples (relevant retrievals)
    ("Find emails from bob@company.com", "Search for emails from john@company.com", 1),
    ("Calculate mean of values", "Calculate average of numbers in file", 1),
    ("Weather in London", "Get weather forecast for Tokyo", 1),
    
    # Negative examples (irrelevant retrievals)
    ("Find emails from bob@company.com", "Convert image format", 0),
    ("Calculate mean of values", "Download file from URL", 0),
    ("Weather in London", "Extract URLs from HTML", 0),
]

print(f"Training dataset: {len(training_examples)} examples")
print(f"  Positive: {sum(1 for _, _, label in training_examples if label == 1)}")
print(f"  Negative: {sum(1 for _, _, label in training_examples if label == 0)}")

In [None]:
# Training loop (simplified)

def train_retriever(model, training_data, epochs=3, lr=2e-5):
    """Train the retriever model."""
    optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()
    
    model.train()
    
    for epoch in range(epochs):
        total_loss = 0
        correct = 0
        
        for query, case, label in training_data:
            # Forward pass
            logits = model(query, case)
            
            # Compute loss
            labels = torch.tensor([label])
            loss = criterion(logits, labels)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # Track metrics
            total_loss += loss.item()
            pred = torch.argmax(logits, dim=1).item()
            if pred == label:
                correct += 1
        
        # Print epoch stats
        avg_loss = total_loss / len(training_data)
        accuracy = correct / len(training_data)
        print(f"Epoch {epoch+1}/{epochs}: Loss={avg_loss:.4f}, Accuracy={accuracy:.2%}")
    
    model.eval()
    return model

# Initialize and train (commented out to avoid long execution)
# retriever = MemoryRetrieverClassifier()
# retriever = train_retriever(retriever, training_examples)

print("\nTraining procedure defined!")
print("In practice, this runs on 1000s of examples to learn optimal retrieval.")

### Why Train a Retriever?

**Fixed Embeddings** (Part 2):
- Use off-the-shelf sentence encoders
- No task-specific tuning
- May retrieve irrelevant cases

**Learned Retriever** (Part 4):
- ✅ Learns task-specific relevance
- ✅ Improves from experience
- ✅ Adapts to user's domain
- ✅ Higher quality retrievals → better performance

## Part 5: Framework Comparison

### Option 1: Pure Memento

Use Memento's hierarchical agent out-of-the-box.

In [None]:
# Pseudocode for pure Memento approach

"""
from Memento.client.parametric_memory_cbr import HierarchicalClientWithMemory
from Memento.memory.parametric_memory import CaseRetriever

# 1. Load trained retriever
retriever = CaseRetriever.load('models/retriever.pt')

# 2. Initialize agent
agent = HierarchicalClientWithMemory(
    planner_model='gpt-4-turbo',
    executor_model='o3-mini',
    memory_retriever=retriever,
    memory_top_k=8
)

# 3. Run task
result = agent.run('Find the population of Tokyo')
print(result.answer)
"""

print("Pure Memento Approach:")
print("+  Fixed architecture (planner + executor)")
print("+  Built-in memory system")
print("+  Automatic training pipeline")
print("-  Less flexibility in agent design")
print("-  Research codebase (not production-hardened)")

### Option 2: LangGraph + Memory Concepts

Use LangGraph for agent framework, add Memento-style memory.

In [None]:
# Pseudocode for LangGraph + Memory

"""
from langgraph.prebuilt import create_react_agent
from langchain_core.tools import Tool

class MemoryAugmentedLangGraphAgent:
    def __init__(self, llm, tools, memory_retriever):
        self.memory = memory_retriever
        self.agent = create_react_agent(llm, tools)
    
    def run(self, query):
        # 1. Retrieve relevant cases
        cases = self.memory.retrieve(query, k=5)
        
        # 2. Augment query with memory
        memory_context = format_cases(cases)
        augmented_query = f"{memory_context}\n\nTask: {query}"
        
        # 3. Run LangGraph agent
        result = self.agent.invoke({'messages': [('user', augmented_query)]})
        
        # 4. Store experience
        self.memory.store(query, result, success=True)
        
        return result
"""

print("LangGraph + Memory Approach:")
print("+  Production-ready framework (LangGraph)")
print("+  Flexible agent architecture")
print("+  Native tool integration")
print("+  Human-in-the-loop support")
print("-  Need to implement training pipeline")
print("-  More setup required")

### Option 3: Custom from Scratch

Build everything yourself for maximum control.

In [None]:
# Custom approach (what we built in Parts 1-4!)

print("Custom Approach:")
print("+  Complete control over architecture")
print("+  Can implement novel algorithms")
print("+  No framework constraints")
print("-  Most implementation work")
print("-  Need to handle edge cases")
print("-  Slower development")

### Recommendation

**For Production Systems**: LangGraph + Memory
- Use battle-tested framework
- Add memory components from Memento
- Best of both worlds

**For Research**: Pure Memento or Custom
- Memento: Fast prototyping with memory
- Custom: Maximum flexibility for novel ideas

**For Learning**: Start with custom (Parts 1-4)
- Understand core concepts
- Then use frameworks

## Part 6: Integration with ARE

Combining Memento's learning with ARE's evaluation environment.

In [None]:
# High-level integration architecture

print("""
ARE + Memento Integration:

1. ARE provides dynamic evaluation environment
   - Complex scenarios
   - Rich tool ecosystem
   - Success/failure validation

2. Agent executes tasks
   - Retrieves relevant memories
   - Plans and executes
   - Uses ARE tools

3. Derive rewards from outcomes
   - Binary: Success = 1, Failure = 0
   - Shaped: Include efficiency, speed

4. Store experiences in Memento
   - Full execution trace
   - Tools used
   - Reward signal

5. Train retriever periodically
   - Batch training every N scenarios
   - Improve retrieval policy
   - Deploy updated model

Result: Agent improves over time!
""")

## Summary

### What We Learned

1. **Case-Based Reasoning**: Solve problems using past experiences
2. **Semantic Retrieval**: Find relevant cases using embeddings
3. **Memory-Augmented Agents**: Agents that learn from experience
4. **Parametric Memory**: Train neural networks for better retrieval
5. **Framework Options**: LangGraph, Memento, or custom
6. **ARE Integration**: Continual learning in evaluation environments

### Key Insights

- ✅ **No Fine-Tuning**: Learn without updating LLM weights
- ✅ **Continual Learning**: Improve automatically from experience
- ✅ **Task-Agnostic**: Works across diverse problem types
- ✅ **Scalable**: Add more memories without retraining base model

### Next Steps

1. Read [MEMENTO_GUIDE.md](../MEMENTO_GUIDE.md) for deeper dive
2. Read [INTEGRATION_ARE_MEMENTO.md](../INTEGRATION_ARE_MEMENTO.md) for full architecture
3. Try ARE tutorials: [01_understanding_are_framework.ipynb](01_understanding_are_framework.ipynb)
4. Explore [03_agent_with_memory.ipynb](03_agent_with_memory.ipynb) for advanced memory patterns
5. Implement your own memory-augmented agent!

### References

- **Memento Repository**: https://github.com/Agent-on-the-Fly/Memento
- **ARE Repository**: https://github.com/facebookresearch/meta-agents-research-environments
- **GAIA Benchmark**: https://huggingface.co/gaia-benchmark
- **LangGraph**: https://langchain-ai.github.io/langgraph/