# procedural_memory

> ReasoningBank-style procedural memory for RLM
>
> This module implements a closed-loop memory system that learns reusable strategies from RLM trajectories.
> After each run, the system judges success/failure, extracts procedural memories, and retrieves relevant
> memories for future tasks using BM25.

In [None]:
#| default_exp procedural_memory

## Overview

This module implements **Stage 2.5: Procedural Memory Loop** inspired by the ReasoningBank paper. The goal is to enable an RLM agent to improve over time by accumulating procedural knowledge (strategies, templates, debugging moves) without replacing evidence-based retrieval.

### Closed-Loop Cycle

```
┌──────────┐    ┌──────────┐    ┌──────────┐
│ RETRIEVE │───▶│ INTERACT │───▶│ EXTRACT  │
│ (BM25)   │    │ (rlm_run)│    │ (Judge + │
└────▲─────┘    └──────────┘    │ Extractor)│
     │                          └─────┬─────┘
     │                                │
     │          ┌──────────┐          │
     └──────────│  STORE   │◀─────────┘
                │ (JSON)   │
                └──────────┘
```

### Design Principles

1. **Procedural, not episodic**: Memories are strategies/checklists, not retellings
2. **Bounded injection**: Only title + description + 3 key bullets in prompts
3. **Evidence-sensitive judgment**: Success requires grounding in retrieved evidence
4. **Keyword retrieval**: BM25 over title/description/tags (deterministic, offline)
5. **Append-only storage**: Simple JSON file for experimentation

### Reference

- [ReasoningBank Paper](https://arxiv.org/html/2509.25140v1)

## Imports

In [None]:
#| export
from dataclasses import dataclass, field, asdict
from typing import Optional
from pathlib import Path
from datetime import datetime
import json
import uuid
from rank_bm25 import BM25Okapi

In [None]:
#| export
from rlm.core import llm_query, rlm_run
from rlm._rlmpaper_compat import RLMIteration

## Memory Schema

A `MemoryItem` represents a reusable procedural insight extracted from an RLM trajectory.

**Constraints**:
- Items must be small enough to inject into prompts
- `content` should be procedural (steps/checklist), not a retelling
- Up to 3 items extracted per trajectory

In [None]:
#| export
@dataclass
class MemoryItem:
    """A reusable procedural memory extracted from an RLM trajectory.
    
    Attributes:
        id: Unique identifier (UUID)
        title: Concise identifier (≤10 words)
        description: One-sentence summary
        content: Procedural steps/checklist/template (Markdown)
        source_type: 'success' or 'failure'
        task_query: Original task that produced this memory
        created_at: ISO timestamp
        access_count: Number of times retrieved (for future consolidation)
        tags: Keywords for BM25 retrieval
        session_id: Optional session ID from DatasetMeta (links to dataset session)
    """
    id: str
    title: str
    description: str
    content: str
    source_type: str  # 'success' or 'failure'
    task_query: str
    created_at: str
    access_count: int = 0
    tags: Optional[list[str]] = None
    session_id: Optional[str] = None  # NEW: Links to DatasetMeta.session_id
    
    def to_dict(self) -> dict:
        """Convert to dictionary for JSON serialization."""
        return asdict(self)
    
    @classmethod
    def from_dict(cls, data: dict) -> 'MemoryItem':
        """Create MemoryItem from dictionary."""
        return cls(**data)

In [None]:
# Test MemoryItem creation and serialization
test_item = MemoryItem(
    id='test-uuid',
    title='SPARQL Query Pattern',
    description='Template for searching entities by label.',
    content='- Use `rdfs:label` for human-readable names\n- Add FILTER for case-insensitive search',
    source_type='success',
    task_query='Find entities named "Activity"',
    created_at=datetime.utcnow().isoformat(),
    tags=['sparql', 'search', 'rdfs']
)

# Test roundtrip
data = test_item.to_dict()
restored = MemoryItem.from_dict(data)
assert restored.title == test_item.title
assert restored.tags == test_item.tags
print("✓ MemoryItem serialization works")

✓ MemoryItem serialization works


  created_at=datetime.utcnow().isoformat(),


## Memory Store

Persistent storage for procedural memories using a simple JSON file format.

In [None]:
#| export
@dataclass
class MemoryStore:
    """Persistent storage for procedural memories.
    
    Attributes:
        memories: List of MemoryItem objects
        path: Path to JSON file
    """
    memories: list[MemoryItem] = field(default_factory=list)
    path: Optional[Path] = None
    
    def add(self, item: MemoryItem) -> str:
        """Add a memory item to the store.
        
        Returns:
            Status message
        """
        self.memories.append(item)
        return f"Added memory '{item.title}' (id={item.id})"
    
    def save(self) -> str:
        """Persist memories to JSON file.
        
        Returns:
            Status message with path and count
        """
        if self.path is None:
            return "No path configured - not saving"
        
        self.path.parent.mkdir(parents=True, exist_ok=True)
        data = [m.to_dict() for m in self.memories]
        
        with open(self.path, 'w') as f:
            json.dump(data, f, indent=2)
        
        return f"Saved {len(self.memories)} memories to {self.path}"
    
    @classmethod
    def load(cls, path: Path) -> 'MemoryStore':
        """Load memories from JSON file.
        
        Returns:
            MemoryStore instance with loaded memories
        """
        if not path.exists():
            return cls(memories=[], path=path)
        
        with open(path, 'r') as f:
            data = json.load(f)
        
        memories = [MemoryItem.from_dict(item) for item in data]
        return cls(memories=memories, path=path)
    
    def get_corpus_for_bm25(self) -> list[list[str]]:
        """Build corpus for BM25 indexing.
        
        Each document is title + description + tags, tokenized.
        
        Returns:
            List of tokenized documents
        """
        corpus = []
        for m in self.memories:
            text = f"{m.title} {m.description}"
            if m.tags:
                text += " " + " ".join(m.tags)
            corpus.append(text.lower().split())
        return corpus

In [None]:
# Test MemoryStore save/load roundtrip
import tempfile

with tempfile.TemporaryDirectory() as tmpdir:
    test_path = Path(tmpdir) / 'test_memories.json'
    
    # Create store and add items
    store = MemoryStore(path=test_path)
    item1 = MemoryItem(
        id=str(uuid.uuid4()),
        title='Test Memory 1',
        description='First test memory',
        content='- Step 1\n- Step 2',
        source_type='success',
        task_query='test task 1',
        created_at=datetime.utcnow().isoformat(),
        tags=['test', 'example']
    )
    item2 = MemoryItem(
        id=str(uuid.uuid4()),
        title='Test Memory 2',
        description='Second test memory',
        content='- Action A\n- Action B',
        source_type='failure',
        task_query='test task 2',
        created_at=datetime.utcnow().isoformat(),
        tags=['test']
    )
    
    store.add(item1)
    store.add(item2)
    store.save()
    
    # Load and verify
    loaded = MemoryStore.load(test_path)
    assert len(loaded.memories) == 2
    assert loaded.memories[0].title == 'Test Memory 1'
    assert loaded.memories[1].source_type == 'failure'
    assert loaded.memories[0].tags == ['test', 'example']
    
    # Test corpus generation
    corpus = loaded.get_corpus_for_bm25()
    assert len(corpus) == 2
    assert 'test' in corpus[0]  # From title and tags
    
    print("✓ MemoryStore save/load/corpus works")

✓ MemoryStore save/load/corpus works


  created_at=datetime.utcnow().isoformat(),
  created_at=datetime.utcnow().isoformat(),


## Trajectory Artifact

Extract a bounded representation of an RLM run for the judge and extractor.

**Purpose**: Summarize iterations into key steps (~10 max) with actions and outcomes.

In [None]:
#| export
def extract_trajectory_artifact(
    task: str,
    answer: str,
    iterations: list[RLMIteration],
    ns: dict
) -> dict:
    """Create bounded trajectory artifact for judge/extractor.
    
    Summarizes each iteration's code blocks into 1-2 line "action + outcome",
    limiting to ~10 most informative key steps.
    
    Args:
        task: Original task query
        answer: Final answer from rlm_run
        iterations: List of RLMIteration objects
        ns: Final namespace dict
    
    Returns:
        Dictionary with keys:
        - task: str
        - final_answer: str
        - iteration_count: int
        - converged: bool (whether final_answer was set)
        - key_steps: List of {iteration, action, outcome}
        - variables_created: List of variable names in ns
        - errors_encountered: List of error messages from stderr
    """
    key_steps = []
    errors = []
    
    for i, iteration in enumerate(iterations, 1):
        # Summarize code blocks in this iteration
        for block in iteration.code_blocks:
            # Extract action from code (first line or summary)
            code_lines = block.code.strip().split('\n')
            action = code_lines[0][:80] if code_lines else "[empty code]"
            
            # Extract outcome from result
            if block.result and block.result.stderr:
                outcome = f"ERROR: {block.result.stderr[:100]}"
                errors.append(block.result.stderr)
            elif block.result and block.result.stdout:
                outcome = block.result.stdout[:100]
            else:
                outcome = "(no output)"
            
            key_steps.append({
                'iteration': i,
                'action': action,
                'outcome': outcome
            })
    
    # Limit to 10 most informative steps (prioritize errors and final steps)
    if len(key_steps) > 10:
        # Keep first 3, last 4, and up to 3 with errors
        error_steps = [s for s in key_steps if 'ERROR' in s['outcome']]
        key_steps = key_steps[:3] + error_steps[:3] + key_steps[-4:]
        # Remove duplicates while preserving order
        seen = set()
        key_steps = [s for s in key_steps if not (tuple(s.items()) in seen or seen.add(tuple(s.items())))]
        key_steps = key_steps[:10]
    
    return {
        'task': task,
        'final_answer': answer,
        'iteration_count': len(iterations),
        'converged': bool(answer and answer != "No answer provided"),
        'key_steps': key_steps,
        'variables_created': list(ns.keys()) if ns else [],
        'errors_encountered': errors
    }

In [None]:
# Test with mock iterations
from rlm._rlmpaper_compat import CodeBlock, REPLResult

mock_block1 = CodeBlock(
    code="search('Activity')",
    result=REPLResult(stdout="Found 3 entities", stderr=None, locals={})
)
mock_block2 = CodeBlock(
    code="describe_entity('prov:Activity')",
    result=REPLResult(stdout="prov:Activity is a class", stderr=None, locals={})
)
mock_iteration = RLMIteration(
    prompt="test prompt",
    response="test response",
    code_blocks=[mock_block1, mock_block2],
    final_answer=None,
    iteration_time=0.5
)

artifact = extract_trajectory_artifact(
    task="What is prov:Activity?",
    answer="prov:Activity is a class",
    iterations=[mock_iteration],
    ns={'result': 'prov:Activity is a class'}
)

assert artifact['task'] == "What is prov:Activity?"
assert artifact['iteration_count'] == 1
assert artifact['converged'] == True
assert len(artifact['key_steps']) == 2
assert 'search' in artifact['key_steps'][0]['action'].lower()
assert len(artifact['variables_created']) == 1
print("✓ Trajectory artifact extraction works")

✓ Trajectory artifact extraction works


## Judge

Classify trajectory as success or failure with evidence-sensitivity.

**Success criteria**:
1. Answer directly addresses the task
2. Answer is grounded in retrieved evidence (not hallucinated)
3. Reasoning shows systematic exploration

**Failure indicators**:
1. No answer produced (didn't converge)
2. Answer doesn't address the task
3. Answer makes claims without supporting evidence

In [None]:
#| export
def judge_trajectory(artifact: dict, ns: dict = None) -> dict:
    """Judge trajectory success using llm_query.
    
    Evidence-sensitive: success requires grounding in retrieved evidence.
    
    Args:
        artifact: Trajectory artifact from extract_trajectory_artifact()
        ns: Optional namespace for additional context
    
    Returns:
        Dictionary with keys:
        - is_success: bool
        - reason: str
        - confidence: str ('high', 'medium', 'low')
        - missing: list[str] (what evidence was lacking if failure)
    """
    # Format key steps for prompt
    steps_text = "\n".join([
        f"  {s['iteration']}. {s['action']} → {s['outcome']}"
        for s in artifact['key_steps']
    ])
    
    prompt = f"""Evaluate this RLM trajectory for task completion quality.

Task: {artifact['task']}
Final Answer: {artifact['final_answer']}
Converged: {artifact['converged']}
Key Steps:
{steps_text}

A trajectory is SUCCESSFUL if:
1. The answer directly addresses the task
2. The answer is grounded in retrieved evidence (not hallucinated)
3. The reasoning steps show systematic exploration

A trajectory FAILED if:
1. No answer was produced (didn't converge)
2. Answer doesn't address the task
3. Answer makes claims without supporting evidence

Return ONLY valid JSON:
{{"is_success": true/false, "reason": "...", "confidence": "high/medium/low", "missing": ["..."]}}"""
    
    # Use llm_query to get judgment (create temp namespace)
    temp_ns = ns if ns is not None else {}
    response = llm_query(prompt, temp_ns, name='judgment_response')
    
    # Parse JSON response
    try:
        # Try to extract JSON from response
        response_text = response.strip()
        if '```json' in response_text:
            response_text = response_text.split('```json')[1].split('```')[0].strip()
        elif '```' in response_text:
            response_text = response_text.split('```')[1].split('```')[0].strip()
        
        judgment = json.loads(response_text)
        
        # Ensure required fields
        if 'missing' not in judgment:
            judgment['missing'] = []
        
        return judgment
    except (json.JSONDecodeError, IndexError) as e:
        # Fallback for parsing errors
        return {
            'is_success': artifact['converged'],
            'reason': f"Parse error: {e}. Raw response: {response[:200]}",
            'confidence': 'low',
            'missing': ['Unable to parse judgment']
        }

In [None]:
#| eval: false
# Test judge with real LLM (requires API key)
test_artifact = {
    'task': 'What is prov:Activity?',
    'final_answer': 'prov:Activity is a class representing activities in PROV ontology',
    'iteration_count': 2,
    'converged': True,
    'key_steps': [
        {'iteration': 1, 'action': "search('Activity')", 'outcome': 'Found 3 entities'},
        {'iteration': 2, 'action': "describe_entity('prov:Activity')", 'outcome': 'A class in PROV'}
    ],
    'variables_created': ['result'],
    'errors_encountered': []
}

judgment = judge_trajectory(test_artifact)
print(f"Success: {judgment['is_success']}")
print(f"Reason: {judgment['reason']}")
print(f"Confidence: {judgment['confidence']}")

Success: True
Reason: The trajectory successfully answers the task by identifying prov:Activity as a class in the PROV ontology that represents activities. The answer directly addresses the question 'What is prov:Activity?', is grounded in retrieved evidence from the describe_entity step, and shows systematic exploration through searching and then describing the specific entity. The trajectory converged with a clear, accurate answer.
Confidence: high


## Extractor

Extract 1-3 reusable memory items from a trajectory.

**For successes**: Emphasize why the approach worked

**For failures**: Emphasize what to avoid and recovery strategies

**Output format**: Procedural (steps/checklist/template), NOT a retelling

In [None]:
#| export
def extract_memories(
    artifact: dict,
    judgment: dict,
    ns: dict = None
) -> list[MemoryItem]:
    """Extract up to 3 reusable memory items from trajectory.
    
    Args:
        artifact: Trajectory artifact from extract_trajectory_artifact()
        judgment: Judgment dict from judge_trajectory()
        ns: Optional namespace for additional context
    
    Returns:
        List of MemoryItem objects (0-3 items)
    """
    source_type = 'success' if judgment['is_success'] else 'failure'
    
    # Capture session_id if available in namespace
    session_id = None
    if ns is not None:
        # Try to get session_id from DatasetMeta
        if 'ds_meta' in ns and hasattr(ns['ds_meta'], 'session_id'):
            session_id = ns['ds_meta'].session_id
    
    # Format key steps for prompt
    steps_text = "\n".join([
        f"  {s['iteration']}. {s['action']} → {s['outcome']}"
        for s in artifact['key_steps']
    ])
    
    prompt = f"""Extract reusable procedural memories from this {source_type} trajectory.

Task: {artifact['task']}
Outcome: {judgment['reason']}
Key Steps:
{steps_text}

Extract UP TO 3 distinct, reusable insights. Each should be:
- Procedural (steps/checklist/template), NOT a retelling of this run
- Applicable to similar future tasks
- Concise but actionable

For ontology/SPARQL work, prefer:
- Query templates with placeholders
- Debugging strategies ("if X fails, try Y")
- Exploration patterns ("start with search, then describe, then probe")

Return ONLY valid JSON array (may have 1-3 items, or empty if no lessons):
[{{
  "title": "≤10 word identifier",
  "description": "One sentence summary",
  "content": "Markdown with steps/checklist/template",
  "tags": ["keyword1", "keyword2"]
}}]"""
    
    # Use llm_query to extract memories (create temp namespace)
    temp_ns = ns if ns is not None else {}
    response = llm_query(prompt, temp_ns, name='extractor_response')
    
    # Parse JSON response
    try:
        response_text = response.strip()
        if '```json' in response_text:
            response_text = response_text.split('```json')[1].split('```')[0].strip()
        elif '```' in response_text:
            response_text = response_text.split('```')[1].split('```')[0].strip()
        
        extracted = json.loads(response_text)
        
        # Convert to MemoryItem objects
        memories = []
        for item in extracted[:3]:  # Limit to 3
            memory = MemoryItem(
                id=str(uuid.uuid4()),
                title=item['title'],
                description=item['description'],
                content=item['content'],
                source_type=source_type,
                task_query=artifact['task'],
                created_at=datetime.utcnow().isoformat(),
                tags=item.get('tags', []),
                session_id=session_id  # NEW: Capture session_id from namespace
            )
            memories.append(memory)
        
        return memories
    except (json.JSONDecodeError, KeyError, IndexError) as e:
        # Return empty list on parsing errors
        print(f"Warning: Failed to extract memories: {e}")
        return []

In [None]:
#| eval: false
# Test extractor with real LLM
test_artifact = {
    'task': 'Find properties of prov:Activity',
    'final_answer': 'prov:Activity has properties: prov:startedAtTime, prov:endedAtTime',
    'iteration_count': 3,
    'converged': True,
    'key_steps': [
        {'iteration': 1, 'action': "search('Activity')", 'outcome': 'Found prov:Activity'},
        {'iteration': 2, 'action': "describe_entity('prov:Activity')", 'outcome': 'A class'},
        {'iteration': 3, 'action': "get_properties('prov:Activity')", 'outcome': 'Listed properties'}
    ],
    'variables_created': ['activity_props'],
    'errors_encountered': []
}

test_judgment = {
    'is_success': True,
    'reason': 'Answer grounded in ontology data',
    'confidence': 'high',
    'missing': []
}

memories = extract_memories(test_artifact, test_judgment)
print(f"Extracted {len(memories)} memories:")
for m in memories:
    print(f"  - {m.title}")
    print(f"    Tags: {m.tags}")

Extracted 0 memories:


## BM25 Retrieval

Find relevant memories for new tasks using keyword-based BM25 retrieval.

**Searches over**: title + description + tags

In [None]:
#| export
def retrieve_memories(
    store: MemoryStore,
    task: str,
    k: int = 3
) -> list[MemoryItem]:
    """Retrieve top-k relevant memories using BM25.
    
    Tokenizes task and searches over title + description + tags.
    
    Args:
        store: MemoryStore instance
        task: Task query string
        k: Number of memories to retrieve
    
    Returns:
        List of top-k MemoryItem objects (may be fewer if scores ≤ 0)
    """
    if not store.memories:
        return []
    
    # Build BM25 index
    corpus = store.get_corpus_for_bm25()
    bm25 = BM25Okapi(corpus)
    
    # Query
    query_tokens = task.lower().split()
    scores = bm25.get_scores(query_tokens)
    
    # Get top-k by score (BM25 can return negative scores for small corpora)
    scored = [(i, s) for i, s in enumerate(scores)]
    scored.sort(key=lambda x: x[1], reverse=True)
    
    # Increment access_count for retrieved memories
    results = []
    for i, _ in scored[:k]:
        store.memories[i].access_count += 1
        results.append(store.memories[i])
    
    return results

In [None]:
# Test BM25 retrieval
test_store = MemoryStore()

# Add diverse memories
test_store.add(MemoryItem(
    id=str(uuid.uuid4()),
    title='SPARQL query pattern for entity search',
    description='Use rdfs:label with FILTER for case-insensitive search.',
    content='- Step 1\n- Step 2',
    source_type='success',
    task_query='Find entities by name',
    created_at=datetime.utcnow().isoformat(),
    tags=['sparql', 'search', 'entity']
))

test_store.add(MemoryItem(
    id=str(uuid.uuid4()),
    title='Property exploration strategy',
    description='Systematically explore properties using describe then probe.',
    content='- Action A\n- Action B',
    source_type='success',
    task_query='What properties does X have?',
    created_at=datetime.utcnow().isoformat(),
    tags=['properties', 'exploration']
))

test_store.add(MemoryItem(
    id=str(uuid.uuid4()),
    title='Debugging failed SPARQL queries',
    description='Check syntax, namespaces, and endpoint first.',
    content='- Check 1\n- Check 2',
    source_type='failure',
    task_query='Query failed with error',
    created_at=datetime.utcnow().isoformat(),
    tags=['sparql', 'debugging', 'error']
))

# Test retrieval for different queries
results1 = retrieve_memories(test_store, 'How do I search for entities?', k=2)
assert len(results1) <= 2
assert any('search' in r.title.lower() or 'search' in r.tags for r in results1)
print(f"✓ Retrieved {len(results1)} memories for 'search for entities'")

results2 = retrieve_memories(test_store, 'My SPARQL query is broken', k=2)
assert len(results2) <= 2
assert any('sparql' in r.tags for r in results2)
print(f"✓ Retrieved {len(results2)} memories for 'SPARQL query broken'")

results3 = retrieve_memories(test_store, 'What properties does prov:Activity have?', k=2)
print(f"✓ Retrieved {len(results3)} memories for 'properties question'")

# Test access count increment
assert results1[0].access_count > 0
print("✓ Access count tracking works")

✓ Retrieved 2 memories for 'search for entities'
✓ Retrieved 2 memories for 'SPARQL query broken'
✓ Retrieved 2 memories for 'properties question'
✓ Access count tracking works


  created_at=datetime.utcnow().isoformat(),
  created_at=datetime.utcnow().isoformat(),
  created_at=datetime.utcnow().isoformat(),


## Injection Formatting

Format retrieved memories for bounded prompt injection.

**Output includes**:
- Assessment instruction
- Title + description + up to 3 key bullets from content

**Never injects full content** to maintain bounded prompt size.

In [None]:
#| export
def format_memories_for_injection(
    memories: list[MemoryItem],
    max_bullets: int = 3
) -> str:
    """Format memories for bounded prompt injection.
    
    Returns string with:
    - Assessment instruction
    - Title + description + key bullets from content (up to max_bullets)
    
    Args:
        memories: List of MemoryItem objects to format
        max_bullets: Maximum bullets to extract from content
    
    Returns:
        Formatted string for prompt injection
    """
    if not memories:
        return ""
    
    lines = [
        "## Relevant Prior Experience",
        "",
        "Before taking action, briefly assess which of these strategies apply to your current task and which do not.",
        ""
    ]
    
    for i, mem in enumerate(memories, 1):
        lines.append(f"### {i}. {mem.title}")
        lines.append(mem.description)
        lines.append("Key points:")
        
        # Extract bullets from content (look for lines starting with - or numbers)
        content_lines = mem.content.split('\n')
        bullets = []
        for line in content_lines:
            stripped = line.strip()
            if stripped.startswith('-') or stripped.startswith('*'):
                bullets.append(stripped)
            elif len(stripped) > 0 and stripped[0].isdigit() and '.' in stripped:
                bullets.append(stripped)
        
        # Use first max_bullets bullets, or first max_bullets lines if no bullets found
        if bullets:
            for bullet in bullets[:max_bullets]:
                lines.append(f"- {bullet.lstrip('- *')}")
        else:
            # Fall back to first few lines
            for line in content_lines[:max_bullets]:
                if line.strip():
                    lines.append(f"- {line.strip()}")
        
        lines.append("")  # Blank line between memories
    
    return "\n".join(lines)

In [None]:
# Test injection formatting
test_memories = [
    MemoryItem(
        id='test-1',
        title='SPARQL Search Pattern',
        description='Template for searching entities by label.',
        content="""- Use rdfs:label for human-readable names
- Add FILTER for case-insensitive matching
- Include LIMIT to avoid timeout
- Check for alternative label properties""",
        source_type='success',
        task_query='test',
        created_at=datetime.utcnow().isoformat(),
        tags=['sparql']
    ),
    MemoryItem(
        id='test-2',
        title='Property Discovery',
        description='Systematic approach to finding properties.',
        content="""1. Start with describe_entity() for overview
2. Use get_properties() for full list
3. Check both domain and range
4. Look for inverse properties""",
        source_type='success',
        task_query='test',
        created_at=datetime.utcnow().isoformat(),
        tags=['properties']
    )
]

formatted = format_memories_for_injection(test_memories, max_bullets=3)

# Verify format
assert '## Relevant Prior Experience' in formatted
assert 'assess which of these strategies' in formatted
assert '### 1. SPARQL Search Pattern' in formatted
assert '### 2. Property Discovery' in formatted
assert 'Use rdfs:label' in formatted
assert 'Start with describe_entity' in formatted

# Verify bullet limiting (should have max 3 bullets per memory)
lines = formatted.split('\n')
bullet_count_mem1 = sum(1 for l in lines[lines.index('### 1. SPARQL Search Pattern'):lines.index('### 2. Property Discovery')] if l.strip().startswith('-'))
assert bullet_count_mem1 <= 3

print("✓ Injection formatting works")
print("\nFormatted output:")
print(formatted[:300] + "...")

✓ Injection formatting works

Formatted output:
## Relevant Prior Experience

Before taking action, briefly assess which of these strategies apply to your current task and which do not.

### 1. SPARQL Search Pattern
Template for searching entities by label.
Key points:
- Use rdfs:label for human-readable names
- Add FILTER for case-insensitive ma...


  created_at=datetime.utcnow().isoformat(),
  created_at=datetime.utcnow().isoformat(),


## Integration

Complete closed-loop: RETRIEVE → INJECT → INTERACT → EXTRACT → STORE

In [None]:
#| export
def rlm_run_with_memory(
    query: str,
    context: str,
    memory_store: MemoryStore,
    ns: dict = None,
    enable_memory_extraction: bool = True,
    # NEW: Dataset persistence
    persist_dataset: bool = False,
    dataset_path: Path = None,
    **kwargs
) -> tuple[str, list, dict, list[MemoryItem]]:
    """RLM run with procedural memory loop.
    
    Closed-loop cycle:
    1. RETRIEVE: Get relevant memories via BM25
    2. INJECT: Add to context/prompt
    3. INTERACT: Run rlm_run()
    4. EXTRACT: Judge + extract new memories
    5. STORE: Persist new memories
    
    NEW: Dataset persistence:
    - If persist_dataset=True and dataset_path provided, loads snapshot before run
    - After run, if dataset was modified, saves snapshot
    - Stores snapshot path in extracted MemoryItem for lineage
    
    Args:
        query: Task query string
        context: Context string (e.g., ontology summary)
        memory_store: MemoryStore instance for retrieval/storage
        ns: Optional namespace dict
        enable_memory_extraction: Whether to extract and store new memories (default True)
        persist_dataset: Whether to persist dataset snapshots (default False)
        dataset_path: Optional path for dataset snapshot
        **kwargs: Additional arguments for rlm_run()
    
    Returns:
        Tuple of (answer, iterations, ns, new_memories)
    """
    # NEW: Load dataset snapshot if it exists
    if persist_dataset and dataset_path is not None and dataset_path.exists():
        try:
            from rlm.dataset import load_snapshot
            load_snapshot(str(dataset_path), ns)
        except Exception as e:
            print(f"Warning: Failed to load dataset snapshot: {e}")
    
    # 1. RETRIEVE relevant memories
    relevant = retrieve_memories(memory_store, query, k=3)
    
    # 2. INJECT into context
    if relevant:
        memory_text = format_memories_for_injection(relevant)
        enhanced_context = f"{memory_text}\n\n---\n\n{context}"
    else:
        enhanced_context = context
    
    # 3. INTERACT - run RLM
    answer, iterations, ns = rlm_run(query, enhanced_context, ns=ns, **kwargs)
    
    # NEW: Save dataset snapshot if dataset was modified
    if persist_dataset and dataset_path is not None:
        try:
            from rlm.dataset import snapshot_dataset
            if 'ds_meta' in ns:
                result = snapshot_dataset(ns['ds_meta'], path=str(dataset_path))
                print(f"Dataset snapshot: {result}")
        except Exception as e:
            print(f"Warning: Failed to save dataset snapshot: {e}")
    
    new_memories = []
    if enable_memory_extraction:
        # 4. EXTRACT - judge and extract memories
        artifact = extract_trajectory_artifact(query, answer, iterations, ns)
        judgment = judge_trajectory(artifact, ns)
        new_memories = extract_memories(artifact, judgment, ns)
        
        # 5. STORE - persist new memories
        for mem in new_memories:
            memory_store.add(mem)
        if memory_store.path:
            memory_store.save()
    
    return answer, iterations, ns, new_memories

In [None]:
#| eval: false
# Integration test (requires full RLM setup)
from rlm.ontology import setup_ontology_context
import tempfile

def test_memory_improves_convergence():
    """Second attempt should benefit from first attempt's memory."""
    with tempfile.TemporaryDirectory() as tmpdir:
        store = MemoryStore(path=Path(tmpdir) / 'test_integration.json')
        
        # First run - no memories
        ns = {}
        setup_ontology_context('ontology/prov.ttl', ns, name='prov')
        
        answer1, iters1, ns1, mems1 = rlm_run_with_memory(
            "What is prov:Activity and what properties does it have?",
            ns['prov_meta'].summary(),
            store,
            ns=ns
        )
        print(f"\nFirst run: {len(iters1)} iterations, {len(mems1)} memories extracted")
        for mem in mems1:
            print(f"  - {mem.title}")
        
        # Second run - similar task, should retrieve memories
        ns2 = {}
        setup_ontology_context('ontology/prov.ttl', ns2, name='prov')
        
        answer2, iters2, ns2, mems2 = rlm_run_with_memory(
            "What is prov:Entity and what properties does it have?",
            ns2['prov_meta'].summary(),
            store,
            ns=ns2
        )
        print(f"\nSecond run: {len(iters2)} iterations")
        print(f"Total memories in store: {len(store.memories)}")
        
        # Verify memories were retrieved
        retrieved_for_second = retrieve_memories(
            store,
            "What is prov:Entity and what properties does it have?",
            k=3
        )
        print(f"Memories that would be retrieved for second run: {len(retrieved_for_second)}")
        for mem in retrieved_for_second:
            print(f"  - {mem.title} (accessed {mem.access_count} times)")

# Run test
# test_memory_improves_convergence()

## Usage Examples

End-to-end examples with PROV ontology.

In [None]:
#| eval: false
# Full example: Build up procedural memory over multiple queries
from rlm.ontology import setup_ontology_context
from pathlib import Path

# Initialize memory store
store = MemoryStore(path=Path('memories/prov_memories.json'))

# If store exists, load it
if store.path.exists():
    store = MemoryStore.load(store.path)
    print(f"Loaded {len(store.memories)} existing memories")

# Setup ontology context
ns = {}
setup_ontology_context('ontology/prov.ttl', ns, name='prov')

# Series of queries
queries = [
    "What is prov:Activity?",
    "What properties does prov:Activity have?",
    "How are prov:Activity and prov:Entity related?",
]

for i, query in enumerate(queries, 1):
    print(f"\n{'='*60}")
    print(f"Query {i}: {query}")
    print('='*60)
    
    answer, iterations, ns, new_memories = rlm_run_with_memory(
        query,
        ns['prov_meta'].summary(),
        store,
        ns=ns
    )
    
    print(f"\nAnswer: {answer}")
    print(f"Iterations: {len(iterations)}")
    print(f"New memories extracted: {len(new_memories)}")
    for mem in new_memories:
        print(f"  - {mem.title}")

print(f"\n{'='*60}")
print(f"Final memory store: {len(store.memories)} memories")
print('='*60)

# Show all memories with access counts
for mem in store.memories:
    print(f"\n{mem.title}")
    print(f"  Source: {mem.source_type}")
    print(f"  Accessed: {mem.access_count} times")
    print(f"  Tags: {mem.tags}")