# RLM Tutorial: Progressive Disclosure Over RDF Graphs

This notebook demonstrates the RLM (Recursive Language Model) architecture
with working examples.

**Note:** Sections 1, 3, and 8 require network access and `ANTHROPIC_API_KEY` set in
your environment (used by `claudette`). Without these, only the non-LLM sections
(ontology loading, dataset memory, SPARQL handles, procedural memory, SHACL indexing)
will execute successfully.

Example:

```bash
export ANTHROPIC_API_KEY="..."
```


In [None]:
# Shared namespace for the entire notebook - demonstrating REPL persistence
import sys
import os
from pathlib import Path

ns = {}  # This single namespace persists throughout the tutorial

def require_anthropic_api_key():
    """Fail fast if the Claude API key is not configured."""
    if not os.getenv('ANTHROPIC_API_KEY'):
        raise RuntimeError(
            "Missing ANTHROPIC_API_KEY. Set it in your environment to run llm_query()/rlm_run() cells."
        )

## 1. Core RLM Loop

The `llm_query()` function delegates a question to Claude and stores the result.

In [None]:
#| eval: false
from rlm.core import llm_query

require_anthropic_api_key()

# Use shared ns - result will persist
result = llm_query("What is 2+2? Answer with just the number.", ns, name='math')
print(f"Result: {result}")
print(f"Stored as: ns['math'] = {ns.get('math', 'not found')}")

The `rlm_run()` function runs the full RLM loop: the model emits code,
executes it in a REPL, and iterates until it finds an answer.

In [None]:
#| eval: false
from rlm.core import rlm_run

require_anthropic_api_key()

# Continue using shared ns
answer, iterations, ns = rlm_run(
    "Calculate the sum of squares of 1, 2, and 3.",
    "You can use Python to calculate.",
    ns=ns,
    max_iters=3
)
print(f"Answer: {answer}")
print(f"Iterations: {len(iterations)}")
print(f"ns still has 'math': {ns.get('math', 'not found')}")

## 2. Ontology Loading

Load RDF ontologies and explore them with bounded view functions.
The key insight: we never dump the full graph into context.

In [None]:
from rlm.ontology import setup_ontology_context

# Add PROV ontology to shared ns
setup_ontology_context('ontology/prov.ttl', ns, name='prov')
print(ns['prov_meta'].summary())
print(f"\nns now contains: {[k for k in ns.keys() if not k.startswith('_')]}")

Graph 'prov': 1,664 triples
Classes: 59
Properties: 89
Individuals: 1
Namespaces: brick, csvw, dc, dcat, dcmitype, dcterms, dcam, doap, foaf, geo, odrl, org, prof, qb, schema, sh, skos, sosa, ssn, time, vann, void, wgs, owl, rdf, rdfs, xsd, xml, prov

ns now contains: ['math', 'context', 'llm_query', 'llm_query_batched', 'FINAL_VAR', 'llm_res', 'analysis', 'sum_of_squares', 'prov', 'prov_meta', 'prov_graph_stats', 'prov_search_by_label', 'prov_describe_entity', 'prov_search_entity', 'prov_probe_relationships', 'prov_find_path', 'prov_predicate_frequency', 'graph_stats', 'search_by_label', 'describe_entity', 'search_entity', 'probe_relationships', 'find_path', 'predicate_frequency']


In [None]:
# Search for classes related to "Activity"
results = ns['prov_search_by_label']('Activity', limit=5)
for uri, label in results:
    print(f"{label}: {uri}")

Activity: http://www.w3.org/ns/prov#Activity
ActivityInfluence: http://www.w3.org/ns/prov#ActivityInfluence
activity: http://www.w3.org/ns/prov#activity
hadActivity: http://www.w3.org/ns/prov#hadActivity
activityOfInfluence: http://www.w3.org/ns/prov#activityOfInfluence


In [None]:
# Get bounded description of Activity class
desc = ns['prov_describe_entity']('http://www.w3.org/ns/prov#Activity', limit=10)
print(f"Label: {desc['label']}")
print(f"Types: {desc['types']}")
print(f"Comment: {desc['comment'][:100] if desc['comment'] else 'None'}...")
print(f"Outgoing triples (sample): {len(desc['outgoing_sample'])}")

Label: Activity
Types: ['http://www.w3.org/2002/07/owl#Class']
Comment: None...
Outgoing triples (sample): 10


## 3. RLM with Ontology Exploration

Combine the RLM loop with ontology tools for intelligent exploration.
The model uses bounded views to progressively discover information.

In [None]:
#| eval: false
from rlm.core import rlm_run
from rlm.ontology import setup_ontology_context

require_anthropic_api_key()

# PROV is already loaded in ns from previous section
query = "What is prov:Activity? Use search_by_label and describe_entity."
context = ns['prov_meta'].summary()

answer, iterations, ns = rlm_run(
    query,
    context,
    ns=ns,
    max_iters=3,
    verbose=False
)

print(f"Answer: {answer[:500] if answer else 'No answer'}...")
print(f"Iterations: {len(iterations)}")

In [None]:
#| eval: false
# Show what code the LLM executed (depends on previous API cell)
for i, it in enumerate(iterations):
    if it.code_blocks:
        print(f"Iteration {i}:")
        for cb in it.code_blocks:
            print(f"  Code: {cb.code[:100]}...")

## 4. Dataset Memory

Store discovered facts in an RDF Dataset with provenance tracking.
Facts persist within the same namespace/session. Use `snapshot_dataset()` and
`load_snapshot()` APIs for persistence across sessions.

In [None]:
from rlm.dataset import setup_dataset_context

# Add dataset to shared ns (alongside previously loaded ontology)
setup_dataset_context(ns)
print(ns['dataset_stats']())
print(f"\nPROV ontology still accessible: 'prov_meta' in ns = {'prov_meta' in ns}")

Dataset 'ds' (session: f7322b86)
mem: 0 triples
prov: 0 events
work graphs: 0
onto graphs: 0

PROV ontology still accessible: 'prov_meta' in ns = True


In [None]:
# Add a fact we discovered
ns['mem_add'](
    'http://example.org/myAnalysis',
    'http://www.w3.org/ns/prov#wasGeneratedBy',
    'http://example.org/rlmSession1'
)

# Check stats
print(ns['dataset_stats']())

Dataset 'ds' (session: f7322b86)
mem: 1 triples
prov: 7 events
work graphs: 0
onto graphs: 0


In [None]:
# Query the memory graph
results = ns['mem_query']("""
    SELECT ?s ?p ?o WHERE { ?s ?p ?o }
""")
for r in results:
    print(r)

{'s': 'http://example.org/myAnalysis', 'p': 'http://www.w3.org/ns/prov#wasGeneratedBy', 'o': 'http://example.org/rlmSession1'}


## 5. SPARQL Result Handles

Query results return handles with metadata, not raw data dumps.
Handles support bounded sampling (e.g., `rows[:n]`) and summary statistics.

**Note:** Results are still fetched into memory; handles provide metadata-first
access patterns rather than true server-side pagination.

In [None]:
from rlm.sparql_handles import SPARQLResultHandle

# Simulating a large result set
handle = SPARQLResultHandle(
    rows=[{'name': f'Item{i}', 'value': i} for i in range(100)],
    result_type='select',
    query='SELECT ?name ?value WHERE { ... }',
    endpoint='local',
    columns=['name', 'value'],
    total_rows=100
)

print(handle.summary())
print(f"First 3 rows: {handle.rows[:3]}")

SELECT: 100 rows, columns=['name', 'value']
First 3 rows: [{'name': 'Item0', 'value': 0}, {'name': 'Item1', 'value': 1}, {'name': 'Item2', 'value': 2}]


## 6. Procedural Memory

Store and retrieve methods learned from past trajectories.
Uses BM25 for similarity-based retrieval.

In [None]:
from rlm.procedural_memory import MemoryStore, MemoryItem, retrieve_memories
from datetime import datetime, timezone
import uuid

store = MemoryStore()

# Add a learned procedure
item = MemoryItem(
    id=str(uuid.uuid4()),
    title='Find Activity classes in PROV',
    description='How to discover Activity-related classes',
    content='1. Use search_by_label("Activity")\n2. Use describe_entity() on results',
    source_type='success',
    task_query='find activities in PROV',
    created_at=datetime.now(timezone.utc).isoformat(),
    tags=['prov', 'ontology', 'exploration']
)
store.add(item)

print(f"Store has {len(store.memories)} memories")

Store has 1 memories


In [None]:
# Retrieve relevant memories for a new task
retrieved = retrieve_memories(store, 'how to explore PROV ontology activities', k=1)
for mem in retrieved:
    print(f"Title: {mem.title}")
    print(f"Content:\n{mem.content}")

Title: Find Activity classes in PROV
Content:
1. Use search_by_label("Activity")
2. Use describe_entity() on results


## 7. SHACL Shape Indexing

Detect and index SHACL shapes for schema discovery and constraint inspection.

**Note:** This provides shape detection and constraint inspection (targets, properties,
cardinalities), not runtime validation. Use a SHACL validator for actual data validation.

In [None]:
from rlm.shacl_examples import detect_shacl, build_shacl_index, search_shapes
from rdflib import Graph

# Load DCAT-AP shapes
g = Graph()
g.parse('ontology/dcat-ap/dcat-ap-SHACL.ttl')

# Detect SHACL content
detection = detect_shacl(g)
print(f"Node shapes: {detection['node_shapes']}")
print(f"Property shapes: {detection['property_shapes']}")

Node shapes: 42
Property shapes: 0


In [None]:
# Build index and search
index = build_shacl_index(g)
results = search_shapes(index, 'dataset', limit=3)

for r in results:
    print(f"{r['uri'].split('#')[-1]}: targets {r['targets']}")

dcat:CatalogShape: targets ['http://www.w3.org/ns/dcat#Catalog']
dcat:DatasetShape: targets ['http://www.w3.org/ns/dcat#Dataset']
dcat:DataServiceShape: targets ['http://www.w3.org/ns/dcat#DataService']


## 8. Full Integration: Multi-Ontology Comparison

Putting it all together: load multiple ontologies, build sense documents,
and use RLM to answer complex questions.

In [None]:
#| eval: false
from rlm.ontology import build_sense

require_anthropic_api_key()

# Build PROV sense document in shared ns
build_sense('ontology/prov.ttl', name='prov_sense', ns=ns)
print("PROV sense document built")
print(f"Summary length: {len(ns['prov_sense'].summary)} chars")

In [None]:
#| eval: false
from rlm.core import rlm_run

require_anthropic_api_key()

# Build sense for SIO in shared ns
build_sense('ontology/sio/sio-release.owl', name='sio_sense', ns=ns)

# Context as dict - model can inspect context['prov'] / context['sio'] directly
context = {
    'prov': ns['prov_sense'].summary[:2000],  # Truncate for demo
    'sio': ns['sio_sense'].summary[:2000]
}

query = "What are the key differences between PROV and SIO ontologies?"

# Pass dict context directly (not str(context)) for progressive disclosure
answer, iterations, ns = rlm_run(
    query,
    context,  # Keep dict structure for model inspection
    ns=ns,
    max_iters=3,
    verbose=False
)

print(f"Answer:\n{answer[:800] if answer else 'No answer'}...")
print(f"\nIterations: {len(iterations)}")

## Summary

This tutorial demonstrated:

1. **Core RLM loop**: `llm_query()` and `rlm_run()` for LLM-driven exploration
2. **Ontology loading**: Bounded views prevent context overflow
3. **Progressive disclosure**: Start small, explore as needed
4. **Dataset memory**: Persist discovered facts with provenance (within a session/namespace)
5. **SPARQL handles**: Metadata-first result handling with bounded sampling
6. **Procedural memory**: Learn and reuse exploration strategies
7. **SHACL indexing**: Schema discovery and constraint inspection through shape search

**Environment requirements:** Sections using `llm_query()` or `rlm_run()` require
network access and `ANTHROPIC_API_KEY` set in your environment. Non-LLM sections work
offline.