# Structured Agents and Pipelines
> Creating DSPy StructuredAgents for Semantic Web tasks

In [None]:
#| default_exp pipelines

In [None]:
#| hide
from nbdev.showdoc import *

In [None]:
import os
os.environ["COG_LOGLEVEL"] = "DEBUG"
# or, programmatically:
from cogitarelink.core.debug import set_loglevel
set_loglevel("DEBUG")

In [None]:
#| export
from typing import List, Dict, Any, Optional
import dspy, hashlib, datetime
from cogitarelink.core.graph import GraphManager
from cogitarelink_dspy.wrappers import get_tools, get_tool_by_name, group_tools_by_layer
from cogitarelink_dspy.components import list_layers
from cogitarelink_dspy.memory import ReflectionStore, REFLECTION_GRAPH
from cogitarelink_dspy.telemetry import TelemetryStore


## Setup mlflow for logging and introspection

In [None]:
import mlflow

mlflow.set_tracking_uri("http://localhost:5000")
# mlflow.set_experiment("DSPy")
mlflow.dspy.autolog()



## Introduction

This notebook implements structured agent pipelines for the Cogitarelink-DSPy integration. We're creating agents that can reason about semantic web data across different layers of abstraction:

1. **Context Layer** - Working with JSON-LD contexts and namespaces
2. **Ontology Layer** - Exploring ontologies and vocabularies
3. **Rules Layer** - Validating data against rules (SHACL, etc.)
4. **Instances Layer** - Working with actual data/triples
5. **Verification Layer** - Verifying and signing data

In addition, we have a **Utility Layer** for cross-cutting concerns like memory and telemetry.

Our approach uses DSPy's `StructuredAgent` which provides a framework for tool selection and execution based on the user's query. We'll implement two levels of agents:

- `HelloLOD`: A lightweight agent with essential tools for common tasks
- `FullPlanner`: A comprehensive agent with all available tools

We'll also integrate memory capabilities to enable the agent to learn from previous experiences.

In [None]:
#| export
graph     = GraphManager(use_rdflib=True)
mem       = ReflectionStore(graph)
telemetry = TelemetryStore(graph)

TOOLS = get_tools()
TOOLS += [mem.add, mem.retrieve, mem.as_prompt]

## System Prompts

The heart of our agent's reasoning is the system prompt, which explains the semantic web layers and how to select the appropriate tool based on the user's query. Let's define the system prompts for our agents.

In [None]:
# System prompt for ReAct agent
SEM_WEB_SYSTEM = (
    "You are a Linked-Data teaching assistant. "
    "Think step-by-step; choose the highest Cogitarelink layer that solves the task. "
    "Return only the final answer — never reveal your thought."
)

# Define the ReAct signature
sig = dspy.Signature(
    "query:str -> answer:str",
    instructions=SEM_WEB_SYSTEM
)


In [None]:
# Configure the LLM and instantiate the ReAct agent
lm = dspy.LM(
    "openai/o3-mini",
    temperature=1.0,
    max_tokens=20000
)
dspy.configure(lm=lm)

In [None]:
lm

<dspy.clients.lm.LM>

In [None]:
agent = dspy.ReAct(
    signature=sig,
    tools=TOOLS,
    max_iters=4,
)


## HelloLOD: Lightweight Semantic Web Agent

Our `HelloLOD` agent is a minimal implementation that provides basic semantic web functionality. It includes only the essential tools for common tasks, making it faster and more focused than the full agent.

The key design decisions for HelloLOD are:

1. Include one representative tool from each semantic layer
2. Exclude memory tools initially for simplicity
3. Use a straightforward system prompt without complex reflection

This agent serves as both a proof of concept and a starting point for more complex implementations.

In [None]:
#| export
class HelloLOD(dspy.Module):
    """Lightweight wrapper that logs scratch-pad hashes & provenance."""
    def __init__(self, agent, telemetry, mem):
        super().__init__()
        self.agent = agent
        self.telemetry = telemetry
        self.mem = mem

    def forward(self, query: str):
        t0 = datetime.datetime.utcnow()
        result = self.agent(query=query)
        t1 = datetime.datetime.utcnow()

        # Hash the hidden chain-of-thought (fallback to empty if unavailable)
        try:
            lm = self.agent.get_lm()
        except Exception:
            lm = None
        scratch = getattr(lm, "last_scratch", "") if lm is not None else ""
        digest = hashlib.sha256(scratch.encode()).hexdigest()
        self.telemetry.log("cot", digest, tool_iri="urn:agent:HelloLOD")

        # Log latency (milliseconds)
        latency_ms = (t1 - t0).total_seconds() * 1000
        self.telemetry.log("latency", latency_ms, tool_iri="urn:agent:HelloLOD")

        # Optional manual reflection
        if query.lower().startswith("remember:"):
            note = query.split("remember:", 1)[1].strip()
            self.mem.add(note, tags=["manual"])
            return {"answer": f"Stored: {note}"}

        return result

In [None]:
 #| export
hello = HelloLOD(agent, telemetry, mem)

In [None]:
# A couple of sample queries to exercise each layer
test_queries = [
    "What is the full IRI of dc:title?",
    "How many cats on Wikidata?"
]

In [None]:

for q in test_queries:
    print(f"Query:    {q}")
    resp = hello(q)
    # DSPy Prediction objects have .answer and .trajectory attributes
    answer     = getattr(resp, "answer", resp.get("answer", None))
    trajectory = getattr(resp, "trajectory", resp.get("trace", None))
    print(f"Answer:   {answer}")
    print(f"Trajectory: {trajectory}")
    print("-" * 60)



Query:    What is the full IRI of dc:title?
Answer:   http://purl.org/dc/elements/1.1/title
Trajectory: {'thought_0': 'The full IRI for dc:title is "http://purl.org/dc/elements/1.1/title".', 'tool_name_0': 'finish', 'tool_args_0': {}, 'observation_0': 'Completed.'}
------------------------------------------------------------
Query:    How many cats on Wikidata?
Answer:   There isn’t a fixed number—the current count is dynamic. To get the exact number, run the query: SELECT (COUNT(?cat) AS ?count) WHERE { ?cat wdt:P31 wd:Q146 }.
Trajectory: {'thought_0': 'The question asks for the number of cat items on Wikidata. Since Wikidata is continuously updated, a direct number isn’t fixed; instead one would typically determine the count by running a SPARQL query such as:\n  SELECT (COUNT(?cat) AS ?count) WHERE { ?cat wdt:P31 wd:Q146 }\nThis query counts all items that have the instance-of property (P31) pointing to the "cat" entity (Q146). The returned number will reflect the current state of Wi

In [None]:
 # HelloLOD
resp = hello("What is the full IRI of dc:title?")
print("TRACE →", resp.trajectory)      # which tool was picked
# check stderr for any cogitarelink.* DEBUG messages



TRACE → {'thought_0': 'The full IRI for dc:title is "http://purl.org/dc/elements/1.1/title".', 'tool_name_0': 'finish', 'tool_args_0': {}, 'observation_0': 'Completed.'}


In [None]:
resp = hello("How many cats on Wikidata?")
print("TRACE →", resp.trajectory)
print("OBSERVATION →", resp.answer)



TRACE → {'thought_0': 'The question asks for the number of cat items on Wikidata. Since Wikidata is continuously updated, a direct number isn’t fixed; instead one would typically determine the count by running a SPARQL query such as:\n  SELECT (COUNT(?cat) AS ?count) WHERE { ?cat wdt:P31 wd:Q146 }\nThis query counts all items that have the instance-of property (P31) pointing to the "cat" entity (Q146). The returned number will reflect the current state of Wikidata and can vary over time.', 'tool_name_0': 'finish', 'tool_args_0': {}, 'observation_0': 'Completed.'}
OBSERVATION → There isn’t a fixed number—the current count is dynamic. To get the exact number, run the query: SELECT (COUNT(?cat) AS ?count) WHERE { ?cat wdt:P31 wd:Q146 }.


## Conclusion

In this notebook, we've implemented a layered approach to semantic web agents using DSPy's structured agent framework. The key components we've created are:

1. **System prompts** that explain the semantic web layers and guide tool selection
2. **Agent implementations** at different capability levels (HelloLOD, HelloLODWithMemory, FullPlanner)
3. **Memory integration** to learn from past interactions
4. **Testing utilities** to validate agent behavior

These components form the foundation of our semantic web agent architecture, enabling sophisticated reasoning across the different layers of the semantic web stack. The agents can now be integrated into applications to provide semantic web capabilities through natural language interfaces.