# Structured Agents and Pipelines
> Creating DSPy StructuredAgents for Semantic Web tasks

In [None]:
#| default_exp pipelines

In [None]:
#| hide
from nbdev.showdoc import *

In [None]:
#| export
import dspy
from typing import List, Dict, Any, Optional
from cogitarelink_dspy.wrappers import get_tools, get_tool_by_name, group_tools_by_layer
from cogitarelink_dspy.components import list_layers

  from .autonotebook import tqdm as notebook_tqdm


## Introduction

This notebook implements structured agent pipelines for the Cogitarelink-DSPy integration. We're creating agents that can reason about semantic web data across different layers of abstraction:

1. **Context Layer** - Working with JSON-LD contexts and namespaces
2. **Ontology Layer** - Exploring ontologies and vocabularies
3. **Rules Layer** - Validating data against rules (SHACL, etc.)
4. **Instances Layer** - Working with actual data/triples
5. **Verification Layer** - Verifying and signing data

In addition, we have a **Utility Layer** for cross-cutting concerns like memory and telemetry.

Our approach uses DSPy's `StructuredAgent` which provides a framework for tool selection and execution based on the user's query. We'll implement two levels of agents:

- `HelloLOD`: A lightweight agent with essential tools for common tasks
- `FullPlanner`: A comprehensive agent with all available tools

We'll also integrate memory capabilities to enable the agent to learn from previous experiences.

## System Prompts

The heart of our agent's reasoning is the system prompt, which explains the semantic web layers and how to select the appropriate tool based on the user's query. Let's define the system prompts for our agents.

In [None]:
#| export

# Basic system prompt explaining the semantic web layers
SEMANTIC_WEB_SYSTEM = '''
You are a Semantic Web agent that reasons across these layers:

1. Context - JSON-LD context operations (namespaces, compaction)
2. Ontology - Vocabulary operations (term resolution, class hierarchies)
3. Rules - Validation and inference rules (SHACL shapes)
4. Instances - Data instance operations (SPARQL queries, graph updates)
5. Verification - Cryptographic operations (signing, verifying)

For each user query, identify the HIGHEST appropriate layer to address it.
Example 1: "What does schema:name mean?" → Ontology layer (vocabulary term)
Example 2: "Is this JSON-LD valid?" → Context layer (JSON-LD structure)
Example 3: "Does this person match the requirements?" → Rules layer (validation)
'''

# Enhanced system prompt with memory integration
def get_memory_enhanced_system(reflection_prompt=""):
    """
    Generate a system prompt enhanced with reflections from memory.
    
    Args:
        reflection_prompt (str): Formatted reflections to include in the prompt
        
    Returns:
        str: The enhanced system prompt
    """
    memory_section = ""
    if reflection_prompt and reflection_prompt.strip():
        memory_section = f"""
## Reflections from previous interactions
Consider these lessons from past interactions:
{reflection_prompt}
"""
    
    return SEMANTIC_WEB_SYSTEM + memory_section

## Available Tool Information

Let's examine what tools are available from our component registry. This helps us understand the capabilities we can provide to our agents.

In [None]:
# Examine available tools and layers (not exported)
tools = get_tools()
print(f"Available tools: {len(tools)}")

layers = list_layers()
print(f"\nAvailable layers: {', '.join(layers)}")

tools_by_layer = group_tools_by_layer(tools)
for layer, layer_tools in tools_by_layer.items():
    print(f"\n{layer} Layer: {len(layer_tools)} tools")
    for tool in layer_tools:
        print(f"  - {tool.__name__}: {tool.__doc__.split('[Layer')[0].strip()}")

Available tools: 9

Available layers: Context, Instances, Ontology, Rules, Utility, Verification

Utility Layer: 4 tools
  - EchoMessage: Simply echoes the input message back.
  - AddReflection: Persist a reflection into semantic memory
  - RecallReflection: Retrieve recent reflection notes
  - ReflectionPrompt: Format recent notes for prompt injection

Context Layer: 1 tools
  - LoadContext: Loads and processes JSON-LD contexts.

Ontology Layer: 1 tools
  - FetchOntology: Accesses the vocabulary registry.

Rules Layer: 1 tools
  - ValidateEntity: Validates an Entity against SHACL shapes.

Instances Layer: 1 tools
  - GraphManager: Manages RDF graphs and triples.

Verification Layer: 1 tools
  - VerifySignature: Verifies a digital signature on a named graph.


## HelloLOD: Lightweight Semantic Web Agent

Our `HelloLOD` agent is a minimal implementation that provides basic semantic web functionality. It includes only the essential tools for common tasks, making it faster and more focused than the full agent.

The key design decisions for HelloLOD are:

1. Include one representative tool from each semantic layer
2. Exclude memory tools initially for simplicity
3. Use a straightforward system prompt without complex reflection

This agent serves as both a proof of concept and a starting point for more complex implementations.

In [None]:
#| export
def build_hellolod(lm=None):
    """
    Create a minimal Linked Open Data agent with basic capabilities.
    
    This agent includes one representative tool from each semantic layer
    but excludes memory tools for simplicity.
    
    Args:
        lm (dspy.LM, optional): Language model to use. If None, must be configured later.
        
    Returns:
        dspy.StructuredAgent: A configured agent ready for use
    """
    # Select one tool from each semantic layer
    selected_tools = [
        get_tool_by_name("LoadContext"),      # Context layer
        get_tool_by_name("FetchOntology"),    # Ontology layer
        get_tool_by_name("ValidateEntity"),   # Rules layer
        get_tool_by_name("GraphManager"),     # Instances layer
        get_tool_by_name("VerifySignature"),  # Verification layer
    ]
    
    # Add EchoMessage as a basic utility
    echo_tool = get_tool_by_name("EchoMessage")
    if echo_tool:
        selected_tools.append(echo_tool)
    
    # Filter out any None values (in case a tool wasn't found)
    selected_tools = [t for t in selected_tools if t is not None]
    
    # Create the structured agent
    agent = dspy.StructuredAgent(
        tools=selected_tools,
        lm=lm,
        system=SEMANTIC_WEB_SYSTEM
    )
    
    return agent

## HelloLODWithMemory: Adding Reflection Capabilities

The next evolution of our agent adds memory capabilities through the ReflectionStore. This allows the agent to:

1. Store reflections about its experiences
2. Recall previous reflections when making decisions
3. Learn from past interactions

By incorporating memory, the agent can improve over time and avoid repeating mistakes.

In [None]:
#| export
def build_hellolod_with_memory(lm=None, reflection_limit=5):
    """
    Create a Linked Open Data agent with memory capabilities.
    
    This agent extends HelloLOD by adding memory tools for storing and
    retrieving reflections about past experiences.
    
    Args:
        lm (dspy.LM, optional): Language model to use. If None, must be configured later.
        reflection_limit (int): Number of reflections to include in the system prompt.
        
    Returns:
        dspy.StructuredAgent: A configured agent with memory capabilities
    """
    # Start with the basic HelloLOD tools
    basic_tools = [
        get_tool_by_name("LoadContext"),      # Context layer
        get_tool_by_name("FetchOntology"),    # Ontology layer
        get_tool_by_name("ValidateEntity"),   # Rules layer
        get_tool_by_name("GraphManager"),     # Instances layer
        get_tool_by_name("VerifySignature"),  # Verification layer
        get_tool_by_name("EchoMessage"),      # Utility layer
    ]
    
    # Add memory tools
    memory_tools = [
        get_tool_by_name("AddReflection"),     # Store reflections
        get_tool_by_name("RecallReflection"),  # Retrieve reflections
        get_tool_by_name("ReflectionPrompt"),  # Format reflections for prompts
    ]
    
    # Combine and filter tools
    all_tools = basic_tools + memory_tools
    all_tools = [t for t in all_tools if t is not None]
    
    # Create an initial agent to get reflections
    reflection_prompt = ""
    try:
        # Try to get reflections from the store
        prompt_tool = get_tool_by_name("ReflectionPrompt")
        if prompt_tool:
            prompt_instance = prompt_tool()
            reflection_prompt = prompt_instance(limit=reflection_limit)
    except Exception as e:
        print(f"Warning: Could not retrieve reflections: {e}")
    
    # Create the agent with memory-enhanced system prompt
    system_prompt = get_memory_enhanced_system(reflection_prompt)
    agent = dspy.StructuredAgent(
        tools=all_tools,
        lm=lm,
        system=system_prompt
    )
    
    return agent

## FullPlanner: Comprehensive Semantic Web Agent

Our most capable agent, `FullPlanner`, includes all available tools and advanced memory integration. This agent is designed for complex semantic web tasks that require multiple tools and sophisticated reasoning.

Key features of the FullPlanner:

1. Includes all tools from all semantic layers
2. Full memory integration with reflection capabilities
3. Enhanced system prompt with layer-specific reasoning
4. Support for telemetry to track performance

This agent represents the full power of our semantic web framework.

In [None]:
#| export
def build_full_planner(lm=None, reflection_limit=5):
    """
    Create a comprehensive semantic web agent with all available tools.
    
    This agent includes all tools from all layers, along with memory
    integration and enhanced reasoning capabilities.
    
    Args:
        lm (dspy.LM, optional): Language model to use. If None, must be configured later.
        reflection_limit (int): Number of reflections to include in the system prompt.
        
    Returns:
        dspy.StructuredAgent: A fully configured comprehensive agent
    """
    # Get all available tools
    all_tools = get_tools()
    
    # Try to get reflections for the system prompt
    reflection_prompt = ""
    try:
        prompt_tool = get_tool_by_name("ReflectionPrompt")
        if prompt_tool:
            prompt_instance = prompt_tool()
            reflection_prompt = prompt_instance(limit=reflection_limit)
    except Exception as e:
        print(f"Warning: Could not retrieve reflections: {e}")
    
    # Create enhanced system prompt with tool-specific guidance
    layers = list_layers()
    tools_by_layer = group_tools_by_layer(all_tools)
    
    # Build layer-specific guidance
    layer_guidance = ""
    for layer in layers:
        if layer in tools_by_layer and tools_by_layer[layer]:
            layer_tools = tools_by_layer[layer]
            tool_names = ", ".join([t.__name__ for t in layer_tools])
            layer_guidance += f"\n- {layer} Layer: {tool_names}"
    
    # Enhanced system prompt with layer/tool mappings
    extended_system = f"{SEMANTIC_WEB_SYSTEM}\n\nAvailable tools by layer:{layer_guidance}"
    system_prompt = get_memory_enhanced_system(reflection_prompt).replace(SEMANTIC_WEB_SYSTEM, extended_system)
    
    # Create the full agent
    agent = dspy.StructuredAgent(
        tools=all_tools,
        lm=lm,
        system=system_prompt
    )
    
    return agent

## Testing Helper Functions

To streamline testing and demonstration, we'll create helper functions that can run queries against our agents and display the results in a structured format.

In [None]:
#| export
def run_test_query(agent, query, save_reflection=False, reflection_tags=None):
    """
    Run a test query through an agent and format the results.
    
    This function runs a query through the specified agent and returns
    the response along with metadata about tool usage and layer selection.
    Optionally saves the interaction as a reflection for future reference.
    
    Args:
        agent (dspy.StructuredAgent): The agent to query
        query (str): The user query to process
        save_reflection (bool): Whether to save the interaction as a reflection
        reflection_tags (list): Tags to apply to the reflection
        
    Returns:
        dict: Response and metadata about the interaction
    """
    # Default tags if none provided
    if reflection_tags is None:
        reflection_tags = ["test"]
    
    # Process the query
    try:
        response = agent(query)
    except Exception as e:
        return {"error": str(e), "query": query}
    
    # Extract metadata
    result = {
        "query": query,
        "response": response.get("response", "No response"),
        "tool_used": response.get("tool", "None"),
        "layer_detected": None,  # Will fill this in
    }
    
    # Determine the layer
    if result["tool_used"] != "None":
        tool_class = get_tool_by_name(result["tool_used"])
        if tool_class:
            result["layer_detected"] = tool_class.layer
    
    # Save reflection if requested
    if save_reflection and result["tool_used"] != "None":
        try:
            add_reflection = get_tool_by_name("AddReflection")
            if add_reflection:
                reflection_tool = add_reflection()
                reflection_text = f"For query '{query}', used {result['tool_used']} from {result['layer_detected']} layer."
                reflection_id = reflection_tool(text=reflection_text, tags=reflection_tags)
                result["reflection_id"] = reflection_id
        except Exception as e:
            result["reflection_error"] = str(e)
    
    return result

## Complete Pipeline Factory

Finally, we'll create a factory function that can produce any of our agent types based on a configuration. This provides a clean interface for applications to obtain the right agent for their needs.

In [None]:
#| export
def create_agent(agent_type="hello", lm=None, reflection_limit=5):
    """
    Factory function to create different types of semantic web agents.
    
    This function creates and returns the specified type of agent,
    handling the configuration details for each variant.
    
    Args:
        agent_type (str): Type of agent to create ('hello', 'memory', or 'full')
        lm (dspy.LM, optional): Language model to use. If None, must be configured later.
        reflection_limit (int): Number of reflections to include for memory-enabled agents.
        
    Returns:
        dspy.StructuredAgent: The configured agent of the requested type
        
    Raises:
        ValueError: If an invalid agent_type is specified
    """
    if agent_type.lower() in ["hello", "basic", "hellolod"]:
        return build_hellolod(lm)
    elif agent_type.lower() in ["memory", "hellolodwithmemory"]:
        return build_hellolod_with_memory(lm, reflection_limit)
    elif agent_type.lower() in ["full", "fullplanner"]:
        return build_full_planner(lm, reflection_limit)
    else:
        raise ValueError(f"Unknown agent type: {agent_type}. Use 'hello', 'memory', or 'full'.")

# Legacy name for backward compatibility
make_hello_agent = create_agent

## Example Usage

Let's demonstrate how to use these agents with some example queries. Note that you'll need to configure a language model before running these examples.

In [None]:
#| hide
# This cell would set up a language model for testing
# It's hidden from export but useful for notebook testing
try:
    # Try to set up a basic LM for testing
    # This would typically use anthropic.Anthropic or similar
    # Placeholder for actual implementation
    # lm = dspy.OpenAI(model="gpt-3.5-turbo")
    pass
except:
    pass

In [None]:
#| hide
# Example to run when an LM is configured
'''
# Create a basic agent
agent = create_agent("hello", lm)

# Run some test queries
test_queries = [
    "What does the JSON-LD context do?",
    "How do I look up the definition of schema:Person?",
    "Can you validate this SHACL shape?",
    "How do I query for all triples about a person?",
    "How can I verify this signed document?"
]

for query in test_queries:
    result = run_test_query(agent, query)
    print(f"Query: {query}")
    print(f"Layer: {result['layer_detected']}")
    print(f"Tool: {result['tool_used']}")
    print(f"Response: {result['response'][:100]}...")
    print("-" * 80)
'''

## Conclusion

In this notebook, we've implemented a layered approach to semantic web agents using DSPy's structured agent framework. The key components we've created are:

1. **System prompts** that explain the semantic web layers and guide tool selection
2. **Agent implementations** at different capability levels (HelloLOD, HelloLODWithMemory, FullPlanner)
3. **Memory integration** to learn from past interactions
4. **Testing utilities** to validate agent behavior

These components form the foundation of our semantic web agent architecture, enabling sophisticated reasoning across the different layers of the semantic web stack. The agents can now be integrated into applications to provide semantic web capabilities through natural language interfaces.