In [None]:
#| hide
from cogitarelink_dspy.core import *

  from .autonotebook import tqdm as notebook_tqdm


# cogitarelink-dspy

> A DSPy-based agent for Semantic Web knowledge navigation and reasoning

This repository implements a DSPy-based agent that integrates with the Cogitarelink framework for Semantic Web data, with a focus on providing intelligent navigation of Linked Data resources. The agent is designed to understand and operate within a 4-layer Semantic Web architecture, selecting the appropriate tools for different user queries.

## Developer Guide

If you are new to using `nbdev` here are some useful pointers to get you started.

### Install cogitarelink_dspy in Development mode

```sh
# make sure cogitarelink_dspy package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to cogitarelink_dspy
$ nbdev_prepare
```

## Usage

### Installation

Install latest from the GitHub [repository][repo]:

```sh
$ pip install git+https://github.com/LA3D/cogitarelink-dspy.git
```

or from [conda][conda]

```sh
$ conda install -c LA3D cogitarelink_dspy
```

or from [pypi][pypi]


```sh
$ pip install cogitarelink_dspy
```


[repo]: https://github.com/LA3D/cogitarelink-dspy
[docs]: https://LA3D.github.io/cogitarelink-dspy/
[pypi]: https://pypi.org/project/cogitarelink-dspy/
[conda]: https://anaconda.org/LA3D/cogitarelink-dspy

### Documentation

Documentation can be found hosted on this GitHub [repository][repo]'s [pages][docs]. Additionally you can find package manager specific guidelines on [conda][conda] and [pypi][pypi] respectively.

[repo]: https://github.com/LA3D/cogitarelink-dspy
[docs]: https://LA3D.github.io/cogitarelink-dspy/
[pypi]: https://pypi.org/project/cogitarelink-dspy/
[conda]: https://anaconda.org/LA3D/cogitarelink-dspy

## How to use

## Research Overview

This project explores how Large Language Models (LLMs) can be combined with Semantic Web technologies to create agents that effectively navigate and reason over Linked Data. Our approach focuses on several key research areas:

### 1. Semantic Layer Architecture

We implement a 4-layer architecture for Semantic Web navigation:

1. **Context Layer** - Working with JSON-LD contexts and namespaces
2. **Ontology Layer** - Accessing and interpreting vocabularies and ontology terms
3. **Rules Layer** - Applying validation rules and shapes (SHACL)
4. **Instances Layer** - Managing actual data instances and entity graphs
5. **Verification Layer** - Verifying and signing graphs for provenance

The agent is designed to select the highest appropriate layer for each user request, enabling efficient access to knowledge at various levels of abstraction.

### 2. Semantic Memory and Reflection

The agent maintains a semantic memory store implemented as a knowledge graph with full provenance. This allows the agent to:

- Record "lessons learned" as reflection notes
- Store those reflections as properly typed JSON-LD entities
- Retrieve and use past experiences to guide future reasoning
- Track errors and validation failures to avoid repetition

### 3. Tool Generation Architecture

A key innovation is our component registry pattern:

- All available tools are defined in a central catalog (`COMPONENTS` in `components.py`)
- DSPy tool wrappers are automatically generated from this registry
- Each tool is tagged with its appropriate semantic layer
- This enables a clean separation between tool definitions and their implementation

### 4. Real-World Linked Data Integration

The agent can interact with:

- Wikidata via SPARQL
- Schema.org JSON-LD collections
- Public SHACL validation shapes
- Custom knowledge graphs with full provenance tracking

## Code Examples

```python
# Initialize a basic Cogitarelink DSPy agent
from cogitarelink_dspy.core import make_hello_agent

# Create an agent with default LLM configuration
agent = make_hello_agent()

# Process a query that will be routed to the appropriate semantic layer
result = agent("What ontology terms are available for describing a Person?")
print(f"Layer used: {result['layer_used']}")
print(f"Response: {result['llm_response']}")
```

## Project Structure

The project follows Jeremy Howard's literate programming approach with nbdev:

- **Notebooks First**: All code is developed in notebooks under `nbs/` 
- **Auto-Export**: Python modules are auto-generated using `nbdev_export`
- **Component Registry**: Central tool definitions in `components.py`
- **Memory**: JSON-LD based semantic memory system in `memory.py`
- **Telemetry**: Knowledge graph-based telemetry in `telemetry.py`

## Research Goals

Our ultimate goals with this project are to:

1. Create an agent that can effectively operate over the entire Semantic Web stack
2. Demonstrate how LLMs can be guided by semantic layer understanding
3. Enable mixed-initiative interactions between users and Linked Data resources
4. Build a foundation for verifiable, provenance-tracked knowledge systems

## Getting Started

To start working with the Cogitarelink-DSPy agent, you'll need to set up your environment:

```python
# Install dependencies
!pip install dspy cogitarelink

# Import the core modules
import dspy
from cogitarelink_dspy.core import make_hello_agent
from cogitarelink_dspy.components import COMPONENTS, get_tools_by_layer
from cogitarelink_dspy.wrappers import get_tools

# Configure DSPy with your preferred LLM
# For OpenAI models:
import os
os.environ["OPENAI_API_KEY"] = "your-api-key-here"
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Create an agent
agent = make_hello_agent()

# Run a simple test query
result = agent("What are the core components of Cogitarelink?")
print(result["llm_response"])
```

### Working with Semantic Layers

To work with specific semantic layers:

```python
# Get all tools for the Context layer
context_tools = get_tools_by_layer("Context")
print(f"Available Context tools: {list(context_tools.keys())}")

# Get all tools for the Ontology layer
ontology_tools = get_tools_by_layer("Ontology")
print(f"Available Ontology tools: {list(ontology_tools.keys())}")
```

### Using the Memory System

The agent can record reflections and learn from experience:

```python
from cogitarelink.core.graph import GraphManager
from cogitarelink_dspy.memory import ReflectionStore

# Initialize a graph manager and reflection store
graph = GraphManager(use_rdflib=True)
memory = ReflectionStore(graph)

# Add a reflection note
memory.add("When querying Wikidata, use wdt: prefix for direct properties",
           tags=["wikidata", "sparql"])

# Retrieve recent reflections
notes = memory.retrieve(limit=5)
for note in notes:
    print(f"- {note.content['text']}")

# Use reflections in system prompt
reflection_prompt = memory.as_prompt()
print(reflection_prompt)
```