# Understanding how to build Agents using the Solve It approach

In [None]:
# !pip install cosette

In [None]:
from cosette import *
from fastcore.utils import *
import fastcore.all as fc, re, math, itertools, functools, numpy as np, types, typing, dataclasses, matplotlib.pyplot as plt, collections, regex
from regex import search
from collections import Counter
from collections.abc import Iterable
from pydantic import BaseModel, Field, field_validator
import uuid
from typing import Literal, Union, List, Dict, Any
from toolslm.funccall import get_schema, mk_ns, call_func
from rdflib.plugins.parsers.notation3 import BadSyntax
from rdflib import Graph, URIRef
from pyshacl import validate
import json
import httpx

**AI Prompt**

Hi solveit! I am Chuck, and I am working with you today! I'm know some Python, experienced with semantic web technologies like OWL,RDF and SPARQL, and I enjoy learning new things.How When writing code for me, please follow fast.ai coding style: avoid comments; place single line if/loop/etc constructs on the same line; use concise variable names where appropriate. Follow the style you will see in my messages, as we work on this dialogue together. Are you ready?

**AI Response**

Hi Chuck! Great to meet you! I understand you're experienced with Python and semantic web technologies, and I'll follow the fast.ai coding style you described. 

Before we dive in, could you tell me what specific aspect of building agents you'd like to explore first? Given your background with semantic technologies, we could approach this from several angles - perhaps looking at how to structure agent knowledge, handle tool interactions, or implement reasoning loops.

What interests you most about agents?

## Context Resources
The following resources have been converted to markdown using [web2md](https://web2md.answer.ai/):
- [Anthropics "Building Effective Agents"](https://www.anthropic.com/research/building-effective-agents)
- [Topics from Anthropics Prompt Engineering Guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview)
- [Cosette is Claudette’s sister, a helper for OpenAI GPT Documentation](https://answerdotai.github.io/cosette/)

## Agentic Patterns
Very broadly, Andrew Ng [defines agentic patterns](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/) as:

- Reflection: The LLM examines its own work to come up with ways to improve it. 
- Tool Use: The LLM is given tools such as web search, code execution, or any other function to help it gather information, take action, or process data.
- Planning: The LLM comes up with, and executes, a multistep plan to achieve a goal (for example, writing an outline for an essay, then doing online research, then writing a draft, and so on).
- Multi-agent collaboration: More than one AI agent work together, splitting up tasks and discussing and debating ideas, to come up with better solutions than a single agent would. This builds on the idea of Roles in Anthropics Prompting Guide.

The Anthropic Agents ["Building Effective Agents"](https://www.anthropic.com/research/building-effective-agents) provides more concrete advice on building agents.
Our Goal is to discover how to build agents using the Solve It approach leveraging Cosette's capabilites to help build the agents.

Let's take an exploratory approach similar to Jeremy Howard's style in building Cosette - starting simple and building understanding through experimentation. We want to use exploratory and literate programming to build agent like systems. We want to leverage Cosette's conversation handling, tool use and tool loops to build on it and provide additional agentic behaviors. 

For the Solve It approach to agents, we should begin by creating a minimal working example that we can iterate on. Would you like to:

1. Start by creating a basic agent class that we can extend with capabilities, examining each component as we add it
2. Begin with a specific use case (like a research or coding assistant) and evolve our understanding through practical implementation
3. Focus on the interaction patterns first (like the tool loop pattern from Cosette) and build our agent architecture around that

Cosette already provides the basic conversation and model interaction capabilities.

Instead of reinventing these components, we should focus on what makes an agent different from a basic chat interface. Looking at the Anthropic Agents documentation, the key differences are:

1. Agents dynamically direct their own processes
2. They maintain control over how they accomplish tasks
3. They can plan and execute multi-step operations

We're trying to build an agentic system that can:

1. Autonomously explore and understand dataset semantics
2. Leverage semantic web standards (RDF, JSON-LD, Wikidata concepts)
3. Suggest meaningful semantic enhancements to make datasets more FAIR (Findable, Accessible, Interoperable, Reusable)

Using the Titanic dataset as our example, we're following Jeremy Howard's iterative approach:
1. Start simple
2. Test and observe
3. Identify what's missing
4. Improve incrementally

So far we've learned that:
- Basic metadata exploration works
- But true semantic understanding requires more sophisticated reasoning
- The agent needs better tools for semantic concept discovery

## Solveit discovery
The solveit process revealed several interesting insights about building agentic systems:

1. **Iterative Development with LLM Assistance**:
   - The LLM was able to suggest code improvements
   - It identified gaps in semantic understanding
   - It adapted its suggestions based on observed results
   - This suggests LLMs can be effective "pair programmers" for agent development

2. **Emergent Exploration Pattern**:
   - The LLM naturally moved from simple listing to deeper semantic analysis
   - It suggested improvements to its own code
   - It identified when approaches weren't working
   - This hints at meta-learning capabilities we could leverage

3. **Areas for Improvement**:
   - The prompting could be more focused on reasoning about semantic relationships
   - We could better leverage the LLM's ability to understand context
   - Tool definitions could be more flexible and self-describing
   - The agent could build up knowledge more systematically

Would you like to explore any of these aspects further, particularly how we might improve the prompting to create more sophisticated semantic exploration?

## Example Data from the Titanic Data set.

The Titanic Data set is available from https://www.kaggle.com/datasets/yasserh/titanic-dataset. It is a classic data set used for machine learning tutorials. It contains information about the passengers on the Titanic, including their age, sex, class, and whether they survived the sinking.

The croissant spec details are available:
- [Spec Markdown](https://raw.githubusercontent.com/mlcommons/croissant/refs/heads/main/docs/croissant-spec.md)
- [Formal Definitions in RDF Turtle](https://raw.githubusercontent.com/mlcommons/croissant/refs/heads/main/docs/croissant.ttl)

In [None]:
tcurl="https://storage.googleapis.com/kaggle-data-sets/1818188/2965537/croissant/metadata.json?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1738860546&Signature=sv%2FWTla20YEMWRFThwdkuxxbtk4WKwobqg3eysAizumIHdM6KsxzaDPX2pFp0WA3Fnz3KTQdgxVozOytxl0UCS2hPjDQqbWq2CaWZJ5dWGH%2F1vDoivVvjLefmcPHqKt3%2F1gG5BviIGuT5ZKGBGu2bq9s%2FSB%2Ble4RBKWkO%2BXuocxXql4yqC%2Bl9boYOB8FZFR%2FtPY85kfr7DYP4D5mFo8cg7qe7oFpsqkGKuoKddEWQzq6l%2FQMVzp18s6oVhCItbG4m8FrO65niOrPWrHbrSWQNLQ8wzGL93pcEjK1WAOchxWS%2F4%2FfdlBq4Isf1x57ghz8k9wzDzTvz3sb6KCRIWdTcw%3D%3D&response-content-disposition=attachment%3B+filename%3Dtitanic-dataset-metadata.json"

In [None]:
def get_croissant(url, fname='titanic_croissant.json'):
    "Download Croissant metadata and save to file"
    try:
        r = httpx.get(url)
        r.raise_for_status()
        with open(fname, 'wb') as f: f.write(r.content)
        return fname
    except Exception as e: print(f'Download failed: {e}')

fname = get_croissant(tcurl)
if fname:
    with open(fname) as f: data = json.load(f)
    print(f'Successfully loaded {len(data)} metadata fields')

Successfully loaded 19 metadata fields


In [None]:
models
models[2]

'gpt-4o'

#### Using data types from other vocabularies

Croissant datasets can use data types from other vocabularies, such as Wikidata. These may be supported by the tools consuming the data, but don’t need to. For example:

<table>
  <thead>
    <th>dataType</th>
    <th>Usage</th>
  </thead>
  <tr>
    <td>
      <a href="https://www.wikidata.org/wiki/Q48277">wd:Q48277</a><br>
      (gender)
    </td>
    <td>Describes a Field or a RecordSet whose values are indicative of someone’s gender. This could be used for instance by RAI frameworks and tools to flag possible biases in the data. Values for this RecordSet can be associated with specific gender URLs (eg: <a href="https://www.wikidata.org/wiki/Q6581097">wd:Q6581097</a>, <a href="https://www.wikidata.org/wiki/Q6581072">wd:Q6581072</a>, etc.). Refer to the "Typed RecordSets > Enumerations" section for an example.</td>
  </tr>
</table>

In the following example, `color_sample` is a field containing an image, but with no associated machine-readable semantics.

```json
{
  "@id": "images/color_sample",
  "@type": "cr:Field",
  "dataType": "sc:ImageObject"
}
```

In the following example, the `url` field is expected to be a URL, whose semantic type is [City](https://www.wikidata.org/wiki/Q515), so one will expect values of this field to be URLs referring to cities (e.g.: "<https://www.wikidata.org/wiki/Q90>").

```json
{
  "@id": "cities/url",
  "@type": "cr:Field",
  "dataType": ["https://schema.org/URL", "https://www.wikidata.org/wiki/Q515"]
}
```

In [None]:
class SemanticAgent:
    "Agent that autonomously explores and enhances dataset semantics"
    def __init__(self, data, model=models[2]):
        self.data = data
        self.fields = {f['name']:f for rs in data.get('recordSet',[]) 
                      for f in rs.get('field',[])}
        self.findings = []
        self.semantic_context = self.data.get('@context',{})
        self.chat = Chat(model, tools=[
            self.list_fields,
            self.check_field,
            self.suggest_semantics,
            self.fetch_wikidata,
            self.get_namespace            
        ])
    
    def list_fields(self):
        "List all fields and their basic types"
        return {k: f.get('dataType',[]) for k,f in self.fields.items()}
    
    def check_field(self, field_name:str):
        "Get detailed information about a field"
        return self.fields.get(field_name, {})
    
    def suggest_semantics(self, field_name:str):
        "Suggest semantic enhancements for a field"
        field = self.fields.get(field_name,{})
        return {
            'description': field.get('description',''),
            'types': field.get('dataType',[]),
            'context': self.semantic_context
        }

    def get_namespace(self, prefix:str) -> str:
        "Get full URI for a namespace prefix"
        return self.semantic_context.get(prefix,'')
    
    def fetch_wikidata(self, concept:str) -> Dict:
        "Fetch information about a Wikidata concept"
        # Now handles both Q ids and semantic concepts
        if concept.startswith('wd:'): 
            concept = concept.split(':')[1]
        elif not concept.startswith('Q'): 
            return {'error':'Not a Wikidata ID or CURIE'}
        url = f"https://www.wikidata.org/wiki/{concept}"
        try: return httpx.get(f"{url}.json").json()
        except: return {'error':f'Failed to fetch {url}'}
        
    def explore(self):
        "Start semantic exploration"
        prompt = """You are a semantic data explorer. Your goal is to:
        1. Examine the available fields
        2. For each field, suggest semantic enhancements using:
           - schema.org types (sc:)
           - wikidata concepts (wd:)
           - MLCommons Croissant vocab (cr:)
        
        Start by listing the fields and then examine each one.
        """
        return self.chat.toolloop(prompt, trace_func=print)

agent = SemanticAgent(data)
agent.explore()

ChatCompletion(id='chatcmpl-Ax0L0CYgDsH0pOUwGCjtaYKdtDyLr', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_CzJZCUwhyvQc7F5g5WwZfVI9', function=Function(arguments='{}', name='list_fields'), type='function')]))], created=1738626014, model='gpt-4o-2024-08-06', object='chat.completion', service_tier='default', system_fingerprint='fp_50cad350e4', usage=In: 219; Out: 11; Total: 230)
ChatCompletion(id='chatcmpl-Ax0L0mnDqTJ2HNBTlFLEz7wuZHeZI', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_6qepuaDVvLMY8oRWcx2JTS5u', function=Function(arguments='{"field_name": "PassengerId"}', name='suggest_semantics'), type='function'), ChatCompletionMessag

Here are the fields with suggested semantic enhancements using schema.org types, wikidata concepts, and MLCommons Croissant vocabulary:

1. **PassengerId**:
   - Description: Passenger ID
   - Suggested Type: `sc:Text`

2. **Survived**:
   - Description: Whether survived or not (0 = No, 1 = Yes)
   - Suggested Type: `sc:Integer`

3. **Pclass**:
   - Description: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)
   - Suggested Type: `sc:Integer`

4. **Name**:
   - Description: Name of the Passenger
   - Suggested Type: `sc:Text`

5. **Sex**:
   - Description: Gender
   - Suggested Type: `sc:Text`

6. **Age**:
   - Description: Age in Years
   - Suggested Type: `sc:Float`

7. **SibSp**:
   - Description: Number of siblings or spouses aboard the Titanic
   - Suggested Type: `sc:Integer`

8. **Parch**:
   - Description: Number of parents or children aboard the Titanic
   - Suggested Type: `sc:Integer`

9. **Ticket**:
   - Description: Ticket number
   - Suggested Type: `sc:Text`

10. **Fare**:
    - Description: Passenger fare
    - Suggested Type: `sc:Float`

11. **Cabin**:
    - Description: Cabin number
    - Suggested Type: `sc:Text`

12. **Embarked**:
    - Description: Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
    - Suggested Type: `sc:Text`

Each field also includes context from MLCommons Croissant vocabulary, which aids in providing metadata for better dataset usage and management.

<details>

- id: chatcmpl-Ax0L4ZnTN64tPTeIExOmWNgzrRTKw
- choices: [Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Here are the fields with suggested semantic enhancements using schema.org types, wikidata concepts, and MLCommons Croissant vocabulary:\n\n1. **PassengerId**:\n   - Description: Passenger ID\n   - Suggested Type: `sc:Text`\n\n2. **Survived**:\n   - Description: Whether survived or not (0 = No, 1 = Yes)\n   - Suggested Type: `sc:Integer`\n\n3. **Pclass**:\n   - Description: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)\n   - Suggested Type: `sc:Integer`\n\n4. **Name**:\n   - Description: Name of the Passenger\n   - Suggested Type: `sc:Text`\n\n5. **Sex**:\n   - Description: Gender\n   - Suggested Type: `sc:Text`\n\n6. **Age**:\n   - Description: Age in Years\n   - Suggested Type: `sc:Float`\n\n7. **SibSp**:\n   - Description: Number of siblings or spouses aboard the Titanic\n   - Suggested Type: `sc:Integer`\n\n8. **Parch**:\n   - Description: Number of parents or children aboard the Titanic\n   - Suggested Type: `sc:Integer`\n\n9. **Ticket**:\n   - Description: Ticket number\n   - Suggested Type: `sc:Text`\n\n10. **Fare**:\n    - Description: Passenger fare\n    - Suggested Type: `sc:Float`\n\n11. **Cabin**:\n    - Description: Cabin number\n    - Suggested Type: `sc:Text`\n\n12. **Embarked**:\n    - Description: Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)\n    - Suggested Type: `sc:Text`\n\nEach field also includes context from MLCommons Croissant vocabulary, which aids in providing metadata for better dataset usage and management.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None))]
- created: 1738626018
- model: gpt-4o-2024-08-06
- object: chat.completion
- service_tier: default
- system_fingerprint: fp_50cad350e4
- usage: CompletionUsage(completion_tokens=397, prompt_tokens=5173, total_tokens=5570, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

</details>

**AI Prompt**

Ok. Is this working as we expected? I'm not sure that the semantic functions actually work like get_namespace and fetch_wikidata.

**AI Response**

You make a good observation. Let's test these functions individually to see if they're working as expected. Should we:

1. Test `get_namespace` to see if it's correctly retrieving namespace URIs from the context
2. Try `fetch_wikidata` with a known Wikidata concept
3. Check if `suggest_semantics` is providing meaningful semantic enhancements beyond just listing the basic types

Which would you like to try first? We can then improve the functions based on what we find.

**AI Prompt**

Yes. Let's test these functions.

**AI Response**

Let's start with testing `get_namespace` first since it's the simplest. We can check what namespaces are available in our context:

```python
print("Available context:")
for k,v in agent.semantic_context.items(): print(f"{k}: {v}")

print("\nTesting get_namespace:")
print(agent.get_namespace('sc'))  # Should get schema.org
print(agent.get_namespace('wd'))  # Should get wikidata
```

After we see the results of this, we can test the next function. Would you like to see what happens?

In [None]:
print("Available context:")
for k,v in agent.semantic_context.items(): print(f"{k}: {v}")

print("\nTesting get_namespace:")
print(agent.get_namespace('sc'))  # Should get schema.org
print(agent.get_namespace('wd'))  # Should get wikidata

Available context:
@language: en
@vocab: https://schema.org/
citeAs: cr:citeAs
column: cr:column
conformsTo: dct:conformsTo
cr: http://mlcommons.org/croissant/
data: {'@id': 'cr:data', '@type': '@json'}
dataBiases: cr:dataBiases
dataCollection: cr:dataCollection
dataType: {'@id': 'cr:dataType', '@type': '@vocab'}
dct: http://purl.org/dc/terms/
extract: cr:extract
field: cr:field
fileProperty: cr:fileProperty
fileObject: cr:fileObject
fileSet: cr:fileSet
format: cr:format
includes: cr:includes
isEnumeration: cr:isEnumeration
jsonPath: cr:jsonPath
key: cr:key
md5: cr:md5
parentField: cr:parentField
path: cr:path
personalSensitiveInformation: cr:personalSensitiveInformation
recordSet: cr:recordSet
references: cr:references
regex: cr:regex
repeated: cr:repeated
replace: cr:replace
sc: https://schema.org/
separator: cr:separator
source: cr:source
subField: cr:subField
transform: cr:transform
wd: https://www.wikidata.org/wiki/

Testing get_namespace:
https://schema.org/
https://www.wikidata.

In [None]:
def test_wikidata(concept:str) -> Dict:
    "Test function for fetching Wikidata concept"
    api_url = "https://www.wikidata.org/w/api.php"
    params = {
        'action': 'wbgetentities',
        'ids': concept.replace('wd:',''),
        'format': 'json',
        'languages': 'en'
    }
    try:
        r = httpx.get(api_url, params=params)
        r.raise_for_status()
        return r.json()
    except Exception as e:
        return {'error':f'API request failed: {str(e)}'}

# Let's try it with our gender concept
print(test_wikidata('Q48277'))

{'entities': {'Q48277': {'pageid': 50269, 'ns': 0, 'title': 'Q48277', 'lastrevid': 2305528428, 'modified': '2025-02-02T01:58:13Z', 'type': 'item', 'id': 'Q48277', 'labels': {'en': {'language': 'en', 'value': 'gender'}}, 'descriptions': {'en': {'language': 'en', 'value': 'social concept which distinguish the different gender categories'}}, 'aliases': {}, 'claims': {'P910': [{'mainsnak': {'snaktype': 'value', 'property': 'P910', 'hash': 'cbeff98df33266793a95bd4e48dfc5106c318be1', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 6916101, 'id': 'Q6916101'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'Q48277$0C13078D-4136-4780-8696-307D9C7FBA58', 'rank': 'normal', 'references': [{'hash': '04d794b59797725524c5ae5f59e6f600dc32c7a5', 'snaks': {'P3452': [{'snaktype': 'value', 'property': 'P3452', 'hash': '2c9f9d858ae9c4ab12fc3ded577eed03ef3379b6', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 6916101, 'id': 'Q6916101'}, 'typ

**AI Prompt**

So, there is a second api, query.wikidata.org? Can we use the two api's together?

**AI Response**

Yes! The query.wikidata.org endpoint lets us use SPARQL queries, which could be very powerful in combination with the entity API. For example, we could:

1. Use SPARQL to find related concepts or properties
2. Then use the entity API to get detailed information about those concepts

Would you like to see an example of how we might combine these? We could start with a simple SPARQL query to find related gender concepts, and then fetch their details using our simplified entity function.

**AI Prompt**

Ok. The problem here is that LLMs hallucinate Property IDs and QIDs.

**AI Response**

Ah yes, that's a crucial point! LLMs might make up QIDs or Property IDs that don't exist, which could break our queries. We need a way to:

1. Validate that QIDs/PIDs actually exist before querying
2. Maybe provide a predefined list of common/verified properties we know are useful
3. Use the SPARQL endpoint to discover actual relationships rather than letting the LLM guess

Would you like to explore one of these approaches? We could start by creating a validation function that checks if a QID/PID exists before we try to use it.

**AI Prompt**

What would be the agentic approach to this?

**AI Response**

An agentic approach would be to let the system discover and learn relationships rather than relying on guessing. Here's a potential strategy:

1. Start with what we know is valid (like the field name "Sex" from our Titanic dataset)
2. Use the Wikidata search API to find relevant concepts
3. Let the agent explore and map actual relationships between found concepts
4. Build up a verified knowledge base of QIDs and properties through exploration

This way, the agent would work with real data rather than hallucinated IDs, and could build its understanding incrementally.

Would you like to explore how we might implement this discovery-based approach?

**AI Prompt**

Yes. I like this discovery based approach.

**AI Response**

Let's start by creating a function to search Wikidata using their API. This will be our first step in the discovery process:

```python
def search_wikidata(term:str) -> Dict:
    "Search Wikidata for a term and return potential matches"
    api_url = "https://www.wikidata.org/w/api.php"
    params = {
        'action': 'wbsearchentities',
        'search': term,
        'language': 'en',
        'format': 'json'
    }
    try:
        r = httpx.get(api_url, params=params)
        r.raise_for_status()
        return r.json()
    except Exception as e:
        return {'error': f'Search failed: {str(e)}'}

# Let's test it with 'sex' from our Titanic dataset
print(search_wikidata('sex'))
```

This will give us valid QIDs to start with. Would you like to try this first step and then we can build on it?

**AI Prompt**

Ok. I think we may need a few different wikidata lookup tools for the agents. We probably need a QID->lookup to make sure it is the correct semantic entity (Thing not string). We probably need a text term (label) -> QID lookup. We proably need something that deals with Properties in a similar way.

**AI Response**

Good thinking! Let's organize these lookup tools systematically. We could create a WikidataTools class with these core functions:

```python
class WikidataTools:
    "Collection of Wikidata lookup and validation tools"
    def __init__(self):
        self.api_url = "https://www.wikidata.org/w/api.php"
        self.verified_cache = {}
    
    def label_to_qid(self, label:str) -> Dict:
        "Convert text label to potential QIDs with descriptions"
        params = {'action':'wbsearchentities', 'search':label, 
                 'language':'en', 'format':'json'}
        return self._api_call(params)
    
    def verify_qid(self, qid:str) -> Dict:
        "Verify QID exists and get its details"
        if not qid.startswith('Q'): return {'error':'Not a QID'}
        params = {'action':'wbgetentities', 'ids':qid,
                 'languages':'en', 'format':'json'}
        return self._api_call(params)
    
    def _api_call(self, params:Dict) -> Dict:
        "Make API call with error handling"
        try:
            r = httpx.get(self.api_url, params=params)
            r.raise_for_status()
            return r.json()
        except Exception as e:
            return {'error':f'API call failed: {str(e)}'}
```

Would you like to test this initial version? We can then add property handling and any other tools needed.

**AI Prompt**

So, we want to use an LLM to figure out which is the corrent QID?

**AI Response**

Yes, that's a great approach! The LLM could help disambiguate and choose the most relevant QID by:

1. Understanding the context of our dataset (e.g., "sex" in the Titanic dataset refers to biological sex)
2. Analyzing the descriptions of each QID returned
3. Making an informed choice based on semantic relevance

Would you like to explore how we might modify our WikidataTools class to incorporate LLM-based disambiguation? We could add a method that uses the Chat interface we already have to make these decisions.

In [None]:
class WikidataTools:
    def __init__(self, chat:Chat):
        self.api_url = "https://www.wikidata.org/w/api.php"
        self.verified_cache = {}
        self.chat = chat
        
    def get_relationships(self, qid:str) -> Dict:
        "Get instance-of and subclass-of relationships for a QID"
        params = {
            'action': 'wbgetclaims',
            'entity': qid,
            'property': 'P31|P279',  # instance-of|subclass-of
            'format': 'json'
        }
        result = self._api_call(params)
        rels = {'instance_of': [], 'subclass_of': []}
        if 'claims' in result:
            if 'P31' in result['claims']:
                rels['instance_of'] = [c['mainsnak']['datavalue']['value']['id'] 
                                     for c in result['claims']['P31']
                                     if 'datavalue' in c['mainsnak']]
            if 'P279' in result['claims']:
                rels['subclass_of'] = [c['mainsnak']['datavalue']['value']['id'] 
                                     for c in result['claims']['P279']
                                     if 'datavalue' in c['mainsnak']]
        return rels

    def disambiguate_concept(self, term:str, context:str) -> Dict:
        "Use LLM to choose most relevant QID including semantic relationships"
        results = self.label_to_qid(term)
        if 'error' in results: return results
        
        candidates = []
        for item in results.get('search',[]):
            qid = item['id']
            rels = self.get_relationships(qid)
            candidates.append({
                'id': qid,
                'label': item.get('label',''),
                'description': item.get('description',''),
                'relationships': rels
            })
        
        prompt = f"""Given this context: '{context}'
        Choose the most relevant Wikidata concept from these candidates:
        {candidates}
        
        Consider both the direct descriptions and the semantic relationships (instance_of and subclass_of).
        Return only the QID of the most relevant concept."""
        
        return self.chat(prompt)

**AI Prompt**

Ok. How do we test this approach?

**AI Response**

Let's create a test case with the Titanic dataset's "Sex" field. Here's how we can test it:

```python
# Create instance with our existing chat
wdt = WikidataTools(chat)

# Test with context from Titanic dataset
context = """This is from the Titanic dataset's 'Sex' field, which indicates 
            the biological sex of passengers (male/female) for demographic analysis"""

# Try disambiguation
print("Testing disambiguation for 'sex' with Titanic context:")
result = wdt.disambiguate_concept('sex', context)
print(result)
```

Would you like to run this test and see how the LLM handles the disambiguation?

In [None]:
class WikidataTools:
    "Tools for working with Wikidata concepts"
    def __init__(self, chat=None):
        self.api_url = "https://www.wikidata.org/w/api.php"
        self.chat = chat
    
    def search(self, term:str) -> Dict:
        "Basic Wikidata search"
        params = {
            'action': 'wbsearchentities',
            'search': term,
            'language': 'en',
            'format': 'json'
        }
        try:
            r = httpx.get(self.api_url, params=params)
            r.raise_for_status()
            return r.json()
        except Exception as e: 
            return {'error': str(e)}

def test_search():
    "Test basic search functionality"
    wdt = WikidataTools()
    results = wdt.search('sex')
    print('\nFirst 3 matches:')
    for r in results.get('search',[])[:3]:
        print(f"- {r['id']}: {r['label']} ({r['description']})")
    return results

results = test_search()


First 3 matches:
- Q10525: Sextans (constellation on the celestial equator)
- Q44148: male organism (male animal or plant)
- Q290: biological sex (biological system that determines an individual's sexually reproductive function)


**AI Prompt**

Ok. This output looks good to me. What do you think?

**AI Response**

Yes, this is much better! We can see:

1. The search is working
2. We're getting clean, readable output
3. We found Q290 (biological sex) which looks like the concept we want for the Titanic dataset

Should we add the next piece - getting the relationships (P31 instance-of and P279 subclass-of) for a specific QID? This would help us verify if Q290 is really the concept we want by seeing its semantic relationships.

In [None]:
def test_relationships_debug():
    "Debug relationship lookup with separate property requests"
    wdt = WikidataTools()
    
    # Test P31 (instance of)
    params1 = {
        'action': 'wbgetclaims',
        'entity': 'Q290',
        'property': 'P31',
        'format': 'json'
    }
    print("\nInstance of (P31):")
    r1 = httpx.get(wdt.api_url, params=params1)
    print(r1.json())
    
    # Test P279 (subclass of)
    params2 = {
        'action': 'wbgetclaims',
        'entity': 'Q290',
        'property': 'P279',
        'format': 'json'
    }
    print("\nSubclass of (P279):")
    r2 = httpx.get(wdt.api_url, params=params2)
    print(r2.json())
    
    return r1.json(), r2.json()

debug_result = test_relationships_debug()


Instance of (P31):
{'claims': {'P31': [{'mainsnak': {'snaktype': 'value', 'property': 'P31', 'hash': '172b3a2690668ccb827bf24396c5df41912bd69b', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 44476427, 'id': 'Q44476427'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'id': 'Q290$77690adf-49ba-31da-36cd-b13d4ea4beea', 'rank': 'normal'}]}}

Subclass of (P279):
{'claims': {'P279': [{'mainsnak': {'snaktype': 'value', 'property': 'P279', 'hash': '8db0ff08fa9fd58fae46a472e32d2d737c19fc92', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 4897819, 'id': 'Q4897819'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}, 'type': 'statement', 'qualifiers': {'P13044': [{'snaktype': 'value', 'property': 'P13044', 'hash': 'e1ca117c8f96bf2e48b4f572a9e12564ce9395e3', 'datavalue': {'value': {'entity-type': 'item', 'numeric-id': 11990, 'id': 'Q11990'}, 'type': 'wikibase-entityid'}, 'datatype': 'wikibase-item'}]}, 'qualifiers-order': ['

**AI Prompt**

Ok. This is Returning some results.

**AI Response**

Yes! Now we're getting actual relationships:

1. Instance of (P31): Q44476427
2. Subclass of (P279): Q4897819

To make this more useful, we should probably:
1. Get the labels for these QIDs so we can understand what they mean
2. Update our WikidataTools class with this working version
3. Add a method to resolve QIDs to their labels

Would you like to try adding a method to get the labels for these related concepts? This would help us understand the semantic relationships better.

**AI Prompt**

Yes. Let's be able to build these semantic relationships. Remember we may want to think about this in terms of general tools that can be recursively called through a toolchain. I suspect that getting entities and properties and special properties like subclass of and instance of if they to building up the semantic signatures to go from strings->things.

**AI Response**

Excellent point! We should create modular tools that can be chained together to build semantic understanding. Let's start with some core tools:

```python
class WikidataTools:
    "Tools for working with Wikidata concepts"
    def __init__(self):
        self.api_url = "https://www.wikidata.org/w/api.php"
        
    def get_entity(self, qid:str) -> Dict:
        "Get basic entity information by QID"
        params = {
            'action': 'wbgetentities',
            'ids': qid,
            'languages': 'en',
            'format': 'json'
        }
        try:
            r = httpx.get(self.api_url, params=params)
            return r.json()
        except Exception as e: return {'error': str(e)}
    
    def get_property(self, pid:str) -> Dict:
        "Get property information"
        return self.get_entity(pid)
    
    def get_claims(self, qid:str, pid:str) -> Dict:
        "Get claims for a specific property on an entity"
        params = {
            'action': 'wbgetclaims',
            'entity': qid,
            'property': pid,
            'format': 'json'
        }
        try:
            r = httpx.get(self.api_url, params=params)
            return r.json()
        except Exception as e: return {'error': str(e)}

def test_semantic_chain():
    "Test following semantic relationships"
    wdt = WikidataTools()
    print("\nTesting with biological sex (Q290):")
    
    # Get instance-of relationships
    instances = wdt.get_claims('Q290', 'P31')
    print("\nInstance of (P31):")
    if 'claims' in instances:
        for claim in instances['claims'].get('P31', []):
            qid = claim['mainsnak']['datavalue']['value']['id']
            entity = wdt.get_entity(qid)
            print(f"- {qid}:", entity['entities'][qid]['labels']['en']['value'])
    
    return instances

test_semantic_chain()
```

This gives us building blocks we can use to:
1. Follow chains of relationships
2. Build semantic graphs
3. Resolve strings to concepts and their relationships

Would you like to try this approach and then build on it?

In [None]:
class WikidataTools:
    "Tools for working with Wikidata concepts"
    def __init__(self):
        self.api_url = "https://www.wikidata.org/w/api.php"
        
    def get_entity(self, qid:str) -> Dict:
        "Get basic entity information by QID"
        params = {
            'action': 'wbgetentities',
            'ids': qid,
            'languages': 'en',
            'format': 'json'
        }
        try:
            r = httpx.get(self.api_url, params=params)
            return r.json()
        except Exception as e: return {'error': str(e)}
    
    def get_property(self, pid:str) -> Dict:
        "Get property information"
        return self.get_entity(pid)
    
    def get_claims(self, qid:str, pid:str) -> Dict:
        "Get claims for a specific property on an entity"
        params = {
            'action': 'wbgetclaims',
            'entity': qid,
            'property': pid,
            'format': 'json'
        }
        try:
            r = httpx.get(self.api_url, params=params)
            return r.json()
        except Exception as e: return {'error': str(e)}

def test_semantic_chain():
    "Test following semantic relationships"
    wdt = WikidataTools()
    print("\nTesting with biological sex (Q290):")
    
    # Get instance-of relationships
    instances = wdt.get_claims('Q290', 'P31')
    print("\nInstance of (P31):")
    if 'claims' in instances:
        for claim in instances['claims'].get('P31', []):
            qid = claim['mainsnak']['datavalue']['value']['id']
            entity = wdt.get_entity(qid)
            print(f"- {qid}:", entity['entities'][qid]['labels']['en']['value'])
    
    return instances

test_semantic_chain()


Testing with biological sex (Q290):

Instance of (P31):
- Q44476427: medical attribute


{'claims': {'P31': [{'mainsnak': {'snaktype': 'value',
     'property': 'P31',
     'hash': '172b3a2690668ccb827bf24396c5df41912bd69b',
     'datavalue': {'value': {'entity-type': 'item',
       'numeric-id': 44476427,
       'id': 'Q44476427'},
      'type': 'wikibase-entityid'},
     'datatype': 'wikibase-item'},
    'type': 'statement',
    'id': 'Q290$77690adf-49ba-31da-36cd-b13d4ea4beea',
    'rank': 'normal'}]}}

**AI Prompt**

Ok. Did this do what we expected?

**AI Response**

We got some results, but it's only showing part of the picture. We found that biological sex (Q290) is an instance of "medical attribute" (Q44476427), but we should also:

1. Show the subclass-of relationships (P279) we saw earlier
2. Get descriptions along with labels for better context
3. Make the output more readable/structured

Should we enhance the test function to show a more complete semantic picture? We could add subclass relationships and more detailed entity information to help us better understand if this concept matches what we need for the Titanic dataset.

Would you like me to show how we might modify the test to include these enhancements?

**AI Prompt**

I like this. At some time we probaby want to impliment the evaluator-optimizer agent pattern?

**AI Response**

Yes! The evaluator-optimizer pattern would be perfect here. We could have:

1. An optimizer agent that:
   - Follows semantic relationships
   - Builds potential concept mappings
   - Suggests semantic enhancements

2. An evaluator agent that:
   - Validates the semantic accuracy
   - Checks if concepts match the dataset context
   - Provides feedback for refinement

Should we first complete our core WikidataTools with the enhanced relationship mapping, and then design the agent pattern around it? This would give us solid building blocks for the agents to work with.

**AI Prompt**

I think we will have a different agents that make up the dataset agent system. I want to prototype a wikidata agent that other agents can interact with. Thw wikidata agent should only be an expert on wikidata.

**AI Response**

Ah yes, that's a much better approach! Let's create a WikidataAgent that:

1. Is solely focused on being the Wikidata expert
2. Has our WikidataTools as its capabilities
3. Can respond to specific queries about concepts, properties, and relationships
4. Doesn't try to understand datasets or make semantic decisions itself

This way, other agents (like a DatasetAgent or SemanticMappingAgent) can consult with the WikidataAgent when they need Wikidata expertise.

Would you like to sketch out what this focused WikidataAgent might look like? We could start with its core interface and responsibilities.

**AI Prompt**

Yes. Let's sketch out this WikidataAgent including what we have learned thus far.

**AI Response**

Let's sketch out the WikidataAgent with its core responsibilities:

```python
class WikidataAgent:
    "Agent specialized in Wikidata knowledge and operations"
    def __init__(self, model=models[2]):
        self.tools = WikidataTools()
        self.chat = Chat(model, tools=[
            self.search_concept,
            self.get_entity_info,
            self.get_relationships,
            self.validate_qid
        ])
        
    def search_concept(self, term:str) -> Dict:
        "Search for a concept by term"
        return self.tools.search(term)
    
    def get_entity_info(self, qid:str) -> Dict:
        "Get detailed entity information"
        return self.tools.get_entity(qid)
    
    def get_relationships(self, qid:str) -> Dict:
        "Get instance-of and subclass-of relationships"
        p31 = self.tools.get_claims(qid, 'P31')
        p279 = self.tools.get_claims(qid, 'P279')
        return {'instance_of': p31, 'subclass_of': p279}
    
    def validate_qid(self, qid:str) -> bool:
        "Verify if a QID exists and is valid"
        result = self.tools.get_entity(qid)
        return 'entities' in result and qid in result['entities']
    
    def ask(self, question:str) -> str:
        "Interface for other agents to ask Wikidata-related questions"
        return self.chat(question)
```

This gives us a focused agent that:
1. Uses our existing tools
2. Provides clear interfaces for other agents
3. Stays within its Wikidata expertise domain

Would you like to test this initial version? We can then refine its capabilities based on what we learn.

**AI Prompt**

OK. Using the solve it Jeremy Howard approach, how should we proceed?

**AI Response**

Following Jeremy's iterative approach, we should:

1. Start Small and Test:
```python
# Create a simple test case
agent = WikidataAgent()
test_q = "What is the Wikidata concept for biological sex and what are its key relationships?"
print("Testing basic agent interaction:")
result = agent.ask(test_q)
print(result)
```

This will help us:
- See if the basic agent-tool interaction works
- Identify any immediate issues
- Understand what we need to improve

Would you like to run this initial test? Then we can iteratively improve based on what we observe.