Example usage of the OntologyRecommender service.

This notebook demonstrates:
1. Analyzing text to automatically recommend an ontology
2. Using the recommended ontology to extract triples
3. Automatic ontology recommendation when using SpindleExtractor without an ontology
4. Conservative ontology extension for new domains
5. Serialization and inspection of recommendations and extensions



In [None]:
import os
import json
from dotenv import load_dotenv
from spindle import (
    OntologyRecommender,
    SpindleExtractor,
    create_ontology,
    recommendation_to_dict,
    triples_to_dict,
    extension_to_dict,
    ontology_to_dict
)

# Load environment variables (API keys)
load_dotenv()

# Check if API key is set
if not os.getenv("ANTHROPIC_API_KEY"):
    print("Error: ANTHROPIC_API_KEY environment variable not set.")
    print("Please set it in a .env file or as an environment variable.")

In [2]:
print("=" * 70)
print("Ontology Recommender Example")
print("=" * 70)
print()

# Example text: Medical research abstract
medical_text = """
A recent clinical trial evaluated the efficacy of Medication A in treating 
patients with chronic migraines. The study, conducted at Massachusetts General 
Hospital, enrolled 250 patients aged 18-65 who experienced at least 8 migraine 
days per month. Dr. Sarah Chen, the principal investigator, led a team of 
neurologists who administered the drug over a 12-week period.

Results showed that Medication A reduced migraine frequency by an average of 
50% compared to the placebo group. Common side effects included nausea and 
dizziness, which affected approximately 15% of participants. The medication 
works by inhibiting CGRP receptors, which are known to play a role in migraine 
pathophysiology.

Dr. Chen reported the findings at the American Academy of Neurology conference 
in Seattle, where the research was well-received by the medical community. The 
FDA is expected to review the data for potential approval in the coming year. 
Massachusetts General Hospital has been a leading research institution in 
neurology for decades and continues to conduct groundbreaking studies in 
headache disorders.
"""

print("Step 1: Recommend ontology from medical text")
print("-" * 70)
print(f"Input text:\n{medical_text}\n")

Ontology Recommender Example

Step 1: Recommend ontology from medical text
----------------------------------------------------------------------
Input text:

A recent clinical trial evaluated the efficacy of Medication A in treating 
patients with chronic migraines. The study, conducted at Massachusetts General 
Hospital, enrolled 250 patients aged 18-65 who experienced at least 8 migraine 
days per month. Dr. Sarah Chen, the principal investigator, led a team of 
neurologists who administered the drug over a 12-week period.

Results showed that Medication A reduced migraine frequency by an average of 
50% compared to the placebo group. Common side effects included nausea and 
dizziness, which affected approximately 15% of participants. The medication 
works by inhibiting CGRP receptors, which are known to play a role in migraine 
pathophysiology.

Dr. Chen reported the findings at the American Academy of Neurology conference 
in Seattle, where the research was well-received by the me

In [3]:
# Create the recommender
recommender = OntologyRecommender()

# Get ontology recommendation
recommendation = recommender.recommend(
    text=medical_text,
    scope="balanced"  # "minimal", "balanced", or "comprehensive"
)

print("Text Purpose:")
print(f"  {recommendation.text_purpose}\n")

print("Recommended Entity Types:")
for i, entity_type in enumerate(recommendation.ontology.entity_types, 1):
    print(f"  {i}. {entity_type.name}: {entity_type.description}")
print()

print("Recommended Relation Types:")
for i, relation_type in enumerate(recommendation.ontology.relation_types, 1):
    print(f"  {i}. {relation_type.name}: {relation_type.description}")
    print(f"     ({relation_type.domain} → {relation_type.range})")
print()

print("Reasoning:")
print(f"  {recommendation.reasoning}\n")

2025-11-03T16:59:08.544 [BAML [92mINFO[0m] [35mFunction RecommendOntology[0m:
    [33mClient: CustomGPT5Mini (gpt-5-mini-2025-08-07) - 74827ms. StopReason: completed. Tokens(in/out): 2469/5483[0m
    [34m---PROMPT---[0m
    [2m[43msystem: [0m[2mYou are a knowledge graph ontology design expert. Your task is to analyze the provided text, understand its overarching purpose and domain, and recommend an appropriate ontology (entity types and relation types) that would be suitable for extracting knowledge from this and similar texts.
    [43muser: [0m[2mTEXT TO ANALYZE:
    
    A recent clinical trial evaluated the efficacy of Medication A in treating 
    patients with chronic migraines. The study, conducted at Massachusetts General 
    Hospital, enrolled 250 patients aged 18-65 who experienced at least 8 migraine 
    days per month. Dr. Sarah Chen, the principal investigator, led a team of 
    neurologists who administered the drug over a 12-week period.
    
    Results

In [5]:
# Step 2: Use the recommended ontology to extract triples
print("Step 2: Extract triples using recommended ontology")
print("-" * 70)

extractor = SpindleExtractor(recommendation.ontology)
extraction_result = extractor.extract(
    text=medical_text,
    source_name="Medical Research Abstract 2024",
    source_url="https://example.com/research/abstract-001"
)

print(f"Extracted {len(extraction_result.triples)} triples:\n")


Step 2: Extract triples using recommended ontology
----------------------------------------------------------------------
2025-11-03T16:59:32.623 [BAML [92mINFO[0m] [35mFunction ExtractTriples[0m:
    [33mClient: CustomHaiku (claude-3-5-haiku-20241022) - 19790ms. StopReason: end_turn. Tokens(in/out): 3491/1501[0m
    [34m---PROMPT---[0m
    [2m[43muser: [0m[2mYou are a knowledge graph extraction expert. Your task is to extract structured triples (subject-predicate-object) from the provided text, with rich entity metadata, custom attributes, and supporting evidence.ONTOLOGY:
    You must extract triples that conform to the following ontology:
    
    Valid Entity Types:
    - Person: An individual mentioned in the text (e.g., investigators, clinicians, presenters, patients when named).
      Custom Attributes:
        * name (string): Full name of the person.
        * role (string): Role or function in the context (e.g., principal investigator, neurologist, presenter).
   

In [6]:

for i, triple in enumerate(extraction_result.triples, 1):
    print(f"  {i}. ({triple.subject}) --[{triple.predicate}]--> ({triple.object})")
    print(f"     Evidence: {len(triple.supporting_spans)} span(s)")
    print(f"     Extraction datetime: {triple.extraction_datetime}")
    for j, span in enumerate(triple.supporting_spans, 1):
        print(f"       Span {j}: start={span.start}, end={span.end}, text=\"{span.text[:50]}...\"")
print()
    


  1. (name='Sarah Chen' type='Person' description='Principal investigator of a clinical trial studying medication for chronic migraines' custom_atts={'name': AttributeValue(value='Sarah Chen', type='string'), 'role': AttributeValue(value='Principal Investigator', type='string'), 'specialty': AttributeValue(value='Neurologist', type='string'), 'affiliation_name': AttributeValue(value='Massachusetts General Hospital', type='string')}) --[works_at]--> (name='Massachusetts General Hospital' type='Organization' description='A leading research institution in neurology that conducted a clinical trial on migraine medication' custom_atts={'name': AttributeValue(value='Massachusetts General Hospital', type='string'), 'type': AttributeValue(value='hospital', type='string'), 'location_name': AttributeValue(value=None, type='string'), 'research_focus': AttributeValue(value='Neurology and headache disorders', type='string')})
     Evidence: 1 span(s)
     Extraction datetime: 2025-11-03T22:59:32Z
  

In [8]:
# Step 3: Display extraction result with detailed span information
print("Step 3: Detailed extraction result with span indices")
print("-" * 70)

triples_dict = triples_to_dict(extraction_result.triples)
print(json.dumps(triples_dict, indent=2))

Step 3: Detailed extraction result with span indices
----------------------------------------------------------------------
[
  {
    "subject": {
      "name": "Sarah Chen",
      "type": "Person",
      "description": "Principal investigator of a clinical trial studying medication for chronic migraines",
      "custom_atts": {
        "name": {
          "value": "Sarah Chen",
          "type": "string"
        },
        "role": {
          "value": "Principal Investigator",
          "type": "string"
        },
        "specialty": {
          "value": "Neurologist",
          "type": "string"
        },
        "affiliation_name": {
          "value": "Massachusetts General Hospital",
          "type": "string"
        }
      }
    },
    "predicate": "works_at",
    "object": {
      "name": "Massachusetts General Hospital",
      "type": "Organization",
      "description": "A leading research institution in neurology that conducted a clinical trial on migraine medication",
   

## Part 1: Basic Ontology Recommendation

This section demonstrates the core functionality of recommending an ontology from text and using it for extraction.


In [9]:
# Step 4: Demonstrate serialization
print("Step 4: Serialize recommendation to JSON")
print("-" * 70)

recommendation_dict = recommendation_to_dict(recommendation)
print(json.dumps(recommendation_dict, indent=2))


Step 4: Serialize recommendation to JSON
----------------------------------------------------------------------
{
  "ontology": {
    "entity_types": [
      {
        "name": "Person",
        "description": "An individual mentioned in the text (e.g., investigators, clinicians, presenters, patients when named).",
        "attributes": [
          {
            "name": "name",
            "type": "string",
            "description": "Full name of the person."
          },
          {
            "name": "role",
            "type": "string",
            "description": "Role or function in the context (e.g., principal investigator, neurologist, presenter)."
          },
          {
            "name": "specialty",
            "type": "string",
            "description": "Professional specialty or occupation (e.g., neurologist)."
          },
          {
            "name": "affiliation_name",
            "type": "string",
            "description": "Name of the primary organization with 

## Part 2: Automatic Ontology Recommendation with SpindleExtractor

This section demonstrates how SpindleExtractor can automatically recommend an ontology when initialized without one.


In [None]:
print("=" * 70)
print("SpindleExtractor with Automatic Ontology Recommendation")
print("=" * 70)
print()

# Example text about a different domain (art/museum)
art_text = """
The Metropolitan Museum of Art acquired a rare 15th-century painting by 
Johannes Vermeer at Sotheby's auction in New York. The artwork, titled 
"Girl with a Pearl Earring Redux," was purchased for $45 million by the 
museum's curator Dr. Elizabeth Sterling. The painting will be displayed 
in the Dutch Masters gallery alongside other works from the Golden Age 
period. Art historian Professor Michael Chen from Yale University 
authenticated the piece and confirmed its provenance. The acquisition was 
funded by the Patterson Foundation's endowment for European art.
"""

print("Step 1: Create SpindleExtractor WITHOUT an ontology")
print("-" * 70)
print("When extract() is called, it will automatically recommend an ontology")
print("based on the text content.")
print()

# Create extractor WITHOUT providing an ontology
extractor_auto = SpindleExtractor()
print("✓ SpindleExtractor created without ontology")
print()


In [None]:
print("Step 2: Extract triples (ontology will be auto-recommended)")
print("-" * 70)
print(f"Input text:\n{art_text}\n")

# Call extract - this will automatically recommend an ontology first
result_auto = extractor_auto.extract(
    text=art_text,
    source_name="Art Museum News",
    source_url="https://example.com/museum/news/acquisition",
    ontology_scope="balanced"  # Optional: "minimal", "balanced", or "comprehensive"
)

print(f"✓ Ontology automatically recommended and applied!")
print()


In [None]:
print("Step 3: Inspect the auto-recommended ontology")
print("-" * 70)

print(f"Entity Types ({len(extractor_auto.ontology.entity_types)}):")
for i, et in enumerate(extractor_auto.ontology.entity_types, 1):
    print(f"  {i}. {et.name}: {et.description}")
print()

print(f"Relation Types ({len(extractor_auto.ontology.relation_types)}):")
for i, rt in enumerate(extractor_auto.ontology.relation_types, 1):
    print(f"  {i}. {rt.name}: {rt.description}")
    print(f"     ({rt.domain} → {rt.range})")
print()


In [None]:
print("Step 4: View extracted triples")
print("-" * 70)

print(f"Extracted {len(result_auto.triples)} triples:\n")
for i, triple in enumerate(result_auto.triples, 1):
    print(f"  {i}. ({triple.subject}) --[{triple.predicate}]--> ({triple.object})")
    print(f"     Source: {triple.source.source_name}")
    print(f"     Evidence snippets: {len(triple.supporting_spans)}")
    if triple.supporting_spans:
        snippet = triple.supporting_spans[0].text[:60] + "..."
        print(f"     \"{snippet}\"")
print()

print(f"Extraction Reasoning:\n  {result_auto.reasoning}\n")


In [None]:
print("Step 5: Subsequent extractions use the same ontology")
print("-" * 70)

art_text2 = """
The Louvre Museum in Paris announced that it will loan three Monet 
paintings to the National Gallery in London for a special exhibition. 
The curator Marie Dubois coordinated the arrangement with her British 
counterpart. These impressionist works will be featured alongside pieces 
from the Tate Modern's collection.
"""

print(f"Second text:\n{art_text2}\n")

# Subsequent calls will use the already-recommended ontology
result_auto2 = extractor_auto.extract(
    text=art_text2,
    source_name="International Art News",
    existing_triples=result_auto.triples  # Maintain entity consistency
)

print(f"✓ Extracted {len(result_auto2.triples)} more triples using same ontology")
print()

for i, triple in enumerate(result_auto2.triples, 1):
    print(f"  {i}. ({triple.subject}) --[{triple.predicate}]--> ({triple.object})")
print()


## Part 3: Conservative Ontology Extension

This section demonstrates how to extend an existing ontology conservatively when encountering new domains.


In [None]:
print("=" * 80)
print("Conservative Ontology Extension Example")
print("=" * 80)
print()

# Step 1: Start with a business-focused ontology
print("Step 1: Create Initial Business Ontology")
print("-" * 80)

entity_types = [
    {"name": "Person", "description": "An individual person"},
    {"name": "Organization", "description": "A company, institution, or business entity"},
    {"name": "Location", "description": "A geographic place or address"},
    {"name": "Product", "description": "A product or service offered by an organization"}
]

relation_types = [
    {
        "name": "works_at",
        "description": "Employment relationship between a person and organization",
        "domain": "Person",
        "range": "Organization"
    },
    {
        "name": "located_in",
        "description": "Physical location of an organization or person",
        "domain": "Organization",
        "range": "Location"
    },
    {
        "name": "develops",
        "description": "Relationship where an organization creates a product",
        "domain": "Organization",
        "range": "Product"
    },
    {
        "name": "founded",
        "description": "Founding relationship",
        "domain": "Person",
        "range": "Organization"
    }
]

initial_ontology = create_ontology(entity_types, relation_types)

print(f"Initial Ontology:")
print(f"  Entity Types: {[et.name for et in initial_ontology.entity_types]}")
print(f"  Relation Types: {[rt.name for rt in initial_ontology.relation_types]}")
print()


In [None]:
recommender_ext = OntologyRecommender()

# Step 2: Test with similar domain text (should NOT need extension)
print("Step 2: Analyze Text from Similar Domain")
print("-" * 80)

similar_text = """
TechVentures Inc., a venture capital firm based in San Jose, recently
invested in CloudScale, a cloud infrastructure startup. The investment
was led by Jennifer Martinez, managing partner at TechVentures. CloudScale
develops cloud optimization software and was founded by Tom Wilson in 2020.
"""

print(f"Text:\n{similar_text}\n")

extension1 = recommender_ext.analyze_extension(
    text=similar_text,
    current_ontology=initial_ontology,
    scope="balanced"
)

print(f"Extension Needed: {extension1.needs_extension}")
if extension1.needs_extension:
    print(f"New Types:")
    print(f"  Entities: {[et.name for et in extension1.new_entity_types]}")
    print(f"  Relations: {[rt.name for rt in extension1.new_relation_types]}")
    print(f"\nCritical Information at Risk:")
    print(f"  {extension1.critical_information_at_risk}")
print(f"\nReasoning:")
print(f"  {extension1.reasoning}")
print()


In [None]:
# Step 3: Test with text from different domain (MIGHT need extension)
print("Step 3: Analyze Text from Medical Domain")
print("-" * 80)

medical_text_ext = """
Dr. Sarah Chen, a cardiologist at Stanford Medical Center, recently published
research on the efficacy of Medication Beta in treating hypertension. The
clinical trial involved 500 patients and showed significant improvement in
blood pressure control. The study was funded by the National Institutes of
Health and published in the Journal of Cardiology.
"""

print(f"Text:\n{medical_text_ext}\n")

extension2 = recommender_ext.analyze_extension(
    text=medical_text_ext,
    current_ontology=initial_ontology,
    scope="balanced"
)

print(f"Extension Needed: {extension2.needs_extension}")
if extension2.needs_extension:
    print(f"\nNew Entity Types:")
    for et in extension2.new_entity_types:
        print(f"  - {et.name}: {et.description}")
    print(f"\nNew Relation Types:")
    for rt in extension2.new_relation_types:
        print(f"  - {rt.name}: {rt.description}")
        print(f"    ({rt.domain} → {rt.range})")
    print(f"\nCritical Information at Risk:")
    print(f"  {extension2.critical_information_at_risk}")
print(f"\nReasoning:")
print(f"  {extension2.reasoning}")
print()


In [None]:
# Step 4: Apply extension if needed
if extension2.needs_extension:
    print("Step 4: Apply Extension to Ontology")
    print("-" * 80)
    
    extended_ontology = recommender_ext.extend_ontology(initial_ontology, extension2)
    
    print(f"Original Ontology:")
    print(f"  {len(initial_ontology.entity_types)} entity types, "
          f"{len(initial_ontology.relation_types)} relation types")
    print(f"\nExtended Ontology:")
    print(f"  {len(extended_ontology.entity_types)} entity types, "
          f"{len(extended_ontology.relation_types)} relation types")
    print(f"\nAll Entity Types: {[et.name for et in extended_ontology.entity_types]}")
    print(f"All Relation Types: {[rt.name for rt in extended_ontology.relation_types]}")
    print()
    
    # Now extract with the extended ontology
    print("Step 5: Extract Triples with Extended Ontology")
    print("-" * 80)
    
    extractor_ext = SpindleExtractor(extended_ontology)
    result_ext = extractor_ext.extract(medical_text_ext, source_name="Medical Research")
    
    print(f"Extracted {len(result_ext.triples)} triples:\n")
    for i, triple in enumerate(result_ext.triples, 1):
        print(f"  {i}. ({triple.subject}) --[{triple.predicate}]--> ({triple.object})")
    print()


In [None]:
# Step 6: Demonstrate the analyze_and_extend convenience method
print("Step 6: Using Convenience Method (analyze_and_extend)")
print("-" * 80)

tech_policy_text = """
The Federal Trade Commission, led by Commissioner Lisa Park, announced new
regulations regarding data privacy for social media platforms. The regulation
requires companies to obtain explicit consent before collecting user data.
Meta and Google have six months to comply with the new requirements.
"""

print(f"Text:\n{tech_policy_text}\n")

# Use convenience method with auto_apply
extension3, maybe_new_ontology = recommender_ext.analyze_and_extend(
    text=tech_policy_text,
    current_ontology=initial_ontology,
    scope="balanced",
    auto_apply=True
)

print(f"Extension Needed: {extension3.needs_extension}")

if maybe_new_ontology:
    print(f"\nAutomatically Extended Ontology:")
    print(f"  {len(maybe_new_ontology.entity_types)} entity types, "
          f"{len(maybe_new_ontology.relation_types)} relation types")
    print(f"\nNew Types Added:")
    print(f"  Entities: {[et.name for et in extension3.new_entity_types]}")
    print(f"  Relations: {[rt.name for rt in extension3.new_relation_types]}")
    
    # Extract with new ontology
    extractor_ext2 = SpindleExtractor(maybe_new_ontology)
    result_ext2 = extractor_ext2.extract(tech_policy_text, source_name="Policy News")
    print(f"\nExtracted {len(result_ext2.triples)} triples with extended ontology")
else:
    print(f"\nNo extension needed - using original ontology")
    print(f"Reasoning: {extension3.reasoning}")

print()


In [None]:
# Step 7: Serialize extension analysis
print("Step 7: Serialize Extension Analysis")
print("-" * 80)

if extension2.needs_extension:
    extension_dict = extension_to_dict(extension2)
    print("Extension Analysis as JSON:")
    print(json.dumps(extension_dict, indent=2))


## Summary

This notebook demonstrated:

**Part 1: Basic Ontology Recommendation**
- ✓ Analyzing text to automatically recommend an ontology
- ✓ Using the recommended ontology to extract triples
- ✓ Serializing recommendations to JSON

**Part 2: Automatic Ontology Recommendation**
- ✓ SpindleExtractor initialized without ontology
- ✓ Ontology automatically recommended from text on first extract()
- ✓ Auto-recommended ontology used for all subsequent extractions
- ✓ Principled ontology design with scope levels (minimal/balanced/comprehensive)
- ✓ Maintains entity consistency across multiple texts

**Part 3: Conservative Ontology Extension**
- ✓ Conservative extension - only when critical information at risk
- ✓ Analysis explains WHY extension is/isn't needed
- ✓ Extensions are backward-compatible additions
- ✓ Original ontology unchanged - new ontology is created
- ✓ Existing types preferred over creating new ones
- ✓ Ontology evolves deliberately, not with every new text
