# MOPs Synthesis Q&A System
## Interactive Knowledge Base Query Interface

This notebook allows you to ask questions about Metal-Organic Polyhedra (MOPs) synthesis and get human-readable answers using SPARQL queries on the knowledge graph.

**Examples of questions you can answer:**
- What is the recipe for synthesis of MOP-14?
- What chemical building units are used in MOP-18?
- What temperatures are used in the synthesis procedures?
- What solvents are commonly used?
- What are the synthesis yields?
- And many more...


## Setup and Imports


In [None]:
import json
import sys
from pathlib import Path
from typing import Dict, List, Any, Optional
from rdflib import Graph, Namespace, URIRef, RDF, RDFS, Literal
from IPython.display import display, HTML, Markdown
import pandas as pd

# Add project root to path
repo_root = Path.cwd().parent.parent
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

print("‚úì Imports successful")


## Load Knowledge Graph


In [None]:
# Select a hash to work with (you can change this)
HASH = "0c57bac8"  # Change this to explore different papers

# Load the merged TTL file
ttl_path = repo_root / "evaluation" / "data" / "merged_tll" / HASH / f"{HASH}.ttl"

if not ttl_path.exists():
    print(f"‚ö† TTL file not found at {ttl_path}")
    print(f"Available hashes:")
    merged_dir = repo_root / "evaluation" / "data" / "merged_tll"
    if merged_dir.exists():
        for d in sorted(merged_dir.iterdir()):
            if d.is_dir():
                print(f"  - {d.name}")
else:
    print(f"Loading knowledge graph from: {ttl_path}")
    graph = Graph()
    graph.parse(str(ttl_path), format="turtle")
    print(f"‚úì Loaded {len(graph)} triples")
    
    # Define namespaces
    ONTOMOPS = Namespace("https://www.theworldavatar.com/kg/ontomops/")
    ONTOSYN = Namespace("https://www.theworldavatar.com/kg/OntoSyn/")
    ONTOSPECIES = Namespace("http://www.theworldavatar.com/ontology/ontospecies/OntoSpecies.owl#")
    OM2 = Namespace("http://www.ontology-of-units-of-measure.org/resource/om-2/")
    
    print(f"‚úì Namespaces configured")


## Helper Functions for Human-Readable Output


In [None]:
def display_answer(question: str, answer: Any, details: Optional[str] = None):
    """Display question and answer in a nice format"""
    html = f"""
    <div style="border: 2px solid #2196F3; border-radius: 8px; padding: 15px; margin: 10px 0; background-color: #f5f5f5;">
        <h3 style="color: #2196F3; margin-top: 0;">‚ùì {question}</h3>
        <div style="background-color: white; padding: 10px; border-radius: 5px; margin-top: 10px;">
            <strong>Answer:</strong> {answer}
        </div>
    """
    if details:
        html += f"""
        <div style="margin-top: 10px; padding: 10px; background-color: #e3f2fd; border-radius: 5px;">
            <strong>Details:</strong><br>{details}
        </div>
        """
    html += "</div>"
    display(HTML(html))

def format_list(items: List[str], max_items: int = 10) -> str:
    """Format a list for display"""
    if not items:
        return "None found"
    if len(items) <= max_items:
        return "<br>".join([f"‚Ä¢ {item}" for item in items])
    else:
        shown = items[:max_items]
        return "<br>\n".join([f"‚Ä¢ {item}" for item in shown]) + f"<br>... and {len(items) - max_items} more"

def format_table(data: List[Dict], headers: List[str]) -> str:
    """Format data as HTML table"""
    if not data:
        return "No data found"
    df = pd.DataFrame(data)
    return df.to_html(index=False, escape=False)

print("‚úì Helper functions loaded")


# Question-Answer System

Below are **many questions** that can be answered using SPARQL queries on the knowledge graph. Each cell demonstrates a different type of query about MOPs synthesis.


## 1. General MOPs Information


In [None]:
# Q1: How many MOPs are described in this paper?
query = """
PREFIX ontomops: <https://www.theworldavatar.com/kg/ontomops/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT (COUNT(DISTINCT ?mop) as ?count)
WHERE {
    ?mop a ontomops:MetalOrganicPolyhedron .
}
"""
results = list(graph.query(query))
count = results[0][0] if results else 0
display_answer(
    "How many MOPs are described in this paper?",
    f"<strong style='font-size: 24px; color: #4CAF50;'>{count} MOPs</strong>"
)


In [None]:
# Q2: What are the names and CCDC numbers of all MOPs?
query = """
PREFIX ontomops: <https://www.theworldavatar.com/kg/ontomops/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?mopLabel ?ccdcNumber ?mopFormula
WHERE {
    ?mop a ontomops:MetalOrganicPolyhedron .
    ?mop rdfs:label ?mopLabel .
    OPTIONAL { ?mop ontomops:hasCCDCNumber ?ccdcNumber }
    OPTIONAL { ?mop ontomops:hasMOPFormula ?mopFormula }
}
ORDER BY ?mopLabel
"""
results = list(graph.query(query))
mops_data = []
for row in results:
    mops_data.append({
        "MOP Name": str(row.mopLabel) if row.mopLabel else "N/A",
        "CCDC Number": str(row.ccdcNumber) if row.ccdcNumber else "N/A",
        "Formula": str(row.mopFormula) if row.mopFormula else "N/A"
    })

if mops_data:
    display_answer(
        "What are all the MOPs in this paper?",
        format_table(mops_data, ["MOP Name", "CCDC Number", "Formula"])
    )
else:
    display_answer("What are all the MOPs in this paper?", "No MOPs found")


## 2. Chemical Building Units (CBUs)


In [None]:
# Q3: What are the chemical building units for each MOP?
query = """
PREFIX ontomops: <https://www.theworldavatar.com/kg/ontomops/>
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?mopLabel ?ccdcNumber ?cbuFormula ?cbuName ?altName
WHERE {
    ?mop a ontomops:MetalOrganicPolyhedron .
    ?mop rdfs:label ?mopLabel .
    ?mop ontomops:hasCCDCNumber ?ccdcNumber .
    ?mop ontomops:hasChemicalBuildingUnit ?cbu .
    OPTIONAL { ?cbu ontomops:hasCBUFormula ?cbuFormula }
    OPTIONAL { ?cbu rdfs:label ?cbuName }
    OPTIONAL { ?cbu ontosyn:hasAlternativeNames ?altName }
}
ORDER BY ?mopLabel ?cbuFormula
"""
results = list(graph.query(query))

# Group by MOP
mop_cbus = {}
for row in results:
    mop = str(row.mopLabel) if row.mopLabel else "Unknown"
    ccdc = str(row.ccdcNumber) if row.ccdcNumber else "N/A"
    formula = str(row.cbuFormula) if row.cbuFormula else "N/A"
    name = str(row.cbuName) if row.cbuName else "N/A"
    alt = str(row.altName) if row.altName else None
    
    key = f"{mop} (CCDC: {ccdc})"
    if key not in mop_cbus:
        mop_cbus[key] = []
    
    cbu_desc = f"<strong>{formula}</strong>"
    if name != "N/A" and name != formula:
        cbu_desc += f" - {name}"
    if alt:
        cbu_desc += f" (also known as: {alt})"
    
    if cbu_desc not in mop_cbus[key]:
        mop_cbus[key].append(cbu_desc)

answer_html = ""
for mop, cbus in mop_cbus.items():
    answer_html += f"<div style='margin-bottom: 15px;'><strong style='color: #FF5722;'>{mop}:</strong><br>{format_list(cbus, max_items=20)}</div>"

if answer_html:
    display_answer(
        "What are the chemical building units (CBUs) for each MOP?",
        answer_html
    )
else:
    display_answer("What are the chemical building units for each MOP?", "No CBU information found")


## 3. Synthesis Procedures and Recipes


In [None]:
# Q4: How many synthesis procedures are described?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>

SELECT (COUNT(DISTINCT ?synthesis) as ?count)
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
}
"""
results = list(graph.query(query))
count = results[0][0] if results else 0
display_answer(
    "How many synthesis procedures are described?",
    f"<strong style='font-size: 24px; color: #4CAF50;'>{count} synthesis procedures</strong>"
)


In [None]:
# Q5: What is the complete recipe for each synthesis?
# This is like asking "What is the recipe for synthesis of MOP-14?" for all MOPs
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?synthesis ?synthesisLabel ?chemical ?chemicalLabel ?amount ?formula
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasChemicalInput ?chemical .
    OPTIONAL { ?chemical rdfs:label ?chemicalLabel }
    OPTIONAL { ?chemical ontosyn:hasAmount ?amount }
    OPTIONAL { ?chemical ontosyn:hasChemicalFormula ?formula }
}
ORDER BY ?synthesisLabel ?chemicalLabel
"""
results = list(graph.query(query))

# Group by synthesis
synthesis_recipes = {}
for row in results:
    synth_label = str(row.synthesisLabel) if row.synthesisLabel else "Unknown"
    chem_label = str(row.chemicalLabel) if row.chemicalLabel else "Unknown chemical"
    amount = str(row.amount) if row.amount else "amount not specified"
    formula = str(row.formula) if row.formula else None
    
    if synth_label not in synthesis_recipes:
        synthesis_recipes[synth_label] = []
    
    ingredient = f"<strong>{chem_label}</strong>: {amount}"
    if formula:
        ingredient += f" (formula: {formula})"
    
    if ingredient not in synthesis_recipes[synth_label]:
        synthesis_recipes[synth_label].append(ingredient)

answer_html = ""
for synth, ingredients in synthesis_recipes.items():
    answer_html += f"<div style='border: 1px solid #ddd; padding: 12px; margin-bottom: 15px; border-radius: 5px; background-color: #fafafa;'>"
    answer_html += f"<h4 style='color: #FF5722; margin-top: 0;'>üìã {synth}</h4>"
    answer_html += f"<div style='margin-left: 15px;'>{format_list(ingredients, max_items=50)}</div>"
    answer_html += f"</div>"

if answer_html:
    display_answer(
        "What is the complete recipe for each MOP synthesis?",
        answer_html
    )
else:
    display_answer("What is the complete recipe for each MOP synthesis?", "No recipe information found")


## 4. Chemical Inputs and Suppliers


In [None]:
# Q6: What chemicals are used across all syntheses?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?chemicalLabel (COUNT(?synthesis) as ?usageCount)
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis ontosyn:hasChemicalInput ?chemical .
    ?chemical rdfs:label ?chemicalLabel .
}
GROUP BY ?chemicalLabel
ORDER BY DESC(?usageCount) ?chemicalLabel
"""
results = list(graph.query(query))
chemicals_data = []
for row in results:
    chemicals_data.append({
        "Chemical": str(row.chemicalLabel),
        "Used in # Syntheses": int(row.usageCount)
    })

if chemicals_data:
    display_answer(
        "What chemicals are used across all syntheses?",
        format_table(chemicals_data, ["Chemical", "Used in # Syntheses"])
    )
else:
    display_answer("What chemicals are used across all syntheses?", "No chemical information found")


In [None]:
# Q7: What are the chemical suppliers mentioned?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?supplierName (COUNT(?chemical) as ?chemicalCount)
WHERE {
    ?chemical ontosyn:isSuppliedBy ?supplier .
    ?supplier rdfs:label ?supplierName .
}
GROUP BY ?supplierName
ORDER BY DESC(?chemicalCount)
"""
results = list(graph.query(query))
suppliers_data = []
for row in results:
    if row.supplierName:
        suppliers_data.append({
            "Supplier": str(row.supplierName),
            "# Chemicals Supplied": int(row.chemicalCount)
        })

if suppliers_data:
    display_answer(
        "What chemical suppliers are mentioned?",
        format_table(suppliers_data, ["Supplier", "# Chemicals Supplied"])
    )
else:
    display_answer("What chemical suppliers are mentioned?", "No supplier information found")


## 5. Synthesis Steps and Procedures


In [None]:
# Q8: How many synthesis steps are there in total?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>

SELECT (COUNT(DISTINCT ?step) as ?count)
WHERE {
    ?synthesis ontosyn:hasSynthesisStep ?step .
}
"""
results = list(graph.query(query))
count = results[0][0] if results else 0
display_answer(
    "How many synthesis steps are there in total?",
    f"<strong style='font-size: 24px; color: #4CAF50;'>{count} steps</strong>"
)


In [None]:
# Q9: What types of synthesis steps are used?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?stepType (COUNT(?step) as ?count)
WHERE {
    ?synthesis ontosyn:hasSynthesisStep ?step .
    ?step rdf:type ?stepType .
    FILTER(STRSTARTS(STR(?stepType), "https://www.theworldavatar.com/kg/OntoSyn/"))
}
GROUP BY ?stepType
ORDER BY DESC(?count)
"""
results = list(graph.query(query))
step_types_data = []
for row in results:
    step_type_full = str(row.stepType)
    step_type = step_type_full.split('/')[-1]  # Get last part of URI
    if step_type != "SynthesisStep":  # Skip generic type
        step_types_data.append({
            "Step Type": step_type,
            "Count": int(row.count)
        })

if step_types_data:
    display_answer(
        "What types of synthesis steps are used?",
        format_table(step_types_data, ["Step Type", "Count"])
    )
else:
    display_answer("What types of synthesis steps are used?", "No step type information found")


## 6. Temperature and Duration Conditions


In [None]:
# Q10: What are the typical temperatures used in synthesis?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX om-2: <http://www.ontology-of-units-of-measure.org/resource/om-2/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?tempValue ?tempUnit ?synthesisLabel ?stepLabel
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasSynthesisStep ?step .
    ?step rdfs:label ?stepLabel .
    {
        ?step ontosyn:hasTargetTemperature ?temp .
    } UNION {
        ?step ontosyn:hasCrystallizationTargetTemperature ?temp .
    }
    ?temp om-2:hasNumericalValue ?tempValue .
    OPTIONAL { ?temp om-2:hasUnit ?tempUnit }
}
ORDER BY ?tempValue
"""
results = list(graph.query(query))
temps_data = []
for row in results:
    value = float(row.tempValue) if row.tempValue else 0
    unit = str(row.tempUnit).split('/')[-1] if row.tempUnit else "unknown"
    synth = str(row.synthesisLabel) if row.synthesisLabel else "Unknown"
    step = str(row.stepLabel) if row.stepLabel else "Unknown step"
    temps_data.append({
        "Synthesis": synth,
        "Step": step,
        "Temperature": f"{value} {unit}"
    })

if temps_data:
    display_answer(
        "What temperatures are used in synthesis steps?",
        format_table(temps_data, ["Synthesis", "Step", "Temperature"])
    )
else:
    display_answer("What temperatures are used in synthesis steps?", "No temperature information found")


In [None]:
# Q11: What are typical reaction durations?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX om-2: <http://www.ontology-of-units-of-measure.org/resource/om-2/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?durationValue ?durationUnit ?stepLabel ?synthesisLabel
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasSynthesisStep ?step .
    ?step ontosyn:hasStepDuration ?duration .
    ?step rdfs:label ?stepLabel .
    ?duration om-2:hasNumericalValue ?durationValue .
    OPTIONAL { ?duration om-2:hasUnit ?durationUnit }
}
ORDER BY ?durationValue
"""
results = list(graph.query(query))
durations_data = []
for row in results:
    value = float(row.durationValue) if row.durationValue else 0
    unit = str(row.durationUnit).split('/')[-1] if row.durationUnit else "unknown"
    step = str(row.stepLabel) if row.stepLabel else "Unknown step"
    synth = str(row.synthesisLabel) if row.synthesisLabel else "Unknown"
    durations_data.append({
        "Synthesis": synth,
        "Step": step,
        "Duration": f"{value} {unit}"
    })

if durations_data:
    display_answer(
        "What are typical reaction durations?",
        format_table(durations_data, ["Synthesis", "Step", "Duration"])
    )
else:
    display_answer("What are typical reaction durations?", "No duration information found")


## 7. Equipment and Vessels


In [None]:
# Q12: What vessels/containers are used in synthesis?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?vesselName ?vesselTypeLabel (COUNT(?step) as ?usageCount)
WHERE {
    ?step ontosyn:hasVessel ?vessel .
    OPTIONAL { ?vessel rdfs:label ?vesselName }
    OPTIONAL { 
        ?vessel ontosyn:hasVesselType ?vesselType .
        ?vesselType rdfs:label ?vesselTypeLabel
    }
}
GROUP BY ?vesselName ?vesselTypeLabel
ORDER BY DESC(?usageCount)
"""
results = list(graph.query(query))
vessels_data = []
for row in results:
    name = str(row.vesselName) if row.vesselName else "Unnamed vessel"
    vtype = str(row.vesselTypeLabel) if row.vesselTypeLabel else "Unknown type"
    count = int(row.usageCount)
    vessels_data.append({
        "Vessel": name,
        "Type": vtype,
        "Used in Steps": count
    })

if vessels_data:
    display_answer(
        "What vessels/containers are used in synthesis?",
        format_table(vessels_data, ["Vessel", "Type", "Used in Steps"])
    )
else:
    display_answer("What vessels/containers are used in synthesis?", "No vessel information found")


In [None]:
# Q13: What heating/cooling devices are mentioned?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?deviceLabel (COUNT(?step) as ?usageCount)
WHERE {
    ?step ontosyn:hasHeatChillDevice ?device .
    ?device rdfs:label ?deviceLabel .
}
GROUP BY ?deviceLabel
ORDER BY DESC(?usageCount)
"""
results = list(graph.query(query))
devices_data = []
for row in results:
    if row.deviceLabel:
        devices_data.append({
            "Device": str(row.deviceLabel),
            "Times Used": int(row.usageCount)
        })

if devices_data:
    display_answer(
        "What heating/cooling devices are mentioned?",
        format_table(devices_data, ["Device", "Times Used"])
    )
else:
    display_answer("What heating/cooling devices are mentioned?", "No device information found")


## 8. Yields and Products


In [None]:
# Q14: What are the synthesis yields?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX om-2: <http://www.ontology-of-units-of-measure.org/resource/om-2/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?synthesisLabel ?yieldValue ?outputLabel
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasChemicalOutput ?output .
    OPTIONAL { ?output rdfs:label ?outputLabel }
    ?output ontosyn:hasYield ?yield .
    ?yield om-2:hasNumericalValue ?yieldValue .
}
ORDER BY DESC(?yieldValue)
"""
results = list(graph.query(query))
yields_data = []
for row in results:
    yields_data.append({
        "Synthesis": str(row.synthesisLabel) if row.synthesisLabel else "Unknown",
        "Product": str(row.outputLabel) if row.outputLabel else "Unknown",
        "Yield": f"{float(row.yieldValue):.1f}%" if row.yieldValue else "N/A"
    })

if yields_data:
    display_answer(
        "What are the synthesis yields?",
        format_table(yields_data, ["Synthesis", "Product", "Yield"])
    )
else:
    display_answer("What are the synthesis yields?", "No yield information found")


## 9. Solvents and Atmosphere


In [None]:
# Q15: What solvents are used in synthesis?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?solventLabel (COUNT(?step) as ?usageCount)
WHERE {
    ?step ontosyn:hasSolventDissolve ?solvent .
    ?solvent rdfs:label ?solventLabel .
}
GROUP BY ?solventLabel
ORDER BY DESC(?usageCount)
"""
results = list(graph.query(query))
solvents_data = []
for row in results:
    if row.solventLabel:
        solvents_data.append({
            "Solvent": str(row.solventLabel),
            "Times Used": int(row.usageCount)
        })

if solvents_data:
    display_answer(
        "What solvents are used in synthesis?",
        format_table(solvents_data, ["Solvent", "Times Used"])
    )
else:
    display_answer("What solvents are used in synthesis?", "No solvent information found")


In [None]:
# Q16: What atmospheres/environments are used?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?environmentLabel (COUNT(?step) as ?usageCount)
WHERE {
    ?step ontosyn:hasVesselEnvironment ?environment .
    ?environment rdfs:label ?environmentLabel .
}
GROUP BY ?environmentLabel
ORDER BY DESC(?usageCount)
"""
results = list(graph.query(query))
environments_data = []
for row in results:
    if row.environmentLabel:
        environments_data.append({
            "Environment/Atmosphere": str(row.environmentLabel),
            "Number of Steps": int(row.usageCount)
        })

if environments_data:
    display_answer(
        "What atmospheres/environments are used in reactions?",
        format_table(environments_data, ["Environment/Atmosphere", "Number of Steps"])
    )
else:
    display_answer("What atmospheres/environments are used in reactions?", "No environment information found")


## 10. Advanced Synthesis Details


In [None]:
# Q17: Which steps involve stirring?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?stepLabel ?synthesisLabel
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasSynthesisStep ?step .
    ?step ontosyn:isStirred ?isStirred .
    ?step rdfs:label ?stepLabel .
    FILTER(?isStirred = true)
}
"""
results = list(graph.query(query))
stirred_data = []
for row in results:
    if row.stepLabel:
        stirred_data.append({
            "Synthesis": str(row.synthesisLabel) if row.synthesisLabel else "Unknown",
            "Step": str(row.stepLabel)
        })

if stirred_data:
    display_answer(
        f"Which steps involve stirring? ({len(stirred_data)} steps)",
        format_table(stirred_data, ["Synthesis", "Step"])
    )
else:
    display_answer("Which steps involve stirring?", "No stirring information found")


In [None]:
# Q18: Which steps use sealed vessels?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?stepLabel ?synthesisLabel
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasSynthesisStep ?step .
    ?step ontosyn:isSealed ?isSealed .
    ?step rdfs:label ?stepLabel .
    FILTER(?isSealed = true)
}
"""
results = list(graph.query(query))
sealed_data = []
for row in results:
    if row.stepLabel:
        sealed_data.append({
            "Synthesis": str(row.synthesisLabel) if row.synthesisLabel else "Unknown",
            "Step": str(row.stepLabel)
        })

if sealed_data:
    display_answer(
        f"Which steps use sealed vessels? ({len(sealed_data)} steps)",
        format_table(sealed_data, ["Synthesis", "Step"])
    )
else:
    display_answer("Which steps use sealed vessels?", "No sealed vessel information found")


In [None]:
# Q19: Which steps involve vacuum?
query = """
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?stepLabel ?stepType ?synthesisLabel
WHERE {
    ?synthesis a ontosyn:ChemicalSynthesis .
    ?synthesis rdfs:label ?synthesisLabel .
    ?synthesis ontosyn:hasSynthesisStep ?step .
    ?step rdfs:label ?stepLabel .
    ?step rdf:type ?stepType .
    {
        ?step ontosyn:hasVacuum ?hasVacuum .
        FILTER(?hasVacuum = true)
    }
    UNION
    {
        ?step ontosyn:isVacuumFiltration ?isVacuumFilt .
        FILTER(?isVacuumFilt = true)
    }
}
"""
results = list(graph.query(query))
vacuum_data = []
for row in results:
    step_label = str(row.stepLabel) if row.stepLabel else "Unknown"
    step_type = str(row.stepType).split('/')[-1] if row.stepType else "Unknown"
    synth = str(row.synthesisLabel) if row.synthesisLabel else "Unknown"
    vacuum_data.append({
        "Synthesis": synth,
        "Step": step_label,
        "Type": step_type
    })

if vacuum_data:
    display_answer(
        f"Which steps involve vacuum? ({len(vacuum_data)} steps)",
        format_table(vacuum_data, ["Synthesis", "Step", "Type"])
    )
else:
    display_answer("Which steps involve vacuum?", "No vacuum information found")


## 11. Characterization Methods


In [None]:
# Q20: What characterization methods are used?
query = """
PREFIX ontospecies: <http://www.theworldavatar.com/ontology/ontospecies/OntoSpecies.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?method
WHERE {
    ?species ontospecies:hasCharacterizationSession ?session .
    {
        ?session ontospecies:hasHNMRDevice ?device .
        BIND("1H NMR Spectroscopy" as ?method)
    }
    UNION
    {
        ?session ontospecies:hasElementalAnalysisDevice ?device .
        BIND("Elemental Analysis" as ?method)
    }
    UNION
    {
        ?session ontospecies:hasInfraredSpectroscopyDevice ?device .
        BIND("Infrared Spectroscopy" as ?method)
    }
}
"""
results = list(graph.query(query))
methods = list(set([str(row.method) for row in results if row.method]))

if methods:
    methods_html = "<br>".join([f"‚úì <strong>{m}</strong>" for m in sorted(methods)])
    display_answer(
        "What characterization methods are used?",
        methods_html
    )
else:
    display_answer("What characterization methods are used?", "No characterization method information found")


In [None]:
# Q21: What are the molecular formulas of synthesized products?
query = """
PREFIX ontospecies: <http://www.theworldavatar.com/ontology/ontospecies/OntoSpecies.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?speciesLabel ?formulaValue ?ccdcValue
WHERE {
    ?species a ontospecies:Species .
    ?species rdfs:label ?speciesLabel .
    OPTIONAL {
        ?species ontospecies:hasMolecularFormula ?formula .
        {
            ?formula ontospecies:hasMolecularFormulaValue ?formulaValue
        } UNION {
            ?formula rdfs:label ?formulaValue
        }
    }
    OPTIONAL {
        ?species ontospecies:hasCCDCNumber ?ccdc .
        ?ccdc ontospecies:hasCCDCNumberValue ?ccdcValue
    }
}
ORDER BY ?speciesLabel
"""
results = list(graph.query(query))
products_data = []
for row in results:
    products_data.append({
        "Product": str(row.speciesLabel) if row.speciesLabel else "Unknown",
        "Molecular Formula": str(row.formulaValue) if row.formulaValue else "N/A",
        "CCDC Number": str(row.ccdcValue) if row.ccdcValue else "N/A"
    })

if products_data:
    display_answer(
        "What are the molecular formulas of synthesized products?",
        format_table(products_data, ["Product", "Molecular Formula", "CCDC Number"])
    )
else:
    display_answer("What are the molecular formulas of synthesized products?", "No product formula information found")


## 12. Custom Queries

Use the cell below to write your own SPARQL queries to answer custom questions!


In [None]:
# Custom query example - modify this to answer your own questions!
# Example: Find all MOPs with their associated synthesis names

custom_query = """
PREFIX ontomops: <https://www.theworldavatar.com/kg/ontomops/>
PREFIX ontosyn: <https://www.theworldavatar.com/kg/OntoSyn/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?mopLabel ?ccdcNumber ?synthesisLabel
WHERE {
    ?mop a ontomops:MetalOrganicPolyhedron .
    ?mop rdfs:label ?mopLabel .
    OPTIONAL { ?mop ontomops:hasCCDCNumber ?ccdcNumber }
    OPTIONAL {
        ?synthesis ontosyn:hasChemicalOutput ?output .
        ?output ontosyn:isRepresentedBy ?mop .
        ?synthesis rdfs:label ?synthesisLabel .
    }
}
ORDER BY ?mopLabel
"""

print("üîç Running custom query...")
results = list(graph.query(custom_query))
print(f"‚úì Found {len(results)} results\n")

# Display results in a table
custom_data = []
for row in results:
    custom_data.append({
        "MOP": str(row.mopLabel) if row.mopLabel else 'N/A',
        "CCDC": str(row.ccdcNumber) if row.ccdcNumber else 'N/A',
        "Synthesis": str(row.synthesisLabel) if row.synthesisLabel else 'N/A'
    })

if custom_data:
    display_answer(
        "Custom Query: MOPs and their Synthesis Procedures",
        format_table(custom_data, ["MOP", "CCDC", "Synthesis"])
    )
else:
    print("No results found")


## üìä Summary Statistics

Let's generate a comprehensive summary of all the information in this knowledge graph:


In [None]:
# Generate a comprehensive summary of the knowledge graph
summary_queries = {
    "MOPs": """
        SELECT (COUNT(DISTINCT ?mop) as ?count) 
        WHERE { ?mop a <https://www.theworldavatar.com/kg/ontomops/MetalOrganicPolyhedron> . }
    """,
    "Syntheses": """
        SELECT (COUNT(DISTINCT ?s) as ?count) 
        WHERE { ?s a <https://www.theworldavatar.com/kg/OntoSyn/ChemicalSynthesis> . }
    """,
    "Synthesis Steps": """
        SELECT (COUNT(DISTINCT ?step) as ?count) 
        WHERE { ?s <https://www.theworldavatar.com/kg/OntoSyn/hasSynthesisStep> ?step . }
    """,
    "Chemical Inputs": """
        SELECT (COUNT(DISTINCT ?chem) as ?count) 
        WHERE { ?s <https://www.theworldavatar.com/kg/OntoSyn/hasChemicalInput> ?chem . }
    """,
    "Chemical Outputs": """
        SELECT (COUNT(DISTINCT ?out) as ?count) 
        WHERE { ?s <https://www.theworldavatar.com/kg/OntoSyn/hasChemicalOutput> ?out . }
    """,
    "Species Characterized": """
        SELECT (COUNT(DISTINCT ?sp) as ?count) 
        WHERE { ?sp a <http://www.theworldavatar.com/ontology/ontospecies/OntoSpecies.owl#Species> . }
    """,
    "Chemical Building Units": """
        SELECT (COUNT(DISTINCT ?cbu) as ?count)
        WHERE { ?mop <https://www.theworldavatar.com/kg/ontomops/hasChemicalBuildingUnit> ?cbu . }
    """,
    "Vessels Used": """
        SELECT (COUNT(DISTINCT ?vessel) as ?count)
        WHERE { ?step <https://www.theworldavatar.com/kg/OntoSyn/hasVessel> ?vessel . }
    """
}

summary_data = []
for label, query in summary_queries.items():
    results = list(graph.query(query))
    count = int(results[0][0]) if results and results[0][0] is not None else 0
    summary_data.append({
        "Category": label,
        "Count": count
    })

# Create a beautiful summary display
summary_html = f"""
<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 25px; border-radius: 15px; color: white; box-shadow: 0 4px 6px rgba(0,0,0,0.1);">
    <h2 style="margin-top: 0; text-align: center;">üìä Knowledge Graph Summary for {HASH}</h2>
    <div style="background-color: white; color: black; border-radius: 10px; padding: 15px; margin-top: 15px;">
        {format_table(summary_data, ["Category", "Count"])}
    </div>
    <div style="margin-top: 20px; text-align: center; font-size: 14px; opacity: 0.9;">
        <p>This knowledge graph contains rich information about MOPs synthesis procedures, including chemicals, steps, conditions, and characterization data.</p>
    </div>
</div>
"""

display(HTML(summary_html))


## üéâ Notebook Complete!

You now have **21+ questions** answered about MOPs synthesis from the knowledge graph!

### What you can do next:

1. **Change the hash** in the "Load Knowledge Graph" cell to explore different papers
2. **Write custom SPARQL queries** in the custom query cell
3. **Modify existing queries** to ask slightly different questions
4. **Export the data** using pandas DataFrames for further analysis

### Example questions covered:
- ‚úì How many MOPs are described?
- ‚úì What are the chemical building units?  
- ‚úì What is the recipe for each synthesis? (like "What is the recipe for synthesis of MOP-14?")
- ‚úì What chemicals and suppliers are used?
- ‚úì What temperatures and durations are used?
- ‚úì What equipment and vessels are used?
- ‚úì What are the synthesis yields?
- ‚úì What solvents and atmospheres are used?
- ‚úì Which steps involve stirring, sealing, or vacuum?
- ‚úì What characterization methods are used?
- ‚úì And many more!

**Happy exploring! üî¨**
