# Example Data / Information Retrieval using SPARQL

In this Jupyter Notebook, a local triple store is created using the OWLready2 Python package to be able to perform SPARQL queries on example datasets.

## Import of relevant packages | Definition of helper function(s)

In [1]:
# Import relevant and useful packages
import re
import numpy as np
import pandas as pd
import owlready2 as or2
from rdflib import Graph
from IPython.display import display

# Definition of helper functions

def parse_query(query: str) -> list[str]:
    """
    Extracts the variable names in order from a SPARQL SELECT query,
    handling both plain variables and expressions with AS.
    """
    # Extract the SELECT clause
    select_match = re.search(r"SELECT\s+(.*?)\s+WHERE", query, re.IGNORECASE | re.DOTALL)
    if not select_match:
        raise ValueError("Could not parse SELECT clause from the query.")
    
    select_clause = select_match.group(1).strip()
    
    variables = []

    # Tokenize the SELECT clause: split by spaces, but keep parentheses together
    # This regex finds either a parenthesized expression or a plain token
    tokens = re.findall(r"\([^\)]+\)|\S+", select_clause)
    
    for token in tokens:
        token = token.strip()
        if token.startswith("(") and token.endswith(")"):
            # Parenthesized expression: check for AS
            match = re.search(r"AS\s+(\?\w+)", token, re.IGNORECASE)
            if match:
                variables.append(match.group(1).lstrip("?").rstrip(")"))
        elif token.startswith("?"):
            # Plain variable
            variables.append(token.lstrip("?").rstrip(")"))
        # else: ignore (e.g., DISTINCT)
    
    return variables

def to_iri(inp):
    """
    Function to transform inputs to IRIs.
    """
    try:
        return inp.iri
    except:
        return inp

def sparql_result_to_df(res):
    """Function to write the result of a SPARQL query into a (pandas) data frame."""
    l = []
    for row in res:
        r = [to_iri(item) for item in row]
        l.append(r)
    return pd.DataFrame(l)

## Example 0: Use SPARQL to count all the entities in all the knowledge graphs

### SPARQL

SPARQL (SPARQL Protocol And RDF Query Language) is used to query the knowledge graphs. To extract information from the graphs, a corresponding SPARQL query has to be built first. A very simple example query that counts all the entities look like this:

```SPARQL
SELECT (COUNT(?s) AS ?number_of_instances)
WHERE {
    ?s ?p ?o
}
```

In [2]:
query = """
SELECT (COUNT(?s) AS ?number_of_instances)
WHERE {
    ?s ?p ?o
}
"""

### Definition of Sources

Load the sources of ontologies to be read in (parsed) as well as the A-Box of the Knowledge Graph to be queried.

In [3]:
# Knowledge Graph File:
kgf = './Knowledge_Graph_All.ttl'

# Convert the file from .ttl to rdf
if kgf.endswith('.ttl'):
    g = Graph()
    g.parse(kgf, format='ttl')
    kgf = f'{kgf.rsplit(".ttl", 1)[0]}.rdf'
    g.serialize(kgf, format='xml')

# Links to ontologies, files, etc. to be loaded in the local triple store
link_PMDco = "https://materialdigital.github.io/core-ontology/ontology.rdf" # PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_ontoNano = "file://../ontonano-full.owl" # Nanoparticle Synthesis Ontology (ontoNano) hosted on corresponding GitHub repository
link_data = f"file://{kgf}" # Example Dataset hosted on corresponding GitHub repository

triple_store = or2.World()
triple_store.get_ontology(link_PMDco).load()
triple_store.get_ontology(link_ontoNano).load()
triple_store.get_ontology(link_data).load()

  http://purl.obolibrary.org/obo/BFO_0000051



get_ontology("file://./Knowledge_Graph_All.rdf#")

### Execute the Query

In [4]:
# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Display Results:
print("Number of all instances in the triple store:")
data.columns = parse_query(query)
display(data)

Number of all instances in the triple store:


Unnamed: 0,number_of_instances
0,39681


## Example 1: Centrifugation times for the Au@MSN-NP Synthesis

### Build the SPARQL Query

In order to build a more complex query, you need to be familiar with how the parameters you want to extract from the knowledge graphs are modeled. We follow the common practices of the PMDco and BFO here. An example for a heating process step is given in the paper, but the same concepts also apply to other process steps, such as a centrifugation step:

![image](./Images/Figure_Heating_Process.jpg)

Following the arrows in the A-Box from left to right, replacing "Temperature" with "Time", and using natural language, we would formulate our query as follows:

<code>
Select all the distinct process_steps
    which are of type "centrifuging" AND
    this "process_step" has a process attribute called "process_attribute" AND
    this "process_attribute" is of type "process_attribute" AND
    this "process_attribute" refers to a quality called "time" AND
    this "time" is of type "temporal_interval" AND
    this "time" quality is specified as a "scalar_value_specification" AND
    this "scalar_value_specification" is of type "scalar_value_specification" AND
    this "scalar_value_specification" has a value called "centrifugation_time" AND
    this "scalar_value_specification" has a measurement untit label called "unit"
</code>
<br />
The SPARQL-Query in human-readable form (using words instead of IRIs) would be:
<br />

```SPARQL
SELECT DISTINCT ?process_step ?centrifugation_time ?unit
WHERE {
  ?process_step rdf:type wcso:centrifuging .
  ?process_step pmd:has_process_characteristic ?process_attribute .
  ?process_attribute rdf:type pmd:process_attribute .
  ?process_attribute pmd:refers_to ?time .
  ?time rdf:type bfo:temporal_interval .
  ?time iao:quality_is_specified_as ?svs .
  ?svs rdf:type obi:scalar_value_specification .
  ?svs pmd:has_value ?centrifugation_time .
  ?svs pmd:has_measurement_unit_label ?unit .
}
```

Finally, replacing the names with their IRIs, we can build the following SPARQL query: 

In [5]:
query = """
SELECT DISTINCT ?process_step ?centrifugation_time ?unit
WHERE {
  ?process_step <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/ontoNano/nano_0050005> .
  ?process_step <https://w3id.org/pmd/co/PMD_0000009> ?process_attribute .
  ?process_attribute <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/pmd/co/PMD_0000008> .
  ?process_attribute <https://w3id.org/pmd/co/PMD_0020127> ?time .
  ?time <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/BFO_0000202> .
  ?time <http://purl.obolibrary.org/obo/IAO_0000419> ?svs .
  ?svs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/OBI_0001931> .
  ?svs <https://w3id.org/pmd/co/PMD_0000006> ?centrifugation_time .
  ?svs <https://w3id.org/pmd/co/PMD_0000020> ?unit .
}
"""

### Definition of Sources

Load the sources of ontologies to be read in (parsed) as well as the A-Box of the Knowledge Graph to be queried.

In [6]:
# Knowledge Graph File:
kgf = './Knowledge_Graph_Au-MSN.ttl'

# Convert the file from .ttl to rdf
if kgf.endswith('.ttl'):
    g = Graph()
    g.parse(kgf, format='ttl')
    kgf = f'{kgf.rsplit(".ttl", 1)[0]}.rdf'
    g.serialize(kgf, format='xml')

# Links to ontologies, files, etc. to be loaded in the local triple store
link_PMDco = "https://materialdigital.github.io/core-ontology/ontology.rdf" # PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_ontoNano = "file://../ontonano-full.owl" # Nanoparticle Synthesis Ontology (ontoNano) hosted on corresponding GitHub repository
link_data = f"file://{kgf}" # Example Dataset hosted on corresponding GitHub repository

triple_store = or2.World()
triple_store.get_ontology(link_PMDco).load()
triple_store.get_ontology(link_ontoNano).load()
triple_store.get_ontology(link_data).load()

  http://purl.obolibrary.org/obo/BFO_0000051



get_ontology("file://./Knowledge_Graph_Au-MSN.rdf#")

### Execute the Query

In [7]:
# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Display Results:
print("Centrifugation times for the Au@MSN-NP Synthesis:")
data.columns = parse_query(query)
display(data)

Centrifugation times for the Au@MSN-NP Synthesis:


Unnamed: 0,process_step,centrifugation_time,unit
0,https://w3id.org/wcso/wcso_e/centrifuge_5622_U...,90.0,https://qudt.org/vocab/unit/MIN
1,https://w3id.org/wcso/wcso_e/centrifuge_3200_U...,15.0,https://qudt.org/vocab/unit/MIN
2,https://w3id.org/wcso/wcso_e/centrifuge_5226_U...,15.0,https://qudt.org/vocab/unit/MIN
3,https://w3id.org/wcso/wcso_e/centrifuge_4264_U...,15.0,https://qudt.org/vocab/unit/MIN
4,https://w3id.org/wcso/wcso_e/centrifuge_4960_U...,15.0,https://qudt.org/vocab/unit/MIN
5,https://w3id.org/wcso/wcso_e/centrifuge_2934_U...,15.0,https://qudt.org/vocab/unit/MIN
6,https://w3id.org/wcso/wcso_e/centrifuge_2560_U...,15.0,https://qudt.org/vocab/unit/MIN


## Example 2: Total volume of water required for Au-NP Synthesis

### Build the SPARQL Query

Quering information about chemicals is conceptually similar to querying information about process attributes (see Example 1 above). Again, it helps to look at the corresponding Figure from the paper to see how this is modeled:

![image](./Images/Figure_Process_Inputs_Outputs.jpg)

Here, we are interested in an identifier of this chem,ical - "7732-18-5" (i.e., the CAS number of water) - and in a volume. Using natural language, we would formulate our query as follows:

<code>
Select all the distinct process steps
    which have a "chemical" as input AND
    this "chemical" is of type "chemical_entity" AND
    this "chemical" has an "identifier" AND
    this "identifier" has the value "7732-18-5" (i.e., the CAS number of water) AND
    the "chemical" has a quality called "quality" AND
    this "quality" is of type "volume" AND
    this "quality" quality is specified as a "value_specification" AND
    this "value_specification" is of type "scalar_value_specification" AND
    this "scalar_value_specification" has a value called "volume" AND
    this "scalar_value_specification" has a measurement untit label called "unit"
</code>
<br />
The SPARQL-Query in human-readable form (using words instead of IRIs) would be:
<br />

```SPARQL
SELECT DISTINCT ?process_step ?chemical ?volume ?unit
WHERE {
  ?process_step ro:has_input ?chemical .
  ?chemical rdf:type chebi:chemical_entity .
  ?chemical iao:denotes ?identifier .
  ?identifier pmd:has_value "7732-18-5" .
  ?chemical ro:has_quality ?quality .
  ?quality rdf:type pmd:volume .
  ?quality iao:quality_is_specified_as ?svs .
  ?svs rdf:type obi:scalar_value_specification .
  ?svs pmd:has_value ?volume .
  ?svs pmd:has_measurement_unit_label ?unit .
}
```

Finally, replacing the names with their IRIs, we can build the following SPARQL query: 

In [8]:
query = """
SELECT DISTINCT ?process_step ?chemical ?volume ?unit
WHERE {
  ?process_step <http://purl.obolibrary.org/obo/RO_0002233> ?chemical .
  ?chemical <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/CHEBI_24431> .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier .
  ?identifier <https://w3id.org/pmd/co/PMD_0000006> "7732-18-5" .
  ?chemical <http://purl.obolibrary.org/obo/RO_0000086> ?quality .
  ?quality <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/pmd/co/PMD_0020150> .
  ?quality <http://purl.obolibrary.org/obo/IAO_0000419> ?svs .
  ?svs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/OBI_0001931> .
  ?svs <https://w3id.org/pmd/co/PMD_0000006> ?volume .
  ?svs <https://w3id.org/pmd/co/PMD_0000020> ?unit .
}
"""

### Definition of Sources

Load the sources of ontologies to be read in (parsed) as well as the A-Box of the Knowledge Graph to be queried.

In [9]:
# Knowledge Graph File:
kgf = './Knowledge_Graph_Au.ttl'

# Convert the file from .ttl to rdf
if kgf.endswith('.ttl'):
    g = Graph()
    g.parse(kgf, format='ttl')
    kgf = f'{kgf.rsplit(".ttl", 1)[0]}.rdf'
    g.serialize(kgf, format='xml')

# Links to ontologies, files, etc. to be loaded in the local triple store
link_PMDco = "https://materialdigital.github.io/core-ontology/ontology.rdf" # PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_ontoNano = "file://../ontonano-full.owl" # Nanoparticle Synthesis Ontology (ontoNano) hosted on corresponding GitHub repository
link_data = f"file://{kgf}" # Example Dataset hosted on corresponding GitHub repository

triple_store = or2.World()
triple_store.get_ontology(link_PMDco).load()
triple_store.get_ontology(link_ontoNano).load()
triple_store.get_ontology(link_data).load()

  http://purl.obolibrary.org/obo/BFO_0000051



get_ontology("file://./Knowledge_Graph_Au.rdf#")

### Execute the Query

In [10]:
# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Display Results:
print("Amount of water needed for the Au-NP Synthesis:")
data.columns = parse_query(query)
display(data)

Amount of water needed for the Au-NP Synthesis:


Unnamed: 0,process_step,chemical,volume,unit
0,https://w3id.org/wcso/wcso_e/add_chemical_499_...,https://w3id.org/wcso/wcso_e/chemical_water_60...,9.0,https://qudt.org/vocab/unit/MilliL
1,https://w3id.org/wcso/wcso_e/measure_dls_1347_...,https://w3id.org/wcso/wcso_e/chemical_water_14...,15.0,https://qudt.org/vocab/unit/MilliL
2,https://w3id.org/wcso/wcso_e/remove_supernatan...,https://w3id.org/wcso/wcso_e/chemical_water_17...,10.0,https://qudt.org/vocab/unit/MilliL
3,https://w3id.org/wcso/wcso_e/remove_supernatan...,https://w3id.org/wcso/wcso_e/chemical_water_17...,10.0,https://qudt.org/vocab/unit/MilliL
4,https://w3id.org/wcso/wcso_e/infuse_while_heat...,https://w3id.org/wcso/wcso_e/chemical_water_20...,2.0,https://qudt.org/vocab/unit/MilliL
5,https://w3id.org/wcso/wcso_e/add_chemical_1285...,https://w3id.org/wcso/wcso_e/chemical_water_19...,4.0,https://qudt.org/vocab/unit/MilliL


As we can see from the corresponding Node Graph, we identified all 6 instances where the Chemical "Water" is used correctly this way. If we simply selected for distinct chemicals in the knowledge graph, without going through the process steps first, we would have missed one time 10.0 mL of volume from case where the **same** chemical instance is connected to two "Remove Supernatant and Redispers" steps (in the middle of the node graph):

![image](./Images/Reaction_Au_NP.jpg)

## Example 3: Chemicals required for CuO-NP Synthesis

This is similar to the previous example. Using natural language, we would formulate our query as follows:

<code>
Select all the distinct entities
    which are of type "chemical" AND
    this "chemical" has an "identifier" AND
    this "identifier" has a "value"
</code>
<br />
The SPARQL-Query in human-readable form (using words instead of IRIs) would be:
<br />

```SPARQL
SELECT DISTINCT ?chemical ?identifier ?value
WHERE {
  ?chemical rdf:type chebi:chemical_entity .
  ?chemical iao:denotes ?identifier .
  ?identifier pmd:has_value ?value .
}
```

While this lists all the chemical entities and their CAS, Name, and SMILES, it lists them in several rows, and also multiple tiimes for different instances of the same chemical. To make the output a bit cleaner, we can use the fact that the identifier for the name always has "\_name\_" in its IRI before the UUID. Likewise, the identifier for CAS always contains "_cas_UUID", and the identifier for SMILES always has "_smiles_UUID" in its IRI. So filtering by that and then extracting the respective values gives a cleaner output:

```SPARQL
SELECT DISTINCT ?name ?cas ?smiles
WHERE {
  ?chemical rdf:type chebi:chemical_entity .
  ?chemical iao:denotes ?identifier_name .
  ?chemical iao:denotes ?identifier_cas .
  ?chemical iao:denotes ?identifier_smiles .

  FILTER(CONTAINS(STR(?identifier_name), "_name_UUID"))
  FILTER(CONTAINS(STR(?identifier_cas), "_cas_UUID"))
  FILTER(CONTAINS(STR(?identifier_smiles), "_smiles_UUID"))
    
  ?identifier_name pmd:has_value ?name .
  ?identifier_cas pmd:has_value ?cas .
  ?identifier_smiles pmd:has_value ?smiles .
}
```

Finally, replacing the names with their IRIs, we can build the following SPARQL query: 

In [11]:
query = """
SELECT DISTINCT ?name ?cas ?smiles
WHERE {
  ?chemical <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/CHEBI_24431> .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier_name .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier_cas .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier_smiles .

  FILTER(CONTAINS(STR(?identifier_name), "_name_UUID"))
  FILTER(CONTAINS(STR(?identifier_cas), "_cas_UUID"))
  FILTER(CONTAINS(STR(?identifier_smiles), "_smiles_UUID"))

  ?identifier_name <https://w3id.org/pmd/co/PMD_0000006> ?name .
  ?identifier_cas <https://w3id.org/pmd/co/PMD_0000006> ?cas .
  ?identifier_smiles <https://w3id.org/pmd/co/PMD_0000006> ?smiles .
}
"""

### Definition of Sources

Load the sources of ontologies to be read in (parsed) as well as the A-Box of the Knowledge Graph to be queried.

In [12]:
# Knowledge Graph File:
kgf = './Knowledge_Graph_CuO.ttl'

# Convert the file from .ttl to rdf
if kgf.endswith('.ttl'):
    g = Graph()
    g.parse(kgf, format='ttl')
    kgf = f'{kgf.rsplit(".ttl", 1)[0]}.rdf'
    g.serialize(kgf, format='xml')

# Links to ontologies, files, etc. to be loaded in the local triple store
link_PMDco = "https://materialdigital.github.io/core-ontology/ontology.rdf" # PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_ontoNano = "file://../ontonano-full.owl" # Nanoparticle Synthesis Ontology (ontoNano) hosted on corresponding GitHub repository
link_data = f"file://{kgf}" # Example Dataset hosted on corresponding GitHub repository

triple_store = or2.World()
triple_store.get_ontology(link_PMDco).load()
triple_store.get_ontology(link_ontoNano).load()
triple_store.get_ontology(link_data).load()

  http://purl.obolibrary.org/obo/BFO_0000051



get_ontology("file://./Knowledge_Graph_CuO.rdf#")

### Execute the Query

In [13]:
# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Display Results:
print("Chemicals needed for the CuO-NP Synthesis:")
data.columns = parse_query(query)
display(data)

Chemicals needed for the CuO-NP Synthesis:


Unnamed: 0,name,cas,smiles
0,Water,7732-18-5,O
1,Acetic Acid,64-19-7,CC(O)=O
2,NaOH,1310-73-2,[OH-].[Na+]
3,Copper(II) Acetate Monohydrate,6046-93-1,[O+]1C(C)O[Cu-3]23([OH2+])[O+]C(C)O[Cu-3]1([OH...


## Example 4: List all Chemicals and their amounts that are required for all the syntheses in the lab

This is essentially a combination of Example 2 and Example 3. The respective SPARQL query looks as follows:

```SPARQL
SELECT DISTINCT ?process_step ?name ?cas ?smiles ?value ?unit
WHERE {
  ?process_step ro:has_input ?chemical .
  ?chemical rdf:type chebi:chemical_entity .
  ?chemical iao:denotes ?identifier_name .
  ?chemical iao:denotes ?identifier_cas .
  ?chemical iao:denotes ?identifier_smiles .

  FILTER(CONTAINS(STR(?identifier_name), "_name_UUID"))
  FILTER(CONTAINS(STR(?identifier_cas), "_cas_UUID"))
  FILTER(CONTAINS(STR(?identifier_smiles), "_smiles_UUID"))
    
  ?identifier_name pmd:has_value ?name .
  ?identifier_cas pmd:has_value ?cas .
  ?identifier_smiles pmd:has_value ?smiles .
  
  ?chemical ro:input_of ?quality .
  ?quality rdf:type pmd:volume .
  ?quality iao:quality_is_specified_as ?svs .
  ?svs rdf:type obi:scalar_value_specification .
  ?svs pmd:hasvalue ?value .
  ?svs pmd:has_measurement_unit_label ?unit .
}
```

Finally, replacing the names with their IRIs, we can build the following SPARQL query: 

In [14]:
query = """
SELECT DISTINCT ?process_step ?chemical ?name ?cas ?smiles ?volume ?unit
WHERE {
  ?process_step <http://purl.obolibrary.org/obo/RO_0002233> ?chemical .
  ?chemical <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/CHEBI_24431> .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier_name .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier_cas .
  ?chemical <http://purl.obolibrary.org/obo/IAO_0000219> ?identifier_smiles .

  FILTER(CONTAINS(STR(?identifier_name), "_name_UUID"))
  FILTER(CONTAINS(STR(?identifier_cas), "_cas_UUID"))
  FILTER(CONTAINS(STR(?identifier_smiles), "_smiles_UUID"))

  ?identifier_name <https://w3id.org/pmd/co/PMD_0000006> ?name .
  ?identifier_cas <https://w3id.org/pmd/co/PMD_0000006> ?cas .
  ?identifier_smiles <https://w3id.org/pmd/co/PMD_0000006> ?smiles .
  
  ?chemical <http://purl.obolibrary.org/obo/RO_0000086> ?quality .
  ?quality <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <https://w3id.org/pmd/co/PMD_0020150> .
  ?quality <http://purl.obolibrary.org/obo/IAO_0000419> ?svs .
  ?svs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.obolibrary.org/obo/OBI_0001931> .
  ?svs <https://w3id.org/pmd/co/PMD_0000006> ?volume .
  ?svs <https://w3id.org/pmd/co/PMD_0000020> ?unit .
}
"""

### Definition of Sources

Load the sources of ontologies to be read in (parsed) as well as the A-Box of the Knowledge Graph to be queried.

In [15]:
# Knowledge Graph File:
kgf = './Knowledge_Graph_All.ttl'

# Convert the file from .ttl to rdf
if kgf.endswith('.ttl'):
    g = Graph()
    g.parse(kgf, format='ttl')
    kgf = f'{kgf.rsplit(".ttl", 1)[0]}.rdf'
    g.serialize(kgf, format='xml')

# Links to ontologies, files, etc. to be loaded in the local triple store
link_PMDco = "https://materialdigital.github.io/core-ontology/ontology.rdf" # PMD Core Ontology (PMDco) hosted on corresponding GitHub repository
link_ontoNano = "file://../ontonano-full.owl" # Nanoparticle Synthesis Ontology (ontoNano) hosted on corresponding GitHub repository
link_data = f"file://{kgf}" # Example Dataset hosted on corresponding GitHub repository

triple_store = or2.World()
triple_store.get_ontology(link_PMDco).load()
triple_store.get_ontology(link_ontoNano).load()
triple_store.get_ontology(link_data).load()

  http://purl.obolibrary.org/obo/BFO_0000051



get_ontology("file://./Knowledge_Graph_All.rdf#")

### Execute the Query

In [16]:
# Execute the SPARQL query
res = triple_store.sparql(query)

# Convert the result to a DataFrame
data = sparql_result_to_df(res)

# Display Results:
print("All Chemicals needed for all the Syntheses in the Lab:")
data.columns = parse_query(query)
display(data)

All Chemicals needed for all the Syntheses in the Lab:


Unnamed: 0,process_step,chemical,name,cas,smiles,volume,unit
0,https://w3id.org/wcso/wcso_e/remove_supernatan...,https://w3id.org/wcso/wcso_e/chemical_ethanol_...,Ethanol,64-17-5,CCO,15.0,https://qudt.org/vocab/unit/MilliL
1,https://w3id.org/wcso/wcso_e/remove_supernatan...,https://w3id.org/wcso/wcso_e/chemical_ethanol_...,Ethanol,64-17-5,CCO,15.0,https://qudt.org/vocab/unit/MilliL
2,https://w3id.org/wcso/wcso_e/add_chemical_1514...,https://w3id.org/wcso/wcso_e/chemical_ctab_179...,CTAB,57-09-0,CCCCCCCCCCCCCCCC[N+](C)(C)C.[Br-],6.0,https://qudt.org/vocab/unit/MilliL
3,https://w3id.org/wcso/wcso_e/remove_supernatan...,https://w3id.org/wcso/wcso_e/chemical_ethanol_...,Ethanol,64-17-5,CCO,5.0,https://qudt.org/vocab/unit/MilliL
4,https://w3id.org/wcso/wcso_e/remove_supernatan...,https://w3id.org/wcso/wcso_e/chemical_ethanol_...,Ethanol,64-17-5,CCO,5.0,https://qudt.org/vocab/unit/MilliL
...,...,...,...,...,...,...,...
98,https://w3id.org/wcso/wcso_e/add_chemical_2163...,https://w3id.org/wcso/wcso_e/chemical_ethanol_...,Ethanol,64-17-5,CCO,8.0,https://qudt.org/vocab/unit/MilliL
99,https://w3id.org/wcso/wcso_e/infuse_while_heat...,https://w3id.org/wcso/wcso_e/chemical_teos_in_...,TEOS in EtOH,78-10-4,CCO[Si](OCC)(OCC)OCC,1.0,https://qudt.org/vocab/unit/MilliL
100,https://w3id.org/wcso/wcso_e/add_chemical_6116...,https://w3id.org/wcso/wcso_e/chemical_copperii...,Copper(II) Acetate Monohydrate,6046-93-1,[O+]1C(C)O[Cu-3]23([OH2+])[O+]C(C)O[Cu-3]1([OH...,3.2,https://qudt.org/vocab/unit/MilliL
101,https://w3id.org/wcso/wcso_e/add_chemical_1514...,https://w3id.org/wcso/wcso_e/chemical_water_17...,Water,7732-18-5,O,54.0,https://qudt.org/vocab/unit/MilliL
