# Querying materials science ontologies

This notebook demonstrates how to build SPARQL queries tailored to materials science applications using tools4RDF. We work with ontology networks relevant to atomistic simulations application and mid-level ontologies, showing how to extract structured information.

ToDo: examples with OCDO and PMDco (show example from issue #36)

In [1]:
from tools4rdf.network.ontology import read_ontology
from tools4rdf.network.network import OntologyNetwork
from rdflib import Graph

In [2]:
onto = read_ontology()

## Atomistic Simulations

In [7]:
query = (onto.create_query(onto.terms.cmso.AtomicScaleSample,
                        onto.terms.cmso.hasAltName@onto.terms.cmso.CrystalStructure))
print(query)

PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
PREFIX asmo: <http://purls.helmholtz-metadaten.de/asmo/>
PREFIX pldo: <http://purls.helmholtz-metadaten.de/pldo/>
PREFIX calculation: <https://w3id.org/mdo/calculation/>
PREFIX podo: <http://purls.helmholtz-metadaten.de/podo/>
PREFIX prov: <http://www.w3.org/ns/prov>
PREFIX qudt: <http://qudt.org/schema/qudt/>
PREFIX ldo: <http://purls.helmholtz-metadaten.de/cdos/ldo/>
PREFIX owl: <http://www.w3.org/2002/07/owl>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?AtomicScaleSample ?CrystalStructure_hasAltNamevalue
WHERE {
    ?AtomicScaleSample cmso:hasMaterial ?cmso_Material .
    ?cmso_Material cmso:hasStructure ?CrystalStructure .
    ?CrystalStructure cmso:hasAltName ?CrystalStructure_hasAltNamevalue .
    ?AtomicScaleSample rdf:type cmso:AtomicScaleSample .
}


In [10]:
"CrystalStructure_hasAltNamevalue" in query

True

In [3]:
#kg = KnowledgeGraph.unarchive('dataset.tar.gz')
kg = Graph()
kg.parse("dataset/triples", format="turtle")

<Graph identifier=Ne02658a3ba7f4b39a6c9b5d8fe91d961 (<class 'rdflib.graph.Graph'>)>

Of course, SPARQL queries can be directly run. See an example:

In [4]:
query = """
PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
SELECT DISTINCT ?sample ?symbol ?number 
WHERE {
    ?sample cmso:hasMaterial ?material .
    ?material cmso:hasStructure ?structure .
    ?structure cmso:hasSpaceGroupSymbol ?symbol .
    ?sample cmso:hasNumberOfAtoms ?number .
FILTER (?number="4"^^xsd:integer)
}"""

The above query finds the Space Group symbol of all structures which have four atoms.

In [5]:
results = kg.query(query)

In [6]:
for row in results:
    print(row)

(rdflib.term.URIRef('sample:10ffd2cc-9e92-4f04-896d-d6c0fdb9e55f'), rdflib.term.Literal('Pm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('sample:1f6b1b0f-446a-4ad8-877e-d2e6176797df'), rdflib.term.Literal('Fm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('sample:286c3974-962b-4333-a2bb-d164ae645454'), rdflib.term.Literal('Fm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('sample:67be61c7-f9c7-4d46-a61d-5350fd0ee246'), rdflib.term.Literal('Fm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4'

This query can also be performed programmatically, which looks like this:

`onto.terms` can be auto-completed to find ontology terms

In [7]:
query = onto.create_query(onto.terms.cmso.AtomicScaleSample, [onto.terms.cmso.hasSpaceGroupSymbol, onto.terms.cmso.hasNumberOfAtoms==4])

In [8]:
print(query)

PREFIX cmso: <http://purls.helmholtz-metadaten.de/cmso/>
PREFIX qudt: <http://qudt.org/schema/qudt/>
PREFIX pldo: <http://purls.helmholtz-metadaten.de/pldo/>
PREFIX podo: <http://purls.helmholtz-metadaten.de/podo/>
PREFIX asmo: <http://purls.helmholtz-metadaten.de/asmo/>
PREFIX prov: <http://www.w3.org/ns/prov>
PREFIX calculation: <https://w3id.org/mdo/calculation/>
PREFIX ldo: <http://purls.helmholtz-metadaten.de/cdos/ldo/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?AtomicScaleSample ?hasSpaceGroupSymbolvalue ?hasNumberOfAtomsvalue
WHERE {
    ?AtomicScaleSample cmso:hasMaterial ?cmso_Material .
    ?cmso_Material cmso:hasStructure ?cmso_CrystalStructure .
    ?cmso_CrystalStructure cmso:hasSpaceGroupSymbol ?hasSpaceGroupSymbolvalue .
    ?AtomicScaleSample cmso:hasNumberOfAtoms ?hasNumberOfAtomsvalue .
    ?AtomicScaleSample rdf:type cmso:AtomicScaleSample .
FILTER (?hasNumberOfAtomsvalue="4"^^xsd:in

Which can now be executed

In [9]:
results = kg.query(query)
for row in results:
    print(row)

(rdflib.term.URIRef('sample:10ffd2cc-9e92-4f04-896d-d6c0fdb9e55f'), rdflib.term.Literal('Pm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('sample:286c3974-962b-4333-a2bb-d164ae645454'), rdflib.term.Literal('Fm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('sample:8fc8e47b-acee-40f8-bcbf-fc298cc31f05'), rdflib.term.Literal('Fm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))
(rdflib.term.URIRef('sample:9f0f48d1-5ebf-4f7a-b241-5e7aa273f5a0'), rdflib.term.Literal('Fm-3m', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#string')), rdflib.term.Literal('4'

OntologyNetwork also has a query method which returns a pandas DataFrame for convenience

In [11]:
onto.query(kg, onto.terms.cmso.AtomicScaleSample, [onto.terms.cmso.hasSpaceGroupSymbol, onto.terms.cmso.hasNumberOfAtoms==4])

Unnamed: 0,AtomicScaleSample,hasSpaceGroupSymbolvalue,hasNumberOfAtomsvalue
0,sample:10ffd2cc-9e92-4f04-896d-d6c0fdb9e55f,Pm-3m,4
1,sample:286c3974-962b-4333-a2bb-d164ae645454,Fm-3m,4
2,sample:8fc8e47b-acee-40f8-bcbf-fc298cc31f05,Fm-3m,4
3,sample:9f0f48d1-5ebf-4f7a-b241-5e7aa273f5a0,Fm-3m,4
4,sample:e54c0e91-52ec-4c47-8ba3-43979a1ebe2e,Fm-3m,4
5,sample:1f6b1b0f-446a-4ad8-877e-d2e6176797df,Fm-3m,4
6,sample:67be61c7-f9c7-4d46-a61d-5350fd0ee246,Fm-3m,4
7,sample:721b7447-8363-4e65-9515-9da2581d7124,Fm-3m,4
8,sample:a3cf6d97-c922-4c4d-8517-e784df83b71e,Fm-3m,4
9,sample:ab2bea57-39ea-49ea-ad3f-c1c40b013154,Fm-3m,4


Now the building of such a query programmatically is discussed. The function needs a source and destination(s). Destination can include conditions attached to it, for example, that numbers of atoms. The first thing to do is to find the right terms. For this, we can use the tab completion feature.

In [12]:
onto.terms

cmso, qudt, pldo, podo, asmo, prov, calculation, ldo, rdf, rdfs

Those are all the ontologies, with the terms we use. One can go deeper down

In [13]:
onto.terms.cmso

AmorphousMaterial, Angle, Atom, AtomAttribute, AtomicForce, AtomicPosition, AtomicScaleSample, AtomicVelocity, Basis, CalculatedProperty, ChemicalComposition, ChemicalElement, ChemicalSpecies, ComputationalSample, CoordinationNumber, CrystalDefect, CrystalStructure, CrystallineMaterial, LatticeAngle, LatticeParameter, LatticePlane, LatticeVector, Length, MacroscaleSample, Material, MesoscaleSample, MicroscaleSample, Microstructure, Molecule, NanoscaleSample, NormalVector, Occupancy, Plane, SimulationCell, SimulationCellAngle, SimulationCellLength, SimulationCellVector, SpaceGroup, Structure, UnitCell, Vector, hasAngle, hasAttribute, hasBasis, hasCalculatedProperty, hasDefect, hasElement, hasLatticeParameter, hasLength, hasMaterial, hasNormalVector, hasSimulationCell, hasSpaceGroup, hasSpecies, hasStructure, hasUnit, hasUnitCell, hasVector, isCalculatedPropertyOf, isDefectOf, isMaterialOf, hasAltName, hasAngle_alpha, hasAngle_beta, hasAngle_gamma, hasAtomicPercent, hasBravaisLattice, ha

And further select terms from there.

In [14]:
onto.terms.cmso.AtomicScaleSample

cmso:AtomicScaleSample
Atomic scale sample is a computational sample in the atomic length scale.

Domains and ranges can also be checked

In [15]:
onto.terms.cmso.hasSpaceGroupSymbol.domain, onto.terms.cmso.hasSpaceGroupSymbol.range

(['cmso:CrystalStructure'], ['string'])

## PMDcore ontology

In [2]:
pmdco = OntologyNetwork('pmd_2_ontology.ttl',format='ttl')

In [5]:
print (pmdco.create_query(pmdco.terms.co.TestPiece, pmdco.terms.co.value)[0])

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX co: <https://w3id.org/pmd/co/>
SELECT DISTINCT ?TestPiece ?valuevalue
WHERE {
    ?TestPiece co:characteristic ?co_ValueObject .
    ?co_ValueObject co:value ?valuevalue .
    ?TestPiece rdf:type co:TestPiece .
}


In [6]:
print (pmdco.create_query([pmdco.terms.co.input,pmdco.terms.co.characteristic], pmdco.terms.co.value)[0])

ValueError: No common classes found in the domains of the object properties.

In [7]:
pmdco.terms.co.input.domain

[]