# Writing SPARQL queries programmatically

This notebook demonstrates how to compose SPARQL queries programmatically, with little to no prior knowledge of SPARQL syntax.

As an example, we'll work with the well-known Pizza ontology and a set of pre-loaded RDF data.

In [30]:
from tools4rdf.network.network import OntologyNetwork
from rdflib import Graph

In [31]:
onto = OntologyNetwork("pizza.owl")

<img src="examples/pizza_schematic.jpg" alt="Pizza schematic" width="400"/>


In [32]:
g = Graph()
g.parse("pizza_kg.ttl", format="ttl")

<Graph identifier=Nfe61821628bb4b0994bf2268c5ef025a (<class 'rdflib.graph.Graph'>)>

Naturally, you can write and run SPARQL queries directly. Here's a simple example:

In [33]:
query = """
PREFIX pizza: <https://example.org/pizza#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?Food ?hasPricevalue
WHERE {
    ?Food pizza:hasPrice ?hasPricevalue .
}"""

Running a Direct SPARQL Query Using RDFLib

In [34]:
results = g.query(query)

In [35]:
for row in results:
    print(row)

(rdflib.term.URIRef('American:3775bb76'), rdflib.term.Literal('10.0', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef('AmericanHot:b3f8f95b'), rdflib.term.Literal('11.5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef('VegetarianPizza:48687fff'), rdflib.term.Literal('11.5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef('IceCream:58e21cdd'), rdflib.term.Literal('7.0', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef('Margherita:52ceb1ed'), rdflib.term.Literal('8.99', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef('Margherita:fed8acfc'), rdflib.term.Literal('9.5', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#float')))
(rdflib.term.URIRef('Mushroom:1e428ac1'), rdflib.term.Literal('11.0', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#floa

You can also construct this query programmatically. Here’s how that looks:

You can use `onto.terms` with tab completion to explore available ontology terms.

In [36]:
q = onto.query(g, onto.terms.pizza.Food.any, onto.terms.pizza.hasPrice)
q

Unnamed: 0,Food,hasPricevalue
0,American:3775bb76,10.0
1,AmericanHot:b3f8f95b,11.5
2,VegetarianPizza:48687fff,11.5
3,IceCream:58e21cdd,7.0
4,Margherita:52ceb1ed,8.99
5,Margherita:fed8acfc,9.5
6,Mushroom:1e428ac1,11.0
7,SpicyPizza:2a9eb8e8,11.0
8,SpicyPizza:8864a1fc,12.99


The `tools4RDF` package also provides a query method that returns results as a pandas DataFrame, making further analysis easier.

Now let's look at how to build this type of query programmatically.
The function requires a source class and one or more destination classes.
In this case, we want to find all pizzas that include a Peperoni Sausage Topping.

In [37]:
q = onto.query(g, onto.terms.pizza.Pizza, onto.terms.pizza.PeperoniSausageTopping)
q

Unnamed: 0,Pizza,PeperoniSausageTopping


What happened here? The function tried to find the shortest path between the `Pizza` and `PeperoniSausageTopping` classes. Since both are subclasses of `Food`, it initially chose the `hasIngredient` property.

However, by increasing the `num_paths` parameter, we can get alternative paths. In this case, the second option uses `hasTopping`, which is the more appropiate query.

In [38]:
q = onto.create_query(onto.terms.pizza.Pizza, onto.terms.pizza.PeperoniSausageTopping,num_paths=2)
print (q[0], q[1])

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pizza: <https://example.org/pizza#>
SELECT DISTINCT ?Pizza ?PeperoniSausageTopping
WHERE {
    ?Pizza pizza:hasIngredient ?PeperoniSausageTopping .
    ?Pizza rdf:type pizza:Pizza .
    ?PeperoniSausageTopping rdf:type pizza:PeperoniSausageTopping .
} PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX pizza: <https://example.org/pizza#>
SELECT DISTINCT ?Pizza ?PeperoniSausageTopping
WHERE {
    ?Pizza pizza:hasTopping ?PeperoniSausageTopping .
    ?Pizza rdf:type pizza:Pizza .
    ?PeperoniSausageTopping rdf:type pizza:PeperoniSausageTopping .
}


If you know part of the path in advance, you can specify intermediate classes or properties.
For example, we can directly query for any class that has a `PeperoniSausageTopping` using the `hasTopping` object property.

In [39]:
q = onto.query(g, onto.terms.pizza.Pizza.all_subtypes, [[onto.terms.pizza.hasTopping, onto.terms.pizza.PeperoniSausageTopping],])
q

Unnamed: 0,Pizza,hasTopping_PeperoniSausageTopping
0,American:3775bb76,PeperoniSausageTopping:f3a39bb8
1,AmericanHot:b3f8f95b,PeperoniSausageTopping:f3a39bb8
2,Margherita:52ceb1ed,PeperoniSausageTopping:f3a39bb8
3,Margherita:fed8acfc,PeperoniSausageTopping:f3a39bb8
4,Mushroom:1e428ac1,PeperoniSausageTopping:f3a39bb8
5,SpicyPizza:2a9eb8e8,PeperoniSausageTopping:f3a39bb8
6,SpicyPizza:8864a1fc,PeperoniSausageTopping:f3a39bb8
7,VegetarianPizza:48687fff,PeperoniSausageTopping:f3a39bb8


Alternatively, you can also start from a specific predicate and build the query outward from there.

In [40]:
q = onto.query(g, onto.terms.pizza.hasTopping, onto.terms.pizza.PeperoniSausageTopping)
q

Unnamed: 0,Siciliana,PeperoniSausageTopping,hasTopping
