<a href="https://colab.research.google.com/github/ACTH-DKES/ACTH2025/blob/main/week8/week8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Querying RDF Data with Python

In this notebook, we’ll learn how to query RDF data from two perspectives:

1. Using **SPARQLWrapper** to query online SPARQL endpoints (like Wikidata).
2. Using `rdflib`'s `.query()` method to run SPARQL queries on local RDF graphs.

---

## 1. SPARQLWrapper: Querying Online SPARQL Endpoints

[SPARQLWrapper](https://rdflib.github.io/sparqlwrapper/) is a Python wrapper around a SPARQL service. It allows us to send SPARQL queries and receive results in formats like JSON or XML.

| Method                        | Description                                                                                            |
| ----------------------------- | ------------------------------------------------------------------------------------------------------ |
| `SPARQLWrapper(endpoint_url)` | Initializes the object with the URL of the SPARQL endpoint.                                            |
| `.setQuery(query_string)`     | Sets the SPARQL query string to be executed.                                                           |
| `.setReturnFormat(format)`    | Specifies the format of the returned result. Common formats include `JSON`, `XML`, `TURTLE`, and `N3`. |
| `.addParameter(key, value)`   | Adds a custom URL parameter to the request.                                                            |
| `.setMethod(HTTPMethod)`      | Sets the HTTP method to use (`GET` or `POST`, default is `GET`).                                       |
| `.query()`                    | Executes the query. Returns a `QueryResult` object.                                                    |
| `.query().convert()`          | Executes the query and converts the result into the specified format (commonly JSON).                  |


In [None]:
!pip install SPARQLWrapper

In [None]:
from SPARQLWrapper import SPARQLWrapper, JSON

# Define the endpoint and your query
endpoint_url = "https://query.wikidata.org/sparql"
sparql = SPARQLWrapper(endpoint_url)
sparql.setReturnFormat(JSON)

# A simple query: get 5 famous Renaissance painters and their birthplaces
query = """
SELECT ?person ?personLabel ?birthplaceLabel WHERE {
  ?person wdt:P106 wd:Q1028181;   # occupation: painter
          wdt:P800 ?work;         # notable work
          wdt:P19 ?birthplace.    # place of birth
  ?work wdt:P135 wd:Q4692.        # renaissance movement
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 5
"""

sparql.setQuery(query)
results = sparql.query().convert()

# Parse and print the results
for result in results["results"]["bindings"]: # in this case it's a JSON!
    print(f"{result['personLabel']['value']} was born in {result['birthplaceLabel']['value']}")

In [None]:
results["results"]["bindings"]

---
## Exercise 1

Modify the query to retrieve famous **Baroque** painters instead. Baroque is `wd:Q37853`.

<details> <summary> Solution (just the query) </summary>
<pre>
query = """
SELECT ?person ?personLabel ?birthplaceLabel WHERE {
  ?person wdt:P106 wd:Q1028181;
          wdt:P800 ?work;
          wdt:P19 ?birthplace.
  ?work wdt:P135 wd:Q37853.  # Baroque
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 5
"""
</pre>
</details>

---
## 2. Querying Local RDF with `rdflib`

You can use `rdflib`'s `.query()` method to execute SPARQL on RDF graphs you've loaded or built in memory.

In [None]:
!pip install rdflib

In [None]:
from rdflib import Graph, Namespace, Literal, RDF, URIRef, FOAF

# Create a small graph manually
ex = Namespace("http://example.org/")
g = Graph()

# Add a few triples
g.add((ex.Michelangelo, RDF.type, ex.Painter))
g.add((ex.Michelangelo, FOAF.name, Literal("Michelangelo")))
g.add((ex.Michelangelo, ex.birthPlace, Literal("Caprese")))
g.add((ex.Raphael, RDF.type, ex.Painter))
g.add((ex.Raphael, ex.birthPlace, Literal("Urbino")))
g.add((ex.Raphael, FOAF.name, Literal("Raphael")))

# Query the graph
query = """
prefix ex: <http://example.org/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?place WHERE {
  ?painter a ex:Painter ;
           foaf:name ?name ;
        ex:birthPlace ?place .
}
"""

results = g.query(query)

for row in results:
    print(f"{row.name} was born in {row.place}")

---
## Exercise 2

Add Leonardo da Vinci to the graph with birthplace Vinci, and re-run the query.

<details><summary>Solution (just adding the new triples)</summary>
<pre>
g.add((ex.Leonardo, RDF.type, ex.Painter))
g.add((ex.Leonardo, FOAF.name, Literal("Leonardo")))
g.add((ex.Leonardo, ex.birthPlace, Literal("Vinci")))
</pre>
</details>

---
## 3. Querying Wikidata and Displaying the Results

Let’s go back to Wikidata and do a more complex query, then display results as a table using `pandas`.

In [None]:
import pandas as pd

query = """
SELECT ?artist ?artistLabel ?birthLabel WHERE {
  ?artist wdt:P106 wd:Q1028181;    # painter
          wdt:P569 ?birth.         # birth date
  FILTER(YEAR(?birth) > 1800)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10
"""

sparql.setQuery(query)
results = sparql.query().convert()

# Build a dataframe
data = []
for r in results["results"]["bindings"]:
    data.append({
        "Artist": r["artistLabel"]["value"],
        "Birth Date": r["birthLabel"]["value"],
    })

df = pd.DataFrame(data)
df

---
## Exercise 3

Change the filter to show only painters born **before 1600**.

<details><summary>Solution (just the query) </summary> <pre>
newquery = """SELECT ?artist ?artistLabel ?birthLabel WHERE {
  ?artist wdt:P106 wd:Q1028181;    # painter
          wdt:P569 ?birth.         # birth date
  FILTER(YEAR(?birth) > 1800)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10""" </pre>
</details>