# Exploring Baroque Ceiling Painting Data in the NFDI4Culture Knowledge Graph

This notebook is a starting point for a data story about baroque art and ceiling paintings using the NFDI4Culture Knowledge Graph.

Focus:
- Work with **data portals** (especially CbDD and the Color Slide Archive of Wall and Ceiling Painting)
- Use **SPARQL** to query the KG
- Prepare results for visualisation (maps, timelines, comparisons)

You can adapt the queries step by step as you learn more about the concrete RDF schema of the datasets.

In [None]:
# Install dependencies (run once per environment)
!pip install SPARQLWrapper pandas matplotlib --quiet

In [None]:
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option("display.max_rows", 50)
pd.set_option("display.max_columns", 20)
pd.set_option("display.width", 120)

# NFDI4Culture SPARQL endpoint
ENDPOINT_URL = "https://nfdi4culture.de/sparql"

# Prefixes used in queries
PREFIXES = """\
PREFIX rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n
PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>\n
PREFIX schema:  <https://schema.org/>\n
PREFIX dcterms: <http://purl.org/dc/terms/>\n
PREFIX dcat:    <http://www.w3.org/ns/dcat#>\n
PREFIX n4c:     <https://nfdi4culture.de/id/>\n
"""

def run_sparql(query: str) -> pd.DataFrame:
    """Run a SPARQL query against the NFDI4Culture endpoint and return a pandas DataFrame.

    The query body should *not* include prefixes, they are automatically prepended.
    """
    sparql = SPARQLWrapper(ENDPOINT_URL)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(PREFIXES + "\n" + query)
    results = sparql.query().convert()

    rows = []
    for binding in results["results"]["bindings"]:
        row = {var: val.get("value") for var, val in binding.items()}
        rows.append(row)
    return pd.DataFrame(rows)

## 1. Inspect the CbDD portal (Corpus of Baroque Ceiling Painting in Germany)

- Portal ID from the registry: `n4c:E4264`
- Goal: See which properties connect the portal to data feeds, homepages, subjects, etc.

Run this once and scan the property list. It tells you which predicates to use in later queries.

In [None]:
query_inspect_cbdd = """\
SELECT ?p ?o
WHERE {
  n4c:E4264 ?p ?o .
}
ORDER BY ?p
LIMIT 200
"""

df_cbdd_props = run_sparql(query_inspect_cbdd)
df_cbdd_props

## 2. Find the data feeds / datasets that belong to CbDD

From the inspection above, identify the property that links the portal to its parts.
Typical options are:
- `schema:hasPart`
- `dcterms:hasPart`

The query below assumes one of these. If the portal uses a different property, adapt the `FILTER` or replace `?hasPart` by the exact predicate.

In [None]:
query_cbdd_parts = """\
SELECT ?part ?partLabel ?partType
WHERE {
  n4c:E4264 ?hasPart ?part .
  FILTER(?hasPart IN (schema:hasPart, dcterms:hasPart))

  ?part schema:name ?partLabel ;
        rdf:type ?partType .
}
ORDER BY ?partLabel
"""

df_cbdd_parts = run_sparql(query_cbdd_parts)
df_cbdd_parts

Look at `df_cbdd_parts`. Identify the URI of the **metadata feed** for CbDD (for example something like `n4c:E6077`).

Copy that feed URI into the variable below. This will be your **main entrypoint** into individual ceiling painting records.

In [None]:
# TODO: Set this to the actual CbDD feed URI you found in df_cbdd_parts
# Example placeholder: "n4c:E6077"
CBDD_FEED_URI = "n4c:E6077"  # <--- change this to the real feed ID from the previous cell

## 3. Sample ceiling-painting records from the CbDD feed

Pattern used here (adjust if inspection shows different properties):
- Records belong to a data feed: `?item schema:isPartOf CBDD_FEED_URI`
- Each record has a title / name: `schema:name`
- Optional location (`schema:spatial` → place → `schema:name`)
- Optional temporal coverage (`schema:temporalCoverage`)

You can extend this with more properties after you inspect one of the `?item` URIs.

In [None]:
query_cbdd_items_sample = f"""\
SELECT ?item ?title ?placeLabel ?time
WHERE {{
  {CBDD_FEED_URI} schema:name ?feedLabel .  # just to anchor the feed in the graph

  ?item schema:isPartOf {CBDD_FEED_URI} ;
        schema:name ?title .

  OPTIONAL {{
    ?item schema:spatial ?place .
    ?place schema:name ?placeLabel .
  }}

  OPTIONAL {{
    ?item schema:temporalCoverage ?time .
  }}
}}
LIMIT 50
"""

print(query_cbdd_items_sample)
df_cbdd_items_sample = run_sparql(query_cbdd_items_sample)
df_cbdd_items_sample.head()

### 3.1 Inspect one sample record in detail

Pick one `?item` URI from the sample and use it to see which properties exist on a single ceiling-painting record.
This helps you discover things like:
- external identifiers (GND, ICONCLASS)
- links to images
- more detailed location or building information

Adapt the `SAMPLE_ITEM_URI` below accordingly.

In [None]:
# TODO: Replace this with one item URI copied from df_cbdd_items_sample["item"]
SAMPLE_ITEM_URI = "<PASTE_ONE_ITEM_URI_HERE>"  # for example: "https://nfdi4culture.de/id/E7xxx"

if SAMPLE_ITEM_URI != "<PASTE_ONE_ITEM_URI_HERE>":
    query_inspect_item = f"""\
    SELECT ?p ?o
    WHERE {{
      {SAMPLE_ITEM_URI} ?p ?o .
    }}
    ORDER BY ?p
    LIMIT 200
    """

    df_item_props = run_sparql(query_inspect_item)
    df_item_props
else:
    print("Set SAMPLE_ITEM_URI to a real item URI first.")

## 4. Compare CbDD and Color Slide Archive of Wall and Ceiling Painting

Portal IDs from the registry:
- CbDD: `n4c:E4264`
- Color Slide Archive: `n4c:E4267`

Goal: Count how many records in the KG come from each of these portals.

We assume a pattern similar to:
- `?item schema:isPartOf ?feed`
- `?feed schema:isPartOf ?portal` or `?feed dcterms:isPartOf ?portal`

You may have to adjust the property in the middle depending on what you see in the inspection of the feed nodes.

In [None]:
query_ceiling_portal_counts = """\
SELECT ?portal ?portalLabel (COUNT(DISTINCT ?item) AS ?records)
WHERE {
  VALUES ?portal { n4c:E4264  n4c:E4267 }

  # feed belongs to one of the two portals
  ?feed ?isPartOfPortal ?portal .
  FILTER(?isPartOfPortal IN (schema:isPartOf, dcterms:isPartOf))

  # items belong to that feed
  ?item schema:isPartOf ?feed .

  ?portal schema:name ?portalLabel .
}
GROUP BY ?portal ?portalLabel
ORDER BY DESC(?records)
"""

df_ceiling_portal_counts = run_sparql(query_ceiling_portal_counts)
df_ceiling_portal_counts

In [None]:
# Simple bar chart of records per portal (CbDD vs Color Slide Archive)
if not df_ceiling_portal_counts.empty:
    plt.figure(figsize=(6, 4))
    plt.bar(df_ceiling_portal_counts["portalLabel"], df_ceiling_portal_counts["records"].astype(int))
    plt.xticks(rotation=20, ha="right")
    plt.ylabel("Number of records in KG")
    plt.title("Records from baroque wall & ceiling painting portals")
    plt.tight_layout()
    plt.show()
else:
    print("No results yet. Check if the intermediate predicate (?isPartOfPortal) is correct.")

## 5. Next steps for your data story

Ideas for how you can extend this notebook:

1. **Map of painting locations**  
   - From `df_cbdd_items_sample` or a larger query, extract `placeLabel` and, if available, coordinates.  
   - Use a mapping library (e.g. `folium`) to display points on a map.

2. **Timeline of creation dates**  
   - Inspect which property holds precise dates or centuries (e.g. `schema:temporalCoverage`, other date fields).  
   - Parse years to integers, bucket by decade or century, plot as a bar chart.

3. **Motif / subject comparison between portals**  
   - If items use `schema:about` or `dcterms:subject` with concepts (ICONCLASS, GND), count their frequency per portal.  
   - Visualise top motifs for CbDD vs Color Slide Archive in a grouped bar chart.

4. **Linked Data demonstration**  
   - Use the item inspection to find external identifiers (e.g. Wikidata, GND).  
   - Show a small RDF snippet or perform a federated query as part of your story.

You can keep all experiments you do here and later turn the most interesting figures and tables into your final data story.