# Exploring Baroque Ceiling Painting Data in the NFDI4Culture Knowledge Graph

This notebook is a starting point for a data story about baroque art and ceiling paintings using the NFDI4Culture Knowledge Graph.

Focus:
- Work with **data portals** (especially CbDD and the Color Slide Archive of Wall and Ceiling Painting)
- Use **SPARQL** to query the KG
- Prepare results for visualisation (maps, timelines, comparisons)

You can adapt the queries step by step as you learn more about the concrete RDF schema of the datasets.

In [93]:
# Install dependencies (run once per environment)
!pip install SPARQLWrapper pandas matplotlib --quiet


[notice] A new release of pip is available: 25.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [94]:
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option("display.max_rows", 50)
pd.set_option("display.max_columns", 20)
pd.set_option("display.width", 120)

# NFDI4Culture SPARQL endpoint
ENDPOINT_URL = "https://nfdi4culture.de/sparql"

# Prefixes used in queries
# NOTE: The KG uses http://schema.org/ (not https://)
PREFIXES = """\
PREFIX fabio: <http://purl.org/spar/fabio/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX schema:  <http://schema.org/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX n4c:     <https://nfdi4culture.de/id/>
"""

def run_sparql(query: str) -> pd.DataFrame:
    """Run a SPARQL query against the NFDI4Culture endpoint and return a pandas DataFrame.

    The query body should *not* include prefixes, they are automatically prepended.
    This version accesses the JSON result safely to avoid indexing errors in static type checkers.
    """
    sparql = SPARQLWrapper(ENDPOINT_URL)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(PREFIXES + "\n" + query)
    results = sparql.query().convert()

    # Be defensive: ensure results is a dict and extract bindings safely
    if not isinstance(results, dict):
        return pd.DataFrame()

    bindings = results.get("results", {}).get("bindings", [])
    rows = []
    for binding in bindings:
        # each binding is a dict of variable -> { "type": ..., "value": ... }
        row = {var: val.get("value") for var, val in binding.items()}
        rows.append(row)
    return pd.DataFrame(rows)

## 1. Inspect the CbDD portal (Corpus of Baroque Ceiling Painting in Germany)

- Portal ID from the registry: `n4c:E4264`
- Goal: See which properties connect the portal to data feeds, homepages, subjects, etc.

Run this once and scan the property list. It tells you which predicates to use in later queries.

In [95]:
query_inspect_cbdd = """\
SELECT ?p ?o
WHERE {
  n4c:E4264 ?p ?o .
}
ORDER BY ?p
LIMIT 200
"""

df_cbdd_props = run_sparql(query_inspect_cbdd)
df_cbdd_props

Unnamed: 0,p,o
0,http://schema.org/contributor,nodeID://b695522
1,http://schema.org/contributor,nodeID://b697392
2,http://schema.org/contributor,nodeID://b698091
3,http://schema.org/contributor,nodeID://b698856
4,http://schema.org/description,\n The Corpus of Baroque Ceiling Painting i...
5,http://schema.org/hasPart,https://nfdi4culture.de/id/E6077
6,http://schema.org/image,https://nfdi4culture.de//fileadmin/user_upload...
7,http://schema.org/keywords,https://nfdi4culture.de/id/E3953
8,http://schema.org/keywords,https://nfdi4culture.de/id/E3959
9,http://schema.org/keywords,https://nfdi4culture.de/id/E3968


## 2. Find the data feeds / datasets that belong to CbDD

From the inspection above, identify the property that links the portal to its parts.
Typical options are:
- `schema:hasPart`
- `dcterms:hasPart`

The query below assumes one of these. If the portal uses a different property, adapt the `FILTER` or replace `?hasPart` by the exact predicate.

In [96]:
# Try multiple approaches to find parts/feeds of the CbDD portal

# Approach 1: Portal has parts (portal -> part)
query_cbdd_parts_v1 = """
SELECT ?part ?partLabel ?partType ?predicate
WHERE {
  n4c:E4264 ?predicate ?part .
  FILTER(?predicate IN (schema:hasPart, dcterms:hasPart, dcat:dataset, dcat:distribution))

  OPTIONAL { ?part schema:name ?partLabel . }
  OPTIONAL { ?part rdf:type ?partType . }
}
ORDER BY ?partLabel
LIMIT 50
"""

df_cbdd_parts_v1 = run_sparql(query_cbdd_parts_v1)
print("Approach 1 - Portal hasPart/dataset:")
print(df_cbdd_parts_v1)
print("\n" + "="*60 + "\n")

# Approach 2: Parts point to portal (part -> portal via isPartOf)
query_cbdd_parts_v2 = """
SELECT ?part ?partLabel ?partType ?predicate
WHERE {
  ?part ?predicate n4c:E4264 .
  FILTER(?predicate IN (schema:isPartOf, dcterms:isPartOf, dcat:inCatalog))

  OPTIONAL { ?part schema:name ?partLabel . }
  OPTIONAL { ?part rdf:type ?partType . }
}
ORDER BY ?partLabel
LIMIT 50
"""

df_cbdd_parts_v2 = run_sparql(query_cbdd_parts_v2)
print("Approach 2 - Part isPartOf portal:")
print(df_cbdd_parts_v2)
print("\n" + "="*60 + "\n")

# Approach 3: Check all outgoing predicates from the portal to find the right one
query_cbdd_all_out = """
SELECT ?predicate (COUNT(?object) AS ?count) (SAMPLE(?object) AS ?sampleObject)
WHERE {
  n4c:E4264 ?predicate ?object .
}
GROUP BY ?predicate
ORDER BY DESC(?count)
LIMIT 30
"""

df_cbdd_predicates = run_sparql(query_cbdd_all_out)
print("All outgoing predicates from CbDD portal:")
print(df_cbdd_predicates)

# Use whichever approach returned results
df_cbdd_parts = df_cbdd_parts_v1 if not df_cbdd_parts_v1.empty else df_cbdd_parts_v2

Approach 1 - Portal hasPart/dataset:
                               part                                          partLabel  \
0  https://nfdi4culture.de/id/E6077  Metadata from the Corpus of Baroque Ceiling Pa...   
1  https://nfdi4culture.de/id/E6077  Metadata from the Corpus of Baroque Ceiling Pa...   
2  https://nfdi4culture.de/id/E6077  Metadata from the Corpus of Baroque Ceiling Pa...   
3  https://nfdi4culture.de/id/E6077  Metadata from the Corpus of Baroque Ceiling Pa...   

                                         partType                  predicate  
0              http://purl.org/spar/fabio/Dataset  http://schema.org/hasPart  
1  https://nfdi.fiz-karlsruhe.de/ontology/Dataset  http://schema.org/hasPart  
2                      http://schema.org/DataFeed  http://schema.org/hasPart  
3      http://www.w3.org/ns/hydra/core#Collection  http://schema.org/hasPart  


Approach 2 - Part isPartOf portal:
Empty DataFrame
Columns: []
Index: []


All outgoing predicates from CbDD portal

In [97]:
# Approach 4: Check all incoming predicates to the portal (things that reference n4c:E4264)
query_cbdd_all_in = """
SELECT ?predicate (COUNT(?subject) AS ?count) (SAMPLE(?subject) AS ?sampleSubject)
WHERE {
  ?subject ?predicate n4c:E4264 .
}
GROUP BY ?predicate
ORDER BY DESC(?count)
LIMIT 30
"""

df_cbdd_incoming = run_sparql(query_cbdd_all_in)
print("All incoming predicates TO CbDD portal (things pointing to it):")
print(df_cbdd_incoming)

# If we found incoming predicates, let's explore the subjects
if not df_cbdd_incoming.empty and 'sampleSubject' in df_cbdd_incoming.columns:
    sample_subj = df_cbdd_incoming.iloc[0]['sampleSubject']
    print(f"\nSample subject pointing to portal: {sample_subj}")
    
    # Inspect that sample subject
    query_sample_subj = f"""
    SELECT ?p ?o
    WHERE {{
      <{sample_subj}> ?p ?o .
    }}
    LIMIT 50
    """
    df_sample_subj = run_sparql(query_sample_subj)
    print("\nProperties of that sample subject:")
    print(df_sample_subj)

All incoming predicates TO CbDD portal (things pointing to it):
                                 predicate count                     sampleSubject
0              http://schema.org/subjectOf    13  https://nfdi4culture.de/id/E2312
1  http://schema.org/includedInDataCatalog     1  https://nfdi4culture.de/id/E6077

Sample subject pointing to portal: https://nfdi4culture.de/id/E2312

Properties of that sample subject:
                                                  p                                                  o
0   http://www.w3.org/1999/02/22-rdf-syntax-ns#type                      http://schema.org/DefinedTerm
1   http://www.w3.org/1999/02/22-rdf-syntax-ns#type  https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...
2        http://www.w3.org/2000/01/rdf-schema#label                                       Architecture
3      http://www.w3.org/2000/01/rdf-schema#seeAlso        https://nfdi4culture.de/resource/E2312.json
4      http://www.w3.org/2000/01/rdf-schema#seeAlso          https:

In [98]:
CBDD_FEED_URI = "n4c:E6077"  

In [99]:
# First, let's inspect the E6077 feed to understand its structure
query_inspect_feed = """
SELECT ?p ?o
WHERE {
  n4c:E6077 ?p ?o .
}
ORDER BY ?p
LIMIT 100
"""

df_feed_props = run_sparql(query_inspect_feed)
print("Properties of the CbDD feed (E6077):")
print(df_feed_props)
print("\n" + "="*60 + "\n")

# Also check what points TO the feed (incoming relations)
query_feed_incoming = """
SELECT ?predicate (COUNT(?s) AS ?count) (SAMPLE(?s) AS ?sampleSubject)
WHERE {
  ?s ?predicate n4c:E6077 .
}
GROUP BY ?predicate
ORDER BY DESC(?count)
LIMIT 20
"""

df_feed_incoming = run_sparql(query_feed_incoming)
print("Incoming predicates to the feed (what points to E6077):")
print(df_feed_incoming)

Properties of the CbDD feed (E6077):
                                    p                                                  o
0       http://schema.org/contributor                                   nodeID://b695439
1       http://schema.org/contributor                                   nodeID://b697107
2       http://schema.org/contributor                                   nodeID://b697383
3       http://schema.org/contributor                                   nodeID://b698304
4   http://schema.org/dataFeedElement  https://nfdi4culture.de/id/ark:/60538/E6077_00...
..                                ...                                                ...
95  http://schema.org/dataFeedElement  https://nfdi4culture.de/id/ark:/60538/E6077_0a...
96  http://schema.org/dataFeedElement  https://nfdi4culture.de/id/ark:/60538/E6077_0a...
97  http://schema.org/dataFeedElement  https://nfdi4culture.de/id/ark:/60538/E6077_0b...
98  http://schema.org/dataFeedElement  https://nfdi4culture.de/id/ark:/60

## 3. Sample ceiling-painting records from the CbDD feed

Pattern used here (adjust if inspection shows different properties):
- Records belong to a data feed: `?item schema:isPartOf CBDD_FEED_URI`
- Each record has a title / name: `schema:name`
- Optional location (`schema:spatial` → place → `schema:name`)
- Optional temporal coverage (`schema:temporalCoverage`)

You can extend this with more properties after you inspect one of the `?item` URIs.

In [100]:
# The feed uses schema:dataFeedElement -> DataFeedItem -> schema:item -> actual painting
# Let's first understand the structure of the DataFeedItems

query_item_predicates = """
SELECT ?p (COUNT(?o) AS ?count) (SAMPLE(?o) AS ?sampleValue)
WHERE {
  n4c:E6077 schema:dataFeedElement ?item .
  ?item ?p ?o .
}
GROUP BY ?p
ORDER BY DESC(?count)
LIMIT 30
"""

df_item_preds = run_sparql(query_item_predicates)
print("Predicates on DataFeedItem objects:")
print(df_item_preds)
print("\n" + "="*60 + "\n")

# Now let's explore the actual paintings (via schema:item)
query_painting_predicates = f"""
SELECT ?p (COUNT(?o) AS ?count) (SAMPLE(?o) AS ?sampleValue)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting ?p ?o .
}}
GROUP BY ?p
ORDER BY DESC(?count)
LIMIT 50
"""

df_painting_preds = run_sparql(query_painting_predicates)
print("All predicates used by actual paintings (via schema:item):")
df_painting_preds

Predicates on DataFeedItem objects:
                                                 p count                                        sampleValue
0                   http://schema.org/dateModified  6228                                         2025-09-08
1                           http://schema.org/item  6228  https://www.deckenmalerei.eu/f128e020-2dc2-4cf...
2  http://www.w3.org/1999/02/22-rdf-syntax-ns#type  6228                     http://schema.org/DataFeedItem
3                    http://schema.org/dateCreated  6228                                         2024-11-16


All predicates used by actual paintings (via schema:item):
All predicates used by actual paintings (via schema:item):


Unnamed: 0,p,count,sampleValue
0,https://nfdi4culture.de/ontology/CTO_0001026,23359,http://vocab.getty.edu/aat/300411453
1,https://nfdi4culture.de/ontology/CTO_0001009,6672,nodeID://b2644062
2,https://nfdi4culture.de/ontology/CTO_0001025,6230,nodeID://b2652613
3,http://www.w3.org/2000/01/rdf-schema#label,6228,Fassadenmalerei: allegorische Darstellung
4,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6228,https://nfdi4culture.de/id/E6404
5,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6228,https://nfdi4culture.de/id/E2430
6,https://nfdi4culture.de/ontology/CTO_0001049,6228,https://nfdi4culture.de/ontology/CTO_0001047
7,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,6228,https://nfdi4culture.de/ontology/CTO_0001005
8,https://nfdi4culture.de/ontology/CTO_0001006,6228,https://nfdi4culture.de/id/E6077
9,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6228,https://www.deckenmalerei.eu/11db6ad0-c4c3-11e...


In [101]:
# Let's get sample actual painting records with their key properties
# Based on the predicates discovered above:
# - rdfs:label = title/name
# - CTO_0001073 = year (5527 records have this)
# - schema:latitude/longitude = coordinates (1244 have geo)
# - CTO_0001026 = ICONCLASS subjects (23359 - multiple per painting)
# - schema:associatedMedia = images (4596)

query_sample_paintings = f"""
SELECT ?painting ?label ?year ?lat ?lon (SAMPLE(?iconclass) AS ?subject) (SAMPLE(?image) AS ?imageNode)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  
  # Title/label
  ?painting rdfs:label ?label .
  
  # Optional: Year
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001073> ?year . }}
  
  # Optional: Coordinates
  OPTIONAL {{
    ?painting schema:latitude ?lat .
    ?painting schema:longitude ?lon .
  }}
  
  # Optional: ICONCLASS subject
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass . }}
  
  # Optional: Associated media
  OPTIONAL {{ ?painting schema:associatedMedia ?image . }}
}}
GROUP BY ?painting ?label ?year ?lat ?lon
LIMIT 20
"""

df_sample_paintings = run_sparql(query_sample_paintings)
print(f"Sample of {len(df_sample_paintings)} ceiling paintings:")
df_sample_paintings

Sample of 20 ceiling paintings:


Unnamed: 0,painting,label,year,subject,imageNode,lat,lon
0,https://www.deckenmalerei.eu/ecfe811e-746b-4dc...,Rudolf II.,1727-31,http://vocab.getty.edu/aat/300411453,nodeID://b2641558,,
1,https://www.deckenmalerei.eu/5343ea25-c82e-4ec...,"Eichenzell, Pavillon im Schlosspark",zwischen 1739 und 1754,http://vocab.getty.edu/aat/300004792,nodeID://b2652847,50.50727228547439,9.71327615765255
2,https://www.deckenmalerei.eu/5d28b720-c8c6-11e...,Pax und Flussgott Neckar,1727,https://iconclass.org/25H213,nodeID://b2643447,,
3,https://www.deckenmalerei.eu/188eef43-6dc9-491...,"Rottenbuch, ehem. Klosterkirche des Augustiner...","1455, 1735-1745",http://vocab.getty.edu/aat/300004792,nodeID://b2640755,47.73581280000001,10.9659338
4,https://www.deckenmalerei.eu/80b8b181-6c05-458...,Adam,1587,https://iconclass.org/11I62%28ADAM%29,nodeID://b2648169,,
5,https://www.deckenmalerei.eu/3b07b010-f7e7-4c9...,"Dresden, Palais Hoym-Riesch","1739–42, nach 1760, nach 1766, 1777",http://vocab.getty.edu/aat/300004792,nodeID://b2642009,51.05065839629036,13.742313956795062
6,https://www.deckenmalerei.eu/b972d7b7-d022-4ce...,"Nordstrand, Hof Jacobsen, Neuer Koog",1705,http://vocab.getty.edu/aat/300004792,nodeID://b2648351,54.473943,8.87891
7,https://www.deckenmalerei.eu/056eccdc-183a-474...,"Bayrischzell, Pfarrkirche St. Margareta","1476, 1733",http://vocab.getty.edu/aat/300004792,nodeID://b2639378,47.67294822472043,12.014917231650085
8,https://www.deckenmalerei.eu/dc156e0a-aba2-4aa...,Puttengruppe mit Blütenkränzen,,http://vocab.getty.edu/aat/300411453,,,
9,https://www.deckenmalerei.eu/f8a03231-5c2c-4a5...,„A.MALO.MALOUM.EX.PALO.SALVS“,1682,http://vocab.getty.edu/aat/300411453,nodeID://b2653501,,


### Data Summary

Based on the inspection above, here's what the CbDD dataset contains:

**Total: 6,228 ceiling paintings**

| Property | Count | Description | Example |
|----------|-------|-------------|----------|
| `rdfs:label` | 6,228 | Title/name of the painting | "Minerva, Apoll und die Musen" |
| `CTO_0001073` | 5,527 | Year/date of creation | "1720", "um 1730", "1720-1730" |
| `CTO_0001026` | 23,359 | ICONCLASS/AAT subject codes (avg. 3.7 per painting) | `iconclass.org/92D1916` |
| `schema:associatedMedia` | 4,570 | Links to images (as ImageObject) | — |
| `schema:latitude/longitude` | 1,244 | Geographic coordinates | 48.57, 13.46 |
| `CTO_0001009` | 6,672 | Related buildings/locations (links to GND) | `gnd/118636960` |
| `CTO_0001019` | 5,363 | Part-of relationships (painting hierarchies) | — |
| `NFDI_000...` | 439 | GND identifiers for the painting itself | `gnd/7678538-5` |

**Subject Classification:**
- Uses both **ICONCLASS** (iconographic classification) and **Getty AAT** (Art & Architecture Thesaurus)
- 4,831 unique subject codes across all paintings
- Most common: architectural elements (AAT), mythological scenes (ICONCLASS)

**Image Data:**
- Images are `schema:ImageObject` with:
  - `CTO_0001021`: Image URL (hosted at `deckenmalerei-bilder.badw.de`)
  - `CTO_0001007`: License (mostly CC BY 4.0)

**Linked Data Connections:**
- Paintings → GND (German National Library authority files)
- Paintings → ICONCLASS (iconographic subjects)
- Paintings → Getty AAT (art vocabulary)
- Paintings → Buildings via location relationships

In [102]:
# Get overall statistics about the CbDD dataset

# Count total paintings
query_total_count = f"""
SELECT (COUNT(DISTINCT ?painting) AS ?totalPaintings)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
}}
"""
df_total = run_sparql(query_total_count)
print(f"Total paintings in CbDD: {df_total['totalPaintings'].iloc[0]}")

# Count paintings with coordinates
query_geo_count = f"""
SELECT (COUNT(DISTINCT ?painting) AS ?withGeo)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting schema:latitude ?lat .
  ?painting schema:longitude ?lon .
}}
"""
df_geo = run_sparql(query_geo_count)
print(f"Paintings with coordinates: {df_geo['withGeo'].iloc[0]}")

# Count paintings with images
query_image_count = f"""
SELECT (COUNT(DISTINCT ?painting) AS ?withImages)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting schema:associatedMedia ?img .
}}
"""
df_img = run_sparql(query_image_count)
print(f"Paintings with images: {df_img['withImages'].iloc[0]}")

# Count unique ICONCLASS subjects
query_iconclass_count = f"""
SELECT (COUNT(DISTINCT ?iconclass) AS ?uniqueSubjects)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass .
}}
"""
df_iconclass = run_sparql(query_iconclass_count)
print(f"Unique ICONCLASS subjects: {df_iconclass['uniqueSubjects'].iloc[0]}")

Total paintings in CbDD: 6228
Paintings with coordinates: 1244
Paintings with images: 4570Paintings with images: 4570
Unique ICONCLASS subjects: 4831

Unique ICONCLASS subjects: 4831


In [103]:
# Explore the most common ICONCLASS subjects
query_top_subjects = f"""
SELECT ?iconclass (COUNT(?painting) AS ?count)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass .
}}
GROUP BY ?iconclass
ORDER BY DESC(?count)
LIMIT 20
"""

df_top_subjects = run_sparql(query_top_subjects)
print("Top 20 ICONCLASS subjects (motifs/themes):")
df_top_subjects

Top 20 ICONCLASS subjects (motifs/themes):


Unnamed: 0,iconclass,count
0,http://vocab.getty.edu/aat/300411453,4984
1,http://vocab.getty.edu/aat/300004792,1244
2,https://iconclass.org/92D1916,463
3,https://iconclass.org/26A,219
4,https://iconclass.org/25G4111,104
5,https://iconclass.org/25G3,99
6,https://iconclass.org/25G41,94
7,https://iconclass.org/48A9872,74
8,https://iconclass.org/45C22,71
9,https://iconclass.org/48C161,70


In [104]:
# Explore the image data structure (schema:associatedMedia)
# The images are blank nodes, let's see what properties they have

query_image_props = f"""
SELECT ?p (COUNT(?o) AS ?count) (SAMPLE(?o) AS ?sampleValue)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting schema:associatedMedia ?image .
  ?image ?p ?o .
}}
GROUP BY ?p
ORDER BY DESC(?count)
LIMIT 20
"""

df_image_props = run_sparql(query_image_props)
print("Properties of image objects (schema:associatedMedia):")
df_image_props

Properties of image objects (schema:associatedMedia):


Unnamed: 0,p,count,sampleValue
0,https://nfdi4culture.de/ontology/CTO_0001021,4596,https://deckenmalerei-bilder.badw.de/eas/parti...
1,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,4596,http://schema.org/ImageObject
2,https://nfdi4culture.de/ontology/CTO_0001007,4511,CC BY 4.0


In [105]:
# Get sample paintings with their image URLs
# Images use CTO_0001021 for the URL (not schema:contentUrl)
query_paintings_with_images = f"""
SELECT ?painting ?label ?imageUrl ?license
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting rdfs:label ?label .
  ?painting schema:associatedMedia ?image .
  ?image <https://nfdi4culture.de/ontology/CTO_0001021> ?imageUrl .
  OPTIONAL {{ ?image <https://nfdi4culture.de/ontology/CTO_0001007> ?license . }}
}}
LIMIT 15
"""

df_paintings_images = run_sparql(query_paintings_with_images)
print(f"Sample paintings with image URLs ({len(df_paintings_images)} records):")
df_paintings_images

Sample paintings with image URLs (15 records):


Unnamed: 0,painting,label,imageUrl,license
0,https://www.deckenmalerei.eu/11db6ad0-c4c3-11e...,Fassadenmalerei: allegorische Darstellung,https://deckenmalerei-bilder.badw.de/eas/parti...,© Bildarchiv Foto Marburg / CbDD / Angelika Dr...
1,https://www.deckenmalerei.eu/1e4fad40-ce49-4d0...,Die Decke des Marmosaals,https://deckenmalerei-bilder.badw.de/eas/parti...,Rechte vorbehalten
2,https://www.deckenmalerei.eu/24712313-0bed-4dd...,Landschaft mit Hasenjagd,https://deckenmalerei-bilder.badw.de/eas/parti...,© CbDD / Bayrische Schlösserverwaltung / Jan-E...
3,https://www.deckenmalerei.eu/271d168c-74ed-4ee...,Die vier Nebenbilder an der Decke des Roten Saals,https://previous.bildindex.de/bilder/fmd100297...,"Rechte vorbehalten | Originator: Scheidt, Thom..."
4,https://www.deckenmalerei.eu/284af00a-294b-4cc...,"Passau, Große Messergasse 6",https://previous.bildindex.de/bilder/fmd100448...,"CC BY-NC-ND 4.0 | Originator: Dietel, Theresa ..."
5,https://www.deckenmalerei.eu/2fff5a82-d436-48a...,Eckbilder: antikisierende Henkelvasen in Rahmu...,https://previous.bildindex.de/bilder/zi0300_00...,Rechte vorbehalten | Rights holder: Deutsches ...
6,https://www.deckenmalerei.eu/3d6fbf7c-df73-411...,"Römhild, Schloss Glücksburg",https://previous.bildindex.de/bilder/fmc445016...,"Rechte vorbehalten | Originator: Hildebrand, G..."
7,https://www.deckenmalerei.eu/4aadb214-c123-49b...,Der Kamin im Raum westlich des Saals,https://previous.bildindex.de/bilder/fmd100254...,"CC BY-NC-ND 4.0 | Originator: Lechtape, Andrea..."
8,https://www.deckenmalerei.eu/4f31c621-2435-4bf...,Decke aus Süddithmarschen aus der Nähe von Dingen,https://previous.bildindex.de/bilder/fmd100307...,"Rechte vorbehalten | Originator: Lechtape, And..."
9,https://www.deckenmalerei.eu/5014ae03-9edf-42f...,Akanthusornament,https://deckenmalerei-bilder.badw.de/eas/parti...,"Rechte vorbehalten | Rights holder: Dreyer, An..."


In [106]:
# Explore temporal distribution - get year values
query_years = f"""
SELECT ?year (COUNT(?painting) AS ?count)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting <https://nfdi4culture.de/ontology/CTO_0001073> ?year .
}}
GROUP BY ?year
ORDER BY DESC(?count)
LIMIT 30
"""

df_years = run_sparql(query_years)
print("Most common date values (note: various formats like '1720', 'um 1730', '1720-1730'):")
df_years

Most common date values (note: various formats like '1720', 'um 1730', '1720-1730'):


Unnamed: 0,year,count
0,1542,169
1,1543,105
2,um 1542,86
3,1751,65
4,1656/57,65
5,um 1678,59
6,1703–1705,56
7,um 1732–1742,56
8,um 1750,55
9,1682,54


In [107]:
# Explore location/building relationships (CTO_0001009)
# This appears to link paintings to buildings

query_building_props = f"""
SELECT ?p (COUNT(?o) AS ?count) (SAMPLE(?o) AS ?sampleValue)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting <https://nfdi4culture.de/ontology/CTO_0001009> ?building .
  ?building ?p ?o .
}}
GROUP BY ?p
ORDER BY DESC(?count)
LIMIT 20
"""

df_building_props = run_sparql(query_building_props)
print("Properties of building/location objects (CTO_0001009):")
df_building_props

Properties of building/location objects (CTO_0001009):


Unnamed: 0,p,count,sampleValue
0,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6672,https://d-nb.info/gnd/118636960


## 3.2 Detailed Analysis of Individual Paintings

Let's fetch a list of paintings with complete metadata and display their images.

In [108]:
# Fetch detailed painting records with all key properties
query_detailed_paintings = f"""
SELECT DISTINCT ?painting ?label ?year ?lat ?lon ?imageUrl ?license
       (GROUP_CONCAT(DISTINCT ?iconclass; separator=", ") AS ?subjects)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  
  # Required: Title and image
  ?painting rdfs:label ?label .
  ?painting schema:associatedMedia ?image .
  ?image <https://nfdi4culture.de/ontology/CTO_0001021> ?imageUrl .
  
  # Optional properties
  OPTIONAL {{ ?image <https://nfdi4culture.de/ontology/CTO_0001007> ?license . }}
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001073> ?year . }}
  OPTIONAL {{
    ?painting schema:latitude ?lat .
    ?painting schema:longitude ?lon .
  }}
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass . }}
}}
GROUP BY ?painting ?label ?year ?lat ?lon ?imageUrl ?license
LIMIT 10
"""

df_detailed = run_sparql(query_detailed_paintings)
print(f"Fetched {len(df_detailed)} paintings with images:")
df_detailed

Fetched 10 paintings with images:


Unnamed: 0,painting,label,year,imageUrl,license,subjects,lat,lon
0,https://www.deckenmalerei.eu/fdb4a025-5a0c-43e...,Divina Sapienza,1729,https://previous.bildindex.de/bilder/fmd100243...,"CC BY-NC-ND 4.0 | Originator: Bunz, Achim | Ri...","http://vocab.getty.edu/aat/300411453, https://...",,
1,https://www.deckenmalerei.eu/8eb7d66e-f2da-47b...,Landschaft mit einer Frau in morgenländischer ...,"1750-1751, 1755-1756",https://previous.bildindex.de/bilder/fmc429630...,"Rechte vorbehalten | Originator: Fenchel, Hors...","http://vocab.getty.edu/aat/300411453, https://...",,
2,https://www.deckenmalerei.eu/4e286dd0-c4c3-11e...,"Winsen, Schloss","1299, Umbau 1592/1617",https://previous.bildindex.de/bilder/fmd495456...,"CC BY-NC-ND 4.0 | Originator: Lechtape, Andrea...",http://vocab.getty.edu/aat/300004792,53.363133773308064,10.204448085904405
3,https://www.deckenmalerei.eu/28abc1d7-5753-45e...,Krösus auf dem Scheiterhaufen,1542,https://deckenmalerei-bilder.badw.de/eas/parti...,© CbDD / Bayrische Schlösserverwaltung / Jan-E...,"http://vocab.getty.edu/aat/300411453, https://...",,
4,https://www.deckenmalerei.eu/5537c66f-6824-4d9...,Drei Putten mit Hopfen und Weizen,1726,https://previous.bildindex.de/bilder/fmd100263...,"CC BY-NC-ND 4.0 | Originator: Bunz, Achim | Ri...","http://vocab.getty.edu/aat/300411453, https://...",,
5,https://www.deckenmalerei.eu/786ca013-aced-45d...,Hl. Petrus,,https://previous.bildindex.de/bilder/fmd100013...,"CC BY-NC-ND 4.0 | Originator: Gaasch, Uwe | Ri...",http://vocab.getty.edu/aat/300411453,,
6,https://www.deckenmalerei.eu/81ca1600-d70b-4a3...,Das Deckengemälde des Audienz- und Speisezimmers,1747,https://previous.bildindex.de/bilder/fmd100049...,"Rechte vorbehalten | Originator: Lechtape, And...","http://vocab.getty.edu/aat/300411453, https://...",,
7,https://www.deckenmalerei.eu/16b45b16-f26c-486...,Die Ehrenstücke des Hochmeisterwappens in sepi...,1758-1759,https://previous.bildindex.de/bilder/fmd100264...,"CC BY-NC-ND 4.0 | Originator: Bunz, Achim | Ri...","http://vocab.getty.edu/aat/300411453, https://...",,
8,https://www.deckenmalerei.eu/44e82c93-5127-44a...,Pomona,1760,https://previous.bildindex.de/bilder/fmlac8851...,"CC BY-NC-ND 4.0 | Originator: Lechtape, Andrea...","http://vocab.getty.edu/aat/300411453, https://...",,
9,https://www.deckenmalerei.eu/0457756e-5b10-468...,Drache 2 (Körper),1543,https://deckenmalerei-bilder.badw.de/eas/parti...,© CbDD / Bayrische Schlösserverwaltung / Jan-E...,http://vocab.getty.edu/aat/300411453,,


## 3.2 GND Resolution for People and Places

The CbDD dataset links paintings to people (painters, commissioners) and places via GND (Gemeinsame Normdatei) identifiers. We can resolve these to human-readable names using the lobid.org API.

In [109]:
# GND Resolution using lobid.org API
import requests
from functools import lru_cache

@lru_cache(maxsize=1000)
def resolve_gnd_uri(gnd_uri: str) -> dict:
    """
    Resolve a GND URI to its preferred name using lobid.org API.
    
    Args:
        gnd_uri: A GND URI like 'https://d-nb.info/gnd/118636960'
        
    Returns:
        dict with 'name', 'type', 'uri', 'resolved' keys
    """
    result = {'uri': gnd_uri, 'name': None, 'type': None, 'resolved': False}
    
    if not gnd_uri or not isinstance(gnd_uri, str):
        return result
    
    try:
        # Extract GND ID from various URI formats
        gnd_id = gnd_uri.split('/')[-1].strip()
        
        # GND IDs can start with digits or X, and may contain hyphens
        if not gnd_id or len(gnd_id) < 3:
            return result
        
        # Query lobid.org API
        response = requests.get(
            f'https://lobid.org/gnd/{gnd_id}.json',
            headers={'Accept': 'application/json'},
            timeout=10
        )
        
        if response.ok:
            data = response.json()
            result['name'] = data.get('preferredName')
            type_val = data.get('type', [])
            if isinstance(type_val, list) and type_val:
                result['type'] = type_val[0]
            elif isinstance(type_val, str):
                result['type'] = type_val
            else:
                result['type'] = 'Unknown'
            result['resolved'] = result['name'] is not None
            
    except Exception as e:
        pass
    
    return result


def resolve_gnd_batch(gnd_uris: list) -> dict:
    """
    Resolve multiple GND URIs to names.
    
    Args:
        gnd_uris: List of GND URIs
        
    Returns:
        dict mapping URI -> resolved name (or '[GND ID]' if not resolved)
    """
    results = {}
    for uri in gnd_uris:
        if uri:
            resolved = resolve_gnd_uri(uri)
            results[uri] = resolved['name'] if resolved['resolved'] else f"[{uri.split('/')[-1]}]"
    return results


# Test GND resolution
print("Testing GND resolution via lobid.org...")
print("="*60)

test_gnds = [
    "https://d-nb.info/gnd/118636960",  # Johann Baptist Zimmermann (painter)
    "https://d-nb.info/gnd/118579371",  # Max Emanuel (commissioner)
]

for gnd_uri in test_gnds:
    result = resolve_gnd_uri(gnd_uri)
    status = "✓" if result['resolved'] else "✗"
    print(f"{status} {result['name'] or 'Not found'}")
    print(f"   Type: {result['type']}")
    print(f"   URI: {gnd_uri}")
    print()

print("\n✅ GND resolution functions defined:")
print("   - resolve_gnd_uri(gnd_uri)")
print("   - resolve_gnd_batch(gnd_uris)")

Testing GND resolution via lobid.org...
✓ Zimmermann, Johann Baptist
   Type: Person
   URI: https://d-nb.info/gnd/118636960

✓ Maximilian I., Heiliges Römisches Reich, Kaiser
   Type: AuthorityResource
   URI: https://d-nb.info/gnd/118579371


✅ GND resolution functions defined:
   - resolve_gnd_uri(gnd_uri)
   - resolve_gnd_batch(gnd_uris)


In [None]:
# Enhanced painting query with CORRECTED CTO field interpretation:
#
# ⚠️ IMPORTANT SCHEMA CORRECTION:
# After detailed analysis, the CTO fields have different meanings than initially assumed:
#
# - CTO_0001011 = LOCATION (building/place) NOT painter! GND resolves to PlaceOrGeographicName
# - CTO_0001009 = PERSONS (painters AND commissioners mixed) - need GND profession to distinguish
# - CTO_0001010 = Auftraggeber (commissioner) - rarely used, most commissioners in CTO_0001009
# - CTO_0001019 = ist Teil von (is part of) -> parent painting/ensemble
#
# GND types found:
# - CTO_0001011 GNDs -> "BuildingOrMemorial", "PlaceOrGeographicName" 
# - CTO_0001009 GNDs -> "Person", "DifferentiatedPerson" with professions like "Maler", "Beamter"

query_enhanced_paintings = f"""
SELECT DISTINCT ?painting ?label ?year ?lat ?lon ?imageUrl ?license
       (GROUP_CONCAT(DISTINCT ?iconclass; separator="|") AS ?subjects)
       (GROUP_CONCAT(DISTINCT ?locationGND; separator="|") AS ?locationGNDs)
       (GROUP_CONCAT(DISTINCT ?personGND; separator="|") AS ?personGNDs)
       ?parentUri ?parentLabel
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  
  # Required: Title and image
  ?painting rdfs:label ?label .
  ?painting schema:associatedMedia ?image .
  ?image <https://nfdi4culture.de/ontology/CTO_0001021> ?imageUrl .
  
  # Optional properties
  OPTIONAL {{ ?image <https://nfdi4culture.de/ontology/CTO_0001007> ?license . }}
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001073> ?year . }}
  OPTIONAL {{
    ?painting schema:latitude ?lat .
    ?painting schema:longitude ?lon .
  }}
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass . }}
  
  # CTO_0001011 = LOCATION (building/place) - NOT painter!
  OPTIONAL {{
    ?painting <https://nfdi4culture.de/ontology/CTO_0001011> ?locationNode .
    ?locationNode <https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001006> ?locationGND .
  }}
  
  # CTO_0001009 = PERSONS (painters, commissioners, related people)
  # These need to be filtered by GND profession after resolution
  OPTIONAL {{
    ?painting <https://nfdi4culture.de/ontology/CTO_0001009> ?personNode .
    ?personNode <https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001006> ?personGND .
  }}
  
  # ist Teil von (is part of) - CTO_0001019 -> parent entity
  OPTIONAL {{
    ?painting <https://nfdi4culture.de/ontology/CTO_0001019> ?parentUri .
    FILTER(?parentUri != ?painting)  # Exclude self-references
    ?parentUri rdfs:label ?parentLabel .
  }}
}}
GROUP BY ?painting ?label ?year ?lat ?lon ?imageUrl ?license ?parentUri ?parentLabel
LIMIT 15
"""

df_enhanced = run_sparql(query_enhanced_paintings)

# Ensure optional columns exist (they may be missing if no data matches)
for col in ['parentLabel', 'parentUri', 'locationGNDs', 'personGNDs']:
    if col not in df_enhanced.columns:
        df_enhanced[col] = None

print(f"Fetched {len(df_enhanced)} paintings with enhanced metadata:")
print(f"  - With location data: {len(df_enhanced[df_enhanced['locationGNDs'].notna() & (df_enhanced['locationGNDs'] != '')])}")
print(f"  - With person data: {len(df_enhanced[df_enhanced['personGNDs'].notna() & (df_enhanced['personGNDs'] != '')])}")
print(f"  - With parent entity: {len(df_enhanced[df_enhanced['parentLabel'].notna()])}")

# Show available columns
display_cols = [c for c in ['label', 'year', 'locationGNDs', 'personGNDs', 'parentLabel'] if c in df_enhanced.columns]
df_enhanced[display_cols].head(10)

Fetched 15 paintings with enhanced metadata:
  - With painter data: 6
  - With commissioner data: 0
  - With parent entity: 0


Unnamed: 0,label,year,painterGNDs,commissionerGNDs,parentLabel
0,Divina Sapienza,1729,,,
1,Landschaft mit einer Frau in morgenländischer ...,"1750-1751, 1755-1756",https://d-nb.info/gnd/4105067-8,,
2,"Winsen, Schloss","1299, Umbau 1592/1617",,,
3,Krösus auf dem Scheiterhaufen,1542,https://d-nb.info/gnd/4034368-6,,
4,Drei Putten mit Hopfen und Weizen,1726,,,
5,Hl. Petrus,,,,
6,Das Deckengemälde des Audienz- und Speisezimmers,1747,https://d-nb.info/gnd/4132604-0,,
7,Die Ehrenstücke des Hochmeisterwappens in sepi...,1758-1759,,,
8,Pomona,1760,https://d-nb.info/gnd/4766902-0,,
9,Drache 2 (Körper),1543,https://d-nb.info/gnd/4034368-6,,


In [None]:
# Resolve GND URIs and classify persons by profession (painter vs commissioner)
print("Resolving GND URIs and classifying persons by profession...")
print("="*70)

# Enhanced GND resolution with profession info
@lru_cache(maxsize=1000)
def resolve_gnd_with_profession(gnd_uri: str) -> dict:
    """
    Resolve a GND URI to name AND profession using lobid.org API.
    Returns dict with 'name', 'type', 'professions', 'is_painter', 'resolved'
    """
    result = {'uri': gnd_uri, 'name': None, 'type': None, 'professions': [], 'is_painter': False, 'resolved': False}
    
    if not gnd_uri or not isinstance(gnd_uri, str):
        return result
    
    try:
        gnd_id = gnd_uri.split('/')[-1].strip()
        if not gnd_id or len(gnd_id) < 3:
            return result
        
        response = requests.get(
            f'https://lobid.org/gnd/{gnd_id}.json',
            headers={'Accept': 'application/json'},
            timeout=10
        )
        
        if response.ok:
            data = response.json()
            result['name'] = data.get('preferredName')
            result['type'] = data.get('type', ['Unknown'])[0] if data.get('type') else 'Unknown'
            
            # Extract professions
            for occ in data.get('professionOrOccupation', []):
                if isinstance(occ, dict):
                    result['professions'].append(occ.get('label', ''))
                else:
                    result['professions'].append(str(occ))
            
            # Check if painter (Maler, Malerin, Kirchenmaler, Freskenmaler, etc.)
            painter_keywords = ['maler', 'malerin', 'freskant', 'freskomaler', 'künstler']
            for prof in result['professions']:
                if any(kw in prof.lower() for kw in painter_keywords):
                    result['is_painter'] = True
                    break
            
            result['resolved'] = result['name'] is not None
            
    except Exception as e:
        pass
    
    return result

# Collect all unique GND URIs from locations and persons
all_gnds = set()

# Location GNDs
for val in df_enhanced['locationGNDs'].dropna():
    if val:
        for gnd in val.split('|'):
            gnd = gnd.strip()
            if gnd:
                all_gnds.add(gnd)

# Person GNDs
for val in df_enhanced['personGNDs'].dropna():
    if val:
        for gnd in val.split('|'):
            gnd = gnd.strip()
            if gnd:
                all_gnds.add(gnd)

print(f"Found {len(all_gnds)} unique GND URIs to resolve...\n")

# Resolve all GNDs with profession info
gnd_info = {}
for gnd in all_gnds:
    gnd_info[gnd] = resolve_gnd_with_profession(gnd)
    if gnd_info[gnd]['resolved']:
        type_str = gnd_info[gnd]['type']
        profs = ', '.join(gnd_info[gnd]['professions'][:2]) if gnd_info[gnd]['professions'] else 'N/A'
        painter_flag = "🎨" if gnd_info[gnd]['is_painter'] else ""
        print(f"  ✓ {gnd_info[gnd]['name'][:35]:35} | {type_str[:20]:20} | {profs} {painter_flag}")

# Add resolved data to dataframe
def resolve_locations(gnd_string):
    """Resolve location GNDs to names."""
    if not gnd_string or pd.isna(gnd_string):
        return None
    names = []
    for gnd in gnd_string.split('|'):
        gnd = gnd.strip()
        if gnd and gnd in gnd_info:
            info = gnd_info[gnd]
            if info['resolved'] and 'Place' in info['type'] or 'Building' in info['type']:
                names.append(info['name'])
    return ', '.join(names) if names else None

def classify_persons(gnd_string):
    """Classify person GNDs into painters and non-painters."""
    if not gnd_string or pd.isna(gnd_string):
        return {'painters': None, 'others': None}
    
    painters = []
    others = []
    
    for gnd in gnd_string.split('|'):
        gnd = gnd.strip()
        if gnd and gnd in gnd_info:
            info = gnd_info[gnd]
            if info['resolved'] and 'Person' in info['type']:
                if info['is_painter']:
                    painters.append(info['name'])
                else:
                    others.append(info['name'])
    
    return {
        'painters': ', '.join(painters) if painters else None,
        'others': ', '.join(others) if others else None
    }

# Apply to dataframe
df_enhanced['location'] = df_enhanced['locationGNDs'].apply(resolve_locations)

person_classes = df_enhanced['personGNDs'].apply(classify_persons)
df_enhanced['painters'] = person_classes.apply(lambda x: x['painters'])
df_enhanced['other_persons'] = person_classes.apply(lambda x: x['others'])

# Show results
print("\n" + "="*70)
print("RESOLVED AND CLASSIFIED DATA:")
print("="*70)

for idx, row in df_enhanced.head(8).iterrows():
    print(f"\n{row['label'][:50]}...")
    if row.get('location'):
        print(f"   🏛️ Location: {row['location']}")
    if row.get('painters'):
        print(f"   🎨 Painter(s): {row['painters']}")
    if row.get('other_persons'):
        print(f"   👤 Other persons: {row['other_persons']}")
    if row.get('parentLabel'):
        print(f"   📦 Part of: {row['parentLabel']}")

# Summary
print("\n" + "="*70)
print("📊 Classification Summary:")
painters_count = len(df_enhanced[df_enhanced['painters'].notna()])
locations_count = len(df_enhanced[df_enhanced['location'].notna()])
others_count = len(df_enhanced[df_enhanced['other_persons'].notna()])
print(f"   With painter info: {painters_count}/{len(df_enhanced)}")
print(f"   With location info: {locations_count}/{len(df_enhanced)}")
print(f"   With other persons: {others_count}/{len(df_enhanced)}")

Resolving painter and commissioner names from GND...
Found 5 unique GND URIs to resolve...


Sample resolved names:

Landschaft mit einer Frau in morgenländischer Trac...
   🎨 Painter: Heidecksburg (Rudolstadt)

Krösus auf dem Scheiterhaufen...
   🎨 Painter: Stadtresidenz Landshut (Landshut)

Das Deckengemälde des Audienz- und Speisezimmers...
   🎨 Painter: Schloss Sanssouci (Potsdam)

Pomona...
   🎨 Painter: Schloss Engers (Neuwied)

Drache 2 (Körper)...
   🎨 Painter: Stadtresidenz Landshut (Landshut)

Die Decke des westlichen Gartensaals...
   🎨 Painter: Schloss Benrath (Düsseldorf)


✅ Resolved 5 / 5 GND URIs

Sample resolved names:

Landschaft mit einer Frau in morgenländischer Trac...
   🎨 Painter: Heidecksburg (Rudolstadt)

Krösus auf dem Scheiterhaufen...
   🎨 Painter: Stadtresidenz Landshut (Landshut)

Das Deckengemälde des Audienz- und Speisezimmers...
   🎨 Painter: Schloss Sanssouci (Potsdam)

Pomona...
   🎨 Painter: Schloss Engers (Neuwied)

Drache 2 (Körper)...
   🎨 Painter

In [None]:
# Enhanced display function with CORRECTED data fields
from IPython.display import HTML, display

def display_painting_full(row, max_width=500, resolve_subjects=True):
    """
    Display a painting with complete metadata including:
    - Basic info (title, year, image)
    - Resolved subject labels (ICONCLASS/AAT)
    - Location (building/place from CTO_0001011)
    - Painter names (from CTO_0001009 persons with painter profession)
    - Other persons (commissioners, etc. from CTO_0001009)
    - Hierarchy info (part of)
    - Coordinates (original or enriched from Wikidata)
    """
    label = row.get('label', 'Unknown')
    year = row.get('year', 'Unknown date')
    image_url = row.get('imageUrl', '')
    subjects = row.get('subjects', '')
    lat = row.get('lat')
    lon = row.get('lon')
    painting_uri = row.get('painting', '')
    painters = row.get('painters', '')
    location = row.get('location', '')  # NEW: building/place
    other_persons = row.get('other_persons', '')  # NEW: non-painters (commissioners, etc.)
    parent_label = row.get('parentLabel', '')
    geo_source = row.get('geo_source', 'original')
    matched_place = row.get('matched_place', '')
    wikidata_place = row.get('wikidata_place', '')
    
    # Coordinates section
    if lat is not None and str(lat) != 'nan' and lat != '':
        if geo_source == 'wikidata':
            coord_html = f'''<p style="color: #000;">
                📍 <span style="background: #9C27B0; color: white; padding: 2px 6px; border-radius: 4px; font-size: 11px;">Wikidata</span>
                {float(lat):.4f}, {float(lon):.4f}
                <br><small style="color: #666;">Matched: <a href="{wikidata_place}" target="_blank">{matched_place}</a></small>
            </p>'''
        else:
            coord_html = f'<p style="color: #000;">📍 {lat}, {lon}</p>'
    else:
        coord_html = ''
    
    # Location (building/place) section
    if location and pd.notna(location):
        location_html = f'''<p style="color: #000;">
            <strong>🏛️ Location:</strong> {location}
        </p>'''
    else:
        location_html = ''
    
    # Painter section
    if painters and pd.notna(painters):
        painter_html = f'''<p style="color: #000;">
            <strong>🎨 Painter:</strong> {painters}
        </p>'''
    else:
        painter_html = ''
    
    # Other persons section (commissioners, patrons, etc.)
    if other_persons and pd.notna(other_persons):
        other_html = f'''<p style="color: #000;">
            <strong>👤 Related persons:</strong> {other_persons}
        </p>'''
    else:
        other_html = ''
    
    # Part-of section
    if parent_label and pd.notna(parent_label):
        parent_html = f'''<p style="color: #000;">
            <strong>📦 Part of:</strong> {parent_label}
        </p>'''
    else:
        parent_html = ''
    
    # Resolve subject labels
    subject_html_items = []
    if subjects and resolve_subjects:
        # Handle both comma and pipe separators
        separator = '|' if '|' in subjects else ','
        subject_list = [s.strip() for s in subjects.split(separator) if s.strip()]
        for uri in subject_list[:5]:  # Limit to 5 subjects
            resolved = resolve_subject_from_sparql(uri)
            badge_color = '#4CAF50' if 'iconclass' in uri.lower() else '#2196F3'
            subject_html_items.append(
                f'<span style="background: {badge_color}; color: white; padding: 2px 8px; '
                f'border-radius: 12px; font-size: 12px; margin: 2px; display: inline-block;" '
                f'title="{resolved["source"]}: {resolved["code"]}">{resolved["label"]}</span>'
            )
    subject_html = ''.join(subject_html_items) if subject_html_items else '<em>No subjects</em>'
    
    html = f"""
    <div style="border: 1px solid #ddd; padding: 15px; margin: 10px 0; border-radius: 8px; background: #fafafa;">
        <h3 style="margin-top: 0; color: #333;">{label}</h3>
        <p style="color: #000;"><strong>Date:</strong> {year}</p>
        {location_html}
        {painter_html}
        {other_html}
        {parent_html}
        <div style="margin: 10px 0;">
            <strong style="color: #000;">Subjects:</strong><br>
            <div style="margin-top: 5px;">{subject_html}</div>
        </div>
        {coord_html}
        <p><a href="{painting_uri}" target="_blank" style="color: #0066cc;">🔗 View in CbDD</a></p>
        <img src="{image_url}" style="max-width: {max_width}px; max-height: 500px; border-radius: 4px;" 
             onerror="this.onerror=null; this.src=''; this.alt='Image could not be loaded';">
    </div>
    """
    display(HTML(html))

print("✅ Full display function defined: display_painting_full(row)")
print("   Shows: title, date, location, painter, related persons, hierarchy, subjects, coordinates, image")

✅ Full display function defined: display_painting_full(row)
   Shows: title, date, painter, commissioner, hierarchy, subjects, coordinates, image


In [None]:
# Display paintings with full metadata
print("Displaying paintings with full metadata:")
print("  🏛️ Location (building/place) from CTO_0001011")
print("  🎨 Painter names classified by GND profession")
print("  👤 Other related persons (commissioners, patrons, etc.)")
print("  📦 Hierarchy info (part-of relations)")
print("  🔵 Getty AAT | 🟢 ICONCLASS subjects")
print("="*70 + "\n")

# Display top paintings that have painter info
for idx, row in df_enhanced[df_enhanced['painters'].notna()].head(5).iterrows():
    display_painting_full(row)
    time.sleep(0.2)

Displaying paintings with full metadata:
  🎨 Painter names resolved from GND
  👑 Commissioner names resolved from GND
  📦 Hierarchy info (part-of relations)
  🔵 Getty AAT | 🟢 ICONCLASS subjects



### Combining GND-resolved Metadata with Wikidata Coordinates

Now let's enrich the enhanced paintings with Wikidata coordinates where missing.

In [114]:
# Enrich enhanced paintings with Wikidata coordinates
print("Enriching enhanced paintings with Wikidata coordinates...")
print("="*70)

enriched_enhanced = []
for idx, row in df_enhanced.iterrows():
    enriched = enrich_painting_with_coordinates(row.to_dict())
    enriched_enhanced.append(enriched)
    
    # Progress indicator
    lat_val = row.get('lat')
    has_original = lat_val is not None and str(lat_val) != 'nan' and lat_val != ''
    if has_original:
        status = "✓ original"
    elif enriched.get('geo_source') == 'wikidata':
        status = f"✓ wikidata: {enriched.get('matched_place', '')[:30]}"
    else:
        status = "✗ not found"
    print(f"  {row.get('label', '')[:40]}... {status}")
    time.sleep(0.2)

df_enhanced_geo = pd.DataFrame(enriched_enhanced)

# Summary
print("\n" + "="*70)
print("\n📊 Enrichment Summary:")
geo_counts = df_enhanced_geo['geo_source'].value_counts()
print(f"   Original coordinates: {geo_counts.get('original', 0)}")
print(f"   Enriched from Wikidata: {geo_counts.get('wikidata', 0)}")
print(f"   Not found: {geo_counts.get('not_found', 0) + geo_counts.get('no_label', 0)}")
print(f"\n   Total with coordinates: {len(df_enhanced_geo[df_enhanced_geo['lat'].notna() & (df_enhanced_geo['lat'] != '')])}/{len(df_enhanced_geo)}")

Enriching enhanced paintings with Wikidata coordinates...


NameError: name 'enrich_painting_with_coordinates' is not defined

In [None]:
# Display fully enriched paintings with all metadata
print("🖼️ Baroque Ceiling Paintings - Full Metadata View")
print("="*70)
print("Data sources combined:")
print("  • NFDI4Culture KG: Basic painting info, subjects, relationships")
print("  • GND (via lobid.org): Painter and commissioner names")
print("  • ICONCLASS/Getty AAT (via SPARQL): Subject labels")
print("  • Wikidata: Missing coordinates")
print("="*70 + "\n")

# Display paintings with the most complete data
for idx, row in df_enhanced_geo.head(5).iterrows():
    display_painting_full(row)
    time.sleep(0.3)

### Data Pipeline Summary

The notebook now implements a complete data pipeline for enriching Baroque ceiling painting data:

| Step | Source | Data Retrieved |
|------|--------|----------------|
| 1. Basic Query | NFDI4Culture KG | Title, year, image, coordinates, subjects, hierarchy |
| 2. GND Resolution | lobid.org API | Location names, person names (with profession classification) |
| 3. Subject Resolution | ICONCLASS/Getty SPARQL | Human-readable subject labels |
| 4. Geo Enrichment | Wikidata SPARQL | Missing coordinates from place names |

**⚠️ CORRECTED Schema Reference:**
- `CTO_0001011` = **LOCATION** (building/place) → `NFDI_0001006` = GND URI (resolves to PlaceOrGeographicName, BuildingOrMemorial)
- `CTO_0001009` = **PERSONS** (painters, commissioners, related people) → `NFDI_0001006` = GND URI (resolves to Person with professionOrOccupation)
- `CTO_0001010` = Auftraggeber (commissioner) - rarely used in dataset
- `CTO_0001019` = ist Teil von (is part of) → parent entity
- `CTO_0001026` = ICONCLASS/AAT subject
- `CTO_0001073` = Year/date

**Painter Classification:**
Persons from `CTO_0001009` are classified as painters if their GND `professionOrOccupation` contains keywords like: "Maler", "Malerin", "Freskenmaler", "Kirchenmaler"

In [None]:
# Analyze the first painting in detail - get ALL its properties
if not df_detailed.empty:
    first_painting_uri = df_detailed.iloc[0]['painting']
    print(f"Detailed analysis of: {df_detailed.iloc[0]['label']}")
    print(f"URI: {first_painting_uri}\n")
    
    query_all_props = f"""
    SELECT ?property ?value
    WHERE {{
      <{first_painting_uri}> ?property ?value .
    }}
    ORDER BY ?property
    """
    
    df_all_props = run_sparql(query_all_props)
    
    # Group properties for better display
    print(f"This painting has {len(df_all_props)} property values:\n")
    df_all_props['property_short'] = df_all_props['property'].apply(lambda x: x.split('/')[-1] if '/' in x else x)
    
    # Show grouped summary
    prop_counts = df_all_props['property_short'].value_counts()
    print("Property summary:")
    for prop, count in prop_counts.items():
        sample_val = df_all_props[df_all_props['property_short'] == prop]['value'].iloc[0]
        # Truncate long values
        sample_val = str(sample_val)[:60] + '...' if len(str(sample_val)) > 60 else sample_val
        print(f"  • {prop}: {count} value(s) - e.g., {sample_val}")

Detailed analysis of: Divina Sapienza
URI: https://www.deckenmalerei.eu/fdb4a025-5a0c-43e3-b9c3-4ba6057f7017

This painting has 24 property values:

Property summary:
  • CTO_0001026: 10 value(s) - e.g., https://iconclass.org/5%281BENEVOLENCE%29
  • CTO_0001009: 2 value(s) - e.g., nodeID://b2646889
  • BFO_0000050: 1 value(s) - e.g., https://www.deckenmalerei.eu/fdb4a025-5a0c-43e3-b9c3-4ba6057...
  • associatedMedia: 1 value(s) - e.g., nodeID://b2653351
  • 22-rdf-syntax-ns#type: 1 value(s) - e.g., https://nfdi4culture.de/ontology/CTO_0001005
  • rdf-schema#label: 1 value(s) - e.g., Divina Sapienza
  • NFDI_0000142: 1 value(s) - e.g., https://nfdi4culture.de/id/E6404
  • NFDI_0000191: 1 value(s) - e.g., https://nfdi4culture.de/id/E2430
  • NFDI_0001008: 1 value(s) - e.g., https://www.deckenmalerei.eu/fdb4a025-5a0c-43e3-b9c3-4ba6057...
  • CTO_0001006: 1 value(s) - e.g., https://nfdi4culture.de/id/E6077
  • CTO_0001019: 1 value(s) - e.g., https://www.deckenmalerei.eu/fdb4a025-5a0c-43e3-

In [None]:
# Explore the ICONCLASS subjects (iconographic themes) of the first painting
if not df_detailed.empty:
    first_painting_uri = df_detailed.iloc[0]['painting']
    
    query_subjects = f"""
    SELECT ?iconclass ?iconclassLabel
    WHERE {{
      <{first_painting_uri}> <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass .
      OPTIONAL {{ ?iconclass rdfs:label ?iconclassLabel . }}
    }}
    """
    
    df_subjects = run_sparql(query_subjects)
    
    print(f"ICONCLASS subjects for '{df_detailed.iloc[0]['label']}':")
    print("="*60)
    
    for idx, row in df_subjects.iterrows():
        iconclass_uri = row['iconclass']
        # Extract the code from the URI
        code = iconclass_uri.split('/')[-1] if '/' in iconclass_uri else iconclass_uri
        
        # Determine if it's ICONCLASS or Getty AAT
        if 'iconclass.org' in iconclass_uri:
            source = 'ICONCLASS'
            link = iconclass_uri
        elif 'vocab.getty.edu' in iconclass_uri:
            source = 'Getty AAT'
            link = iconclass_uri
        else:
            source = 'Other'
            link = iconclass_uri
            
        print(f"  • [{source}] {code}")
        print(f"    🔗 {link}")
        print()

ICONCLASS subjects for 'Divina Sapienza':
  • [ICONCLASS] 5%281BENEVOLENCE%29
    🔗 https://iconclass.org/5%281BENEVOLENCE%29

  • [ICONCLASS] 5%281GENEROSITY%29
    🔗 https://iconclass.org/5%281GENEROSITY%29

  • [ICONCLASS] 5%28CARITAS%29
    🔗 https://iconclass.org/5%28CARITAS%29

  • [ICONCLASS] 52A2
    🔗 https://iconclass.org/52A2

  • [Getty AAT] 300411453
    🔗 http://vocab.getty.edu/aat/300411453

  • [ICONCLASS] 11H%28VITUS%29
    🔗 https://iconclass.org/11H%28VITUS%29

  • [ICONCLASS] 34B232%28COCK%29
    🔗 https://iconclass.org/34B232%28COCK%29

  • [ICONCLASS] 11E1
    🔗 https://iconclass.org/11E1

  • [ICONCLASS] 11E541
    🔗 https://iconclass.org/11E541

  • [ICONCLASS] 73G2
    🔗 https://iconclass.org/73G2



In [None]:
# Check if the ICONCLASS/AAT subjects have labels stored in the NFDI4Culture KG
# Let's see what properties these subject URIs have in the KG

query_subject_properties = f"""
SELECT ?p (COUNT(?o) AS ?count) (SAMPLE(?o) AS ?sampleValue)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?subject .
  ?subject ?p ?o .
}}
GROUP BY ?p
ORDER BY DESC(?count)
LIMIT 20
"""

df_subject_props = run_sparql(query_subject_properties)
print("Properties of ICONCLASS/AAT subject URIs in the NFDI4Culture KG:")
df_subject_props

Properties of ICONCLASS/AAT subject URIs in the NFDI4Culture KG:


Unnamed: 0,p,count,sampleValue
0,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,23359,https://nfdi4culture.de/ontology/CTO_0001030


In [None]:
# Query external SPARQL endpoints for subject labels
# Clear any cached results from previous runs
import requests
import time
from functools import lru_cache
import urllib.parse

def query_iconclass_sparql(notation):
    """Query ICONCLASS SPARQL endpoint for a label."""
    try:
        # URL-decode the notation (e.g., "48C14%28SCHEINARCHITEKTUR%29" -> "48C14(SCHEINARCHITEKTUR)")
        notation_decoded = urllib.parse.unquote(notation)
        
        endpoint = "https://iconclass.org/sparql"
        query = f"""
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
        
        SELECT ?label
        WHERE {{
          <https://iconclass.org/{notation_decoded}> skos:prefLabel ?label .
          FILTER(LANG(?label) = "en")
        }}
        LIMIT 1
        """.strip()  # IMPORTANT: strip whitespace!
        
        resp = requests.get(
            endpoint,
            params={'query': query, 'format': 'json'},
            headers={'Accept': 'application/sparql-results+json'},
            timeout=10
        )
        if resp.ok:
            data = resp.json()
            bindings = data.get("results", {}).get("bindings", [])
            if bindings:
                return bindings[0].get("label", {}).get("value")
    except Exception as e:
        pass
    return None

@lru_cache(maxsize=500)
def query_getty_sparql(aat_id):
    """Query Getty AAT SPARQL endpoint for a label using gvp:prefLabelGVP."""
    try:
        endpoint = "http://vocab.getty.edu/sparql"
        # Getty uses gvp:prefLabelGVP/xl:literalForm for preferred labels
        # IMPORTANT: Must strip whitespace - Getty returns empty response if query has leading whitespace!
        query = f"""
PREFIX gvp: <http://vocab.getty.edu/ontology#>
PREFIX xl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX aat: <http://vocab.getty.edu/aat/>

SELECT ?label
WHERE {{
  aat:{aat_id} gvp:prefLabelGVP/xl:literalForm ?label .
}}
LIMIT 1
""".strip()
        
        resp = requests.get(
            endpoint,
            params={'query': query, 'format': 'json'},
            headers={'Accept': 'application/sparql-results+json'},
            timeout=10
        )
        if resp.ok and resp.text:  # Also check response is not empty
            data = resp.json()
            bindings = data.get("results", {}).get("bindings", [])
            if bindings:
                return bindings[0].get("label", {}).get("value")
    except Exception as e:
        pass
    return None

def resolve_subject_from_sparql(uri):
    """Resolve a subject URI to its label using external SPARQL endpoints."""
    code = uri.split('/')[-1]
    
    if 'iconclass.org' in uri:
        label = query_iconclass_sparql(code)
        source = 'ICONCLASS'
    elif 'vocab.getty.edu' in uri:
        label = query_getty_sparql(code)
        source = 'Getty AAT'
    else:
        label = None
        source = 'Unknown'
    
    return {
        'uri': uri,
        'code': code,
        'label': label or f'[{code}]',
        'source': source,
        'resolved': label is not None
    }

# Test with sample codes
print("Testing external SPARQL endpoints...")
print("="*60)

print("\n1. ICONCLASS tests:")
for code in ["92D1521", "25HH", "5"]:
    label = query_iconclass_sparql(code)
    print(f"   {code}: {label}")

print("\n2. Getty AAT tests (using gvp:prefLabelGVP/xl:literalForm):")
for code in ["300004792", "300411453"]:
    label = query_getty_sparql(code)
    print(f"   {code}: {label}")

print("\n" + "="*60)
print("✅ Functions defined: resolve_subject_from_sparql(uri)")

Testing external SPARQL endpoints...

1. ICONCLASS tests:
   92D1521: Cupid shooting a dart
   25HH: landscapes - HH - ideal landscapes
   5: Abstract Ideas and Concepts

2. Getty AAT tests (using gvp:prefLabelGVP/xl:literalForm):
   25HH: landscapes - HH - ideal landscapes
   5: Abstract Ideas and Concepts

2. Getty AAT tests (using gvp:prefLabelGVP/xl:literalForm):
   300004792: buildings (structures)
   300004792: buildings (structures)
   300411453: ceiling paintings

✅ Functions defined: resolve_subject_from_sparql(uri)
   300411453: ceiling paintings

✅ Functions defined: resolve_subject_from_sparql(uri)


In [None]:
# Test the SPARQL-based label resolver on subjects from our sample paintings
print("Resolving labels for subjects using external SPARQL endpoints...")
print("="*70)

# Collect unique subjects from df_detailed
all_subjects = set()
for subjects_str in df_detailed['subjects'].dropna():
    for s in subjects_str.split(', '):
        s = s.strip()
        if s:
            all_subjects.add(s)

print(f"\nFound {len(all_subjects)} unique subject codes in sample paintings\n")

# Resolve each subject (limit to first 15 to avoid too many API calls)
resolved = []
for uri in list(all_subjects)[:15]:
    code = uri.split('/')[-1]
    print(f"Resolving: {code[:30]}...", end=" ")
    result = resolve_subject_from_sparql(uri)  # Use new SPARQL function
    resolved.append(result)
    status = "✓" if result['resolved'] else "✗"
    label_display = result['label'][:50] + "..." if len(result['label']) > 50 else result['label']
    print(f"{status} [{result['source']}] {label_display}")
    time.sleep(0.2)  # Be nice to external endpoints

# Create a dataframe of resolved subjects
df_resolved = pd.DataFrame(resolved)
print("\n" + "="*70)
print(f"\nResolved {sum(df_resolved['resolved'])}/{len(df_resolved)} subjects successfully")
df_resolved

Resolving labels for subjects using external SPARQL endpoints...

Found 30 unique subject codes in sample paintings

Resolving: 11E541... ✓ [ICONCLASS] 'Sapientia', Wisdom; 'Sapienza divina' (Ripa) ~ on...
Resolving: 25G4%28WHEAT%29... ✓ [ICONCLASS] plants and herbs: wheat
Resolving: 25G4%28WHEAT%29... ✓ [ICONCLASS] plants and herbs: wheat
Resolving: 25G41... Resolving: 25G41... ✓ [ICONCLASS] flowers
Resolving: 92D1916... ✓ [ICONCLASS] flowers
Resolving: 92D1916... ✓ [ICONCLASS] cupids: 'amores', 'amoretti', 'putti'
Resolving: 11E1... ✓ [ICONCLASS] cupids: 'amores', 'amoretti', 'putti'
Resolving: 11E1... ✓ [ICONCLASS] Holy Ghost represented as a dove (in flames)
✓ [ICONCLASS] Holy Ghost represented as a dove (in flames)
Resolving: 41D%28%2B44%29... ✓ [ICONCLASS] fashion, clothing (+ feathers used for clothes)
Resolving: 41D%28%2B44%29... ✓ [ICONCLASS] fashion, clothing (+ feathers used for clothes)
Resolving: 300004792... ✓ [Getty AAT] buildings (structures)
Resolving: 98B%28CROESUS%29

Unnamed: 0,uri,code,label,source,resolved
0,https://iconclass.org/11E541,11E541,"'Sapientia', Wisdom; 'Sapienza divina' (Ripa) ...",ICONCLASS,True
1,https://iconclass.org/25G4%28WHEAT%29,25G4%28WHEAT%29,plants and herbs: wheat,ICONCLASS,True
2,https://iconclass.org/25G41,25G41,flowers,ICONCLASS,True
3,https://iconclass.org/92D1916,92D1916,"cupids: 'amores', 'amoretti', 'putti'",ICONCLASS,True
4,https://iconclass.org/11E1,11E1,Holy Ghost represented as a dove (in flames),ICONCLASS,True
5,https://iconclass.org/41D%28%2B44%29,41D%28%2B44%29,"fashion, clothing (+ feathers used for clothes)",ICONCLASS,True
6,http://vocab.getty.edu/aat/300004792,300004792,buildings (structures),Getty AAT,True
7,https://iconclass.org/98B%28CROESUS%2961,98B%28CROESUS%2961,on his pyre Croesus is pardoned by Cyrus,ICONCLASS,True
8,https://iconclass.org/92E5421,92E5421,"Zephyrus abducting Flora (or Chloris), who usu...",ICONCLASS,True
9,https://iconclass.org/44A1,44A1,"coat of arms (as symbol of the state, etc.)",ICONCLASS,True


In [None]:
# Enhanced display function that shows resolved subject labels (using SPARQL endpoints)
from IPython.display import HTML, display

def display_painting_with_labels(row, max_width=500, resolve_subjects=True):
    """Display a painting with its metadata and resolved subject labels."""
    label = row.get('label', 'Unknown')
    year = row.get('year', 'Unknown date')
    image_url = row.get('imageUrl', '')
    subjects = row.get('subjects', '')
    lat = row.get('lat', '')
    lon = row.get('lon', '')
    painting_uri = row.get('painting', '')
    
    # Create location string if coordinates exist
    location = f"📍 {lat}, {lon}" if lat and lon else ""
    
    # Resolve subject labels using SPARQL endpoints
    subject_html_items = []
    if subjects and resolve_subjects:
        subject_list = [s.strip() for s in subjects.split(',') if s.strip()]
        for uri in subject_list[:5]:  # Limit to 5 subjects
            resolved = resolve_subject_from_sparql(uri)  # Use SPARQL-based resolver
            badge_color = '#4CAF50' if 'iconclass' in uri.lower() else '#2196F3'
            subject_html_items.append(
                f'<span style="background: {badge_color}; color: white; padding: 2px 8px; '
                f'border-radius: 12px; font-size: 12px; margin: 2px; display: inline-block;" '
                f'title="{resolved["source"]}: {resolved["code"]}">{resolved["label"]}</span>'
            )
    subject_html = ''.join(subject_html_items) if subject_html_items else '<em>No subjects</em>'
    
    html = f"""
    <div style="border: 1px solid #ddd; padding: 15px; margin: 10px 0; border-radius: 8px; background: #fafafa;">
        <h3 style="margin-top: 0; color: #333;">{label}</h3>
        <p style="color: #000;"><strong>Date:</strong> {year}</p>
        <div style="margin: 10px 0;">
            <strong style="color: #000;">Subjects:</strong><br>
            <div style="margin-top: 5px;">{subject_html}</div>
        </div>
        {f'<p style="color: #000;">{location}</p>' if location else ''}
        <p><a href="{painting_uri}" target="_blank" style="color: #0066cc;">🔗 View in CbDD</a></p>
        <img src="{image_url}" style="max-width: {max_width}px; max-height: 500px; border-radius: 4px;" 
             onerror="this.onerror=null; this.src=''; this.alt='Image could not be loaded';">
    </div>
    """
    display(HTML(html))

print("✅ Enhanced display function defined: display_painting_with_labels()")
print("\nThis function resolves subject codes via SPARQL to external vocabularies.")
print("Subject sources are color-coded: 🟢 ICONCLASS | 🔵 Getty AAT")

✅ Enhanced display function defined: display_painting_with_labels()

This function resolves subject codes via SPARQL to external vocabularies.
Subject sources are color-coded: 🟢 ICONCLASS | 🔵 Getty AAT


In [None]:
# Display paintings with resolved subject labels (using SPARQL endpoints)
print("Displaying paintings with resolved subject labels:\n")
print("Fetching labels from ICONCLASS and Getty AAT SPARQL endpoints...")
print("(🟢 = ICONCLASS, 🔵 = Getty AAT)\n")

for idx, row in df_detailed.head(3).iterrows():
    print(f"\n--- Painting {idx+1}: {row.get('label', 'Unknown')[:50]} ---")
    display_painting_with_labels(row)
    time.sleep(0.3)  # Small delay between paintings

Displaying paintings with resolved subject labels:

Fetching labels from ICONCLASS and Getty AAT SPARQL endpoints...
(🟢 = ICONCLASS, 🔵 = Getty AAT)


--- Painting 1: Divina Sapienza ---



--- Painting 2: Landschaft mit einer Frau in morgenländischer Trac ---



--- Painting 3: Winsen, Schloss ---


## 3.3 Enriching Missing Geospatial Data with Wikidata

Many paintings in the CbDD dataset are missing GPS coordinates. We can enrich this data by:
1. Extracting the place/building name from the painting label
2. Querying Wikidata to find matching German geographic regions
3. Retrieving the coordinates from Wikidata

This uses the Wikidata SPARQL endpoint with the `EntitySearch` API to find places.

In [None]:
# Wikidata Geocoding for Missing Coordinates
import httpx
from functools import lru_cache

# SPARQL query template to find a place in Wikidata
# Searches for geographic regions (Q82794) in Germany
FIND_PLACE_SPARQL = '''SELECT ?item ?itemLabel ?coord WHERE {{
  SERVICE wikibase:mwapi {{
    bd:serviceParam wikibase:endpoint "www.wikidata.org";
                    wikibase:api "EntitySearch";
                    mwapi:search "{place_name}";
                    mwapi:language "de".
    ?item wikibase:apiOutputItem mwapi:item.    
  }}
  # Filter for geographic regions or their subclasses
  ?item wdt:P31/wdt:P279* wd:Q82794.
  # Get coordinates
  OPTIONAL {{ ?item wdt:P625 ?coord. }}
  SERVICE wikibase:label {{ bd:serviceParam wikibase:language "de,en". }}
}}
LIMIT 5
'''

# Alternative query for buildings/structures (castles, churches, palaces)
FIND_BUILDING_SPARQL = '''SELECT ?item ?itemLabel ?coord WHERE {{
  SERVICE wikibase:mwapi {{
    bd:serviceParam wikibase:endpoint "www.wikidata.org";
                    wikibase:api "EntitySearch";
                    mwapi:search "{place_name}";
                    mwapi:language "de".
    ?item wikibase:apiOutputItem mwapi:item.    
  }}
  # Filter for buildings/structures
  {{ ?item wdt:P31/wdt:P279* wd:Q41176. }}  # building
  UNION
  {{ ?item wdt:P31/wdt:P279* wd:Q16970. }}  # church
  UNION  
  {{ ?item wdt:P31/wdt:P279* wd:Q23413. }}  # castle
  UNION
  {{ ?item wdt:P31/wdt:P279* wd:Q16560. }}  # palace
  # Get coordinates
  OPTIONAL {{ ?item wdt:P625 ?coord. }}
  SERVICE wikibase:label {{ bd:serviceParam wikibase:language "de,en". }}
}}
LIMIT 5
'''

@lru_cache(maxsize=500)
def query_wikidata_coordinates(place_name: str) -> dict:
    """
    Query Wikidata to find coordinates for a place name.
    Returns dict with 'lat', 'lon', 'wikidata_uri', 'matched_label' or None values.
    """
    result = {'lat': None, 'lon': None, 'wikidata_uri': None, 'matched_label': None}
    
    # Clean the place name - extract city/location from labels like "Winsen, Schloss"
    # Try the full name first, then individual parts
    search_terms = [place_name]
    if ',' in place_name:
        parts = [p.strip() for p in place_name.split(',')]
        search_terms.extend(parts)
    
    for search_term in search_terms:
        if not search_term or len(search_term) < 3:
            continue
            
        # Try building search first (more specific), then place search
        for query_template in [FIND_BUILDING_SPARQL, FIND_PLACE_SPARQL]:
            query = query_template.format(place_name=search_term.replace('"', ''))
            
            try:
                response = httpx.post(
                    "https://query.wikidata.org/sparql",
                    data={"query": query, "format": "json"},
                    headers={"Accept": "application/sparql-results+json",
                             "User-Agent": "BaroqueCeilingPaintingResearch/1.0"},
                    timeout=30,
                )
                
                if response.status_code == 200:
                    data = response.json()
                    bindings = data.get('results', {}).get('bindings', [])
                    
                    for binding in bindings:
                        coord = binding.get('coord', {}).get('value', '')
                        if coord and coord.startswith('Point('):
                            # Parse "Point(lon lat)" format
                            coords = coord.replace('Point(', '').replace(')', '').split()
                            if len(coords) == 2:
                                result['lon'] = float(coords[0])
                                result['lat'] = float(coords[1])
                                result['wikidata_uri'] = binding.get('item', {}).get('value')
                                result['matched_label'] = binding.get('itemLabel', {}).get('value')
                                return result
                                
            except (httpx.ReadTimeout, httpx.ConnectError, Exception) as e:
                continue
        
        time.sleep(0.5)  # Be nice to Wikidata
    
    return result


def extract_place_from_label(label: str) -> str:
    """
    Extract a searchable place name from a painting label.
    Labels often follow patterns like:
    - "Winsen, Schloss" -> search for "Winsen" and "Schloss Winsen"
    - "München, Residenz, Antiquarium" -> search for "München" and "Residenz München"
    """
    if not label:
        return ""
    
    # Common patterns in CbDD labels
    parts = [p.strip() for p in label.split(',')]
    
    if len(parts) >= 2:
        # Return "Building City" format for better Wikidata matching
        return f"{parts[1]} {parts[0]}"
    
    return label


def enrich_painting_with_coordinates(row: dict) -> dict:
    """
    Enrich a painting record with coordinates from Wikidata if missing.
    Returns the row with potentially added 'lat', 'lon', 'wikidata_place' fields.
    """
    enriched = dict(row)
    
    # Check if coordinates are already present and valid
    lat = row.get('lat')
    lon = row.get('lon')
    
    # pandas NaN check
    has_coords = (lat is not None and lon is not None and 
                  str(lat) != 'nan' and str(lon) != 'nan' and
                  lat != '' and lon != '')
    
    if has_coords:
        enriched['geo_source'] = 'original'
        return enriched
    
    # Try to get coordinates from Wikidata
    label = row.get('label', '')
    place_search = extract_place_from_label(label)
    
    if place_search:
        wikidata_result = query_wikidata_coordinates(place_search)
        
        if wikidata_result['lat'] is not None:
            enriched['lat'] = wikidata_result['lat']
            enriched['lon'] = wikidata_result['lon']
            enriched['wikidata_place'] = wikidata_result['wikidata_uri']
            enriched['matched_place'] = wikidata_result['matched_label']
            enriched['geo_source'] = 'wikidata'
        else:
            enriched['geo_source'] = 'not_found'
    else:
        enriched['geo_source'] = 'no_label'
    
    return enriched

print("✅ Wikidata geocoding functions defined:")
print("   - query_wikidata_coordinates(place_name)")
print("   - extract_place_from_label(label)")
print("   - enrich_painting_with_coordinates(row)")

✅ Wikidata geocoding functions defined:
   - query_wikidata_coordinates(place_name)
   - extract_place_from_label(label)
   - enrich_painting_with_coordinates(row)


## 4. Compare CbDD and Color Slide Archive of Wall and Ceiling Painting

Portal IDs from the registry:
- CbDD: `n4c:E4264`
- Color Slide Archive: `n4c:E4267`

Goal: Count how many records in the KG come from each of these portals.

We assume a pattern similar to:
- `?item schema:isPartOf ?feed`
- `?feed schema:isPartOf ?portal` or `?feed dcterms:isPartOf ?portal`

You may have to adjust the property in the middle depending on what you see in the inspection of the feed nodes.

In [None]:
query_ceiling_portal_counts = """\
SELECT ?portal ?portalLabel (COUNT(DISTINCT ?item) AS ?records)
WHERE {
  VALUES ?portal { n4c:E4264  n4c:E4267 }

  # feed belongs to one of the two portals
  ?feed ?isPartOfPortal ?portal .
  FILTER(?isPartOfPortal IN (schema:isPartOf, dcterms:isPartOf))

  # items belong to that feed
  ?item schema:isPartOf ?feed .

  ?portal schema:name ?portalLabel .
}
GROUP BY ?portal ?portalLabel
ORDER BY DESC(?records)
"""

df_ceiling_portal_counts = run_sparql(query_ceiling_portal_counts)
df_ceiling_portal_counts

In [None]:
# Simple bar chart of records per portal (CbDD vs Color Slide Archive)
if not df_ceiling_portal_counts.empty:
    plt.figure(figsize=(6, 4))
    plt.bar(df_ceiling_portal_counts["portalLabel"], df_ceiling_portal_counts["records"].astype(int))
    plt.xticks(rotation=20, ha="right")
    plt.ylabel("Number of records in KG")
    plt.title("Records from baroque wall & ceiling painting portals")
    plt.tight_layout()
    plt.show()
else:
    print("No results yet. Check if the intermediate predicate (?isPartOfPortal) is correct.")

No results yet. Check if the intermediate predicate (?isPartOfPortal) is correct.


## 5. Next steps for your data story

Ideas for how you can extend this notebook:

1. **Map of painting locations**  
   - From `df_cbdd_items_sample` or a larger query, extract `placeLabel` and, if available, coordinates.  
   - Use a mapping library (e.g. `folium`) to display points on a map.

2. **Timeline of creation dates**  
   - Inspect which property holds precise dates or centuries (e.g. `schema:temporalCoverage`, other date fields).  
   - Parse years to integers, bucket by decade or century, plot as a bar chart.

3. **Motif / subject comparison between portals**  
   - If items use `schema:about` or `dcterms:subject` with concepts (ICONCLASS, GND), count their frequency per portal.  
   - Visualise top motifs for CbDD vs Color Slide Archive in a grouped bar chart.

4. **Linked Data demonstration**  
   - Use the item inspection to find external identifiers (e.g. Wikidata, GND).  
   - Show a small RDF snippet or perform a federated query as part of your story.

You can keep all experiments you do here and later turn the most interesting figures and tables into your final data story.