# Exploring Baroque Ceiling Painting Data in the NFDI4Culture Knowledge Graph

This notebook is a starting point for a data story about baroque art and ceiling paintings using the NFDI4Culture Knowledge Graph.

Focus:
- Work with **data portals** (especially CbDD and the Color Slide Archive of Wall and Ceiling Painting)
- Use **SPARQL** to query the KG
- Prepare results for visualisation (maps, timelines, comparisons)

You can adapt the queries step by step as you learn more about the concrete RDF schema of the datasets.

In [156]:
# Install dependencies (run once per environment)
!pip install SPARQLWrapper pandas matplotlib --quiet


[notice] A new release of pip is available: 25.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [157]:
from SPARQLWrapper import SPARQLWrapper, JSON
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option("display.max_rows", 50)
pd.set_option("display.max_columns", 20)
pd.set_option("display.width", 120)

# NFDI4Culture SPARQL endpoint
ENDPOINT_URL = "https://nfdi4culture.de/sparql"

# Prefixes used in queries
# NOTE: The KG uses http://schema.org/ (not https://)
PREFIXES = """\
PREFIX fabio: <http://purl.org/spar/fabio/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX nfdicore: <https://nfdi.fiz-karlsruhe.de/ontology/>
PREFIX schema:  <http://schema.org/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dcat:    <http://www.w3.org/ns/dcat#>
PREFIX n4c:     <https://nfdi4culture.de/id/>
"""

def run_sparql(query: str) -> pd.DataFrame:
    """Run a SPARQL query against the NFDI4Culture endpoint and return a pandas DataFrame.

    The query body should *not* include prefixes, they are automatically prepended.
    This version accesses the JSON result safely to avoid indexing errors in static type checkers.
    """
    sparql = SPARQLWrapper(ENDPOINT_URL)
    sparql.setReturnFormat(JSON)
    sparql.setQuery(PREFIXES + "\n" + query)
    results = sparql.query().convert()

    # Be defensive: ensure results is a dict and extract bindings safely
    if not isinstance(results, dict):
        return pd.DataFrame()

    bindings = results.get("results", {}).get("bindings", [])
    rows = []
    for binding in bindings:
        # each binding is a dict of variable -> { "type": ..., "value": ... }
        row = {var: val.get("value") for var, val in binding.items()}
        rows.append(row)
    return pd.DataFrame(rows)

## 1. Inspect the CbDD portal (Corpus of Baroque Ceiling Painting in Germany)

- Portal ID from the registry: `n4c:E4264`
- Goal: See which properties connect the portal to data feeds, homepages, subjects, etc.

Run this once and scan the property list. It tells you which predicates to use in later queries.

In [158]:
query_inspect_cbdd = """\
SELECT ?p ?o
WHERE {
  n4c:E4264 ?p ?o .
}
ORDER BY ?p
LIMIT 200
"""

df_cbdd_props = run_sparql(query_inspect_cbdd)
df_cbdd_props

Unnamed: 0,p,o
0,http://schema.org/contributor,nodeID://b696559
1,http://schema.org/contributor,nodeID://b697616
2,http://schema.org/contributor,nodeID://b698392
3,http://schema.org/contributor,nodeID://b699776
4,http://schema.org/description,\n The Corpus of Baroque Ceiling Painting i...
5,http://schema.org/hasPart,https://nfdi4culture.de/id/E6077
6,http://schema.org/image,https://nfdi4culture.de//fileadmin/user_upload...
7,http://schema.org/keywords,https://nfdi4culture.de/id/E3953
8,http://schema.org/keywords,https://nfdi4culture.de/id/E3959
9,http://schema.org/keywords,https://nfdi4culture.de/id/E3968


## 2. Discover the CbDD Data Feed

The CbDD portal (`n4c:E4264`) contains a data feed that holds all painting records. 
Let's find the feed and understand how paintings are connected to it.

In [159]:
# Find what points TO the CbDD portal - this reveals the data feed
query_find_feed = """
SELECT ?feed ?feedLabel ?feedType ?predicate
WHERE {
  ?feed ?predicate n4c:E4264 .
  OPTIONAL { ?feed rdfs:label ?feedLabel . }
  OPTIONAL { ?feed rdf:type ?feedType . }
}
LIMIT 20
"""

df_feeds = run_sparql(query_find_feed)
print("Entities pointing to the CbDD portal:")
print(df_feeds)

# The main feed is E6077 - let's verify its structure
print("\n" + "="*60)
print("Verifying E6077 feed structure:")

query_feed_structure = """
SELECT ?p (COUNT(?o) AS ?count) 
WHERE {
  n4c:E6077 ?p ?o .
}
GROUP BY ?p
ORDER BY DESC(?count)
LIMIT 10
"""
df_feed_struct = run_sparql(query_feed_structure)
print(df_feed_struct)

Entities pointing to the CbDD portal:
                                feed                                          feedLabel  \
0   https://nfdi4culture.de/id/E3978                                            CC0 1.0   
1   https://nfdi4culture.de/id/E3978                                            CC0 1.0   
2   https://nfdi4culture.de/id/E2312                                       Architecture   
3   https://nfdi4culture.de/id/E2312                                       Architecture   
4   https://nfdi4culture.de/id/E2313                                        Art History   
5   https://nfdi4culture.de/id/E2313                                        Art History   
6   https://nfdi4culture.de/id/E2957                                  Image File Format   
7   https://nfdi4culture.de/id/E3596                                           Database   
8   https://nfdi4culture.de/id/E3608                                              CC BY   
9   https://nfdi4culture.de/id/E3953                

In [160]:
# Define the CbDD feed URI - this is the main entry point for querying paintings
CBDD_FEED_URI = "n4c:E6077"

# Verify the data path: Feed -> DataFeedItem -> Painting
query_verify_path = f"""
SELECT (COUNT(DISTINCT ?painting) AS ?totalPaintings)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
}}
"""
df_verify = run_sparql(query_verify_path)
print(f"‚úì CbDD Feed URI: {CBDD_FEED_URI}")
print(f"‚úì Total paintings accessible: {df_verify['totalPaintings'].iloc[0]}")
print(f"\nData path: Feed ‚Üí schema:dataFeedElement ‚Üí DataFeedItem ‚Üí schema:item ‚Üí Painting")  

‚úì CbDD Feed URI: n4c:E6077
‚úì Total paintings accessible: 6228

Data path: Feed ‚Üí schema:dataFeedElement ‚Üí DataFeedItem ‚Üí schema:item ‚Üí Painting


## 3. Explore Painting Properties

Now let's discover what properties are available on the painting records.

In [None]:
# Discover all predicates used by paintings in the dataset
query_painting_predicates = f"""
SELECT ?predicate (COUNT(?o) AS ?count) (SAMPLE(?o) AS ?sampleValue)
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting ?predicate ?o .
}}
GROUP BY ?predicate
ORDER BY DESC(?count)
LIMIT 30
"""

df_painting_preds = run_sparql(query_painting_predicates)

# Add resolved labels using the ontology resolver (defined in cell 13)
# This will be populated after running the ontology resolution cell
def add_resolved_labels(df):
    """Add a 'resolved_label' column with human-readable property names."""
    if 'resolve_property_name' in dir():
        df['resolved_label'] = df['predicate'].apply(resolve_property_name)
    else:
        # Fallback: extract last part of URI
        df['resolved_label'] = df['predicate'].apply(
            lambda x: x.split('/')[-1] if '/' in x else x
        )
    return df

df_painting_preds = add_resolved_labels(df_painting_preds)

print("All predicates used by paintings (with resolved ontology labels):")
print("="*80)
print("\nRun the 'Automatic Ontology Resolution' cell first to get full CTO/NFDI labels.\n")

# Display with resolved labels
df_painting_preds[['resolved_label', 'count', 'predicate', 'sampleValue']]

All predicates used by paintings:


Unnamed: 0,predicate,count,sampleValue
0,https://nfdi4culture.de/ontology/CTO_0001026,23359,https://iconclass.org/11D
1,https://nfdi4culture.de/ontology/CTO_0001009,6672,nodeID://b2646779
2,https://nfdi4culture.de/ontology/CTO_0001025,6230,nodeID://b2651780
3,http://www.w3.org/2000/01/rdf-schema#label,6228,Christus
4,https://nfdi4culture.de/ontology/CTO_0001049,6228,https://nfdi4culture.de/ontology/CTO_0001047
5,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6228,https://www.deckenmalerei.eu/0031d9cd-e121-4da...
6,https://nfdi4culture.de/ontology/CTO_0001006,6228,https://nfdi4culture.de/id/E6077
7,http://www.w3.org/1999/02/22-rdf-syntax-ns#type,6228,https://nfdi4culture.de/ontology/CTO_0001005
8,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6228,https://nfdi4culture.de/id/E6404
9,https://nfdi.fiz-karlsruhe.de/ontology/NFDI_00...,6228,https://nfdi4culture.de/id/E2430


In [None]:
# Get a sample of paintings with key properties to understand the data
# Key properties: CTO_0001073 = creation period/year
query_sample_paintings = f"""
SELECT ?painting ?label ?year ?lat ?lon 
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  ?painting rdfs:label ?label .
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001073> ?year . }}
  OPTIONAL {{
    ?painting schema:latitude ?lat .
    ?painting schema:longitude ?lon .
  }}
}}
LIMIT 10
"""

df_sample_paintings = run_sparql(query_sample_paintings)
print(f"Sample paintings ({len(df_sample_paintings)} records):")
print(df_sample_paintings)

# =============================================================================
# Function to get ALL metadata for a specific painting
# Uses the automatic ontology resolver for human-readable property names
# =============================================================================
def get_painting_metadata(painting_uri: str, use_ontology_labels: bool = True) -> pd.DataFrame:
    """
    Retrieve ALL properties (predicates and values) for a specific painting URI.
    This shows the complete metadata stored in the knowledge graph.
    
    Integrates with the CTO/NFDI ontology resolver for human-readable names.
    
    Args:
        painting_uri: The full URI of the painting (e.g., 'https://nfdi4culture.de/id/...')
        use_ontology_labels: If True, use resolved ontology labels (requires cell 13 to be run)
        
    Returns:
        DataFrame with columns: property_name, value, value_type, property
    """
    query = f"""
    SELECT ?property ?value
    WHERE {{
      <{painting_uri}> ?property ?value .
    }}
    ORDER BY ?property
    """
    
    df = run_sparql(query)
    
    if not df.empty:
        # Add a readable property name column using ontology resolver if available
        if use_ontology_labels and 'resolve_property_name' in dir():
            df['property_name'] = df['property'].apply(resolve_property_name)
        else:
            # Fallback: extract last part of URI
            df['property_name'] = df['property'].apply(
                lambda x: x.split('/')[-1] if '/' in x else x
            )
        
        # Detect value type (URI vs literal)
        df['value_type'] = df['value'].apply(
            lambda x: 'URI' if x.startswith('http') else 'Literal'
        )
        # Reorder columns for better readability
        df = df[['property_name', 'value', 'value_type', 'property']]
    
    return df

# Show all metadata for the first painting in our sample
print("\n" + "="*80)
print("üìã COMPLETE METADATA for first painting:")
print("   (Property names resolved via CTO/NFDI ontology when available)")
print("="*80)

if not df_sample_paintings.empty:
    first_painting_uri = df_sample_paintings.iloc[0]['painting']
    first_painting_label = df_sample_paintings.iloc[0]['label']
    print(f"\nüñºÔ∏è  {first_painting_label}")
    print(f"URI: {first_painting_uri}\n")
    
    df_metadata = get_painting_metadata(first_painting_uri)
    print(f"Found {len(df_metadata)} property values:\n")
    
    # Group by property for cleaner display
    for prop_name in df_metadata['property_name'].unique():
        prop_rows = df_metadata[df_metadata['property_name'] == prop_name]
        values = prop_rows['value'].tolist()
        value_type = prop_rows['value_type'].iloc[0]
        
        if len(values) == 1:
            val_display = values[0][:80] + '...' if len(values[0]) > 80 else values[0]
            print(f"  ‚Ä¢ {prop_name}: {val_display}")
        else:
            print(f"  ‚Ä¢ {prop_name}: ({len(values)} values)")
            for v in values[:3]:  # Show first 3 values
                val_display = v[:70] + '...' if len(v) > 70 else v
                print(f"      - {val_display}")
            if len(values) > 3:
                print(f"      ... and {len(values)-3} more")

print("\n‚úÖ Function defined: get_painting_metadata(painting_uri)")
print("   Use it to explore any painting: get_painting_metadata(df_sample_paintings.iloc[N]['painting'])")
print("   Set use_ontology_labels=False to disable ontology resolution")

Sample paintings (10 records):
                                            painting                                            label  \
0  https://www.deckenmalerei.eu/00e1625e-0ac7-423...                        Burggen, Kapelle St. Anna   
1  https://www.deckenmalerei.eu/021afb11-438b-4f7...                       Iffeldorf, Heuwinklkapelle   
2  https://www.deckenmalerei.eu/02f7125d-cfb1-4fa...  Hessental, H√§llische Erbsch√§nke, Gasthaus Krone   
3  https://www.deckenmalerei.eu/03414469-1219-4fc...                             Lauchheim, Pfarrhaus   
4  https://www.deckenmalerei.eu/037d1d8a-4487-439...                             Berlin, Stadtschloss   
5  https://www.deckenmalerei.eu/043e1e20-2c95-42b...        Eisenberg, Residenzschloss Christiansburg   
6  https://www.deckenmalerei.eu/0656df8b-2e41-4cc...    Schmidm√ºhlen, Unteres Schloss (Hammerschloss)   
7  https://www.deckenmalerei.eu/0678f9cc-e52d-46e...                           Weimar, R√∂misches Haus   
8  https://www.decke

### Automatic Ontology Resolution for CTO/NFDI Codes

The painting metadata uses property codes from two namespaces:

1. **CTO (Culture Ontology)**: `https://nfdi4culture.de/ontology/CTO_XXXXXXX`
   - Domain-specific extension for NFDI4Culture cultural heritage data
   - Example: `CTO_0001009` = "has related person", `CTO_0001011` = "has related location"

2. **NFDIcore**: `https://nfdi.fiz-karlsruhe.de/ontology/NFDI_XXXXXXX`
   - Mid-level ontology for all NFDI consortia
   - Example: `NFDI_0001006` = "has external identifier" (links to GND, etc.)

**Automatic Resolution:**

Instead of hardcoding property labels, we dynamically fetch and parse the official ontology files from the GitHub repositories:

- **CTO**: [cto.ttl](https://github.com/ISE-FIZKarlsruhe/nfdi4culture/blob/main/cto.ttl)
- **NFDIcore**: [nfdicore.ttl](https://github.com/ISE-FIZKarlsruhe/nfdicore/blob/main/nfdicore.ttl)

The `rdfs:label` annotations are extracted for each CTO/NFDI entity, providing human-readable names automatically.

In [None]:
# =============================================================================
# Automatic CTO/NFDI Ontology Resolution
# =============================================================================
# Dynamically resolve ontology codes to human-readable labels by parsing
# the official OWL/TTL files from the GitHub repositories.
#
# Sources:
#   - CTO (NFDI4Culture Ontology): https://github.com/ISE-FIZKarlsruhe/nfdi4culture
#   - NFDIcore (Mid-level Ontology): https://github.com/ISE-FIZKarlsruhe/nfdicore
#
# This approach fetches the ontology files once and extracts rdfs:label
# for all CTO_* and NFDI_* entities, avoiding hardcoded mappings.

import requests
from functools import lru_cache
import re

# =============================================================================
# Ontology Sources (Raw TTL files from GitHub)
# =============================================================================
ONTOLOGY_SOURCES = {
    'CTO': {
        'url': 'https://raw.githubusercontent.com/ISE-FIZKarlsruhe/nfdi4culture/main/cto.ttl',
        'namespace': 'https://nfdi4culture.de/ontology/',
        'prefix_pattern': r'CTO_\d+',
    },
    'NFDIcore': {
        'url': 'https://raw.githubusercontent.com/ISE-FIZKarlsruhe/nfdicore/main/nfdicore.ttl',
        'namespace': 'https://nfdi.fiz-karlsruhe.de/ontology/',
        'prefix_pattern': r'NFDI_\d+',
    }
}

# Global cache for resolved ontology labels
_ontology_cache = {}
_ontology_loaded = False

def _parse_ttl_labels(ttl_content: str, namespace: str, prefix_pattern: str) -> dict:
    """
    Parse a TTL file and extract rdfs:label for entities matching the prefix pattern.
    Handles both full URI format and prefix notation (used in nfdicore.ttl).
    
    Args:
        ttl_content: The TTL file content as a string
        namespace: The namespace URI (e.g., 'https://nfdi4culture.de/ontology/')
        prefix_pattern: Regex pattern for codes (e.g., 'CTO_\\d+')
    
    Returns:
        dict mapping code -> label (e.g., 'CTO_0001009' -> 'has related person')
    """
    labels = {}
    
    # Pattern 1: Full URI format - <namespace/CODE> ... rdfs:label "Label"@en .
    entity_pattern = re.compile(
        rf'<{re.escape(namespace)}({prefix_pattern})>\s+[^;]*?'
        rf'rdfs:label\s+"([^"]+)"(?:@en)?\s*[;.]',
        re.MULTILINE | re.DOTALL
    )
    
    for match in entity_pattern.finditer(ttl_content):
        code = match.group(1)
        label = match.group(2)
        labels[code] = label
    
    # Pattern 2: Prefix notation - ontology:NFDI_XXXXXX ... rdfs:label "Label"@en
    # First find the prefix definition
    prefix_match = re.search(r'@prefix\s+(\w+):\s+<' + re.escape(namespace) + r'>\s*\.', ttl_content)
    if prefix_match:
        prefix_name = prefix_match.group(1)
        # Now find entities using that prefix
        prefix_entity_pattern = re.compile(
            rf'{prefix_name}:({prefix_pattern})\s+[^;]*?'
            rf'rdfs:label\s+"([^"]+)"(?:@en)?\s*[;.]',
            re.MULTILINE | re.DOTALL
        )
        for match in prefix_entity_pattern.finditer(ttl_content):
            code = match.group(1)
            label = match.group(2)
            if code not in labels:
                labels[code] = label
    
    # Pattern 3: Multi-line format with entity definition on one line, label on another
    lines = ttl_content.split('\n')
    current_entity = None
    
    for line in lines:
        # Check for full URI entity definition
        entity_match = re.match(rf'^<{re.escape(namespace)}({prefix_pattern})>', line)
        if entity_match:
            current_entity = entity_match.group(1)
        
        # Check for prefix notation entity definition (e.g., "ontology:NFDI_0000004")
        if prefix_match:
            prefix_name = prefix_match.group(1)
            prefix_entity_match = re.match(rf'^{prefix_name}:({prefix_pattern})\s', line)
            if prefix_entity_match:
                current_entity = prefix_entity_match.group(1)
        
        # Check for rdfs:label in the current context
        if current_entity:
            label_match = re.search(r'rdfs:label\s+"([^"]+)"(?:@en)?', line)
            if label_match and current_entity not in labels:
                labels[current_entity] = label_match.group(1)
            
            # Reset current entity on blank line or new entity definition
            if line.strip() == '':
                current_entity = None
    
    return labels

def load_ontology_labels(force_reload: bool = False) -> dict:
    """
    Load and cache all ontology labels from CTO and NFDIcore.
    
    Args:
        force_reload: If True, reload even if already cached
    
    Returns:
        dict mapping code -> {'label': str, 'namespace': str, 'uri': str}
    """
    global _ontology_cache, _ontology_loaded
    
    if _ontology_loaded and not force_reload:
        return _ontology_cache
    
    print("Loading ontology labels from GitHub...")
    
    for source_name, source_info in ONTOLOGY_SOURCES.items():
        try:
            print(f"   Fetching {source_name} from {source_info['url'][:50]}...")
            response = requests.get(source_info['url'], timeout=30)
            response.raise_for_status()
            
            labels = _parse_ttl_labels(
                response.text,
                source_info['namespace'],
                source_info['prefix_pattern']
            )
            
            for code, label in labels.items():
                _ontology_cache[code] = {
                    'label': label,
                    'namespace': source_info['namespace'],
                    'uri': f"{source_info['namespace']}{code}",
                    'source': source_name
                }
            
            print(f"   Loaded {len(labels)} labels from {source_name}")
            
        except Exception as e:
            print(f"   Failed to load {source_name}: {e}")
    
    _ontology_loaded = True
    print(f"\nTotal: {len(_ontology_cache)} ontology codes resolved")
    return _ontology_cache

@lru_cache(maxsize=500)
def resolve_ontology_code(code: str) -> dict:
    """
    Resolve a CTO/NFDI ontology code to its label.
    
    Args:
        code: Ontology code like 'CTO_0001009' or 'NFDI_0001006'
    
    Returns:
        dict with 'code', 'label', 'uri', 'source', 'resolved' keys
    """
    result = {'code': code, 'label': code, 'uri': None, 'source': None, 'resolved': False}
    
    # Ensure ontology is loaded
    if not _ontology_loaded:
        load_ontology_labels()
    
    if code in _ontology_cache:
        cached = _ontology_cache[code]
        result['label'] = cached['label']
        result['uri'] = cached['uri']
        result['source'] = cached['source']
        result['resolved'] = True
    else:
        # Construct URI even if label not found
        if code.startswith('CTO_'):
            result['uri'] = f"https://nfdi4culture.de/ontology/{code}"
            result['source'] = 'CTO'
        elif code.startswith('NFDI_'):
            result['uri'] = f"https://nfdi.fiz-karlsruhe.de/ontology/{code}"
            result['source'] = 'NFDIcore'
    
    return result

def resolve_property_name(property_uri: str) -> str:
    """
    Convert a full property URI to a human-readable label.
    
    Args:
        property_uri: Full URI like 'https://nfdi4culture.de/ontology/CTO_0001009'
    
    Returns:
        Human-readable label like 'has related person (CTO_0001009)'
    """
    # Extract the code from the URI
    code = property_uri.split('/')[-1] if '/' in property_uri else property_uri
    
    # Handle standard vocabularies
    if 'schema.org' in property_uri:
        return code
    if 'w3.org' in property_uri:
        return code.split('#')[-1] if '#' in code else code
    
    # Resolve CTO/NFDI codes
    if code.startswith('CTO_') or code.startswith('NFDI_'):
        resolved = resolve_ontology_code(code)
        if resolved['resolved'] and resolved['label'] != code:
            return f"{resolved['label']} ({code})"
    
    return code

def get_ontology_reference_table() -> pd.DataFrame:
    """
    Get a DataFrame with all resolved ontology codes for reference.
    
    Returns:
        DataFrame with columns: code, label, source, uri
    """
    if not _ontology_loaded:
        load_ontology_labels()
    
    rows = []
    for code, info in sorted(_ontology_cache.items()):
        rows.append({
            'code': code,
            'label': info['label'],
            'source': info['source'],
            'uri': info['uri']
        })
    
    return pd.DataFrame(rows)

# =============================================================================
# Load ontology on first run
# =============================================================================
ontology_labels = load_ontology_labels()

# Display summary
print("\n" + "="*70)
print("CTO/NFDI Ontology Code Reference (Auto-loaded from GitHub)")
print("="*70)

# Show some key properties used in CbDD dataset
key_codes = ['CTO_0001005', 'CTO_0001009', 'CTO_0001010', 'CTO_0001011',
             'CTO_0001019', 'CTO_0001026', 'CTO_0001073', 'CTO_0001021',
             'NFDI_0000004', 'NFDI_0000005', 'NFDI_0000008', 'NFDI_0000015']

print("\nKey properties used in the CbDD ceiling painting dataset:\n")
for code in key_codes:
    resolved = resolve_ontology_code(code)
    status = '[OK]' if resolved['resolved'] else '[??]'
    print(f"  {status} {code:15} -> {resolved['label']}")

print("\n" + "="*70)
print("\nOntology Sources:")
for name, info in ONTOLOGY_SOURCES.items():
    print(f"  - {name}: {info['url']}")

print("\nFunctions defined:")
print("   - resolve_ontology_code(code) -> resolve CTO/NFDI codes to labels")
print("   - resolve_property_name(uri) -> human-readable property names")
print("   - get_ontology_reference_table() -> DataFrame with all codes")
print("   - load_ontology_labels(force_reload=True) -> refresh from GitHub")

In [None]:
# =============================================================================
# GND Resolution using lobid.org API
# =============================================================================
# Resolves GND (Gemeinsame Normdatei) URIs to human-readable names.
# GND URIs are linked via NFDI_0001006 ("has external identifier") from:
#   - CTO_0001009 ("has related person") -> painters, commissioners
#   - CTO_0001011 ("has related location") -> buildings, places
#
# This integrates with the CTO/NFDI ontology resolver (cell 13).

import requests
from functools import lru_cache

@lru_cache(maxsize=1000)
def resolve_gnd_uri(gnd_uri: str) -> dict:
    """
    Resolve a GND URI to its preferred name using lobid.org API.
    
    GND URIs come from NFDI_0001006 ("has external identifier") linked to:
      - CTO_0001009: persons (painters, commissioners)
      - CTO_0001011: locations (buildings, places)
    
    Args:
        gnd_uri: A GND URI like 'https://d-nb.info/gnd/118636960'
        
    Returns:
        dict with 'name', 'type', 'uri', 'resolved' keys
    """
    result = {'uri': gnd_uri, 'name': None, 'type': None, 'resolved': False}
    
    if not gnd_uri or not isinstance(gnd_uri, str):
        return result
    
    try:
        # Extract GND ID from various URI formats
        gnd_id = gnd_uri.split('/')[-1].strip()
        
        # GND IDs can start with digits or X, and may contain hyphens
        if not gnd_id or len(gnd_id) < 3:
            return result
        
        # Query lobid.org API
        response = requests.get(
            f'https://lobid.org/gnd/{gnd_id}.json',
            headers={'Accept': 'application/json'},
            timeout=10
        )
        
        if response.ok:
            data = response.json()
            result['name'] = data.get('preferredName')
            type_val = data.get('type', [])
            if isinstance(type_val, list) and type_val:
                result['type'] = type_val[0]
            elif isinstance(type_val, str):
                result['type'] = type_val
            else:
                result['type'] = 'Unknown'
            result['resolved'] = result['name'] is not None
            
    except Exception as e:
        pass
    
    return result


def resolve_gnd_batch(gnd_uris: list) -> dict:
    """
    Resolve multiple GND URIs to names.
    
    Args:
        gnd_uris: List of GND URIs
        
    Returns:
        dict mapping URI -> resolved name (or '[GND ID]' if not resolved)
    """
    results = {}
    for uri in gnd_uris:
        if uri:
            resolved = resolve_gnd_uri(uri)
            results[uri] = resolved['name'] if resolved['resolved'] else f"[{uri.split('/')[-1]}]"
    return results


# Test GND resolution
print("Testing GND resolution via lobid.org...")
print("="*70)

# Show ontology context
print("\nüìã GND Resolution Context (from CTO/NFDI ontology):")
if 'resolve_ontology_code' in dir():
    for code in ['NFDI_0001006', 'CTO_0001009', 'CTO_0001011']:
        resolved = resolve_ontology_code(code)
        print(f"   {code}: {resolved['label']}")
else:
    print("   NFDI_0001006: has external identifier (-> GND URI)")
    print("   CTO_0001009: has related person")
    print("   CTO_0001011: has related location")

print("\n" + "="*70)
print("Sample GND resolutions:\n")

test_gnds = [
    "https://d-nb.info/gnd/118636960",  # Johann Baptist Zimmermann (painter)
    "https://d-nb.info/gnd/118579371",  # Max Emanuel (commissioner)
]

for gnd_uri in test_gnds:
    result = resolve_gnd_uri(gnd_uri)
    status = "‚úì" if result['resolved'] else "‚úó"
    print(f"{status} {result['name'] or 'Not found'}")
    print(f"   Type: {result['type']}")
    print(f"   URI: {gnd_uri}")
    print()

print("="*70)
print("‚úÖ GND resolution functions defined:")
print("   - resolve_gnd_uri(gnd_uri) -> resolve single GND URI")
print("   - resolve_gnd_batch(gnd_uris) -> resolve multiple GND URIs")
print("\nUsed to resolve persons (CTO_0001009) and locations (CTO_0001011).")

Testing GND resolution via lobid.org...
‚úì Zimmermann, Johann Baptist
   Type: Person
   URI: https://d-nb.info/gnd/118636960

‚úì Maximilian I., Heiliges R√∂misches Reich, Kaiser
   Type: AuthorityResource
   URI: https://d-nb.info/gnd/118579371


‚úÖ GND resolution functions defined:
   - resolve_gnd_uri(gnd_uri)
   - resolve_gnd_batch(gnd_uris)
‚úì Maximilian I., Heiliges R√∂misches Reich, Kaiser
   Type: AuthorityResource
   URI: https://d-nb.info/gnd/118579371


‚úÖ GND resolution functions defined:
   - resolve_gnd_uri(gnd_uri)
   - resolve_gnd_batch(gnd_uris)


In [None]:
# Enhanced painting query with CORRECTED CTO field interpretation:
# 
# Property Reference (from CTO/NFDI ontology - see cell 13):
#   CTO_0001073 = "has creation period" (year/date)
#   CTO_0001026 = "has external classifier" (ICONCLASS/AAT subjects)
#   CTO_0001011 = "has related location" (building/place GND)
#   CTO_0001009 = "has related person" (painters, commissioners via GND)
#   CTO_0001019 = "has related item" (part-of relationships)
#   CTO_0001021 = "has content url" (image URL)
#   CTO_0001007 = license information
#   NFDI_0001006 = "has external identifier" (GND URI link)

query_enhanced_paintings = f"""
SELECT DISTINCT ?painting ?label ?year ?lat ?lon ?imageUrl ?license
       (GROUP_CONCAT(DISTINCT ?iconclass; separator="|") AS ?subjects)
       (GROUP_CONCAT(DISTINCT ?locationGND; separator="|") AS ?locationGNDs)
       (GROUP_CONCAT(DISTINCT ?personGND; separator="|") AS ?personGNDs)
       ?parentUri ?parentLabel
WHERE {{
  {CBDD_FEED_URI} schema:dataFeedElement ?feedItem .
  ?feedItem schema:item ?painting .
  
  # Required: Title and image
  ?painting rdfs:label ?label .
  ?painting schema:associatedMedia ?image .
  ?image <https://nfdi4culture.de/ontology/CTO_0001021> ?imageUrl .  # has content url
  
  # Optional properties
  OPTIONAL {{ ?image <https://nfdi4culture.de/ontology/CTO_0001007> ?license . }}
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001073> ?year . }}  # has creation period
  OPTIONAL {{
    ?painting schema:latitude ?lat .
    ?painting schema:longitude ?lon .
  }}
  OPTIONAL {{ ?painting <https://nfdi4culture.de/ontology/CTO_0001026> ?iconclass . }}  # has external classifier
  
  # CTO_0001011 = "has related location" (building/place) - NOT painter!
  OPTIONAL {{
    ?painting <https://nfdi4culture.de/ontology/CTO_0001011> ?locationNode .
    ?locationNode <https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001006> ?locationGND .  # has external identifier
  }}
  
  # CTO_0001009 = "has related person" (painters, commissioners, related people)
  # These need to be filtered by GND profession after resolution
  OPTIONAL {{
    ?painting <https://nfdi4culture.de/ontology/CTO_0001009> ?personNode .
    ?personNode <https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001006> ?personGND .  # has external identifier
  }}
  
  # CTO_0001019 = "has related item" (ist Teil von / is part of) -> parent entity
  OPTIONAL {{
    ?painting <https://nfdi4culture.de/ontology/CTO_0001019> ?parentUri .
    FILTER(?parentUri != ?painting)  # Exclude self-references
    ?parentUri rdfs:label ?parentLabel .
  }}
}}
GROUP BY ?painting ?label ?year ?lat ?lon ?imageUrl ?license ?parentUri ?parentLabel
LIMIT 15
"""

df_enhanced = run_sparql(query_enhanced_paintings)

# Ensure optional columns exist (they may be missing if no data matches)
for col in ['parentLabel', 'parentUri', 'locationGNDs', 'personGNDs']:
    if col not in df_enhanced.columns:
        df_enhanced[col] = None

print(f"Fetched {len(df_enhanced)} paintings with enhanced metadata:")
print(f"  - With location data: {len(df_enhanced[df_enhanced['locationGNDs'].notna() & (df_enhanced['locationGNDs'] != '')])}")
print(f"  - With person data: {len(df_enhanced[df_enhanced['personGNDs'].notna() & (df_enhanced['personGNDs'] != '')])}")
print(f"  - With parent entity: {len(df_enhanced[df_enhanced['parentLabel'].notna()])}")

# Print resolved property names if ontology is loaded
print("\nüìã Properties used in this query:")
if 'resolve_ontology_code' in dir():
    for code in ['CTO_0001073', 'CTO_0001026', 'CTO_0001011', 'CTO_0001009', 'CTO_0001019', 'CTO_0001021', 'NFDI_0001006']:
        resolved = resolve_ontology_code(code)
        print(f"   {code}: {resolved['label']}")

# Show available columns
display_cols = [c for c in ['label', 'year', 'locationGNDs', 'personGNDs', 'parentLabel'] if c in df_enhanced.columns]
df_enhanced[display_cols].head(10)

Fetched 15 paintings with enhanced metadata:
  - With location data: 7
  - With person data: 11
  - With parent entity: 0


Unnamed: 0,label,year,locationGNDs,personGNDs,parentLabel
0,Die f√ºnf Sinne als Puttenszenen,1679,https://d-nb.info/gnd/4227611-1,https://d-nb.info/gnd/119105691|https://d-nb.i...,
1,"Sondershausen, Residenzschloss","1533-1914; 1533-1596, 1680-1725, 1762-1771, 18...",,https://d-nb.info/gnd/102995514X|https://d-nb....,
2,Die von Juno in einen Storch verwandelte Antigone,1542,https://d-nb.info/gnd/4034368-6,,
3,"Braunschweig, Schloss Richmond",1768-69,,https://d-nb.info/gnd/1034941445|https://d-nb....,
4,Wallfahrt zum Gnadenbild von Altbunzlau,ab 1734,https://d-nb.info/gnd/4104563-4,https://d-nb.info/gnd/118504606|https://d-nb.i...,
5,"Freiburg-Munzingen, Schloss","1672, 1760‚Äì1765",,https://d-nb.info/gnd/118864416|https://d-nb.i...,
6,"Malerei der T√ºrfl√ºgel: Mythologische, weidm√§nn...",,,,
7,Die Decke des westlichen Gartensaals,1762-1764,https://d-nb.info/gnd/4112706-7,https://d-nb.info/gnd/1139464094|https://d-nb....,
8,D√ºrmentinger Provenienz der Bildausstattung,1751/1752,,https://d-nb.info/gnd/124566081,
9,Stammwappen der V√∂hlin und unbekanntes Wappen ...,1751,,https://d-nb.info/gnd/119059541|https://d-nb.i...,


In [176]:
# Resolve GND URIs and classify persons by profession (painter vs commissioner)
print("Resolving GND URIs and classifying persons by profession...")
print("="*70)

# Enhanced GND resolution with profession info
@lru_cache(maxsize=1000)
def resolve_gnd_with_profession(gnd_uri: str) -> dict:
    """
    Resolve a GND URI to name AND profession using lobid.org API.
    Returns dict with 'name', 'type', 'professions', 'is_painter', 'resolved'
    """
    result = {'uri': gnd_uri, 'name': None, 'type': None, 'professions': [], 'is_painter': False, 'resolved': False}
    
    if not gnd_uri or not isinstance(gnd_uri, str):
        return result
    
    try:
        gnd_id = gnd_uri.split('/')[-1].strip()
        if not gnd_id or len(gnd_id) < 3:
            return result
        
        response = requests.get(
            f'https://lobid.org/gnd/{gnd_id}.json',
            headers={'Accept': 'application/json'},
            timeout=10
        )
        
        if response.ok:
            data = response.json()
            result['name'] = data.get('preferredName')
            result['type'] = data.get('type', ['Unknown'])[0] if data.get('type') else 'Unknown'
            
            # Extract professions
            for occ in data.get('professionOrOccupation', []):
                if isinstance(occ, dict):
                    result['professions'].append(occ.get('label', ''))
                else:
                    result['professions'].append(str(occ))
            
            # Check if painter (Maler, Malerin, Kirchenmaler, Freskenmaler, etc.)
            painter_keywords = ['maler', 'malerin', 'freskant', 'freskomaler', 'k√ºnstler']
            for prof in result['professions']:
                if any(kw in prof.lower() for kw in painter_keywords):
                    result['is_painter'] = True
                    break
            
            result['resolved'] = result['name'] is not None
            
    except Exception as e:
        pass
    
    return result

# Collect all unique GND URIs from locations and persons
all_gnds = set()

# Location GNDs
for val in df_enhanced['locationGNDs'].dropna():
    if val:
        for gnd in val.split('|'):
            gnd = gnd.strip()
            if gnd:
                all_gnds.add(gnd)

# Person GNDs
for val in df_enhanced['personGNDs'].dropna():
    if val:
        for gnd in val.split('|'):
            gnd = gnd.strip()
            if gnd:
                all_gnds.add(gnd)

print(f"Found {len(all_gnds)} unique GND URIs to resolve...\n")

# Resolve all GNDs with profession info
gnd_info = {}
for gnd in all_gnds:
    gnd_info[gnd] = resolve_gnd_with_profession(gnd)
    if gnd_info[gnd]['resolved']:
        type_str = gnd_info[gnd]['type']
        profs = ', '.join(gnd_info[gnd]['professions'][:2]) if gnd_info[gnd]['professions'] else 'N/A'
        painter_flag = "üé®" if gnd_info[gnd]['is_painter'] else ""
        print(f"  ‚úì {gnd_info[gnd]['name'][:35]:35} | {type_str[:20]:20} | {profs} {painter_flag}")

# Add resolved data to dataframe
def resolve_locations(gnd_string):
    """Resolve location GNDs to names."""
    if not gnd_string or pd.isna(gnd_string):
        return None
    names = []
    for gnd in gnd_string.split('|'):
        gnd = gnd.strip()
        if gnd and gnd in gnd_info:
            info = gnd_info[gnd]
            if info['resolved'] and 'Place' in info['type'] or 'Building' in info['type']:
                names.append(info['name'])
    return ', '.join(names) if names else None

def classify_persons(gnd_string):
    """Classify person GNDs into painters and non-painters."""
    if not gnd_string or pd.isna(gnd_string):
        return {'painters': None, 'others': None}
    
    painters = []
    others = []
    
    for gnd in gnd_string.split('|'):
        gnd = gnd.strip()
        if gnd and gnd in gnd_info:
            info = gnd_info[gnd]
            if info['resolved'] and 'Person' in info['type']:
                if info['is_painter']:
                    painters.append(info['name'])
                else:
                    others.append(info['name'])
    
    return {
        'painters': ', '.join(painters) if painters else None,
        'others': ', '.join(others) if others else None
    }

# Apply to dataframe
df_enhanced['location'] = df_enhanced['locationGNDs'].apply(resolve_locations)

person_classes = df_enhanced['personGNDs'].apply(classify_persons)
df_enhanced['painters'] = person_classes.apply(lambda x: x['painters'])
df_enhanced['other_persons'] = person_classes.apply(lambda x: x['others'])

# Show results
print("\n" + "="*70)
print("RESOLVED AND CLASSIFIED DATA:")
print("="*70)

for idx, row in df_enhanced.head(8).iterrows():
    print(f"\n{row['label'][:50]}...")
    if row.get('location'):
        print(f"   üèõÔ∏è Location: {row['location']}")
    if row.get('painters'):
        print(f"   üé® Painter(s): {row['painters']}")
    if row.get('other_persons'):
        print(f"   üë§ Other persons: {row['other_persons']}")
    if row.get('parentLabel'):
        print(f"   üì¶ Part of: {row['parentLabel']}")

# Summary
print("\n" + "="*70)
print("üìä Classification Summary:")
painters_count = len(df_enhanced[df_enhanced['painters'].notna()])
locations_count = len(df_enhanced[df_enhanced['location'].notna()])
others_count = len(df_enhanced[df_enhanced['other_persons'].notna()])
print(f"   With painter info: {painters_count}/{len(df_enhanced)}")
print(f"   With location info: {locations_count}/{len(df_enhanced)}")
print(f"   With other persons: {others_count}/{len(df_enhanced)}")

Resolving GND URIs and classifying persons by profession...
Found 41 unique GND URIs to resolve...

  ‚úì Wunder, Wilhelm Ernst               | AuthorityResource    | K√ºnstler, Maler üé®
  ‚úì Markgr√§fliches Opernhaus (Bayreuth) | AuthorityResource    | N/A 
  ‚úì Kuen, Franz Martin                  | DifferentiatedPerson | K√ºnstler, Maler üé®
  ‚úì Markgr√§fliches Opernhaus (Bayreuth) | AuthorityResource    | N/A 
  ‚úì Kuen, Franz Martin                  | DifferentiatedPerson | K√ºnstler, Maler üé®
  ‚úì Asam, Egid Quirin                   | DifferentiatedPerson | Architekt, Bildhauer üé®
  ‚úì Friedrich, Brandenburg-Bayreuth, Ma | AuthorityResource    | N/A 
  ‚úì Asam, Egid Quirin                   | DifferentiatedPerson | Architekt, Bildhauer üé®
  ‚úì Friedrich, Brandenburg-Bayreuth, Ma | AuthorityResource    | N/A 
  ‚úì Geiger, Franz Josef                 | AuthorityResource    | K√ºnstler, Maler üé®
  ‚úì Heidecksburg (Rudolstadt)           | AuthorityResource    | N

In [None]:
# Enhanced display function with CORRECTED data fields
# Integrates with CTO/NFDI ontology resolver for property documentation
from IPython.display import HTML, display

def display_painting_full(row, max_width=500, resolve_subjects=True):
    """
    Display a painting with complete metadata including:
    - Basic info (title, year, image)
    - Resolved subject labels (ICONCLASS/AAT via CTO_0001026 "has external classifier")
    - Location (building/place from CTO_0001011 "has related location")
    - Painter names (from CTO_0001009 "has related person" with painter profession)
    - Other persons (commissioners, etc. from CTO_0001009)
    - Hierarchy info (part of via CTO_0001019 "has related item")
    - Coordinates (original or enriched from Wikidata)
    
    Property references resolved via CTO/NFDI ontology (cell 13).
    """
    label = row.get('label', 'Unknown')
    year = row.get('year', 'Unknown date')
    image_url = row.get('imageUrl', '')
    subjects = row.get('subjects', '')
    lat = row.get('lat')
    lon = row.get('lon')
    painting_uri = row.get('painting', '')
    painters = row.get('painters', '')
    location = row.get('location', '')  # from CTO_0001011 (has related location)
    other_persons = row.get('other_persons', '')  # from CTO_0001009 (non-painters)
    parent_label = row.get('parentLabel', '')  # from CTO_0001019 (has related item)
    geo_source = row.get('geo_source', 'original')
    matched_place = row.get('matched_place', '')
    wikidata_place = row.get('wikidata_place', '')
    
    # Coordinates section
    if lat is not None and str(lat) != 'nan' and lat != '':
        if geo_source == 'wikidata':
            coord_html = f'''<p style="color: #000;">
                üìç <span style="background: #9C27B0; color: white; padding: 2px 6px; border-radius: 4px; font-size: 11px;">Wikidata</span>
                {float(lat):.4f}, {float(lon):.4f}
                <br><small style="color: #666;">Matched: <a href="{wikidata_place}" target="_blank">{matched_place}</a></small>
            </p>'''
        else:
            coord_html = f'<p style="color: #000;">üìç {lat}, {lon}</p>'
    else:
        coord_html = ''
    
    # Location (building/place) section - CTO_0001011: has related location
    if location and pd.notna(location):
        location_html = f'''<p style="color: #000;">
            <strong>üèõÔ∏è Location:</strong> {location}
        </p>'''
    else:
        location_html = ''
    
    # Painter section - from CTO_0001009: has related person (classified by GND profession)
    if painters and pd.notna(painters):
        painter_html = f'''<p style="color: #000;">
            <strong>üé® Painter:</strong> {painters}
        </p>'''
    else:
        painter_html = ''
    
    # Other persons section (commissioners, patrons, etc.) - from CTO_0001009
    if other_persons and pd.notna(other_persons):
        other_html = f'''<p style="color: #000;">
            <strong>üë§ Related persons:</strong> {other_persons}
        </p>'''
    else:
        other_html = ''
    
    # Part-of section - CTO_0001019: has related item
    if parent_label and pd.notna(parent_label):
        parent_html = f'''<p style="color: #000;">
            <strong>üì¶ Part of:</strong> {parent_label}
        </p>'''
    else:
        parent_html = ''
    
    # Resolve subject labels - CTO_0001026: has external classifier
    subject_html_items = []
    if subjects and resolve_subjects:
        # Handle both comma and pipe separators
        separator = '|' if '|' in subjects else ','
        subject_list = [s.strip() for s in subjects.split(separator) if s.strip()]
        for uri in subject_list[:5]:  # Limit to 5 subjects
            # Use resolve_subject_from_sparql if available (defined in cell 27)
            if 'resolve_subject_from_sparql' in dir():
                resolved = resolve_subject_from_sparql(uri)
            else:
                # Fallback resolution
                code = uri.split('/')[-1]
                resolved = {'label': f'[{code}]', 'source': 'ICONCLASS' if 'iconclass' in uri else 'AAT', 'code': code}
            
            badge_color = '#4CAF50' if 'iconclass' in uri.lower() else '#2196F3'
            subject_html_items.append(
                f'<span style="background: {badge_color}; color: white; padding: 2px 8px; '
                f'border-radius: 12px; font-size: 12px; margin: 2px; display: inline-block;" '
                f'title="{resolved["source"]}: {resolved["code"]}">{resolved["label"]}</span>'
            )
    subject_html = ''.join(subject_html_items) if subject_html_items else '<em>No subjects</em>'
    
    html = f"""
    <div style="border: 1px solid #ddd; padding: 15px; margin: 10px 0; border-radius: 8px; background: #fafafa;">
        <h3 style="margin-top: 0; color: #333;">{label}</h3>
        <p style="color: #000;"><strong>Date:</strong> {year}</p>
        {location_html}
        {painter_html}
        {other_html}
        {parent_html}
        <div style="margin: 10px 0;">
            <strong style="color: #000;">Subjects:</strong><br>
            <div style="margin-top: 5px;">{subject_html}</div>
        </div>
        {coord_html}
        <p><a href="{painting_uri}" target="_blank" style="color: #0066cc;">üîó View in CbDD</a></p>
        <img src="{image_url}" style="max-width: {max_width}px; max-height: 500px; border-radius: 4px;" 
             onerror="this.onerror=null; this.src=''; this.alt='Image could not be loaded';">
    </div>
    """
    display(HTML(html))

print("‚úÖ Full display function defined: display_painting_full(row)")
print("   Shows: title, date, location, painter, related persons, hierarchy, subjects, coordinates, image")
print("\nüìã Property mapping (from CTO/NFDI ontology):")
if 'resolve_ontology_code' in dir():
    props = ['CTO_0001073', 'CTO_0001026', 'CTO_0001011', 'CTO_0001009', 'CTO_0001019']
    for code in props:
        resolved = resolve_ontology_code(code)
        print(f"   {code}: {resolved['label']}")

‚úÖ Full display function defined: display_painting_full(row)
   Shows: title, date, location, painter, related persons, hierarchy, subjects, coordinates, image


In [178]:
# Display paintings with full metadata
print("Displaying paintings with full metadata:")
print("  üèõÔ∏è Location (building/place) from CTO_0001011")
print("  üé® Painter names classified by GND profession")
print("  üë§ Other related persons (commissioners, patrons, etc.)")
print("  üì¶ Hierarchy info (part-of relations)")
print("  üîµ Getty AAT | üü¢ ICONCLASS subjects")
print("="*70 + "\n")

# Display top paintings that have painter info
for idx, row in df_enhanced[df_enhanced['painters'].notna()].head(5).iterrows():
    display_painting_full(row)
    time.sleep(0.2)

Displaying paintings with full metadata:
  üèõÔ∏è Location (building/place) from CTO_0001011
  üé® Painter names classified by GND profession
  üë§ Other related persons (commissioners, patrons, etc.)
  üì¶ Hierarchy info (part-of relations)
  üîµ Getty AAT | üü¢ ICONCLASS subjects



### Data Pipeline Summary

The notebook implements a complete data pipeline for enriching Baroque ceiling painting data, with **automatic ontology resolution** for all CTO/NFDI property codes.

| Step | Source | Data Retrieved |
|------|--------|----------------|
| 0. Ontology Resolution | GitHub (cto.ttl, nfdicore.ttl) | Human-readable labels for 267 CTO/NFDI codes |
| 1. Basic Query | NFDI4Culture KG | Title, year, image, coordinates, subjects, hierarchy |
| 2. GND Resolution | lobid.org API | Location names, person names (with profession classification) |
| 3. Subject Resolution | ICONCLASS/Getty SPARQL | Human-readable subject labels |
| 4. Geo Enrichment | Wikidata SPARQL | Missing coordinates from place names |

**üìã Schema Reference (auto-resolved from CTO/NFDI ontology):**

| Code | Label (from ontology) | Description |
|------|----------------------|-------------|
| `CTO_0001011` | has related location | Buildings/places ‚Üí `NFDI_0001006` ‚Üí GND URI |
| `CTO_0001009` | has related person | Painters, commissioners ‚Üí `NFDI_0001006` ‚Üí GND URI |
| `CTO_0001019` | has related item | Part-of relationships (hierarchy) |
| `CTO_0001026` | has external classifier | ICONCLASS/AAT subject codes |
| `CTO_0001073` | has creation period | Year/date of creation |
| `CTO_0001021` | has content url | Image URL |
| `NFDI_0001006` | has external identifier | Links to GND URIs |

**üé® Painter Classification:**
Persons from `CTO_0001009` are classified as painters if their GND `professionOrOccupation` contains keywords like: "Maler", "Malerin", "Freskenmaler", "Kirchenmaler"

**üîß Key Functions (use ontology resolver):**
- `resolve_ontology_code(code)` ‚Üí resolve CTO/NFDI codes to labels
- `resolve_property_name(uri)` ‚Üí human-readable property names
- `get_painting_metadata(uri)` ‚Üí all metadata with resolved property names
- `resolve_subject_from_sparql(uri)` ‚Üí ICONCLASS/AAT labels
- `resolve_gnd_uri(uri)` ‚Üí person/place names from GND

In [None]:
# =============================================================================
# Subject Resolution via External SPARQL Endpoints
# =============================================================================
# Resolves subject URIs from CTO_0001026 ("has external classifier") to labels
# using the official ICONCLASS and Getty AAT SPARQL endpoints.
#
# Integrates with the CTO/NFDI ontology resolver (cell 13) for consistent
# property name resolution throughout the notebook.

import requests
import time
from functools import lru_cache
import urllib.parse

@lru_cache(maxsize=500)
def query_iconclass_sparql(notation):
    """Query ICONCLASS SPARQL endpoint for a label."""
    try:
        # URL-decode the notation (e.g., "48C14%28SCHEINARCHITEKTUR%29" -> "48C14(SCHEINARCHITEKTUR)")
        notation_decoded = urllib.parse.unquote(notation)
        
        endpoint = "https://iconclass.org/sparql"
        query = f"""
        PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
        
        SELECT ?label
        WHERE {{
          <https://iconclass.org/{notation_decoded}> skos:prefLabel ?label .
          FILTER(LANG(?label) = "en")
        }}
        LIMIT 1
        """.strip()  # IMPORTANT: strip whitespace!
        
        resp = requests.get(
            endpoint,
            params={'query': query, 'format': 'json'},
            headers={'Accept': 'application/sparql-results+json'},
            timeout=10
        )
        if resp.ok:
            data = resp.json()
            bindings = data.get("results", {}).get("bindings", [])
            if bindings:
                return bindings[0].get("label", {}).get("value")
    except Exception as e:
        pass
    return None

@lru_cache(maxsize=500)
def query_getty_sparql(aat_id):
    """Query Getty AAT SPARQL endpoint for a label using gvp:prefLabelGVP."""
    try:
        endpoint = "http://vocab.getty.edu/sparql"
        # Getty uses gvp:prefLabelGVP/xl:literalForm for preferred labels
        # IMPORTANT: Must strip whitespace - Getty returns empty response if query has leading whitespace!
        query = f"""
PREFIX gvp: <http://vocab.getty.edu/ontology#>
PREFIX xl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX aat: <http://vocab.getty.edu/aat/>

SELECT ?label
WHERE {{
  aat:{aat_id} gvp:prefLabelGVP/xl:literalForm ?label .
}}
LIMIT 1
""".strip()
        
        resp = requests.get(
            endpoint,
            params={'query': query, 'format': 'json'},
            headers={'Accept': 'application/sparql-results+json'},
            timeout=10
        )
        if resp.ok and resp.text:  # Also check response is not empty
            data = resp.json()
            bindings = data.get("results", {}).get("bindings", [])
            if bindings:
                return bindings[0].get("label", {}).get("value")
    except Exception as e:
        pass
    return None

def resolve_subject_from_sparql(uri):
    """
    Resolve a subject URI to its label using external SPARQL endpoints.
    
    Handles subjects from CTO_0001026 ("has external classifier"):
    - ICONCLASS: iconographic classification for art
    - Getty AAT: Art & Architecture Thesaurus
    
    Args:
        uri: Subject URI (e.g., 'https://iconclass.org/92D1521' or 'http://vocab.getty.edu/aat/300004792')
    
    Returns:
        dict with 'uri', 'code', 'label', 'source', 'resolved' keys
    """
    code = uri.split('/')[-1]
    
    if 'iconclass.org' in uri:
        label = query_iconclass_sparql(code)
        source = 'ICONCLASS'
    elif 'vocab.getty.edu' in uri:
        label = query_getty_sparql(code)
        source = 'Getty AAT'
    else:
        label = None
        source = 'Unknown'
    
    return {
        'uri': uri,
        'code': code,
        'label': label or f'[{code}]',
        'source': source,
        'resolved': label is not None
    }

# Test with sample codes
print("Testing external SPARQL endpoints for subject resolution...")
print("="*70)
print(f"\nSubjects come from CTO_0001026", end="")
if 'resolve_ontology_code' in dir():
    resolved = resolve_ontology_code('CTO_0001026')
    print(f" ({resolved['label']})")
else:
    print(" (has external classifier)")

print("\n1. ICONCLASS tests:")
for code in ["92D1521", "25HH", "5"]:
    label = query_iconclass_sparql(code)
    print(f"   {code}: {label}")

print("\n2. Getty AAT tests (using gvp:prefLabelGVP/xl:literalForm):")
for code in ["300004792", "300411453"]:
    label = query_getty_sparql(code)
    print(f"   {code}: {label}")

print("\n" + "="*70)
print("‚úÖ Functions defined:")
print("   - resolve_subject_from_sparql(uri) -> resolve ICONCLASS/AAT URIs to labels")
print("   - query_iconclass_sparql(notation) -> query ICONCLASS endpoint")
print("   - query_getty_sparql(aat_id) -> query Getty AAT endpoint")
print("\nThese integrate with CTO_0001026 ('has external classifier') property.")

Testing external SPARQL endpoints...

1. ICONCLASS tests:
   92D1521: Cupid shooting a dart
   25HH: landscapes - HH - ideal landscapes
   5: Abstract Ideas and Concepts

2. Getty AAT tests (using gvp:prefLabelGVP/xl:literalForm):
   25HH: landscapes - HH - ideal landscapes
   5: Abstract Ideas and Concepts

2. Getty AAT tests (using gvp:prefLabelGVP/xl:literalForm):
   300004792: buildings (structures)
   300004792: buildings (structures)
   300411453: ceiling paintings

‚úÖ Functions defined: resolve_subject_from_sparql(uri)
   300411453: ceiling paintings

‚úÖ Functions defined: resolve_subject_from_sparql(uri)


## 4. Compare CbDD and Color Slide Archive of Wall and Ceiling Painting

Portal IDs from the registry:
- CbDD: `n4c:E4264`
- Color Slide Archive: `n4c:E4267`

Goal: Count how many records in the KG come from each of these portals.

We assume a pattern similar to:
- `?item schema:isPartOf ?feed`
- `?feed schema:isPartOf ?portal` or `?feed dcterms:isPartOf ?portal`

You may have to adjust the property in the middle depending on what you see in the inspection of the feed nodes.

In [182]:
query_ceiling_portal_counts = """\
SELECT ?portal ?portalLabel (COUNT(DISTINCT ?item) AS ?records)
WHERE {
  VALUES ?portal { n4c:E4264  n4c:E4267 }

  # feed belongs to one of the two portals
  ?feed ?isPartOfPortal ?portal .
  FILTER(?isPartOfPortal IN (schema:isPartOf, dcterms:isPartOf))

  # items belong to that feed
  ?item schema:isPartOf ?feed .

  ?portal schema:name ?portalLabel .
}
GROUP BY ?portal ?portalLabel
ORDER BY DESC(?records)
"""

df_ceiling_portal_counts = run_sparql(query_ceiling_portal_counts)
df_ceiling_portal_counts

In [181]:
# Simple bar chart of records per portal (CbDD vs Color Slide Archive)
if not df_ceiling_portal_counts.empty:
    plt.figure(figsize=(6, 4))
    plt.bar(df_ceiling_portal_counts["portalLabel"], df_ceiling_portal_counts["records"].astype(int))
    plt.xticks(rotation=20, ha="right")
    plt.ylabel("Number of records in KG")
    plt.title("Records from baroque wall & ceiling painting portals")
    plt.tight_layout()
    plt.show()
else:
    print("No results yet. Check if the intermediate predicate (?isPartOfPortal) is correct.")

No results yet. Check if the intermediate predicate (?isPartOfPortal) is correct.
