## Data Normalization and Metadata Preparation
In this section, I normalize event titles using string operations. The raw JSON data, located in the folder `Ibsenstage_raw`, is flattened using `pandas.json_normalize()` for easier manipulation.
I remove the `venuecountry` column, since the dataset is focused exclusively on performances in Norway. To standardize the naming of events, I build a canonical list of titles using the `worktitle` field, then construct regular expressions to match common title variants. Some well-known plays (i.e. `Et dukkehjem`) have additional hardcoded variant patterns (i.e. "Nora", "Casa di bambola", etc.).
The `eventname` field is then normalized by matching against these compiled regex patterns.

In [1]:
from pathlib import Path

# Create and set the default output directory
STAGED_DIR = Path.cwd() / "Ibsenstage_staged"
STAGED_DIR.mkdir(parents=True, exist_ok=True)

# Patch open() and Path functions to redirect outputs to STAGED_DIR
def stage_path(filename):
    return STAGED_DIR / filename

# Optimized normalization of event titles using vectorized pandas methods
import json, re, shutil, os
from pathlib import Path
import pandas as pd

# 1) Locate the input file (within Ibsenstage_raw folder)
raw_dir = Path.cwd() / 'Ibsenstage_raw'
matches = list(raw_dir.rglob('IbsenStage_scrape.json'))
if not matches:
    raise FileNotFoundError('IbsenStage_scrape.json not found in Ibsenstage_raw folder.')
src = matches[0]
print('Using source file:', src)

# 2) Ensure output folder exists
staged_dir = Path.cwd() / 'Ibsenstage_staged'
staged_dir.mkdir(exist_ok=True)
json_out = staged_dir / 'IbsenStage_normalized.json'

# 3) Load & flatten JSON
with open(src, 'r', encoding='utf-8') as f:
    root = json.load(f)
records = root.get('hits', root)
ibsen_df = pd.json_normalize(records, sep='_')

# removing 'venuecountry'
if 'venuecountry' in ibsen_df.columns:
    ibsen_df = ibsen_df.drop(columns=['venuecountry'])

# 4) Canonical regex patterns for work titles
unique_titles = (
    ibsen_df['worktitle']
    .dropna()
    .astype(str)
    .str.strip()
    .sort_values()
    .unique()
)
canonical = {}
for title in unique_titles:
    safe = re.escape(title).replace('\\\\ ', '[\\\\s_-]*')
    canonical[title] = ['^' + safe + '$']
extra_variants = {
    'Et dukkehjem'   : ['^a doll.*house$', '^ett[\\\\s_-]*dockhem$', '^casa[\\\\s_-]*di[\\\\s_-]*bambola$', '^nora$'],
    'Gjengangere'    : ['^ghosts$', '^spettri$'],
    'En folkefiende' : ['^an enemy.*people$'],
    'Vildanden'      : ['^the[\\\\s_-]*wild[\\\\s_-]*duck$'],
}
for canon, pats in extra_variants.items():
    canonical.setdefault(canon, []).extend(pats)

# 5) Compile all patterns and build matcher
pattern_map = [
    (re.compile(pat, re.IGNORECASE), canon)
    for canon, pats in canonical.items()
    for pat in pats
]

def normalize_title(txt):
    if pd.isna(txt): return txt
    low = str(txt).strip().lower()
    for pat, canon in pattern_map:
        if pat.match(low): return canon
    return txt

# 6) Normalize eventname IN PLACE
ibsen_df['eventname'] = ibsen_df['eventname'].apply(normalize_title)

# 7) Save updated JSON to staged folder
ibsen_df.to_json(json_out, orient='records', force_ascii=False, indent=2)
print('Saved normalized JSON →', json_out)

# 8) Preview
preview = json.loads(ibsen_df.head(3).to_json(orient='records', force_ascii=False))
print('Preview:')
print(json.dumps(preview, ensure_ascii=False, indent=2))

Using source file: c:\Users\Cristiano (CC)\Desktop\Cristiano-June25\OsloMet\Masterstudium i bibliotek- og informasjonsvitenskap - deltid\MBIB4140 - Metadata og interoperabilitet\2ndre sjansen\Ibsenstage_raw\IbsenStage_scrape.json
Saved normalized JSON → c:\Users\Cristiano (CC)\Desktop\Cristiano-June25\OsloMet\Masterstudium i bibliotek- og informasjonsvitenskap - deltid\MBIB4140 - Metadata og interoperabilitet\2ndre sjansen\Ibsenstage_staged\IbsenStage_normalized.json
Preview:
[
  {
    "eventname": "Hedda Gabler",
    "eventid": 85542,
    "first_date": "1983-11-12",
    "workid": 8547.0,
    "worktitle": "Hedda Gabler",
    "venueid": 14985,
    "venuename": "Honningsvåg kino"
  },
  {
    "eventname": "Hedda Gabler",
    "eventid": 85543,
    "first_date": "1983-11-14",
    "workid": 8547.0,
    "worktitle": "Hedda Gabler",
    "venueid": 14981,
    "venuename": "Vadsø kino"
  },
  {
    "eventname": "Hedda Gabler",
    "eventid": 85544,
    "first_date": "1983-11-15",
    "workid": 

Here I install Geopy - it will be useful in the coming cells

In [2]:
import sys
!{sys.executable} -m pip install geopy==2.4.1


Defaulting to user installation because normal site-packages is not writeable


### Loading and Preparing the Dataset
The dataset is loaded and prepared with necessary libraries. This step ensures that I can apply transformations in a controlled and repeatable environment. I follow FAIR principles, especially focusing on reusability and interoperability. With this cell I give information about the cities connected to the venues, populating over 60% of the keys venuecity with actual city names thanks to GeoPy and GeoNames. 
To increase the accuracy, if the name of the city or a variation of it is included in the key `venuenames`, it will be mapped in the new key `venuecity` (i.e. "Teater i Trondheim" will give "Trondheim"). 

Here I also try to resolve some incongruencies, expecially related to Oslo, so that the key `venuecity` will connect back to that city in the instances of variation of Kristiania or Nationaltheatret. I also include some more common overrides to enhance the population. Ensuring the presence of `venuecity` is meaningful for a fallback when I will map `venueid` to URIs.

I initialize a persistent cache to store previously resolved venue-to-city mappings (`venue_geocode_cache.pkl`). A separate city list from the GeoNames API (limited to Norway) is also cached to avoid repeated calls. Unicode normalization is applied to venue names to reduce inconsistencies due to accents or formatting, and the resolved cities are added to the dataset. 

In [3]:
import os
import re
import json
import pickle
import pandas as pd
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import urllib.request, urllib.parse
from urllib.error import HTTPError, URLError
import unicodedata
from rapidfuzz import process, fuzz
import time

# Configuration
DATA_FOLDER     = "Ibsenstage_staged"
CACHE_PATH      = os.path.join(DATA_FOLDER, "venue_geocode_cache.pkl")
INPUT_FILE      = os.path.join(DATA_FOLDER, "IbsenStage_normalized.json")
OUTPUT_FILE     = os.path.join(DATA_FOLDER, "IbsenStage_with_city.json")
GEONAMES_USER   = "MBIB4140_ibsen_user"  # your GeoNames username
GEONAMES_COUNTRY = "NO"

# Nominatim Setup
geolocator = Nominatim(user_agent="ibsen_city_extractor", timeout=5)  # Reduced timeout
geocode    = RateLimiter(geolocator.geocode, min_delay_seconds=0.5, max_retries=1)  # Faster rate limiting

# Cache
if os.path.exists(CACHE_PATH):
    with open(CACHE_PATH, "rb") as f:
        cache = pickle.load(f)
else:
    cache = {}

# Norway cities list (cached)
def load_norway_cities():
    cache_file = os.path.join(DATA_FOLDER, "norway_cities_cache.pkl")
    if os.path.exists(cache_file):
        with open(cache_file, "rb") as f:
            return pickle.load(f)
    
    qs = urllib.parse.urlencode({
        "country": GEONAMES_COUNTRY,
        "featureClass": "P",
        "maxRows": 2000,
        "username": GEONAMES_USER
    })
    url = f"http://api.geonames.org/searchJSON?{qs}"
    try:
        with urllib.request.urlopen(url, timeout=10) as resp:
            data = json.load(resp)
            cities = {item["name"] for item in data.get("geonames", [])}
            # Cache the cities list
            with open(cache_file, "wb") as f:
                pickle.dump(cities, f)
            return cities
    except Exception:
        return set()

norway_cities = load_norway_cities()

# Precompute normalized overrides
OVERRIDES = {
    "nationaltheatret": "Oslo", "kristiania": "Oslo", "christiania": "Oslo",
    "det norske teatret": "Oslo", "black box teater": "Oslo", "oslo nye": "Oslo",
    "trøndelag teater": "Trondheim", "rosendal teater": "Trondheim",
    "den nationale scene": "Bergen", "dns": "Bergen", "hordaland teater": "Bergen",
    "kilden": "Kristiansand", "agder teater": "Kristiansand",
    "hålogaland teater": "Tromsø", "rogaland teater": "Stavanger",
    "teater innsikt": "Stavanger", "teater i drammen": "Drammen",
    "teater i fredrikstad": "Fredrikstad", "teater i moss": "Moss",
    "teater i ålesund": "Ålesund", "teater i bodø": "Bodø",
    "teater i tromsø": "Tromsø", "teater i sarpsborg": "Sarpsborg",
    "teater i skien": "Skien", "teater i hamar": "Hamar",
    "teater i sandnes": "Sandnes"
}

# Precompute normalized overrides for faster lookup
def normalize(txt):
    txt = unicodedata.normalize('NFKD', txt)
    txt = "".join(c for c in txt if not unicodedata.combining(c))
    return re.sub(r'[^a-z0-9]', '', txt.lower())

# Create normalized override mapping
NORMALIZED_OVERRIDES = {normalize(k): v for k, v in OVERRIDES.items()}
NORMALIZED_OVERRIDE_KEYS = list(NORMALIZED_OVERRIDES.keys())

# Constants
CITY_KEYS = [
    "city", "town", "village", "municipality", "hamlet",
    "locality", "county", "state_district", "state",
    "region", "district", "suburb"
]

# Optimized city extraction logic
def get_city_for(venue_name: str) -> str | None:
    if not venue_name or pd.isna(venue_name):
        return None
    name = venue_name.strip()
    
    # 1. Check cache first (done for optimization)
    if name in cache:
        return cache[name]
    
    key = name.lower()
    norm = normalize(key)

    # 2. Hard overrides (using precomputed normalized keys)
    for norm_key, city in NORMALIZED_OVERRIDES.items():
        if norm_key in norm:
            cache[name] = city
            return city

    # 3. Fuzzy match (only if no direct match found)
    if len(norm) > 3:  # Skip very short names for fuzzy matching
        match, score, _ = process.extractOne(norm, NORMALIZED_OVERRIDE_KEYS, scorer=fuzz.partial_ratio)
        if score > 85:  # Slightly lower threshold for better performance
            matched_city = NORMALIZED_OVERRIDES[match]
            cache[name] = matched_city
            return matched_city

    # 4. Token match against known cities (before expensive API calls)
    for token in re.split(r'[,/()\-\s]+', key):
        token_cap = token.capitalize()
        if token_cap in norway_cities:
            cache[name] = token_cap
            return token_cap

    # 5. Geopy lookup (only for promising candidates)
    if len(name) > 2 and any(char.isalpha() for char in name):
        try:
            loc = geocode(f"{name}, Norway", addressdetails=True, country_codes="no")
            if loc and (addr := loc.raw.get("address")):
                for k in CITY_KEYS:
                    if k in addr:
                        cache[name] = addr[k]
                        return addr[k]
        except Exception:
            pass

    # 6. GeoNames fallback (only for very specific cases)
    if len(name) > 3 and name.count(' ') <= 2:  # Skip complex names
        try:
            qs = urllib.parse.urlencode({
                "q": name,
                "country": GEONAMES_COUNTRY,
                "maxRows": 1,
                "username": GEONAMES_USER
            })
            url = f"http://api.geonames.org/searchJSON?{qs}"
            with urllib.request.urlopen(url, timeout=3) as resp:  # Reduced timeout
                data = json.load(resp)
                if data.get("geonames"):
                    city = data["geonames"][0]["name"]
                    cache[name] = city
                    return city
        except Exception:
            pass

    # 7. Record failure
    cache[name] = None
    return None

# Load and process input
print("Loading input data...")
with open(INPUT_FILE, "r", encoding="utf-8") as f:
    records = json.load(f)

# Get unique venue names and filter out already cached ones
unique_names = sorted({rec.get("venuename") for rec in records if rec.get("venuename")})
uncached_names = [name for name in unique_names if name not in cache]

print(f"Total unique venues: {len(unique_names)}")
print(f"Already cached: {len(unique_names) - len(uncached_names)}")
print(f"Need to process: {len(uncached_names)}")

# Process only uncached names
start_time = time.time()
for idx, vn in enumerate(uncached_names, start=1):
    city = get_city_for(vn)
    elapsed = time.time() - start_time
    rate = idx / elapsed if elapsed > 0 else 0
    print(f"\r[{idx:4d}/{len(uncached_names):4d}] {vn[:30]:30s} → {city} ({rate:.1f}/s)", end="")
    
    # Save cache more frequently for long-running processes
    if idx % 50 == 0:
        with open(CACHE_PATH, "wb") as cf:
            pickle.dump(cache, cf)

print(f"\n✔ Processing complete in {time.time() - start_time:.1f} seconds")

# Annotate and save
print("Annotating records and saving...")
for rec in records:
    vn = rec.get("venuename") or ""
    rec["venuecity"] = get_city_for(vn) or ""

with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
    json.dump(records, f, ensure_ascii=False, indent=2)
with open(CACHE_PATH, "wb") as cf:
    pickle.dump(cache, cf)

# Stats
filled = sum(1 for r in records if r.get("venuecity") and r["venuecity"].strip())
total_records = len(records)
print(f"venuecity filled for {filled} of {total_records} records ({(filled / total_records) * 100:.2f}%)")

Loading input data...
Total unique venues: 1302
Already cached: 1297
Need to process: 5
[   5/   5] Vigelandsstua ved Lindesnes un → None (0.0/s)0/s)
✔ Processing complete in 0.0 seconds
Annotating records and saving...
venuecity filled for 3245 of 4924 records (65.90%)


The script above pre-processes only those unique venue names for geocoding, and later all 4900+ records are annotated using the chached lookups.

### Mapping Works and Venues to authoritative IDs

The next step is to map every key `workid` and `venueid` with an authoritative URI, to resolve external Wikidata URIs for works/venue. To achieve this result, the optimal way is to keep the original keys for `workid` and `venueid` from IbsenStage and add the keys `workURI` and `venueURI` to the json. First I will map the works and then the venues.

A SPARQL query is sent to the Wikidata endpoint to retrieve all known works (`wdt:P800`) attributed to Henrik Ibsen (`wd:Q36661`). Labels are filtered to include multiple languages (en, no, nb, nn) and normalized to lowercase for matching. The script then attempts to match each worktitle from the dataset to a Wikidata label in two passes:

1. Exact match based on normalized title.

2. Partial string match (i.e. “Gjengangere” may match “Ghosts” or “The Ghosts”).

If a match is found, a new field workURI is added with the corresponding Wikidata URI 

In [6]:
import json
import pandas as pd
import requests

# Load the dataset
with open('Ibsenstage_staged/IbsenStage_with_city.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

df = pd.DataFrame(data)

# Query Wikidata for Henrik Ibsen's notable works
def get_ibsen_works():
    sparql_query = """
    SELECT ?work ?label WHERE {
      wd:Q36661 wdt:P800 ?work .
      ?work rdfs:label ?label .
      FILTER(LANG(?label) IN ("en", "no", "nb", "nn"))
    }
    """
    url = "https://query.wikidata.org/sparql"
    headers = {'User-Agent': 'IbsenStage-Pipeline/1.0'}
    params = {'query': sparql_query, 'format': 'json'}
    response = requests.get(url, headers=headers, params=params)
    if response.status_code == 200:
        return response.json()
    else:
        raise Exception(f"SPARQL query failed with status code {response.status_code}")

print("Fetching Ibsen works from Wikidata...")
wikidata_result = get_ibsen_works()
#print (wikidata_result.head(5))

# Build title → QID mapping from SPARQL result
wikidata_mapping = {}
for item in wikidata_result['results']['bindings']:
    uri = item['work']['value']
    qid = uri.split('/')[-1]  # Extract QID from URI (e.g., Q1432009)
    label = item['label']['value'].lower().strip()
    wikidata_mapping[label] = qid

print(f"Found {len(wikidata_mapping)} labels from Wikidata")
#wikidata_mapping.head(5)
# Function to map title to QID
def map_to_qid(title):
    if pd.isna(title) or not title.strip():
        return None
    title_norm = title.lower().strip()
    
    # Direct match
    if title_norm in wikidata_mapping:
        return wikidata_mapping[title_norm]
    
    # Partial match (fuzzy matching)
    for label, qid in wikidata_mapping.items():
        if title_norm in label or label in title_norm:
            return qid
    
    return None

# Apply mapping to dataset
print("Mapping 'worktitle' to Wikidata QIDs...")
df['workURI'] = df['worktitle'].apply(map_to_qid)

# Save to file
output_path = 'Ibsenstage_staged/IbsenStage_with_wikidata_works.json'
with open(output_path, 'w', encoding='utf-8') as f:
    json.dump(df.to_dict(orient='records'), f, ensure_ascii=False, indent=2)

print(f"Saved updated file to: {output_path}")

# Statistics
mapped_count = df['workURI'].notna().sum()
total_count = len(df)
print(f"Successfully mapped {mapped_count} out of {total_count} work titles ({mapped_count/total_count*100:.1f}%)")

Fetching Ibsen works from Wikidata...
Found 54 labels from Wikidata
Mapping 'worktitle' to Wikidata QIDs...
Saved updated file to: Ibsenstage_staged/IbsenStage_with_wikidata_works.json
Successfully mapped 4905 out of 4924 work titles (99.6%)


Now I can proceed by connecting `venueURI` with an authoritative ID for the venues when available in WikiData. 

As expected, fetching and mapping authoritative URIs for all the venues is a difficult process, since many of these venues are either too small to possess their own URI or part of a bigger building/venue. Some of these venues might not even exists nowadays, as the data scraped from IbsenStage includes venues from the 19th cenutury.

Since not all venues can be tracked through WikiData, to partially resolve this problem I will try as a fallback to map the cities' URIs to the venues not found in WikiData. The cities URI will be present in a new key called `cityURI`. To address this, the code follows a two-step fallback strategy:

1. It first attempts to resolve the venuename to a venueURI using the Wikidata Search API (`wbsearchentities`).

2. If no match is found, it tries to resolve the associated venuecity instead, storing the result in a separate field, cityURI.

The process uses a thread pool (`ThreadPoolExecutor`) to perform multiple lookups concurrently (up to 5 at a time), while throttling requests using a brief delay to avoid overloading Wikidata’s servers. Any failed or timed-out queries are caught and logged for debugging.

(This cell might require up to 7 minutes to run). 

In [8]:
import json, urllib.request, urllib.parse, time, logging, asyncio
from concurrent.futures import ThreadPoolExecutor, as_completed

# Config
INPUT = "IbsenStage_with_wikidata_works.json"
OUTPUT = "IbsenStage_with_uris.json"
USER_AGENT = "VenueCityWikidataLinker/1.0"
MAX_WORKERS = 5
DELAY = 0.1

# Logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logger = logging.getLogger()

async def query_wikidata(search_term):
    time.sleep(DELAY)
    params = urllib.parse.urlencode({
        "action": "wbsearchentities",
        "format": "json",
        "language": "en",
        "search": search_term,
        "limit": 1,
        "type": "item"
    })
    url = f"https://www.wikidata.org/w/api.php?{params}"
    try:
        req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
        with urllib.request.urlopen(req, timeout=10) as res:
            result = json.loads(res.read())
            if result.get("search"):
                return result["search"][0]["id"]
    except Exception as e:
        logger.debug(f"Wikidata query failed for '{search_term}': {e}")
    return None

async def resolve_uris(entry):
    venue = (entry.get("venuename") or "").strip()
    city = (entry.get("venuecity") or "").strip()
    venue_uri = await query_wikidata(venue) if venue else None
    if venue_uri:
        entry["venueURI"] = venue_uri
    else:
        city_uri = await query_wikidata(city) if city else None
        if city_uri:
            entry["cityURI"] = city_uri
    return entry

# Load and process
with open(INPUT, encoding="utf-8") as f:
    data = json.load(f)

def get_uris(entry):
    return asyncio.run(resolve_uris(entry))

with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
    futures = [executor.submit(get_uris, entry) for entry in data]
    results = [future.result() for future in as_completed(futures)]

with open(stage_path(OUTPUT), "w", encoding="utf-8") as f:
    json.dump(results, f, ensure_ascii=False, indent=2)

print("Mapping complete. Output saved to:", OUTPUT)


Mapping complete. Output saved to: IbsenStage_with_uris.json


Now that both work and venue IDs and URIs are present, we can bridge them in a single file.

In [None]:
# Load the enriched data
with open('Ibsenstage_staged/IbsenStage_with_uris.json', encoding='utf-8') as f:
    data = json.load(f)
df = pd.DataFrame(data)

# Create bridge mappings
work_bridge = (
    df[['workid', 'workURI']]
    .dropna()
    .drop_duplicates()
    .set_index('workid')['workURI']
    .to_dict()
)

venue_bridge = (
    df[['venueid', 'venueURI']]
    .dropna()
    .drop_duplicates()
    .set_index('venueid')['venueURI']
    .to_dict()
)

# Combine into one object
bridge = {
    'work_bridge': work_bridge,
    'venue_bridge': venue_bridge
}

# Ensure staged directory exists
staged_dir = Path.cwd() / 'Ibsenstage_staged'
staged_dir.mkdir(exist_ok=True)

# Save the mapping
output_path = staged_dir / 'id_to_uri_bridge.json'
with open(output_path, 'w', encoding='utf-8') as f:
    json.dump(bridge, f, ensure_ascii=False, indent=2)

print(f"ID-to-URI bridge saved to {output_path}")

✅ ID-to-URI bridge saved to c:\Users\Cristiano (CC)\Desktop\Cristiano-June25\OsloMet\Masterstudium i bibliotek- og informasjonsvitenskap - deltid\MBIB4140 - Metadata og interoperabilitet\2ndre sjansen\Ibsenstage_staged\id_to_uri_bridge.json


### Prerequisites Setup
All required dependencies are set up here. This supports a production pipeline approach where reproducibility and environment setup are clearly defined.

In [None]:
# ─── Prereqs ──────────────────────────────────────────────────────────────────
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter
import pandas as pd
import pickle, os

# 1) Geocoder + rate-limiting
geolocator = Nominatim(user_agent="ibsen_city_extractor", timeout=10)
geocode    = RateLimiter(geolocator.geocode,   min_delay_seconds=1, max_retries=2)
reverse    = RateLimiter(geolocator.reverse,   min_delay_seconds=1, max_retries=2)

# 2) On-disk cache
CACHE = "venue_geocode_cache.pkl"
if os.path.exists(CACHE):
    with open(CACHE, "rb") as f:
        cache = pickle.load(f)
else:
    cache = {}

# 3) City-level keys, in priority order
CITY_KEYS = [
    "city","town","village","municipality",
    "hamlet","locality","county","state_district",
    "state","region","district","suburb"
]

# 4) Optional: load an exhaustive list of Norwegian municipalities
#    (download from Kartverket or any public CSV).
#    e.g. muni_df = pd.read_csv("norway_municipalities.csv")["municipality"].tolist()
municipalities = set()  # fill this if you have a CSV

# ─── New get_city_for ─────────────────────────────────────────────────────────
def get_city_for(venue_name: str) -> str | None:
    if pd.isna(venue_name) or not venue_name.strip():
        return None
    name = venue_name.strip()

    # a) cache hit?
    if name in cache:
        return cache[name]

    city = None

    # b) forward geocode
    try:
        q = f"{name}, Norway"
        loc = geocode(q, addressdetails=True, country_codes="no")
        if loc and "address" in loc.raw:
            addr = loc.raw["address"]
            for key in CITY_KEYS:
                if key in addr:
                    city = addr[key]
                    break

        # c) reverse geocode fallback (if forward gave us coords but no city)
        if city is None and loc:
            rev = reverse((loc.latitude, loc.longitude),
                          addressdetails=True, country_codes="no")
            if rev and "address" in rev.raw:
                for key in CITY_KEYS:
                    if key in rev.raw["address"]:
                        city = rev.raw["address"][key]
                        break

        # d) display_name parsing: split on commas, look for known muni
        if city is None and loc and loc.raw.get("display_name"):
            parts = [p.strip() for p in loc.raw["display_name"].split(",")]
            # check last few parts
            for part in parts[-4:]:
                if part in municipalities:
                    city = part
                    break

    except Exception:
        city = None

    # e) record result (even if None) so I don’t retry
    cache[name] = city
    return city
