# Query Iceberg Tables with DuckDB

Uses **PyIceberg** (pointed at the Nessie REST catalog) to resolve table metadata paths,
then hands those paths to **DuckDB**'s native `iceberg_scan` for fast SQL queries against MinIO.

| Layer | Tool |
|-------|------|
| Catalog | Nessie REST (`localhost:19120`) |
| Storage | MinIO (`minio-api.andylile.com`) |
| Query engine | DuckDB + `iceberg` + `httpfs` extensions |

In [27]:
%pip install duckdb pyiceberg s3fs pandas --quiet

Note: you may need to restart the kernel to use updated packages.


## 1 — Configuration

Edit the values below to match your environment. `MINIO_ACCESS_KEY` and `MINIO_SECRET_KEY`
are the same credentials used in `.env`.

In [18]:
# ── Nessie ────────────────────────────────────────────────────────────────────
NESSIE_URI      = "http://localhost:19120/iceberg"   # exposed port in docker-compose.override.yml

# ── MinIO ─────────────────────────────────────────────────────────────────────
MINIO_ENDPOINT  = "https://minio-api.andylile.com"
MINIO_ACCESS_KEY = "LwL2a0Lic5JYaXvxl06X"
MINIO_SECRET_KEY = "v60a4nh1nebmrbFbwoS6AFcaCQYd3G5Gu11qH5Kf"

# ── Iceberg warehouse / namespace ─────────────────────────────────────────────
WAREHOUSE       = "themeparks"    # matches nessie.catalog.default-warehouse
NAMESPACE       = "silver"        # actual namespace used by the DAGs


## 2 — Connect to Nessie catalog via PyIceberg

In [19]:
from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "nessie",
    **{
        "type": "rest",
        "uri": NESSIE_URI,
        "s3.endpoint": MINIO_ENDPOINT,
        "s3.access-key-id": MINIO_ACCESS_KEY,
        "s3.secret-access-key": MINIO_SECRET_KEY,
        "s3.path-style-access": "true",
    },
)

print("Namespaces:", catalog.list_namespaces())

Namespaces: [('silver',)]


In [20]:
# List tables in the namespace
tables = catalog.list_tables(NAMESPACE)
print(f"Tables in '{NAMESPACE}':")
for t in tables:
    print(" ", t)

Tables in 'silver':
  ('silver', 'destinations')


In [21]:
# Diagnostics — list everything Nessie knows about
print("=== Namespaces ===")
namespaces = catalog.list_namespaces()
if not namespaces:
    print("  (none — Nessie catalog is empty)")
for ns in namespaces:
    print(f"  {ns}")
    tables = catalog.list_tables(ns)
    if tables:
        for t in tables:
            print(f"    └─ {t[1]}")
    else:
        print("      (no tables)")


=== Namespaces ===
  ('silver',)
    └─ destinations


## 3 — Set up DuckDB with S3 / Iceberg support

In [22]:
import duckdb

con = duckdb.connect()

# Install and load required extensions
con.execute("INSTALL httpfs;   LOAD httpfs;")
con.execute("INSTALL iceberg;  LOAD iceberg;")

# Point DuckDB at MinIO
con.execute(f"""
    SET s3_endpoint    = '{MINIO_ENDPOINT.replace('https://', '').replace('http://', '')}';
    SET s3_access_key_id     = '{MINIO_ACCESS_KEY}';
    SET s3_secret_access_key = '{MINIO_SECRET_KEY}';
    SET s3_use_ssl     = true;
    SET s3_url_style   = 'path';
    SET s3_region      = 'us-east-1';
""")

print("DuckDB ready")

DuckDB ready


## 4 — Helper: resolve Iceberg metadata path from Nessie

In [23]:
def metadata_location(namespace: str, table_name: str) -> str:
    """Return the s3:// path to the current Iceberg metadata JSON file."""
    tbl = catalog.load_table(f"{namespace}.{table_name}")
    return tbl.metadata_location

def query_table(namespace: str, table_name: str, sql_where: str = "") -> duckdb.DuckDBPyRelation:
    """Query an Iceberg table using DuckDB's iceberg_scan."""
    path = metadata_location(namespace, table_name)
    where = f"WHERE {sql_where}" if sql_where else ""
    return con.execute(f"SELECT * FROM iceberg_scan('{path}') {where}").df()

print("Helper functions defined")

Helper functions defined


## 5 — Query the tables

### Destinations

In [28]:
df_destinations = query_table(NAMESPACE, "destinations")
print(f"{len(df_destinations)} rows")
df_destinations.head(10)

96 rows


Unnamed: 0,id,name,slug,parks,ingest_timestamp
0,259cf011-6195-42dd-bfdb-640969e0bfb9,Guangzhou Chimelong Tourist Resort,chimelongguangzhou,[{'id': '73436fe5-1f14-400f-bfbf-ab6766269e70'...,2026-02-20T04:15:59.508205+00:00
1,9c6a0987-e519-4d6e-b011-e6c47a60641b,Walibi Holland,walibiholland,[{'id': '18635b3e-fa23-4284-89dd-9fcd0aaa9c9c'...,2026-02-20T04:15:59.508205+00:00
2,6cc48df2-f126-4f28-905d-b4c2c15765f2,Parc Asterix,parcasterix,[{'id': '9e938687-fd99-46f3-986a-1878210378f8'...,2026-02-20T04:15:59.508205+00:00
3,c0eddd5b-da82-4161-9a5f-2eb4ab5f82e7,Plopsaland Belgium,plopsaland-de-panne,[{'id': 'f0ea9b9c-1ccb-4860-bfe6-b5aea7e4db2b'...,2026-02-20T04:15:59.508205+00:00
4,8d8a8cc7-4523-4437-8bb6-5c87a26ba5ce,Walibi Belgium,walibibelgium,[{'id': '21897ba2-cc63-460e-99b0-fcd4ca7c18ef'...,2026-02-20T04:15:59.508205+00:00
5,8fba5a14-8d04-455c-acf8-eccaaa0f58d9,Silver Dollar City,silverdollarcity,[{'id': 'd21fac4f-1099-4461-849c-0f8e0d6e85a6'...,2026-02-20T04:15:59.508205+00:00
6,f9497403-adf3-4409-bd79-bb5b54000e45,Walibi Rhône-Alpes,walibirhonealpes,[{'id': '28aee1df-1d05-4f53-bbf5-08f7aabff3a1'...,2026-02-20T04:15:59.508205+00:00
7,6c3cd0cc-57b5-431b-926c-2658e8104057,Dollywood,dollywood,[{'id': '7502308a-de08-41a3-b997-961f8275ab3c'...,2026-02-20T04:15:59.508205+00:00
8,0257ff9f-c73c-4855-b5b4-774755c4d146,Phantasialand,phantasialand,[{'id': 'abb67808-61e3-49ef-996c-1b97ed64fac6'...,2026-02-20T04:15:59.508205+00:00
9,ae0cd07c-87f3-41d9-a825-0fded65d626c,LEGOLAND Japan,legolandjapanresort,[{'id': '0c14c187-13f5-41ef-91b1-7ecb504239a7'...,2026-02-20T04:15:59.508205+00:00


### Entities

In [30]:
df_entities = query_table(NAMESPACE, "entities")
print(f"{len(df_entities)} rows")
df_entities.head(10)

14438 rows


Unnamed: 0,park_id,id,name,entityType,ingest_timestamp
0,73436fe5-1f14-400f-bfbf-ab6766269e70,9aa0bc92-6b9c-4bf9-9319-810c9f9a2e0f,火箭过山车,ATTRACTION,2026-02-20T05:18:43.563571+00:00
1,73436fe5-1f14-400f-bfbf-ab6766269e70,7fb74359-4657-45cf-8b83-6f63b5090227,龙卷风暴,ATTRACTION,2026-02-20T05:18:43.563571+00:00
2,73436fe5-1f14-400f-bfbf-ab6766269e70,24a21c03-0004-4871-9e31-7cfce66e1fde,桑巴气球,ATTRACTION,2026-02-20T05:18:43.563571+00:00
3,73436fe5-1f14-400f-bfbf-ab6766269e70,0a06581f-6862-4e1b-bf33-467b09da168e,极速跳跃,ATTRACTION,2026-02-20T05:18:43.563571+00:00
4,73436fe5-1f14-400f-bfbf-ab6766269e70,15573d5d-8434-4586-a5ec-fbe59c7ac730,飞马家庭过山车,ATTRACTION,2026-02-20T05:18:43.563571+00:00
5,73436fe5-1f14-400f-bfbf-ab6766269e70,7a4b771b-a567-4067-9864-cc3b868db4b2,急流勇进,ATTRACTION,2026-02-20T05:18:43.563571+00:00
6,73436fe5-1f14-400f-bfbf-ab6766269e70,bbcd36d0-e47c-4499-8995-d4238c17f067,梦回兰若,ATTRACTION,2026-02-20T05:18:43.563571+00:00
7,73436fe5-1f14-400f-bfbf-ab6766269e70,e20f7a19-8a77-445f-8222-16e29c5b87bf,超级大摆锤,ATTRACTION,2026-02-20T05:18:43.563571+00:00
8,73436fe5-1f14-400f-bfbf-ab6766269e70,d10b1004-9aa7-44ec-9f12-6a522d8b9311,摇摆屋,ATTRACTION,2026-02-20T05:18:43.563571+00:00
9,73436fe5-1f14-400f-bfbf-ab6766269e70,84de15bc-b573-4825-8385-e590d82c7339,滑翔飞翼,ATTRACTION,2026-02-20T05:18:43.563571+00:00


### Live data

In [31]:
df_live = query_table(NAMESPACE, "live_data")
print(f"{len(df_live)} rows")
df_live.head(10)

4411 rows


Unnamed: 0,park_id,id,name,entityType,status,queue,lastUpdated,ingest_timestamp
0,000c724a-cd0f-41a1-b355-f764902c2b55,96458868-f815-4b8f-b7cb-2b04abda284c,Tornado,ATTRACTION,CLOSED,"{'STANDBY': {'waitTime': None}, 'SINGLE_RIDER'...",2025-09-07 16:01:36,2026-02-20T05:20:00.626540+00:00
1,000c724a-cd0f-41a1-b355-f764902c2b55,50b91bb1-3339-417b-bca8-b3a0c88378e8,Dare Devil Dive,ATTRACTION,OPERATING,"{'STANDBY': {'waitTime': 0}, 'SINGLE_RIDER': N...",2025-10-26 20:29:56,2026-02-20T05:20:00.626540+00:00
2,000c724a-cd0f-41a1-b355-f764902c2b55,e7245ad4-8c51-4d72-98f8-56fb761a6e36,Shipwreck Cove,ATTRACTION,CLOSED,"{'STANDBY': {'waitTime': None}, 'SINGLE_RIDER'...",2025-08-24 22:49:24,2026-02-20T05:20:00.626540+00:00
3,000c724a-cd0f-41a1-b355-f764902c2b55,43ab7687-0092-4b64-b465-1f607e331791,Bucket Blasters,ATTRACTION,CLOSED,"{'STANDBY': {'waitTime': None}, 'SINGLE_RIDER'...",2025-08-24 22:49:23,2026-02-20T05:20:00.626540+00:00
4,000c724a-cd0f-41a1-b355-f764902c2b55,55cbb5ac-cfe7-47b0-9dee-68c6a76dfe08,Typhoon Twister,ATTRACTION,CLOSED,"{'STANDBY': {'waitTime': None}, 'SINGLE_RIDER'...",2025-08-31 00:16:23,2026-02-20T05:20:00.626540+00:00
5,000c724a-cd0f-41a1-b355-f764902c2b55,79610610-c5ce-4f3c-a7e8-fe5bb82eb604,Comet,ATTRACTION,OPERATING,"{'STANDBY': {'waitTime': 0}, 'SINGLE_RIDER': N...",2025-10-12 23:17:34,2026-02-20T05:20:00.626540+00:00
6,000c724a-cd0f-41a1-b355-f764902c2b55,094776e3-4ba3-4883-811c-48aa0ecb8319,Adventure River,ATTRACTION,CLOSED,"{'STANDBY': {'waitTime': None}, 'SINGLE_RIDER'...",2025-09-07 16:01:35,2026-02-20T05:20:00.626540+00:00
7,000c724a-cd0f-41a1-b355-f764902c2b55,4fdd1128-b655-40b2-b23c-8e43d1e582b5,Adirondack Outlaw,ATTRACTION,OPERATING,"{'STANDBY': {'waitTime': 0}, 'SINGLE_RIDER': N...",2025-10-27 00:46:24,2026-02-20T05:20:00.626540+00:00
8,000c724a-cd0f-41a1-b355-f764902c2b55,12c49ef0-8846-46fd-860b-1fbee99eacb0,Rocky’s Ranger Planes,ATTRACTION,CLOSED,"{'STANDBY': {'waitTime': None}, 'SINGLE_RIDER'...",2025-06-05 14:03:08,2026-02-20T05:20:00.626540+00:00
9,000c724a-cd0f-41a1-b355-f764902c2b55,88416604-0318-4201-a39d-8d4adb8df692,Sasquatch Launch,ATTRACTION,OPERATING,"{'STANDBY': {'waitTime': 0}, 'SINGLE_RIDER': N...",2025-09-01 14:34:15,2026-02-20T05:20:00.626540+00:00


## 6 — Ad-hoc SQL

Run arbitrary SQL against any Iceberg table.

In [32]:
meta_path = metadata_location(NAMESPACE, "live_data")

con.execute(f"""
    SELECT status, COUNT(*) AS cnt
    FROM iceberg_scan('{meta_path}')
    GROUP BY status
    ORDER BY cnt DESC
""").df()

Unnamed: 0,status,cnt
0,CLOSED,2882
1,OPERATING,1444
2,DOWN,69
3,REFURBISHMENT,16


In [40]:
live_data = metadata_location(NAMESPACE, "live_data")
destinations = metadata_location(NAMESPACE, "destinations")
entities = metadata_location(NAMESPACE, "entities")

In [None]:
con.execute(f"""
    SELECT *
    FROM iceberg_scan('{destinations}') d
    WHERE slug = 'waltdisneyworldresort'
""").df()

In [42]:
con.execute(f"""
    SELECT distinct entityType
    FROM iceberg_scan('{entities}') d
""").df()

Unnamed: 0,entityType
0,ATTRACTION
1,SHOW
2,RESTAURANT
