In [1]:
import duckdb

# Load SQL extension
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

There's a new jupysql version available (0.11.0), you're running 0.10.10. To upgrade: pip install jupysql --upgrade
Deploy Panel apps for free on Ploomber Cloud! Learn more: https://ploomber.io/s/signup


Config,value
feedback,True
autopandas,True
displaylimit,10
displaycon,False


Unnamed: 0,Count
0,224


This dataset contains structured and semi-structured data.

In [2]:
%%sql
SELECT * FROM nps_public_data.parks LIMIT 3

Unnamed: 0,relevanceScore,designation,weatherInfo,addresses,operatingHours,entrancePasses,name,description,directionsUrl,fees,...,activities,url,longitude,id,images,directionsInfo,fullName,parkCode,latLong,latitude
0,1,National Memorial,http://forecast.weather.gov/MapClick.php?CityN...,"[{'type': 'Physical', 'line2': '', 'line1': '1...","[{'name': 'Hours of Operation', 'standardHours...",[],Federal Hall,"Here on Wall Street, George Washington took th...",http://www.nps.gov/feha/planyourvisit/directio...,[],...,"[{'name': 'Arts and Culture', 'id': '09DF0950-...",https://www.nps.gov/feha/index.htm,-74.010256,2337D255-2D32-4997-957A-D461EEA03AF8,[{'url': 'https://www.nps.gov/common/uploads/s...,The main entrance of Federal Hall is located a...,Federal Hall National Memorial,feha,"lat:40.70731192, long:-74.01025636",40.707312
1,1,National Historic Trail,"In winter, watch for ice on trails and sidewal...","[{'type': 'Physical', 'line2': '', 'line1': '6...","[{'name': 'Visitor Center Hours', 'standardHou...",[],Lewis & Clark,The Lewis and Clark National Historic Trail wi...,https://www.nps.gov/lecl/,[],...,"[{'name': 'Auto and ATV', 'id': '5F723BAD-7359...",https://www.nps.gov/lecl/index.htm,-95.924515,5D443C5F-19A0-4A06-9CE4-30534A3DD81A,[{'url': 'https://www.nps.gov/common/uploads/s...,Lewis & Clark National Historic Trail Headquar...,Lewis & Clark National Historic Trail,lecl,"lat:41.2646141052, long:-95.9245147705",41.264614
2,1,,"Summers are generally hot and humid, with dayt...","[{'type': 'Physical', 'line2': '', 'line1': '1...",[{'name': 'National Capital Parks-East Headqua...,[],National Capital Parks-East,Welcome to National Capital Parks-East. We inv...,http://www.nps.gov/nace/planyourvisit/directio...,[],...,"[{'name': 'Biking', 'id': '7CE6E935-F839-4FEC-...",https://www.nps.gov/nace/index.htm,-76.994,BA3C1A1D-AA6A-49EB-9237-0222CEEE2670,[{'url': 'https://www.nps.gov/common/uploads/s...,DC295 South to the Exit for I-694/I-395/Capito...,National Capital Parks-East,nace,"lat:38.8659, long:-76.994",38.8659


A STRUCT in SQL (used in databases like BigQuery) is a nested data type that allows you to store multiple related values inside a single column. It’s similar to a JSON object, but it has a fixed schema (predefined fields and data types).

In [4]:
%%sql
-- Callout: query structuring, LIMIT statements
SELECT 
    name, 
    operatingHours as operating_hours
FROM nps_public_data.parks 
LIMIT 1

Unnamed: 0,name,operating_hours
0,Federal Hall,"[{'name': 'Hours of Operation', 'standardHours..."


What if we want to create an `operatingHours` table? We can unpack `json` using `UNNEST`. Notice what we're doing here: there are two operations happening, but we're splitting them up! That's called a CTE (common table expression). It's a way of separating aggregates or other operations.

Next, we're using `UNNEST` to explode the `STRUCT` or `json` data. DuckDB let's us use `recursive := true` to burrow down and get _every_ level of the `json`... Pretty neat!

In [5]:
%%sql
-- Callout: CTEs, UNNEST
WITH park_hours AS (
    SELECT 
        name as park_name, 
        id as park_id, 
        UNNEST(operatingHours, recursive := true)
    FROM nps_public_data.parks
)
SELECT 
    * EXCLUDE (exceptions, name),
    name as category
FROM park_hours
LIMIT 2

Unnamed: 0,park_name,park_id,friday,sunday,thursday,tuesday,saturday,monday,wednesday,description,category
0,Federal Hall,2337D255-2D32-4997-957A-D461EEA03AF8,10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM,Closed,10:00AM - 5:00PM,10:00AM - 5:00PM,Federal Hall is Open.,Hours of Operation
1,Lewis & Clark,5D443C5F-19A0-4A06-9CE4-30534A3DD81A,8:30AM - 4:30PM,Closed,8:30AM - 4:30PM,8:30AM - 4:30PM,Closed,8:30AM - 4:30PM,8:30AM - 4:30PM,Lewis and Clark National Historic Trail Visito...,Visitor Center Hours


This query creates or replaces the nps_public_data.park_hours table by transforming data from nps_public_data.parks. It first extracts and flattens the operatingHours field using UNNEST(..., recursive := true), ensuring all nested elements are expanded into separate rows. Then, it renames name to park_name and id to park_id. In the final selection, it excludes the exceptions and name columns while renaming name to category. The result is stored as a new table with a structured format for easier querying.

In [None]:
%%sql
-- Callout: column renaming, EXCLUDE
CREATE OR REPLACE TABLE nps_public_data.park_hours AS (
    WITH park_hours AS (
        SELECT 
            name as park_name, 
            id as park_id, 
            -- https://duckdb.org/docs/sql/query_syntax/unnest.html
            UNNEST(operatingHours, recursive := true) -- recursive flattens that json entirely on every level, not just the first []
        FROM nps_public_data.parks
    )
    SELECT 
        * EXCLUDE (exceptions, name),
        name as category
    FROM park_hours 
)

Unnamed: 0,Count
0,667


In [7]:
%%sql
-- Callout: WHERE clause
SELECT
    p.name,
    h.thursday
FROM nps_public_data.park_hours h
LEFT JOIN nps_public_data.parks p
    ON h.park_id = p.id
WHERE h.category = 'Hours of Operation'
LIMIT 5

Unnamed: 0,name,thursday
0,Federal Hall,10:00AM - 5:00PM
1,Theodore Roosevelt Birthplace,10:00AM - 4:00PM
2,Tumacácori,9:00AM - 5:00PM
3,Wright Brothers,9:00AM - 5:00PM


In [8]:
%%sql 
# Callout: DISTINCT, Order, LIMIT
SELECT 
    DISTINCT(thursday) 
FROM nps_public_data.park_hours 
ORDER BY 1 DESC 
LIMIT 10;

Unnamed: 0,thursday
0,unknown
1,Sunrise to Sunset
2,Opens at 6:00AM
3,Opens at 5:00AM
4,Closes at 12:00PM
5,Closed
6,All Day
7,9:30AM - 5:00PM
8,9:30AM - 4:30PM
9,9:30AM - 4:00PM


In [9]:
%%sql
CREATE OR REPLACE TABLE nps_public_data.park_hours AS (
    WITH park_hours AS (
        SELECT 
            name as park_name, 
            id as park_id, 
            -- https://duckdb.org/docs/sql/query_syntax/unnest.html
            UNNEST(operatingHours, recursive := true)
        FROM nps_public_data.parks
    )
    SELECT 
        park_name,
        park_id,
        description,
        name as category,
        CASE monday WHEN 'unknown' THEN 'Closed' ELSE monday END as monday_hours,
        CASE tuesday WHEN 'unknown' THEN 'Closed' ELSE tuesday END as tuesday_hours,
        CASE wednesday WHEN 'unknown' THEN 'Closed' ELSE wednesday END as wednesday_hours,
        CASE thursday WHEN 'unknown' THEN 'Closed' ELSE thursday END as thursday_hours,
        CASE friday WHEN 'unknown' THEN 'Closed' ELSE friday END as friday_hours,
        CASE saturday WHEN 'unknown' THEN 'Closed' ELSE saturday END as saturday_hours,
        CASE sunday WHEN 'unknown' THEN 'Closed' ELSE sunday END as sunday_hours,
        CASE WHEN 
            monday != 'Closed' AND
            tuesday != 'Closed' AND
            wednesday != 'Closed' AND
            thursday != 'Closed' AND
            friday != 'Closed' AND
            saturday != 'Closed' AND
            sunday != 'Closed'
        THEN TRUE ELSE FALSE END as open_seven_days_a_week
    FROM park_hours 
)

Unnamed: 0,Count
0,667


In [None]:
%%sql
SELECT * FROM nps_public_data.park_hours WHERE open_seven_days_a_week LIMIT 1

In [10]:
%%sql
SELECT
    p.name,
    closed_thurs.category,
    closed_thurs.thursday_hours,
    COALESCE(closed_thurs.thursday_hours, 'Open') as closed_open,
    NOT closed_thurs.thursday_hours IS NULL as is_closed
FROM nps_public_data.parks p
INNER JOIN nps_public_data.park_hours closed_thurs
    ON closed_thurs.park_id = p.id
    AND closed_thurs.thursday_hours = 'Closed'
WHERE 1 = 1
ORDER BY RANDOM()
LIMIT 5;

Unnamed: 0,name,category,thursday_hours,closed_open,is_closed
0,Saratoga,Saratoga Monument,Closed,Closed,True
1,Belmont-Paul Women's Equality,Museum,Closed,Closed,True
2,Gateway Arch,Old Courthouse,Closed,Closed,True
3,Grand Portage,Historic Depot,Closed,Closed,True
4,Charles Young Buffalo Soldiers,Park Grounds,Closed,Closed,True


In [11]:
%%sql
EXPORT DATABASE '../../data/nps' (FORMAT PARQUET, COMPRESSION ZSTD, ROW_GROUP_SIZE 100000);

Unnamed: 0,Success
