In [None]:
import duckdb

# Load SQL extension, configure display limit
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

Now we get to have some fun! **Window** functions works by breaking up relations into _independent_ partitions, ordering those partitions and then computing a new column for each row. If this sounds complex, it might be at first, but we'll break it down for you.

If this sounds computationally intensive, it is... So we'll have to be careful when windowing over large datasets.

When we build a window function, we're basically iterating over a slice of data _relative_ to the other rows around it. This is a powerful pattern and one that makes SQL stand out as an excellent language for querying relational data.


Here are some sample questions well suited to windows:
- Which park has the most campsites (we'll show you why this is easy with windows)?
- What is the last event in January?
- 

In [None]:
%%sql
SELECT
    DISTINCT p.fullname as park_name,

    -- For each park, which campground has the maximum number of campsites?
    MAX(c.numberofsitesfirstcomefirstserve) OVER (PARTITION BY park_name) as max_num_fcfc,
    MAX(c.numberofsitesreservable) OVER (PARTITION BY park_name) as max_num_reserve,
    MAX(c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable) OVER (PARTITION BY park_name) as max_num_campsites,

    -- For each park, which _campsite_ has the maximum number of campsites?
    FIRST(c.name) OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve DESC) as max_num_fcfs_site,
    FIRST(c.name) OVER (PARTITION BY park_name ORDER BY c.numberofsitesreservable DESC) as max_num_reserve_site,
    FIRST(c.name) OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as max_num_campsites_site
FROM nps_public_data.campgrounds c
INNER JOIN nps_public_data.parks p
    ON c.parkcode = p.parkcode
    AND p.designation = 'National Park'
ORDER BY max_num_campsites DESC
LIMIT 10;

In [None]:
%%sql
SELECT
    p.fullname as park_name,
    c.name as campground_name,

    -- For each park, which campground has the maximum number of campsites?
    c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable as num_campsites,
    -- RANK, ROW_NUMBER, DENSE_RANK     
    RANK() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as park_campsites_rank,
    ROW_NUMBER() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as campsites_row_num,
    DENSE_RANK() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as campsites_dense_rank,
FROM nps_public_data.campgrounds c
INNER JOIN nps_public_data.parks p
    ON c.parkcode = p.parkcode
    AND p.designation = 'National Park'
-- WHERE p.fullname = 'Death Valley National Park'
ORDER BY park_name, park_campsites_rank ASC
LIMIT 12;

In [None]:
%%sql
SELECT
    p.fullname as park_name,
    c.name as campground_name,

    -- For each park, which campground has the maximum number of campsites?
    c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable as num_campsites,
    -- RANK, ROW_NUMBER, DENSE_RANK     
    RANK() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as park_campsites_rank,
    ROW_NUMBER() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as campsites_row_num,
    DENSE_RANK() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as campsites_dense_rank,
FROM nps_public_data.campgrounds c
INNER JOIN nps_public_data.parks p
    ON c.parkcode = p.parkcode
    AND p.designation = 'National Park'
-- WHERE p.fullname = 'Death Valley National Park'
ORDER BY park_name, park_campsites_rank ASC
LIMIT 12;