In [None]:
import duckdb

# Load SQL extension
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

Now that we've discussed `GROUP` and `WINDOWS`, we can discuss some advanced filtering concepts: `QUALIFY` and `HAVING`

`HAVING` is like `WHERE`, but for aggregates. That means it's evaluated after most of the query and let's you filter on aggregated values.

In [None]:
%%sql
SELECT
    p.fullname,
    SUM(c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable) AS num_campsites
FROM nps_public_data.campgrounds c
INNER JOIN nps_public_data.parks p
    ON c.parkcode = p.parkcode
    AND p.designation = 'National Park'
-- This won't work, because num_campsites is an aggregate function
-- WHERE num_campsites > 100
GROUP BY 1
-- This will
HAVING num_campsites > 100
ORDER BY 2 ASC
LIMIT 10;

`QUALIFY` is like `WHERE` and `HAVING` _but_.... It applies to windows! This can be exceedingly helpful, as windows would almost always require a second CTE to filter.

One particularly useful case for qualify is in `ROW_NUMBER` or `RANK` queries"

In [None]:
%%sql
SELECT
    p.fullname as park_name,
    c.name as campground_name,

    -- For each park, which campground has the maximum number of campsites?
    c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable as num_campsites,
    -- RANK, ROW_NUMBER, DENSE_RANK     
    RANK() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as park_campsites_rank,
    ROW_NUMBER() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as campsites_row_num,
    DENSE_RANK() OVER (PARTITION BY park_name ORDER BY c.numberofsitesfirstcomefirstserve + c.numberofsitesreservable DESC) as campsites_dense_rank,
FROM nps_public_data.campgrounds c
INNER JOIN nps_public_data.parks p
    ON c.parkcode = p.parkcode
    AND p.designation = 'National Park'
-- Get the sencond largest campground for each park
QUALIFY park_campsites_rank = 2
ORDER BY park_name, park_campsites_rank ASC
LIMIT 12;

While advanced filters are a simple concept, understanding when and where to use them can save you an extra CTE... and possible a few lines of code 😄