## Topics
1. `WHERE` clause
2. [Comparisons](https://duckdb.org/docs/sql/expressions/comparison_operators)
   1. Logical
   2. `IS NULL`
   3. `BETWEEN`
3. Filtering with `JOINS`
4. `LIKE` & `ILIKE` [comparisons](https://duckdb.org/docs/sql/functions/patternmatching)
   1. Mention GLOB, REGEX
5. `IN` https://duckdb.org/docs/sql/expressions/in.html
6. `REGEXP`
7. `ORDER BY`
8. `GROUP BY`

In [1]:
import duckdb

%load_ext sql
%config SqlMagic.displaylimit = 5

In [2]:
conn = duckdb.connect()
%sql conn --alias duckdb

In [3]:
%%sql
IMPORT DATABASE '../../data/nps';

Count
448


Creating fields you commonly filter on can drastically improve readability. String/column manipulation is great for that.

In [4]:
%%sql 
WITH thursday AS (
    SELECT
        p.name,
        closed_thurs.category,
        closed_thurs.thursday,
        COALESCE(closed_thurs.thursday, 'Open') as closed_open,
        NOT closed_thurs.thursday IS NULL as is_closed
    FROM nps_public_data.parks p
    LEFT JOIN nps_public_data.park_hours closed_thurs
        ON closed_thurs.park_id = p.id
        AND closed_thurs.thursday = 'Closed'
    WHERE 1 = 1
)
SELECT
    *
FROM thursday
WHERE is_closed
LIMIT 5

name,category,thursday,closed_open,is_closed
Eleanor Roosevelt,Val-Kill Cottage Tours,Closed,Closed,True
Morristown,Winter Hours,Closed,Closed,True
Freedom Riders,Anniston Greyhound Bus Depot,Closed,Closed,True
Black Canyon Of The Gunnison,East Portal,Closed,Closed,True
Golden Gate,Fort Point National Historic Site,Closed,Closed,True


We commonly use the `WHERE` clause to filter aggregate queries, but we can also do so in `JOIN`s. 

However, we need to be very careful with how joins work.

In [7]:
%%sql
SELECT
    p.name,
    vc.name as visitor_center_name
FROM nps_public_data.parks p
LEFT JOIN nps_public_data.visitorcenters vc
    ON p.parkcode = vc.parkcode
WHERE 1 = 1
-- Filter base query (parks) for national monument
    AND p.designation = 'National Monument'
-- Filter JOIN (!) for passport stamp locations.
-- what will happen to parks without visitor centers?
    AND vc.ispassportstamplocation
LIMIT 1

name,visitor_center_name
Statue Of Liberty,Liberty Island Information Center


How many rows are returned with/without the `LEFT JOIN`? What does that say about the number of parks we're querying? Why do you think that is?

In [9]:
%%sql
WITH filter_in_join AS (
    SELECT
        p.name,
        vc.name as visitor_center_name
    FROM nps_public_data.parks p
    INNER JOIN nps_public_data.visitorcenters vc
        ON p.parkcode = vc.parkcode
), filter_in_where AS (
    SELECT
        p.name,
        vc.name as visitor_center_name
    FROM nps_public_data.parks p
    LEFT JOIN nps_public_data.visitorcenters vc
        ON p.parkcode = vc.parkcode
    WHERE vc.parkcode IS NOT NULL
)
SELECT
    COUNT(*) as ct
FROM filter_in_join

UNION ALL

SELECT
    COUNT(*) as ct
FROM filter_in_where


ct
2820
2820
