In [None]:
import duckdb

# Load SQL extension
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

Most SQL DBs come with a number of handy ways to generate data. This can be very useful for creating an index to aggregate data, whether that's numerical or date-based.

In [None]:
%%sql
SELECT
    r.range
FROM range(0,100) r
LIMIT 5

We can also increment by different values

In [None]:
%%sql
SELECT
    r.range
FROM range(0,100,2) r
LIMIT 5

Or generate dates

In [None]:
%%sql
SELECT
    r.range
FROM range(DATE '2019-01-01', DATE '2025-01-01', INTERVAL '1 day') r
LIMIT 5;

Why is this useful? Well imagine you'd like to pull in data from multiple sources or generate a running aggregation. We can't always be sure that every date/number is accounted for... Generating a range allows us to _be sure_ every date is covered!

In [None]:
%%sql
WITH date_range AS (
    SELECT
        r.range
    FROM range(DATE '2024-02-01', DATE '2024-02-29', INTERVAL '1 day') r
)
SELECT
    dr.range as dt,
    COUNT(DISTINCT a.title) as num_alerts
FROM date_range dr
LEFT JOIN nps_public_data.alerts a
    ON dr.range::DATE = a.lastindexeddate::DATE
GROUP BY 1
ORDER BY 1
LIMIT 12

Note the days with zero alerts— those would have been skipped without our generated range! Note that there are a few aliases for DuckDB [range functions](https://duckdb.org/docs/sql/functions/nested.html#range-functions) and these look different in every variant of SQL... Some lack it entirely!