The SQL component to the is course will use DuckDB— an _in-memory_ analytics engine that let's you write full-featured SQL without the need for a stand-alone database.

If you nerd out over that stuff, like me, you can read more [here](https://open.substack.com/pub/casewhen/p/data-explained-what-is-duckdb?r=rnul&utm_campaign=post&utm_medium=web). 

For now, you can just assume that the following code will load DuckDB + the necessary datasets, so you can sit back and relax:

In [None]:
import duckdb

# Load SQL extension
%load_ext sql

# Initialize 🦆 DuckDB connection
conn = duckdb.connect()

# Import database
%sql conn --alias duckdb
%sql IMPORT DATABASE '../../data/nps';

Now, we can focus on writing SQL! DuckDB is like any other variant— you can `SELECT` columns `FROM` some data source. (note we need %%sql at the beginning of the cell to make it work with our setup)

In [None]:
%%sql 
SELECT
    *
FROM nps_public_data.parks
LIMIT 1

Some SQL basics and refreshers:
- Every query is made up of a `SELECT` and `FROM` 
- Between those two, we list the columns, separated by a comma. 
- We can _alias_ columns or our data source using _as_ (technically not required, but a good idea)

In [None]:
%%sql 
SELECT
    fullName as full_name,
    weatherInfo as weather_info,
    operatingHours as operating_hours
FROM nps_public_data.parks as not_parks
LIMIT 3

Using `DISTINCT` in SQL will only return unique values— it's like `set()` in Python. We can also use `UNION` to concatenate the results of any queries together (as long as they have the same column length). This query shows how `DISTINCT` works— duplicated rows are collapsed and the distinct example only counts 10 rows, while the complete one counts 100.

In [None]:
%%sql
SELECT
   COUNT(*) as num_rows
FROM range(0, 10) r
CROSS JOIN range(0, 10) r2

UNION

SELECT
    COUNT(DISTINCT r) as num_rows
FROM range(0, 10) r
CROSS JOIN range(0, 10) r2

`CASE WHEN` is a SQL logic that let's us define conditions on columns— here we're taking the `designation` column and saying "when designation is equal to `National Park`, return that. Otherwise, return `Not a National Park`".

You can use:
```sql
CASE [column] WHEN [key_1] THEN [value_1] WHEN [key_2] THEN [value_2] ... WHEN [key_n] THEN [value_n] ... ELSE [default_value] END
```

OR

```sql
CASE WHEN [logical_condition] THEN [value_1] ... ELSE [default_value] END

In [None]:
%%sql
SELECT
    DISTINCT designation,
    CASE designation 
        WHEN 'National Parks' 
        THEN 'National Park' 
        WHEN 'National Park' 
        THEN 'National Park' 
        WHEN 'National and State Parks'
        THEN 'National Park'
        ELSE 'Not a National Park' 
    END as park_type,
    CASE WHEN len(designation) > 20 THEN 'I am not reading that' ELSE designation END as short_designation
FROM nps_public_data.parks
WHERE len(designation) > 0
ORDER BY 1 DESC
LIMIT 10

You might notice from the above example we're adding `LIMIT` and `ORDER` clauses to our results. They're pretty self explanatory— they limit the number of results being returned and define how we're ordering those results.

When we say `ORDER BY 1 DESC`, we're saying "Order by the first column descending," but you could also say `ORDER BY designation DESC`.

The `WHERE` clause acts as a filter, returning only the rows that meet the logical condition `len(designation) > 0`

Do note that the clauses _must_ appear in that order— `FROM, WHERE, ORDER, LIMIT`

We're going to keep our SQL refresher short & sweet, so if you'd like to read more, feel free to check out [this article](https://www.dataquest.io/blog/sql-basics/), which does an excellent job of going over some of these intro concepts.