# Part 1: `SUM()` and NULL: WHERE vs Having

 We are using the following dataset


| region | amount |
| ------ | ------ |
| A      | 100    |
| A      | NULL   |
| B      | NULL   |
| B      | NULL   |
| C      | 0      |

Important rule:

> `SUM()` ignores NULL values.
> But if **all values in a group are NULL**, then `SUM()` returns NULL.

So grouping by region:

* Region A â†’ 100 + NULL â†’ **100**
* Region B â†’ NULL + NULL â†’ **NULL**
* Region C â†’ 0 â†’ **0**

## ðŸ”Ž Query 1 â€” Filtering in HAVING

```sql
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
HAVING SUM(amount) IS NULL
ORDER BY region;
```

### What happens?

1. All rows are grouped.
2. `SUM(amount)` is calculated for each region.
3. Only groups where the final result is NULL are kept.

Result:

* Region B only.

Because:

* Region Bâ€™s total is NULL (all values were NULL).
* A and C have non-NULL totals.


## ðŸ”Ž Query 2 â€” Filtering in WHERE

```sql
SELECT region, SUM(amount) AS total
FROM sales
WHERE amount IS NULL
GROUP BY region
ORDER BY region;
```

### What happens?

1. We remove all rows where `amount` is NOT NULL.
2. Only NULL rows remain.
3. Then we group and compute SUM.

Now the dataset becomes:

| region | amount |
| ------ | ------ |
| A      | NULL   |
| B      | NULL   |
| B      | NULL   |

Now:

* Region A â†’ only NULL â†’ SUM = NULL
* Region B â†’ only NULL â†’ SUM = NULL

Result:

* Region A
* Region B



## Why The Results Are Different

Because:

> WHERE decides which rows participate in aggregation.
> HAVING decides which aggregated results we keep.

Filtering before aggregation changes the data being aggregated.

Filtering after aggregation evaluates the final computed result.

That is the core conceptual difference.

# Part 2 â€” `COUNT()` and `NULL`

Now we switch to:

```sql
SELECT region, COUNT(amount) AS cnt
FROM sales
GROUP BY region
HAVING COUNT(amount) IS NOT NULL
ORDER BY region;
```

Important rule:

> `COUNT(column)` ignores NULL values.
> If all values are NULL, it returns **0**, not NULL.

So grouped results:

* A â†’ one non-NULL â†’ COUNT = 1
* B â†’ all NULL â†’ COUNT = 0
* C â†’ one non-NULL (0 is a value!) â†’ COUNT = 1

Now:

```sql
HAVING COUNT(amount) IS NOT NULL
```

But COUNT never returns NULL.

So this condition is always TRUE.

Result:

* A
* B
* C



## Now compare with WHERE

```sql
SELECT region, COUNT(amount) AS cnt
FROM sales
WHERE amount IS NOT NULL
GROUP BY region
ORDER BY region;
```

Now we remove NULL rows before grouping.

Remaining rows:

| region | amount |
| ------ | ------ |
| A      | 100    |
| C      | 0      |

Now grouping:

* A â†’ COUNT = 1
* C â†’ COUNT = 1
* B disappears completely

Result:

* A
* C


#  Why it matters
This example shows two things:

1. Aggregates treat NULL differently (`SUM` vs `COUNT`)
2. WHERE and HAVING operate at different logical layers


# Want to Experiment?

Below youâ€™ll find a small SQL GUI interface connected to the same database used in this notebook.

Use it to:
- Rewrite the queries from the examples  
- Change filters from WHERE to HAVING and observe the difference  
- Test how NULL values affect aggregates  
- Try your own reporting ideas  

Understanding SQL comes from experimenting â€” not just reading.


# **SQL Environment Setup (do not edit)**

In [24]:
# @title
%%capture
!mkdir -p notebook_lib
!wget -q -O notebook_lib/sql_runner.py \
  https://raw.githubusercontent.com/Haross/sql_notebook/8021f5c05b7d973b8db549a1398a3c9a5c7829d5/notebook_lib/sql_runner.py
!wget -q -O notebook_lib/validators.py \
  https://raw.githubusercontent.com/Haross/sql_notebook/7baff2c6485cdf641cabcdb55d92a51317cd18b9/notebook_lib/validators.py

from notebook_lib.sql_runner import make_sql_runner
from notebook_lib.validators import make_df_validator_nospoilers, check_process_rules

import sqlite3
import pandas as pd
from pathlib import Path


In [26]:
# @title
%%capture
DB_FILE = 'class.db'

if DB_FILE != ":memory:":
    Path(DB_FILE).unlink(missing_ok=True)

conn = sqlite3.connect(DB_FILE)
conn.execute("PRAGMA foreign_keys = ON;")

conn.executescript(r'''
DROP TABLE IF EXISTS sales;
-- Create table

CREATE TABLE sales (
    region TEXT,
    amount REAL
);


-- Insert rows
INSERT INTO sales (region, amount) VALUES
    ("A", 100),
    ("A", NULL),
    ("B", NULL),
    ("B", NULL),
    ("C", 0)
''')
print(f"Database ready âœ… ({DB_FILE})")


# SQL editor

In [47]:
# @title
make_sql_runner(
    conn,
    runner_id="sql_editor",
  )


VBox(children=(HTML(value=''), VBox(children=(Textarea(value='SELECT region, SUM(amount) AS total\n    FROM saâ€¦