# 04 – Aggregations, GROUP BY, and HAVING

Core SQL concepts: aggregating data, grouping rows, and filtering grouped results.

Part of the [Foundations: Python, R & SQL](../README.md) repository.


## 1. Sample Table

In [1]:
import duckdb

duckdb.sql("""
CREATE TABLE sales (
  id INTEGER,
  region TEXT,
  sales_rep TEXT,
  amount INTEGER,
  sale_date DATE
);

INSERT INTO sales VALUES
(1, 'North', 'Alice', 5000, '2023-01-05'),
(2, 'South', 'Bob', 7000, '2023-01-07'),
(3, 'North', 'Alice', 6000, '2023-02-12'),
(4, 'East', 'Clara', 4000, '2023-02-15'),
(5, 'West', 'David', 3000, '2023-02-20'),
(6, 'South', 'Bob', 9000, '2023-03-01'),
(7, 'East', 'Clara', 3500, '2023-03-05');
""")

In [2]:
duckdb.sql("SELECT * FROM sales")

┌───────┬─────────┬───────────┬────────┬────────────┐
│  id   │ region  │ sales_rep │ amount │ sale_date  │
│ int32 │ varchar │  varchar  │ int32  │    date    │
├───────┼─────────┼───────────┼────────┼────────────┤
│     1 │ North   │ Alice     │   5000 │ 2023-01-05 │
│     2 │ South   │ Bob       │   7000 │ 2023-01-07 │
│     3 │ North   │ Alice     │   6000 │ 2023-02-12 │
│     4 │ East    │ Clara     │   4000 │ 2023-02-15 │
│     5 │ West    │ David     │   3000 │ 2023-02-20 │
│     6 │ South   │ Bob       │   9000 │ 2023-03-01 │
│     7 │ East    │ Clara     │   3500 │ 2023-03-05 │
└───────┴─────────┴───────────┴────────┴────────────┘

## 2. Aggregation Functions

In [3]:
duckdb.sql("""
SELECT
  COUNT(*) AS total_sales,
  SUM(amount) AS total_amount,
  AVG(amount) AS avg_amount,
  MIN(amount) AS min_amount,
  MAX(amount) AS max_amount
FROM sales;
""")

┌─────────────┬──────────────┬───────────────────┬────────────┬────────────┐
│ total_sales │ total_amount │    avg_amount     │ min_amount │ max_amount │
│    int64    │    int128    │      double       │   int32    │   int32    │
├─────────────┼──────────────┼───────────────────┼────────────┼────────────┤
│           7 │        37500 │ 5357.142857142857 │       3000 │       9000 │
└─────────────┴──────────────┴───────────────────┴────────────┴────────────┘

## 3. GROUP BY Clause

In [4]:
duckdb.sql("""
SELECT
  region,
  COUNT(*) AS sales_count,
  SUM(amount) AS total_amount
FROM sales
GROUP BY region;
""")

┌─────────┬─────────────┬──────────────┐
│ region  │ sales_count │ total_amount │
│ varchar │    int64    │    int128    │
├─────────┼─────────────┼──────────────┤
│ East    │           2 │         7500 │
│ West    │           1 │         3000 │
│ North   │           2 │        11000 │
│ South   │           2 │        16000 │
└─────────┴─────────────┴──────────────┘

## 4. Filtering Grouped Results with HAVING

In [5]:
duckdb.sql("""
SELECT
  region,
  COUNT(*) AS sales_count,
  SUM(amount) AS total_amount
FROM sales
GROUP BY region
HAVING total_amount > 8000;
""")

┌─────────┬─────────────┬──────────────┐
│ region  │ sales_count │ total_amount │
│ varchar │    int64    │    int128    │
├─────────┼─────────────┼──────────────┤
│ South   │           2 │        16000 │
│ North   │           2 │        11000 │
└─────────┴─────────────┴──────────────┘

## Summary

- Use **aggregation functions** (`SUM`, `AVG`, `MIN`, `MAX`, `COUNT`) to compute summaries.
- Use **GROUP BY** to aggregate over categories.
- Use **HAVING** to filter after grouping.
