# SQL 的五十道練習

> 分組與聚合結果篩選

[數據交點](https://www.datainpoint.com/) | 郭耀仁 <yaojenkuo@datainpoint.com>

In [1]:
%LOAD ../databases/imdb.db

In [2]:
ATTACH "../databases/nba.db" AS nba;

In [3]:
ATTACH "../databases/twElection2020.db" AS twElection2020;

In [4]:
ATTACH "../databases/covid19.db" AS covid19;

## 以 `GROUP BY` 敘述分組

## 使用分組 `GROUP BY` 敘述的時候可以視為 `DISTINCT` 與 `ORDER BY` 兩個敘述同時作用。

In [5]:
-- DISTINCT 與 ORDER BY 兩個敘述同時作用
SELECT DISTINCT pos AS distinct_pos
  FROM players
 ORDER BY distinct_pos;

distinct_pos
C
C-F
F
F-C
F-G
G
G-F


In [6]:
-- 使用分組 GROUP BY 敘述
SELECT pos AS distinct_pos
  FROM players
 GROUP BY pos;

distinct_pos
C
C-F
F
F-C
F-G
G
G-F


## 在「函數」的章節中，我們介紹過一種「用來彙總資訊」的函數，稱為聚合函數（Aggregate functions）。

## 單獨使用聚合函數的時候，是將一整欄變數的資訊彙總後輸出。

In [7]:
-- 所有球員的平均身高
SELECT AVG(heightMeters) AS height_meters_avg
  FROM players;

height_meters_avg
1.98917171717171


## 假如現在希望計算不同 `pos`（鋒衛位置）的球員平均身高，現在的我們可以怎麼做？

- 先知道有哪些鋒衛位置。
- 篩選不同鋒衛位置的球員，一一計算平均身高。

In [8]:
-- 先知道有哪些鋒衛位置
SELECT DISTINCT pos
  FROM players;

pos
F
C-F
G-F
G
F-G
C
F-C


## 篩選不同鋒衛位置的球員，一一計算平均身高

```sql
SELECT AVG(heightMeters) AS forward_avg_height_meters
  FROM players
 WHERE pos = 'F';
SELECT AVG(heightMeters) AS center_forward_avg_height_meters
  FROM players
 WHERE pos = 'C-F';
SELECT AVG(heightMeters) AS guard_forward_avg_height_meters
  FROM players
 WHERE pos = 'G-F';
SELECT AVG(heightMeters) AS guard_avg_height_meters
  FROM players
 WHERE pos = 'G';
```

## 篩選不同鋒衛位置的球員，一一計算平均身高

```sql
SELECT AVG(heightMeters) AS forward_guard_avg_height_meters
  FROM players
 WHERE pos = 'F-G';
SELECT AVG(heightMeters) AS center_avg_height_meters
  FROM players
 WHERE pos = 'C';
SELECT AVG(heightMeters) AS forward_center_avg_height_meters
  FROM players
 WHERE pos = 'F-C';
```

## 結合聚合函數與 `GROUP BY` 敘述可以便捷地完成分組聚合

```sql
-- GROUP BY 敘述
SELECT AGGREGATE_FUNCTION(column_name) AS alias
  FROM table_name
 GROUP BY column_name;
```

In [9]:
-- 計算不同 pos（鋒衛位置）的球員平均身高
SELECT pos,
       ROUND(AVG(heightMeters), 2) AS avg_height_meters
  FROM players
 GROUP BY pos;

pos,avg_height_meters
C,2.12
C-F,2.1
F,2.02
F-C,2.08
F-G,2.0
G,1.91
G-F,1.98


## `GROUP BY` 敘述後可以加入不只一個變數

In [10]:
SELECT confName,
       divName,
       COUNT(*) AS n_teams
  FROM teams
 GROUP BY confName,
          divName;

confName,divName,n_teams
East,Atlantic,5
East,Central,5
East,Southeast,5
West,Northwest,5
West,Pacific,5
West,Southwest,5


## 以 `HAVING` 敘述篩選分組聚合結果

## 篩選查詢結果的敘述有兩個：

1. 作用在「觀測值」的 `WHERE` 敘述。
2. 作用在「分組聚合結果」的 `HAVING` 敘述。

## 作用在「分組聚合結果」的 `HAVING` 敘述

```sql
-- HAVING 敘述
SELECT AGGREcolumn_name
  FROM table_name
 GROUP BY column_name
HAVING condition;
```

In [11]:
-- 篩選分組聚合結果
SELECT pos,
       ROUND(AVG(heightMeters), 2) AS avg_height_meters
  FROM players
 GROUP BY pos
HAVING AVG(heightMeters) > 2;

pos,avg_height_meters
C,2.12
C-F,2.1
F,2.02
F-C,2.08
F-G,2.0


In [12]:
-- 篩選平均身高超過 2 公尺的查詢結果
SELECT pos,
       ROUND(AVG(heightMeters), 2) AS avg_height_meters
  FROM players
 GROUP BY pos
HAVING avg_height_meters > 2;

pos,avg_height_meters
C,2.12
C-F,2.1
F,2.02
F-C,2.08


## 重點統整

- 使用分組 `GROUP BY` 敘述的時候可以視為 `DISTINCT` 與 `ORDER BY` 兩個敘述同時作用。
- 結合聚合函數與 `GROUP BY` 敘述可以便捷地完成分組聚合。
- 以 `HAVING` 敘述篩選分組聚合結果。

## 目前我們會的 SQL

```sql
SELECT column_name
  FROM table_name
 WHERE condition
 GROUP BY column_name
HAVING condition
 ORDER BY column_name
 LIMIT m;
```