# 進階的 SQL 五十道練習

> 進階的 SQL 查詢

[數聚點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

In [1]:
%LOAD mysql db=imdb user=root password='hahowsql'

## （複習）SQL 的分類

- 在初階課程「SQL 的五十道練習」中我們專注在資料查詢語言的部分；這堂進階課程會完整帶學員們認識其他的 SQL 語言。
- 實務上，資料查詢語言是被分類在資料操作語言之下的一個分支。
    - 資料定義語言（Data Definition Language, DDL）
    - 資料操作語言（Data Manipulation Language, DML）
        - **資料查詢語言（Data Query Language, DQL）**
    - 資料控制語言（Data Control Language, DCL）
    - 交易控制語言（Transaction Control Language, TCL）

## 進階的 SQL 查詢技巧

- 通用資料表運算式（Common Table Expression, CTE)
- 集合運算（Set operations）
- 交叉連接（Cross-join）與自連接（Self-join）
- 元資料（Metadata）
- MySQL 視窗函數（Window functions）

## 通用資料表運算式

## 什麼是通用資料表運算式

- 通用資料表運算式（Common Table Expression, CTE）是一種介於子查詢與檢視表之間的存在。
- 我們可以將通用資料表運算式視為一種暫存的檢視表，僅在一個 SQL 敘述中有效。
- 使用通用資料表運算式必須在同一個 SQL 敘述中涵蓋定義與查詢的語法。

## 通用資料表運算式所需的權限比檢視表為低

- 權限最高：建立資料表，通常給予管理員。
- 權限次高：建立檢視表，通常給予活躍使用者。
- 權限較低：通用資料表運算式、子查詢，會給予一般使用者。
- 建立資料表 `>` 建立檢視表 `>` 通用資料表運算式/子查詢。 

## 建立通用資料表運算式

使用 `WITH` 建立通用資料表運算式並給予命名，然後加入 SQL 敘述作為通用資料表運算式的內容，接著馬上在 `FROM` 保留字後從通用資料表運算式中查詢資料。

```sql
WITH cte_name AS (
    SQL Statement
)
SELECT columns FROM cte_name ...;
```

In [2]:
WITH avg_rating_by_release_year AS (
    SELECT release_year,
           AVG(rating) AS avg_rating
      FROM imdb.movies
     GROUP BY release_year
) 
SELECT *
  FROM avg_rating_by_release_year
 WHERE avg_rating >= 8.7;

## 透過通用資料表運算式複習 MySQL 不同的連接類型

- `JOIN` 保留兩個資料表的「交集」觀測值。
- `LEFT JOIN` 保留左資料表（接在 `FROM` 之後）的所有觀測值。
- `RIGHT JOIN` 保留右資料表（接在 `JOIN` 之後）的所有觀測值。

## 建立兩個以上的通用資料表運算式

以逗號將通用資料表運算式的命名與其 SQL 敘述區隔開來。

```sql
WITH cte_name_1 AS (
    SELECT Statement
),
cte_name_2 AS (
    SELECT Statement
)
SELECT columns FROM cte_name_1 JOIN cte_name_2 ...;
```

## `JOIN` 保留兩個資料表的「交集」觀測值

In [3]:
WITH casting_shawshank_darkknight AS (
    SELECT *
      FROM imdb.movies_actors
     WHERE movie_id IN (1, 3)
),
movies_shawshank_forrest AS (
    SELECT *
      FROM imdb.movies
     WHERE title IN ('The Shawshank Redemption', 'Forrest Gump')
)
SELECT movies_shawshank_forrest.title,
       casting_shawshank_darkknight.actor_id
  FROM casting_shawshank_darkknight
  JOIN movies_shawshank_forrest
    ON casting_shawshank_darkknight.movie_id = movies_shawshank_forrest.id;

## `LEFT JOIN` 保留左資料表的所有觀測值

In [4]:
WITH casting_shawshank_darkknight AS (
    SELECT *
      FROM imdb.movies_actors
     WHERE movie_id IN (1, 3)
),
movies_shawshank_forrest AS (
    SELECT *
      FROM imdb.movies
     WHERE title IN ('The Shawshank Redemption', 'Forrest Gump')
)
SELECT movies_shawshank_forrest.title,
       casting_shawshank_darkknight.actor_id
  FROM casting_shawshank_darkknight
  LEFT JOIN movies_shawshank_forrest
    ON casting_shawshank_darkknight.movie_id = movies_shawshank_forrest.id;

## `RIGHT JOIN` 保留右資料表的所有觀測值

In [5]:
WITH casting_shawshank_darkknight AS (
    SELECT *
      FROM imdb.movies_actors
     WHERE movie_id IN (1, 3)
),
movies_shawshank_forrest AS (
    SELECT *
      FROM imdb.movies
     WHERE title IN ('The Shawshank Redemption', 'Forrest Gump')
)
SELECT movies_shawshank_forrest.title,
       casting_shawshank_darkknight.actor_id
  FROM casting_shawshank_darkknight
 RIGHT JOIN movies_shawshank_forrest
    ON casting_shawshank_darkknight.movie_id = movies_shawshank_forrest.id;

## 集合運算

## 什麼是集合運算

垂直關聯多個 SQL 敘述時，在具有相同的欄數和資料類型的前提下，集合運算的保留字能夠將一個查詢的結果垂直合併至另一個查詢的結果，MySQL 支援的集合運算保留字有：

- `UNION`/`UNION ALL`
- `INTERSECT`
- `EXCEPT`

## 使用 `UNION` 或 `UNION ALL` 垂直關聯兩段 SQL 敘述

```sql
SELECT Statement
UNION | UNION ALL
SELECT Statement;
```

## `UNION` 聯集並且剔除重複值、排序

In [6]:
SELECT name
  FROM imdb.directors
 WHERE name IN ('Steven Spielberg', 'Christopher Nolan', 'Aamir Khan')
 UNION
SELECT name
  FROM imdb.actors
 WHERE name IN ('Tom Hanks', 'Tom Cruise', 'Aamir Khan');

name
Aamir Khan
Christopher Nolan
Steven Spielberg
Tom Cruise
Tom Hanks


## `UNION ALL` 聯集並且保留重複值、不予排序

In [7]:
SELECT name
  FROM directors
 WHERE name IN ('Steven Spielberg', 'Christopher Nolan', 'Aamir Khan')
 UNION ALL
SELECT name
  FROM actors
 WHERE name IN ('Tom Hanks', 'Tom Cruise', 'Aamir Khan');

name
Aamir Khan
Christopher Nolan
Steven Spielberg
Aamir Khan
Tom Cruise
Tom Hanks


## 使用 `INTERSECT` 垂直關聯兩段 SQL 敘述並且保留「交集」的值

```sql
SELECT Statement
INTERSECT
SELECT Statement;
```

In [8]:
SELECT name
  FROM directors
 WHERE name IN ('Steven Spielberg', 'Christopher Nolan', 'Aamir Khan')
INTERSECT
SELECT name
  FROM actors
 WHERE name IN ('Tom Hanks', 'Tom Cruise', 'Aamir Khan');

name
Aamir Khan


## 使用 `EXCEPT` 垂直關聯兩段 SQL 敘述並進行「差集」運算

```sql
SELECT Statement
EXCEPT
SELECT Statement;
```

## `directors` 差集 `actors`

In [9]:
SELECT name
  FROM imdb.directors
 WHERE name IN ('Steven Spielberg', 'Christopher Nolan', 'Aamir Khan')
EXCEPT
SELECT name
  FROM imdb.actors
 WHERE name IN ('Tom Hanks', 'Tom Cruise', 'Aamir Khan');

name
Christopher Nolan
Steven Spielberg


## `actors` 差集 `directors`

In [10]:
SELECT name
  FROM actors
 WHERE name IN ('Tom Hanks', 'Tom Cruise', 'Aamir Khan')
EXCEPT
SELECT name
  FROM directors
 WHERE name IN ('Steven Spielberg', 'Christopher Nolan', 'Aamir Khan');

name
Tom Cruise
Tom Hanks


## 交叉連接與自連接

## 什麼是交叉連接

- 交叉連接（Cross-join），又稱笛卡爾連接（Cartesian join）或乘積（Product）。
- 交叉連接會將關聯的資料表中所有可能的排列組合（又稱枚舉 Enumerate）作為結果回傳。
- 使用 `CROSS JOIN` 保留字，不使用 `ON` 描述連接鍵（Join key）。

```sql
SELECT columns
  FROM database_name.left_table_name
 CROSS JOIN database_name.right_table_table;
```

## 使用 `CROSS JOIN` 進行交叉連接

In [11]:
SELECT directors.name AS director_name,
       actors.name AS actor_name
  FROM imdb.directors
 CROSS JOIN imdb.actors
 WHERE directors.name IN ('Steven Spielberg', 'Christopher Nolan', 'Aamir Khan') AND
       actors.name IN ('Tom Hanks', 'Tom Cruise', 'Aamir Khan');

director_name,actor_name
Steven Spielberg,Aamir Khan
Christopher Nolan,Aamir Khan
Aamir Khan,Aamir Khan
Steven Spielberg,Tom Cruise
Christopher Nolan,Tom Cruise
Aamir Khan,Tom Cruise
Steven Spielberg,Tom Hanks
Christopher Nolan,Tom Hanks
Aamir Khan,Tom Hanks


## 什麼是自連接

- 當連接的左資料表與右資料表為同一個資料表的時候，就稱為自連接（Self-join）。
- 在 `FROM` 保留字後指定同一個資料表兩次，並給予 Alias。

```sql
SELECT columns
  FROM database_name.table_name AS t1,
       database_name.table_name AS t2;
```

## 自連接的應用場景：查詢同名同姓的演員

In [12]:
SELECT DISTINCT a1.name,
       a1.link
  FROM imdb.actors AS a1,
       imdb.actors AS a2
 WHERE a1.name = a2.name AND
       a1.link != a2.link;

name,link
Alan Lee,https://imdb.com/name/nm0496768/
Alan Lee,https://imdb.com/name/nm0496769/
Alan Lee,https://imdb.com/name/nm4556632/
Bill Thompson,https://imdb.com/name/nm0859895/
Bill Thompson,https://imdb.com/name/nm8560460/
Bill Walker,https://imdb.com/name/nm4350793/
Bill Walker,https://imdb.com/name/nm0907553/
Charles Irwin,https://imdb.com/name/nm1558579/
Charles Irwin,https://imdb.com/name/nm0410361/
Charles West,https://imdb.com/name/nm0921983/


## 自連接的應用場景

- 自連接常應用於在條件中多次使用來自同一個資料表的欄位。
- 自連接的可讀性較低，
- 類似的問題可以用分組聚合加上子查詢的可讀性更高。

## 查詢同名同姓的演員：改寫

In [13]:
SELECT name,
       COUNT(*) AS number_of_actors
  FROM imdb.actors
 GROUP BY name
HAVING number_of_actors > 1;

name,number_of_actors
Alan Lee,3
Bill Thompson,2
Bill Walker,2
Charles Irwin,2
Charles West,2
David Murray,2
Frank Stallone,2
Gerard Murphy,2
Greg Ellis,2
Harrison Young,2


In [14]:
SELECT name,
       link
  FROM imdb.actors
 WHERE name IN (SELECT name
                  FROM imdb.actors
                 GROUP BY name
                HAVING COUNT(*) > 1)
 ORDER BY name;

name,link
Alan Lee,https://imdb.com/name/nm4556632/
Alan Lee,https://imdb.com/name/nm0496769/
Alan Lee,https://imdb.com/name/nm0496768/
Bill Thompson,https://imdb.com/name/nm8560460/
Bill Thompson,https://imdb.com/name/nm0859895/
Bill Walker,https://imdb.com/name/nm0907553/
Bill Walker,https://imdb.com/name/nm4350793/
Charles Irwin,https://imdb.com/name/nm0410361/
Charles Irwin,https://imdb.com/name/nm1558579/
Charles West,https://imdb.com/name/nm0921980/


## 元資料

## 什麼是元資料（Metadata）

- 能夠被稱為是資料庫的資料集合必須要具備元資料（Metadata）。
- 元資料是用來描述資料表的資料。
- 元資料之於資料，就像是英英字典之於英文單字一般。

## MySQL 透過 `information_schema` 資料庫提供元資料

- `information_schema` 資料庫的物件都是以檢視表的形式存在。
- 在 `FROM` 保留字之後指定檢視表名稱 `information_schema.{view_name}` 就可以查詢元資料的相關資訊。
- `{view_name}` 可以指定的檢視表名稱參考：<https://dev.mysql.com/doc/refman/8.0/en/information-schema-table-reference.html>

## 常用的元資料檢視表

- `information_schema.schemata`
- `information_schema.tables`
- `information_schema.columns`
- `information_schema.table_constraints`
- `information_schema.key_column_usage`

## 常用的元資料檢視表：`information_schema.schemata`

提供資料庫資訊。

In [15]:
SELECT SCHEMA_NAME
  FROM information_schema.schemata;

SCHEMA_NAME
mysql
information_schema
performance_schema
sys
cloned_imdb
cloned_covid19
tcl
transaction_control
imdb
covid19


## 常用的元資料檢視表：`information_schema.tables`

提供資料表與檢視表的資訊。

In [16]:
SELECT TABLE_SCHEMA,
       TABLE_NAME,
       TABLE_TYPE
  FROM information_schema.tables
 WHERE TABLE_SCHEMA IN ('imdb', 'information_schema')
 LIMIT 10;

TABLE_SCHEMA,TABLE_NAME,TABLE_TYPE
information_schema,CHARACTER_SETS,SYSTEM VIEW
information_schema,CHECK_CONSTRAINTS,SYSTEM VIEW
information_schema,COLLATIONS,SYSTEM VIEW
information_schema,COLLATION_CHARACTER_SET_APPLICABILITY,SYSTEM VIEW
information_schema,COLUMNS,SYSTEM VIEW
information_schema,COLUMNS_EXTENSIONS,SYSTEM VIEW
information_schema,COLUMN_STATISTICS,SYSTEM VIEW
information_schema,EVENTS,SYSTEM VIEW
information_schema,FILES,SYSTEM VIEW
information_schema,INNODB_DATAFILES,SYSTEM VIEW


## 常用的元資料檢視表：`information_schema.columns`

提供資料表與檢視表的欄位資訊。

In [17]:
SELECT TABLE_SCHEMA,
       TABLE_NAME,
       COLUMN_NAME
  FROM information_schema.columns
 WHERE TABLE_SCHEMA = 'imdb' AND
       TABLE_NAME = 'movies';

TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME
imdb,movies,id
imdb,movies,title
imdb,movies,release_year
imdb,movies,runtime
imdb,movies,rating
imdb,movies,link


## 暸解一個資料表的外型 `(m, n)`

- `m` 可以透過 `SELECT COUNT(*) FROM database_name.table_name`
- `n` 可以透過 `SELECT COUNT(*) FROM information_schema.columns WHERE TABLE_SCHEMA = '{database_name}' TABLE_NAME = {table_name}`

## 暸解一個資料表的外型：以 `imdb.movies` 資料表為例

In [18]:
SELECT COUNT(*) AS m
  FROM imdb.movies;

m
250


In [19]:
SELECT COUNT(*) AS n
  FROM information_schema.columns
 WHERE TABLE_SCHEMA = 'imdb' AND
       TABLE_NAME = 'movies';

n
6


## 常用的元資料檢視表：`information_schema.table_constraints`

提供獨特性、外來鍵與主鍵約束資訊。

In [20]:
SELECT TABLE_SCHEMA,
       TABLE_NAME,
       CONSTRAINT_TYPE
  FROM information_schema.table_constraints
 WHERE TABLE_SCHEMA = 'imdb';

TABLE_SCHEMA,TABLE_NAME,CONSTRAINT_TYPE
imdb,actors,PRIMARY KEY
imdb,directors,PRIMARY KEY
imdb,movies,PRIMARY KEY
imdb,movies_actors,PRIMARY KEY
imdb,movies_actors,FOREIGN KEY
imdb,movies_actors,FOREIGN KEY
imdb,movies_directors,PRIMARY KEY
imdb,movies_directors,FOREIGN KEY
imdb,movies_directors,FOREIGN KEY
imdb,movies_writers,PRIMARY KEY


## 常用的元資料檢視表：`information_schema.key_column_usage`

提供與約束相關的欄位資訊。

In [21]:
SELECT TABLE_SCHEMA,
       TABLE_NAME,
       CONSTRAINT_NAME,
       COLUMN_NAME,
       REFERENCED_TABLE_NAME,
       REFERENCED_COLUMN_NAME
  FROM information_schema.key_column_usage
 WHERE TABLE_SCHEMA = 'imdb';

TABLE_SCHEMA,TABLE_NAME,CONSTRAINT_NAME,COLUMN_NAME,REFERENCED_TABLE_NAME,REFERENCED_COLUMN_NAME
imdb,actors,PRIMARY,id,,
imdb,directors,PRIMARY,id,,
imdb,movies,PRIMARY,id,,
imdb,movies_actors,PRIMARY,id,,
imdb,movies_actors,fk_movies_actors_actors,actor_id,actors,id
imdb,movies_actors,fk_movies_actors_movies,movie_id,movies,id
imdb,movies_directors,PRIMARY,id,,
imdb,movies_directors,fk_movies_directors_directors,director_id,directors,id
imdb,movies_directors,fk_movies_directors_movies,movie_id,movies,id
imdb,movies_writers,PRIMARY,id,,


## MySQL 視窗函數

## 什麼是視窗函數

- 視窗函數（Window functions）與分組聚合（`GROUP BY` 與聚合函數）的概念類似。
- 兩者相同之處在於她們都是跨列的聚合計算（Aggregate across rows）。
- 兩者相異之處在於視窗函數並不會將多列的輸出摘要為單列輸出，而聚合函數會。

## 如何使用視窗函數

- 搭配保留字：
    - `OVER()`
    - `PARTITION BY`
- `PARTITION BY` 保留字之於視窗函數，就像是 `GROUP BY` 保留字之於聚合函數一般。

```sql
SELECT window_function() OVER (PARTITION BY columns) AS alias
  FROM database_name.table_name;
```

## 聚合函數會將多列的輸出摘要為單列輸出

In [22]:
SELECT rating,
       COUNT(*) AS number_of_movies
  FROM imdb.movies
 GROUP BY rating;

rating,number_of_movies
9.3,1
9.2,1
9.0,5
8.9,2
8.8,8
8.7,5
8.6,12
8.5,21
8.4,27
8.3,41


## 視窗函數不會將多列的輸出摘要為單列輸出

In [23]:
SELECT title,
       rating,
       COUNT(*) OVER (PARTITION BY rating) AS number_of_movies_by_rating
  FROM imdb.movies
 ORDER BY rating DESC
 LIMIT 10;

title,rating,number_of_movies_by_rating
The Shawshank Redemption,9.3,1
The Godfather,9.2,1
The Dark Knight,9.0,5
The Godfather Part II,9.0,5
12 Angry Men,9.0,5
Schindler's List,9.0,5
The Lord of the Rings: The Return of the King,9.0,5
Pulp Fiction,8.9,2
Spider-Man: Across the Spider-Verse,8.9,2
The Lord of the Rings: The Fellowship of the Ring,8.8,8


## 視窗函數的分類

1. 一般聚合函數。
2. 具有排序功能的視窗函數。
3. `LEAD()` 與 `LAG()` 函數。

## 視窗函數的分類：一般聚合函數

In [24]:
SELECT title,
       rating,
       AVG(rating) OVER (PARTITION BY release_year) AS avg_rating_by_year
  FROM imdb.movies
 ORDER BY release_year DESC
 LIMIT 10;

title,rating,avg_rating_by_year
Spider-Man: Across the Spider-Verse,8.9,8.85
Oppenheimer,8.8,8.85
Top Gun: Maverick,8.3,8.3
Spider-Man: No Way Home,8.2,8.5
Jai Bhim,8.8,8.5
Hamilton,8.3,8.25
The Father,8.2,8.25
Parasite,8.5,8.3
Joker,8.4,8.3
Avengers: Endgame,8.4,8.3


## 視窗函數的分類：具有排序功能的視窗函數

- `ROW_NUMBER()`：相同數值給予不重複排序。
- `RANK()`：相同數值給予重複排序。
- `FIRST_VALUE()`/`LAST_VALUE()`/`NTH_VALUE()`：排名第一、最後、第 N 位的數值。
- 搭配具有排序功能視窗函數的時候加入 `ORDER BY`。

## 視窗函數的分類：具有排序功能的視窗函數 `ROW_NUMBER()` 與 `RANK()`

In [25]:
SELECT title,
       rating,
       ROW_NUMBER() OVER (ORDER BY rating DESC) AS row_num,
       RANK() OVER (ORDER BY rating DESC) AS rating_rank
  FROM imdb.movies
 LIMIT 10;

title,rating,row_num,rating_rank
The Shawshank Redemption,9.3,1,1
The Godfather,9.2,2,2
The Dark Knight,9.0,3,3
The Godfather Part II,9.0,4,3
12 Angry Men,9.0,5,3
Schindler's List,9.0,6,3
The Lord of the Rings: The Return of the King,9.0,7,3
Pulp Fiction,8.9,8,8
Spider-Man: Across the Spider-Verse,8.9,9,8
The Lord of the Rings: The Fellowship of the Ring,8.8,10,10


## 視窗函數的分類：具有排序功能的視窗函數 `FIRST_VALUE()`/`LAST_VALUE()`/`NTH_VALUE()`

In [26]:
SELECT title,
       release_year,
       rating,
       FIRST_VALUE(rating) OVER (PARTITION BY release_year ORDER BY rating DESC) AS highest_rating_by_year,
       LAST_VALUE(rating) OVER (PARTITION BY release_year) AS lowest_rating_by_year,
       NTH_VALUE(rating, 2) OVER (PARTITION BY release_year) AS second_highest_rating_by_year
  FROM imdb.movies
 ORDER BY release_year DESC
 LIMIT 10;

title,release_year,rating,highest_rating_by_year,lowest_rating_by_year,second_highest_rating_by_year
Spider-Man: Across the Spider-Verse,2023,8.9,8.9,8.8,8.8
Oppenheimer,2023,8.8,8.9,8.8,8.8
Top Gun: Maverick,2022,8.3,8.3,8.3,
Jai Bhim,2021,8.8,8.8,8.2,8.2
Spider-Man: No Way Home,2021,8.2,8.8,8.2,8.2
Hamilton,2020,8.3,8.3,8.2,8.2
The Father,2020,8.2,8.3,8.2,8.2
Parasite,2019,8.5,8.5,8.1,8.4
Joker,2019,8.4,8.5,8.1,8.4
Avengers: Endgame,2019,8.4,8.5,8.1,8.4


## 視窗函數的分類：`LAG()`

In [27]:
SELECT calendars.recorded_on,
       locations.country_name AS country,
       accumulative_cases.confirmed AS accumulative_confirmed,
       LAG(accumulative_cases.confirmed) OVER (PARTITION BY accumulative_cases.location_id ORDER BY calendars.recorded_on) AS lag_accumulative_confirmed,
       accumulative_cases.deaths,
       LAG(accumulative_cases.deaths) OVER (PARTITION BY accumulative_cases.location_id ORDER BY calendars.recorded_on) AS lag_accumulative_deaths
  FROM covid19.accumulative_cases
  JOIN covid19.calendars
    ON accumulative_cases.calendar_id = calendars.id
  JOIN covid19.locations
    ON accumulative_cases.location_id = locations.id
 WHERE locations.country_name = 'Taiwan'
 LIMIT 30;

recorded_on,country,accumulative_confirmed,lag_accumulative_confirmed,deaths,lag_accumulative_deaths
2020-01-22,Taiwan,1,,0,
2020-01-23,Taiwan,1,1.0,0,0.0
2020-01-24,Taiwan,3,1.0,0,0.0
2020-01-25,Taiwan,3,3.0,0,0.0
2020-01-26,Taiwan,4,3.0,0,0.0
2020-01-27,Taiwan,5,4.0,0,0.0
2020-01-28,Taiwan,8,5.0,0,0.0
2020-01-29,Taiwan,8,8.0,0,0.0
2020-01-30,Taiwan,9,8.0,0,0.0
2020-01-31,Taiwan,10,9.0,0,0.0


## 使用 `LAG()` 函數計算每日新增確診、死亡數

In [28]:
SELECT calendars.recorded_on,
       locations.country_name AS country,
       accumulative_cases.confirmed AS accumulative_confirmed,
       accumulative_cases.confirmed - LAG(accumulative_cases.confirmed) OVER (PARTITION BY accumulative_cases.location_id ORDER BY calendars.recorded_on) AS daily_confirmed,
       accumulative_cases.deaths AS accumulative_deaths,
       accumulative_cases.deaths - LAG(accumulative_cases.deaths) OVER (PARTITION BY accumulative_cases.location_id ORDER BY calendars.recorded_on) AS daily_deaths
  FROM covid19.accumulative_cases
  JOIN covid19.calendars
    ON accumulative_cases.calendar_id = calendars.id
  JOIN covid19.locations
    ON accumulative_cases.location_id = locations.id
 WHERE locations.country_name = 'Taiwan'
 LIMIT 30;

recorded_on,country,accumulative_confirmed,daily_confirmed,accumulative_deaths,daily_deaths
2020-01-22,Taiwan,1,,0,
2020-01-23,Taiwan,1,0.0,0,0.0
2020-01-24,Taiwan,3,2.0,0,0.0
2020-01-25,Taiwan,3,0.0,0,0.0
2020-01-26,Taiwan,4,1.0,0,0.0
2020-01-27,Taiwan,5,1.0,0,0.0
2020-01-28,Taiwan,8,3.0,0,0.0
2020-01-29,Taiwan,8,0.0,0,0.0
2020-01-30,Taiwan,9,1.0,0,0.0
2020-01-31,Taiwan,10,1.0,0,0.0


## 重點統整

- 進階的 SQL 查詢技巧
    - 通用資料表運算式（Common Table Expression, CTE)
    - 集合運算（Set operations）
    - 交叉連接（Cross-join）與自連接（Self-join）
    - 元資料（Metadata）
    - MySQL 視窗函數（Window functions）

## 重點統整（續）

- 使用通用資料表運算式必須在同一個 SQL 敘述中涵蓋定義與查詢的語法。
- MySQL 支援的集合運算保留字有：
    - `UNION`：聯集並且剔除重複、對結果排序。
    - `UNION ALL`：聯集並且不剔除重複、亦不對結果排序。
    - `INTERSECT`：交集。
    - `EXCEPT`：差集。

## 重點統整（續）

- 交叉連接會將關聯的資料表中所有可能的排列組合（又稱枚舉 Enumerate）作為結果回傳
- MySQL 透過 `information_schema` 資料庫提供元資料，`information_schema` 資料庫的物件都是以檢視表的形式存在。
- 常用的元資料檢視表
    - `information_schema.schemata`
    - `information_schema.tables`
    - `information_schema.columns`
    - `information_schema.table_constraints`
    - `information_schema.key_column_usage`

## 重點統整（續）

- 視窗函數（Window functions）與分組聚合（GROUP BY 與聚合函數）的概念類似。
- 兩者相同之處在於她們都是跨列的聚合計算（Aggregate across rows）。
- 兩者相異之處在於視窗函數並不會將多列的輸出摘要為單列輸出，而聚合函數會。
- 使用視窗函數搭配特別的保留字：
    - `OVER()`
    - `PARTITION BY`，`PARTITION BY` 保留字之於視窗函數，就像是 `GROUP BY` 保留字之於聚合函數一般。

## 重點統整（續）

- 視窗函數的分類
    - 一般聚合函數。
    - 具有排序功能的視窗函數。
        - `ROW_NUMBER()`
        - `RANK()`
        - `FIRST_VALUE()`/`LAST_VALUE()`/`NTH_VALUE()`
    - `LEAD()` 與 `LAG()` 函數。