# SQL 的五十道練習

> 篩選資料

[數據交點](https://www.datainpoint.com/) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 這個章節要學起來的 SQL 保留字

- `WHERE`
- `LIKE`
- `AND`
- `BETWEEN`
- `OR`
- `IN`
- `NOT`
- `IS NULL`

In [1]:
%LOAD sqlite3 db=../databases/imdb.db timeout=2 shared_cache=true

In [2]:
ATTACH "../databases/nba.db" AS nba;

In [3]:
ATTACH "../databases/twElection2020.db" AS twElection2020;

In [4]:
ATTACH "../databases/covid19.db" AS covid19;

## 以 `WHERE` 篩選資料

## 在實際應用 SQL 時，常見需求是選出資料表中符合「特定條件」的觀測值，例如：

- 在 covid19 資料庫中找出台灣的資料。
- 在 imdb 資料庫中找出 1994 年上映的經典電影。
- 在 nba 資料庫中找出 Fantasy Game 想要選的球員。
- 在 twElection2020 資料庫中找出台北市的資料。

## 加入 `WHERE` 能夠以條件（Conditions）作為篩選觀測值的依據

```sql
SELECT column_names
  FROM table_name
 WHERE conditions;
```

![filter](filter.png)

## 撰寫條件之前，我們要暸解兩個觀念：

1. 比較運算符：能夠產生布林（Boolean）的運算符號。
2. 布林（Boolean）：用來表示比較結果的兩個值（真、假）。

## 基礎比較運算符

|比較運算符|作用描述|
|--------|-------|
|`=`|相等|
|`!=`|不相等|
|`>`|大於|
|`<`|小於|
|`>=`|大於等於|
|`<=`|小於等於|

## 比較結果為「真」的布林，SQLite 以 `1` 表示

In [5]:
SELECT 5566 = 5566 AS bool_true;

bool_true
1


## 比較結果為「假」的布林，SQLite 以 `0` 表示

In [6]:
SELECT 5566 != 5566 AS bool_false;

bool_false
0


## 針對資料表中的欄位使用比較運算符會在每列觀測值都對應生成一個布林

In [7]:
SELECT release_year = 1994 AS bool_values
  FROM movies
 LIMIT 10;

bool_values
1
0
0
0
0
0
0
1
0
0


## 若是在 `WHERE` 後應用比較運算符撰寫條件，會篩選出布林為真（`1`）的觀測值

In [8]:
SELECT release_year = 1994 AS bool_values
  FROM movies
 WHERE release_year = 1994;

bool_values
1
1
1
1
1


In [9]:
SELECT *
  FROM movies
 WHERE release_year = 1994;  -- 篩選 1994 年上映的電影

id,title,release_year,rating,director,runtime
1,The Shawshank Redemption,1994,9.3,Frank Darabont,142
8,Pulp Fiction,1994,8.9,Quentin Tarantino,154
12,Forrest Gump,1994,8.8,Robert Zemeckis,142
31,Léon: The Professional,1994,8.5,Luc Besson,110
34,The Lion King,1994,8.5,Roger Allers,88


## 比較運算符也可以應用在文字類型的變數

In [10]:
SELECT *
  FROM movies
 WHERE director = 'Christopher Nolan';

id,title,release_year,rating,director,runtime
4,The Dark Knight,2008,9.0,Christopher Nolan,152
13,Inception,2010,8.8,Christopher Nolan,148
29,Interstellar,2014,8.6,Christopher Nolan,169
47,The Prestige,2006,8.5,Christopher Nolan,130
54,Memento,2000,8.4,Christopher Nolan,113
71,The Dark Knight Rises,2012,8.4,Christopher Nolan,164
127,Batman Begins,2005,8.2,Christopher Nolan,140


## 特徵比對

## 除了基礎比較運算符，對文字類型的變數撰寫條件時，還能夠使用具備特徵比對（Pattern matching）性質的比較運算符 `LIKE`

## 使用 `LIKE` 比較運算符的時候需要搭配萬用字元（Wildcards）

|萬用字元|作用描述|
|-------|------|
|`%`|表示任意文字，包含空字串|
|`_`|表示剛好一個文字|

In [11]:
SELECT *
  FROM players
 WHERE firstName LIKE 'L%';  -- 篩選名字是 L 開頭的球員

firstName,lastName,temporaryDisplayName,personId,teamId,jersey,pos,heightFeet,heightInches,heightMeters,weightPounds,weightKilograms,dateOfBirthUTC,nbaDebutYear,yearsPro,collegeName,lastAffiliation,country
LeBron,James,"James, LeBron",2544,1610612747,23,F,6,9,2.06,250,113.4,1984-12-30,2003,17,St. Vincent-St. Mary HS (OH),St. Vincent-St. Mary HS (OH)/USA,USA
Lou,Williams,"Williams, Lou",101150,1610612737,6,G,6,1,1.85,175,79.4,1986-10-27,2005,15,South Gwinnett HS (GA),South Gwinnett HS (GA)/USA,USA
LaMarcus,Aldridge,"Aldridge, LaMarcus",200746,1610612751,12,C-F,6,11,2.11,250,113.4,1985-07-19,2006,14,Texas,Texas/USA,USA
Langston,Galloway,"Galloway, Langston",204038,1610612756,2,G,6,1,1.85,200,90.7,1991-12-09,2014,6,St. Joseph's (PA),St. Joseph's (PA)/USA,USA
Larry,Nance Jr.,"Nance Jr., Larry",1626204,1610612739,22,F-C,6,7,2.01,245,111.1,1993-01-01,2015,5,Wyoming,Wyoming/USA,USA
Lonzo,Ball,"Ball, Lonzo",1628366,1610612740,2,G,6,6,1.98,190,86.2,1997-10-27,2017,3,UCLA,UCLA/USA,USA
Lauri,Markkanen,"Markkanen, Lauri",1628374,1610612741,24,F-C,7,0,2.13,240,108.9,1997-05-22,2017,3,Arizona,Arizona/Finland,Finland
Luke,Kennard,"Kennard, Luke",1628379,1610612746,5,G,6,5,1.96,206,93.4,1996-06-24,2017,3,Duke,Duke/USA,USA
Luke,Kornet,"Kornet, Luke",1628436,1610612738,40,F-C,7,2,2.18,250,113.4,1995-07-15,2017,3,Vanderbilt,Vanderbilt/USA,USA
Landry,Shamet,"Shamet, Landry",1629013,1610612751,20,G,6,4,1.93,190,86.2,1997-03-13,2018,2,Wichita State,Wichita State/USA,USA


In [12]:
SELECT *
  FROM players
 WHERE firstName LIKE 'L_____'; -- 篩選名字是 L 開頭後面接五個字元的球員

firstName,lastName,temporaryDisplayName,personId,teamId,jersey,pos,heightFeet,heightInches,heightMeters,weightPounds,weightKilograms,dateOfBirthUTC,nbaDebutYear,yearsPro,collegeName,lastAffiliation,country
LeBron,James,"James, LeBron",2544,1610612747,23,F,6,9,2.06,250,113.4,1984-12-30,2003,17,St. Vincent-St. Mary HS (OH),St. Vincent-St. Mary HS (OH)/USA,USA
Landry,Shamet,"Shamet, Landry",1629013,1610612751,20,G,6,4,1.93,190,86.2,1997-03-13,2018,2,Wichita State,Wichita State/USA,USA
Lonnie,Walker IV,"Walker IV, Lonnie",1629022,1610612759,1,G-F,6,4,1.93,204,92.5,1998-12-14,2018,2,Miami,Miami/USA,USA
LaMelo,Ball,"Ball, LaMelo",1630163,1610612766,2,G,6,6,1.98,180,81.6,2001-08-22,2020,0,Illawarra,Illawarra/USA,USA


## 邏輯運算符

## 當 `WHERE` 後的條件有多個的時候，必須使用邏輯運算符結合這些條件

## 基礎的邏輯運算符有：

- `AND` 結合條件的交集。
- `BETWEEN` 結合數值比較條件的交集。
- `OR` 結合條件的聯集。
- `IN` 結合條件的聯集。
- `NOT` 反轉真假。

## 使用 `AND` 結合兩個條件時，要兩條件皆為真才會判斷為真，其餘狀況均為假

In [13]:
SELECT rating >= 8.8 AS condition_1,
       rating <= 9.0 AS condition_2,
       rating >= 8.8 AND rating <= 9.0 AS condtion_1_and_condition_2
  FROM movies
 LIMIT 15;

condition_1,condition_2,condtion_1_and_condition_2
1,0,0
1,0,0
1,1,1
1,1,1
1,1,1
1,1,1
1,1,1
1,1,1
1,1,1
1,1,1


In [14]:
SELECT *
  FROM movies
 WHERE rating >= 8.8 AND
       rating <= 9.0;  -- 評等介於 8.8 與 9.0 之間的電影

id,title,release_year,rating,director,runtime
3,The Godfather: Part II,1974,9.0,Francis Ford Coppola,202
4,The Dark Knight,2008,9.0,Christopher Nolan,152
5,12 Angry Men,1957,9.0,Sidney Lumet,96
6,Schindler's List,1993,8.9,Steven Spielberg,195
7,The Lord of the Rings: The Return of the King,2003,8.9,Peter Jackson,201
8,Pulp Fiction,1994,8.9,Quentin Tarantino,154
9,"The Good, the Bad and the Ugly",1966,8.8,Sergio Leone,178
10,The Lord of the Rings: The Fellowship of the Ring,2001,8.8,Peter Jackson,178
11,Fight Club,1999,8.8,David Fincher,139
12,Forrest Gump,1994,8.8,Robert Zemeckis,142


## 以 `AND` 結合數值比較條件時，更推薦使用 `BETWEEN`

In [15]:
SELECT *
  FROM movies
 WHERE rating BETWEEN 8.8 AND 9.0;  -- 評等介於 8.8 與 9.0 之間的電影

id,title,release_year,rating,director,runtime
3,The Godfather: Part II,1974,9.0,Francis Ford Coppola,202
4,The Dark Knight,2008,9.0,Christopher Nolan,152
5,12 Angry Men,1957,9.0,Sidney Lumet,96
6,Schindler's List,1993,8.9,Steven Spielberg,195
7,The Lord of the Rings: The Return of the King,2003,8.9,Peter Jackson,201
8,Pulp Fiction,1994,8.9,Quentin Tarantino,154
9,"The Good, the Bad and the Ugly",1966,8.8,Sergio Leone,178
10,The Lord of the Rings: The Fellowship of the Ring,2001,8.8,Peter Jackson,178
11,Fight Club,1999,8.8,David Fincher,139
12,Forrest Gump,1994,8.8,Robert Zemeckis,142


## 使用 `OR` 結合兩個條件時，要兩者皆為假才為假，其餘狀況均為真

In [16]:
SELECT divName = 'Atlantic' AS condition_1,
       divName = 'Pacific' AS condition_2,
       divName = 'Atlantic' OR divName = 'Pacific' AS condition_1_or_condition_2
  FROM teams
 LIMIT 10;

condition_1,condition_2,condition_1_or_condition_2
0,0,0
1,0,1
0,0,0
0,0,0
0,0,0
0,0,0
0,0,0
0,1,1
0,0,0
0,1,1


In [17]:
SELECT *
  FROM teams
 WHERE divName = 'Atlantic' OR
       divName = 'Pacific';  -- 分組為 Atlantic 或 Pacific 的球隊

isNBAFranchise,isAllStar,city,altCityName,fullName,tricode,teamId,nickname,urlName,teamShortName,confName,divName
1,0,Boston,Boston,Boston Celtics,BOS,1610612738,Celtics,celtics,Boston,East,Atlantic
1,0,Golden State,Golden State,Golden State Warriors,GSW,1610612744,Warriors,warriors,Golden State,West,Pacific
1,0,LA,LA Clippers,LA Clippers,LAC,1610612746,Clippers,clippers,LA Clippers,West,Pacific
1,0,Los Angeles,Los Angeles Lakers,Los Angeles Lakers,LAL,1610612747,Lakers,lakers,L.A. Lakers,West,Pacific
1,0,Brooklyn,Brooklyn,Brooklyn Nets,BKN,1610612751,Nets,nets,Brooklyn,East,Atlantic
1,0,New York,New York,New York Knicks,NYK,1610612752,Knicks,knicks,New York,East,Atlantic
1,0,Philadelphia,Philadelphia,Philadelphia 76ers,PHI,1610612755,76ers,sixers,Philadelphia,East,Atlantic
1,0,Phoenix,Phoenix,Phoenix Suns,PHX,1610612756,Suns,suns,Phoenix,West,Pacific
1,0,Sacramento,Sacramento,Sacramento Kings,SAC,1610612758,Kings,kings,Sacramento,West,Pacific
1,0,Toronto,Toronto,Toronto Raptors,TOR,1610612761,Raptors,raptors,Toronto,East,Atlantic


## 以 `OR` 結合比較條件時，更推薦使用 `IN`

In [18]:
SELECT *
  FROM teams
 WHERE divName IN ('Atlantic', 'Pacific');  -- 分組為 Atlantic 或 Pacific 的球隊

isNBAFranchise,isAllStar,city,altCityName,fullName,tricode,teamId,nickname,urlName,teamShortName,confName,divName
1,0,Boston,Boston,Boston Celtics,BOS,1610612738,Celtics,celtics,Boston,East,Atlantic
1,0,Golden State,Golden State,Golden State Warriors,GSW,1610612744,Warriors,warriors,Golden State,West,Pacific
1,0,LA,LA Clippers,LA Clippers,LAC,1610612746,Clippers,clippers,LA Clippers,West,Pacific
1,0,Los Angeles,Los Angeles Lakers,Los Angeles Lakers,LAL,1610612747,Lakers,lakers,L.A. Lakers,West,Pacific
1,0,Brooklyn,Brooklyn,Brooklyn Nets,BKN,1610612751,Nets,nets,Brooklyn,East,Atlantic
1,0,New York,New York,New York Knicks,NYK,1610612752,Knicks,knicks,New York,East,Atlantic
1,0,Philadelphia,Philadelphia,Philadelphia 76ers,PHI,1610612755,76ers,sixers,Philadelphia,East,Atlantic
1,0,Phoenix,Phoenix,Phoenix Suns,PHX,1610612756,Suns,suns,Phoenix,West,Pacific
1,0,Sacramento,Sacramento,Sacramento Kings,SAC,1610612758,Kings,kings,Sacramento,West,Pacific
1,0,Toronto,Toronto,Toronto Raptors,TOR,1610612761,Raptors,raptors,Toronto,East,Atlantic


## 使用 `NOT` 將條件的比較結果反轉，亦即真假互換

In [19]:
SELECT divName = 'Atlantic' AS condition_1,
       divName = 'Pacific' AS condition_2,
       NOT (divName = 'Atlantic' OR divName = 'Pacific') AS not_condition_1_nor_condition_2
  FROM teams;

condition_1,condition_2,not_condition_1_nor_condition_2
0,0,1
1,0,0
0,0,1
0,0,1
0,0,1
0,0,1
0,0,1
0,1,0
0,0,1
0,1,0


In [20]:
SELECT *
  FROM teams
 WHERE divName NOT IN ('Atlantic', 'Pacific');

isNBAFranchise,isAllStar,city,altCityName,fullName,tricode,teamId,nickname,urlName,teamShortName,confName,divName
1,0,Atlanta,Atlanta,Atlanta Hawks,ATL,1610612737,Hawks,hawks,Atlanta,East,Southeast
1,0,Cleveland,Cleveland,Cleveland Cavaliers,CLE,1610612739,Cavaliers,cavaliers,Cleveland,East,Central
1,0,New Orleans,New Orleans,New Orleans Pelicans,NOP,1610612740,Pelicans,pelicans,New Orleans,West,Southwest
1,0,Chicago,Chicago,Chicago Bulls,CHI,1610612741,Bulls,bulls,Chicago,East,Central
1,0,Dallas,Dallas,Dallas Mavericks,DAL,1610612742,Mavericks,mavericks,Dallas,West,Southwest
1,0,Denver,Denver,Denver Nuggets,DEN,1610612743,Nuggets,nuggets,Denver,West,Northwest
1,0,Houston,Houston,Houston Rockets,HOU,1610612745,Rockets,rockets,Houston,West,Southwest
1,0,Miami,Miami,Miami Heat,MIA,1610612748,Heat,heat,Miami,East,Southeast
1,0,Milwaukee,Milwaukee,Milwaukee Bucks,MIL,1610612749,Bucks,bucks,Milwaukee,East,Central
1,0,Minnesota,Minnesota,Minnesota Timberwolves,MIN,1610612750,Timberwolves,timberwolves,Minnesota,West,Northwest


## 遺漏值的比較運算符

## `NULL` 遺漏值（或稱空值）不適用基礎比較運算符

In [21]:
SELECT *
  FROM lookup_table
 WHERE Province_State = NULL AND
       Admin2 = NULL;

## 要判斷是否為遺漏值，必須使用 `IS NULL` 作為比較運算符

In [22]:
SELECT *
  FROM lookup_table
 WHERE Province_State IS NULL AND
       Admin2 IS NULL
 LIMIT 10;

UID,Combined_Key,iso2,iso3,Country_Region,Province_State,Admin2,Lat,Long_,Population
4,Afghanistan,AF,AFG,Afghanistan,,,33.93911,67.709953,38928341
8,Albania,AL,ALB,Albania,,,41.1533,20.1683,2877800
12,Algeria,DZ,DZA,Algeria,,,28.0339,1.6596,43851043
20,Andorra,AD,AND,Andorra,,,42.5063,1.5218,77265
24,Angola,AO,AGO,Angola,,,-11.2027,17.8739,32866268
28,Antigua and Barbuda,AG,ATG,Antigua and Barbuda,,,17.0608,-61.7964,97928
32,Argentina,AR,ARG,Argentina,,,-38.4161,-63.6167,45195777
51,Armenia,AM,ARM,Armenia,,,40.0691,45.0382,2963234
40,Austria,AT,AUT,Austria,,,47.5162,14.5501,9006400
31,Azerbaijan,AZ,AZE,Azerbaijan,,,40.1431,47.5769,10139175


In [23]:
SELECT *
  FROM lookup_table
 WHERE Province_State IS NOT NULL AND
       Admin2 IS NOT NULL
 LIMIT 10;

UID,Combined_Key,iso2,iso3,Country_Region,Province_State,Admin2,Lat,Long_,Population
535,"Bonaire, Sint Eustatius and Saba, Netherlands",BQ,BES,Netherlands,Sint Eustatius and Saba,Bonaire,12.1784,-68.2385,26221
654,"Saint Helena, Ascension and Tristan da Cunha, United Kingdom",SH,SHN,United Kingdom,Ascension and Tristan da Cunha,Saint Helena,-7.9467,-14.3559,5661
63072001,"Adjuntas, Puerto Rico, US",PR,PRI,US,Puerto Rico,Adjuntas,18.180117,-66.754367,17363
63072003,"Aguada, Puerto Rico, US",PR,PRI,US,Puerto Rico,Aguada,18.360255,-67.175131,36694
63072005,"Aguadilla, Puerto Rico, US",PR,PRI,US,Puerto Rico,Aguadilla,18.459681,-67.120815,50265
63072007,"Aguas Buenas, Puerto Rico, US",PR,PRI,US,Puerto Rico,Aguas Buenas,18.251619,-66.126806,24814
63072009,"Aibonito, Puerto Rico, US",PR,PRI,US,Puerto Rico,Aibonito,18.131361,-66.264131,22108
63072011,"Anasco, Puerto Rico, US",PR,PRI,US,Puerto Rico,Anasco,18.287985,-67.120611,26161
63072013,"Arecibo, Puerto Rico, US",PR,PRI,US,Puerto Rico,Arecibo,18.406631,-66.675077,81966
63072015,"Arroyo, Puerto Rico, US",PR,PRI,US,Puerto Rico,Arroyo,17.998457,-66.056546,17238


## 重點統整

- 加入 `WHERE` 能夠以條件作為篩選觀測值的依據。
- 對變數應用比較運算符生成條件。
- 對文字類型的變數撰寫條件，能夠使用特徵比對的運算符 `LIKE` 搭配萬用字元。
- 當 `WHERE` 後的條件有多個的時候，必須使用邏輯運算符結合這些條件。
- 判斷是否為遺漏值，必須使用 `IS NULL` 作為比較運算符。

```sql
/*
截至目前學起來的 SQL 有哪些？
SQL 寫作順序必須遵從標準 SQL 的規定。
*/
SELECT column_names     -- 選擇哪些欄位
  FROM table_name       -- 從哪個資料庫的資料表
 WHERE conditions       -- 篩選哪些觀測值
 ORDER BY column_names  -- 指定依照哪個變數排序
 LIMIT m;               -- 查詢結果顯示前 m 列就好
```