# SQL 的五十道練習

> 分組與聚合結果篩選

[數據交點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 練習題指引

- 在每份練習題的開始，都會先將四個學習資料庫載入環境。
- 因此 SQL 可以指定四個學習資料庫中的資料表，不需要額外指定資料庫。
- 在 SQL 語法起點與 SQL 語法終點這兩個單行註解之間撰寫能夠得到預期結果的 SQL。
- 可以先在自己電腦的 SQLiteStudio 或者 DBeaver 寫出跟預期結果相同的 SQL 後再複製貼上到練習題。
- 執行測試的方式為點選上方選單的 Kernel -> Restart & Run All -> Restart and Run All Cells。
- 可以每寫一題就執行測試，也可以全部寫完再執行測試。
- 練習題閒置超過 10 分鐘會自動斷線，這時只要重新點選練習題連結即可重新啟動。

In [1]:
import sqlite3
import unittest
import json
import os
import numpy as np
import pandas as pd
conn = sqlite3.connect('../databases/nba.db')
conn.execute("""ATTACH '../databases/covid19.db' AS covid19""")
conn.execute("""ATTACH '../databases/twElection2020.db' AS twElection2020""")
conn.execute("""ATTACH '../databases/imdb.db' AS imdb""")

<sqlite3.Cursor at 0x7f93bd36ea40>

## 25. 從 `imdb` 資料庫的 `movies` 資料表計算每一年有幾部在 IMDb.com 獲得高評等的經典電影，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(85, 2) 的查詢結果。

```
    release_year  number_of_movies
0           1921                 1
1           1924                 1
2           1925                 1
3           1926                 1
4           1927                 1
..           ...               ...
80          2017                 3
81          2018                 6
82          2019                 8
83          2020                 2
84          2021                 1

[85 rows x 2 columns]
```

In [2]:
count_number_of_movies_by_year_from_movies =\
"""
-- SQL 查詢語法起點
SELECT release_year,
       COUNT(*) AS number_of_movies
  FROM movies
 GROUP BY release_year
 ORDER BY release_year;
-- SQL 查詢語法終點
"""

## 26. 從 `imdb` 資料庫的 `movies` 資料表計算每一年有幾部在 IMDb.com 獲得高評等的經典電影，只顯示電影數在 5 部以上（`>= 5`）的年份，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(19, 2) 的查詢結果。

```
    release_year  number_of_movies
0           1957                 6
1           1988                 5
2           1994                 5
3           1995                 8
4           1997                 6
5           1998                 5
6           1999                 5
7           2000                 6
8           2001                 5
9           2003                 5
10          2004                 7
11          2006                 5
12          2009                 6
13          2010                 5
14          2013                 6
15          2014                 5
16          2015                 5
17          2018                 6
18          2019                 8
```

In [3]:
count_number_of_movies_by_year_having_from_movies =\
"""
-- SQL 查詢語法起點
SELECT release_year,
       COUNT(*) AS number_of_movies
  FROM movies
 GROUP BY release_year
HAVING number_of_movies >= 5
 ORDER BY release_year;
-- SQL 查詢語法終點
"""

## 27. 從 `twElection2020` 資料庫的 `presidential` 資料表暸解台灣 2020 總統副總統的選舉結果，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(3, 2) 的查詢結果。

```
   candidate_id  total_votes
0             1       608590
1             2      5522119
2             3      8170231
```

In [4]:
find_summary_by_candidate_id_from_presidential =\
"""
-- SQL 查詢語法起點
SELECT candidate_id,
       SUM(votes) AS total_votes
  FROM presidential
 GROUP BY candidate_id;
-- SQL 查詢語法終點
"""

## 28. 從 `nba` 資料庫的 `players` 資料表根據 `country` 暸解截至 2021-03-31，NBA 由哪些國家的球員所組成，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(43, 2) 的查詢結果。

```
                   country  number_of_players
0                      USA                370
1                   Canada                 17
2                   France                 11
3                Australia                  9
4                  Germany                  6
5                   Serbia                  6
6                  Croatia                  4
7                    Spain                  4
8                   Turkey                  4
9                   Greece                  3
10                   Italy                  3
11                  Latvia                  3
12               Lithuania                  3
13                 Nigeria                  3
14                 Senegal                  3
15                Slovenia                  3
16                 Bahamas                  2
17                  Brazil                  2
18                Cameroon                  2
19                   Japan                  2
20                 Ukraine                  2
21                  Angola                  1
22               Argentina                  1
23                 Austria                  1
24  Bosnia and Herzegovina                  1
25          Czech Republic                  1
26                     DRC                  1
27      Dominican Republic                  1
28                   Egypt                  1
29                 Finland                  1
30                   Gabon                  1
31                 Georgia                  1
32                  Guinea                  1
33                  Israel                  1
34                 Jamaica                  1
35              Montenegro                  1
36             New Zealand                  1
37   Republic of the Congo                  1
38             Saint Lucia                  1
39             South Sudan                  1
40                   Sudan                  1
41             Switzerland                  1
42          United Kingdom                  1
```

In [5]:
count_number_of_players_by_country_from_players =\
"""
-- SQL 查詢語法起點
SELECT country,
       COUNT(*) AS number_of_players
  FROM players
 GROUP BY country
 ORDER BY number_of_players DESC;
-- SQL 查詢語法終點
"""

## 29. 從 `nba` 資料庫的 `players` 資料表根據 `country` 暸解截至 2021-03-31，NBA 由哪些國家的球員所組成，只顯示球員數在 2 位以上（>= 2）並在 9 位以下（<=9）的國家，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(18, 2) 的查詢結果。

```
      country  number_of_players
0   Australia                  9
1     Germany                  6
2      Serbia                  6
3     Croatia                  4
4       Spain                  4
5      Turkey                  4
6      Greece                  3
7       Italy                  3
8      Latvia                  3
9   Lithuania                  3
10    Nigeria                  3
11    Senegal                  3
12   Slovenia                  3
13    Bahamas                  2
14     Brazil                  2
15   Cameroon                  2
16      Japan                  2
17    Ukraine                  2
```

In [6]:
count_number_of_players_by_country_having_from_players =\
"""
-- SQL 查詢語法起點
SELECT country,
       COUNT(*) AS number_of_players
  FROM players
 GROUP BY country
HAVING number_of_players BETWEEN 2 and 9
 ORDER BY number_of_players DESC;
-- SQL 查詢語法終點
"""

## 執行測試！

Kernel -> Restart & Run All -> Restart and Run All Cells.

In [7]:
class TestGroupByHaving(unittest.TestCase):
    def test_25_count_number_of_movies_by_year_from_movies(self):
        number_of_movies_by_year_from_movies = pd.read_sql(count_number_of_movies_by_year_from_movies, conn)
        self.assertEqual(number_of_movies_by_year_from_movies.shape, (85, 2))
        column_values = number_of_movies_by_year_from_movies.iloc[:, 1].values
        self.assertEqual(column_values.sum(), 250)
    def test_26_count_number_of_movies_by_year_having_from_movies(self):
        number_of_movies_by_year_having_from_movies = pd.read_sql(count_number_of_movies_by_year_having_from_movies, conn)
        self.assertEqual(number_of_movies_by_year_having_from_movies.shape, (19, 2))
        column_values = number_of_movies_by_year_having_from_movies.iloc[:, 1].values
        self.assertEqual(column_values.sum(), 109)
    def test_27_find_summary_by_candidate_id_from_presidential(self):
        summary_by_candidate_id_from_presidential = pd.read_sql(find_summary_by_candidate_id_from_presidential, conn)
        self.assertEqual(summary_by_candidate_id_from_presidential.shape, (3, 2))
        column_values = set(summary_by_candidate_id_from_presidential.iloc[:, 1].values)
        self.assertTrue(608590 in column_values)
        self.assertTrue(5522119 in column_values)
        self.assertTrue(8170231 in column_values)
    def test_28_count_number_of_players_by_country_from_players(self):
        number_of_players_by_country_from_players = pd.read_sql(count_number_of_players_by_country_from_players, conn)
        self.assertEqual(number_of_players_by_country_from_players.shape, (43, 2))
        column_values = number_of_players_by_country_from_players.iloc[:, 1].values
        self.assertEqual(column_values.sum(), 484)
    def test_29_count_number_of_players_by_country_having_from_players(self):
        number_of_players_by_country_having_from_players = pd.read_sql(count_number_of_players_by_country_having_from_players, conn)
        self.assertEqual(number_of_players_by_country_having_from_players.shape, (18, 2))
        column_values = number_of_players_by_country_having_from_players.iloc[:, 1].values
        self.assertEqual(column_values.sum(), 64)

suite = unittest.TestLoader().loadTestsFromTestCase(TestGroupByHaving)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)
cwd = os.getcwd()
folder_name = cwd.split("/")[-1]
with open("../exercise_index.json", "r") as content:
    exercise_index = json.load(content)
chapter_name = exercise_index[folder_name]

test_25_count_number_of_movies_by_year_from_movies (__main__.TestGroupByHaving) ... ok
test_26_count_number_of_movies_by_year_having_from_movies (__main__.TestGroupByHaving) ... ok
test_27_find_summary_by_candidate_id_from_presidential (__main__.TestGroupByHaving) ... ok
test_28_count_number_of_players_by_country_from_players (__main__.TestGroupByHaving) ... ok
test_29_count_number_of_players_by_country_having_from_players (__main__.TestGroupByHaving) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.186s

OK


In [8]:
print("您在「{}」章節中的 {} 道 SQL 練習答對了 {} 題。".format(chapter_name, number_of_test_runs, number_of_successes))

您在「分組與聚合結果篩選」章節中的 5 道 SQL 練習答對了 5 題。
