# SQL 的五十道練習

> 綜合練習題

[數據交點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

## 練習題指引

- 在每份練習題的開始，都會先將四個學習資料庫載入環境。
- 因此 SQL 可以指定四個學習資料庫中的資料表，不需要額外指定資料庫。
- 在 SQL 語法起點與 SQL 語法終點這兩個單行註解之間撰寫能夠得到預期結果的 SQL。
- 可以先在自己電腦的 SQLiteStudio 或者 DBeaver 寫出跟預期結果相同的 SQL 後再複製貼上到練習題。
- 執行測試的方式為點選上方選單的 Kernel -> Restart & Run All -> Restart and Run All Cells。
- 可以每寫一題就執行測試，也可以全部寫完再執行測試。
- 練習題閒置超過 10 分鐘會自動斷線，這時只要重新點選練習題連結即可重新啟動。

In [1]:
import sqlite3
import unittest
import json
import os
import numpy as np
import pandas as pd
conn = sqlite3.connect('../databases/nba.db')
conn.execute("""ATTACH '../databases/covid19.db' AS covid19""")
conn.execute("""ATTACH '../databases/twElection2020.db' AS twElection2020""")
conn.execute("""ATTACH '../databases/imdb.db' AS imdb""")

<sqlite3.Cursor at 0x7fcdf04aca40>

## 46. 從 `covid19` 資料庫查詢兩艘郵輪（Grand Princess 與 Diamond Princess）的資訊，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(4, 4) 的查詢結果。

```
  iso2 Country_Region    Province_State  Confirmed
0   CA         Canada  Diamond Princess          0
1   CA         Canada    Grand Princess         13
2   US             US  Diamond Princess         49
3   US             US    Grand Princess        103
```

In [2]:
find_cruise_ships_from_covid19 =\
"""
-- SQL 查詢語法起點
SELECT lookup_table.iso2,
       lookup_table.Country_Region,
       lookup_table.Province_State,
       Confirmed
  FROM daily_report
  JOIN lookup_table
    ON daily_report.Combined_Key = lookup_table.Combined_key
 WHERE Province_State IN ('Grand Princess', 'Diamond Princess');
-- SQL 查詢語法終點
"""

## 47. 從 `covid19` 資料庫查詢截至 2021-03-31 所有國家確診與死亡人數的資訊，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(192, 3) 的查詢結果。

```
         Country_Region  Confirmed  Deaths
0           Afghanistan      56454    2484
1               Albania     125157    2235
2               Algeria     117192    3093
3               Andorra      12010     115
4                Angola      22311     537
..                  ...        ...     ...
187             Vietnam       2603      35
188  West Bank and Gaza     242353    2627
189               Yemen       4357     888
190              Zambia      88418    1208
191            Zimbabwe      36882    1523

[192 rows x 3 columns]
```

In [3]:
summarize_confirmed_deaths_by_country_from_covid19 =\
"""
-- SQL 查詢語法起點
SELECT lookup_table.Country_Region,
       SUM(Confirmed) AS Confirmed,
       SUM(Deaths) AS Deaths
  FROM daily_report
  JOIN lookup_table
    ON daily_report.Combined_Key = lookup_table.Combined_key
 GROUP BY lookup_table.Country_Region;
-- SQL 查詢語法終點
"""

## 48. 從 `imdb` 資料庫查詢「魔戒三部曲」與「蝙蝠俠三部曲」的電影資訊與演員名單，三部曲電影系列中演員重複出演的情況是正常的，這時顯示獨一值即可，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(66, 2) 的查詢結果。

```
                          trilogy                  name
0                  Batman Trilogy         Aaron Eckhart
1                  Batman Trilogy          Aidan Gillen
2                  Batman Trilogy        Alon Aboutboul
3                  Batman Trilogy         Anne Hathaway
4                  Batman Trilogy  Anthony Michael Hall
..                            ...                   ...
61  The Lord of the Rings Trilogy         Sadwyn Brophy
62  The Lord of the Rings Trilogy            Sala Baker
63  The Lord of the Rings Trilogy            Sam Comery
64  The Lord of the Rings Trilogy            Sean Astin
65  The Lord of the Rings Trilogy             Sean Bean

[66 rows x 2 columns]
```

In [4]:
find_two_trilogy_casting_list =\
"""
-- SQL 查詢語法起點
SELECT CASE WHEN movies.title LIKE '%Lord of the Rings%' THEN 'The Lord of the Rings Trilogy'
            ELSE 'Batman Trilogy' END AS trilogy,
       actors.name
  FROM actors
  JOIN casting
    ON actors.id = casting.actor_id
  JOIN movies
    ON casting.movie_id = movies.id
 WHERE movies.title LIKE '%Lord of the Rings%' OR
       movies.title LIKE '%Batman%' OR
       movies.title LIKE '%The Dark Knight%'
 GROUP BY trilogy, actors.name;
-- SQL 查詢語法終點
"""

## 49. 從 `nba` 資料庫查詢截至 2021-03-31 的得分王（生涯場均得分 `ppg` 最高）、助攻王（生涯場均助攻 `apg` 最高）、籃板王（生涯場均籃板 `rpg` 最高）、抄截王（生涯場均抄截 `spg` 最高）以及阻攻王（生涯場均阻攻 `bpg` 最高），參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(6, 4) 的查詢結果。

```
  firstName   lastName category  value
0     Andre   Drummond      rpg   13.8
1   Anthony      Davis      bpg    2.4
2     Chris       Paul      apg    9.4
3     Chris       Paul      spg    2.2
4    Hassan  Whiteside      bpg    2.4
5     Kevin     Durant      ppg   27.1
```

In [5]:
find_max_stats_per_game_from_nba =\
"""
-- SQL 查詢語法起點
SELECT players.firstName,
       players.lastName,
       'ppg' AS category,
       max_stats.ppg AS value
  FROM players
  JOIN (SELECT personId,
               ppg
          FROM career_summaries
         WHERE ppg = (SELECT MAX(ppg)
                        FROM career_summaries)) AS max_stats
    ON players.personId = max_stats.personId
 UNION
SELECT players.firstName,
       players.lastName,
       'apg' AS category,
       max_stats.apg AS value
  FROM players
  JOIN (SELECT personId,
               apg
          FROM career_summaries
         WHERE apg = (SELECT MAX(apg)
                        FROM career_summaries)) AS max_stats
    ON players.personId = max_stats.personId
 UNION
SELECT players.firstName,
       players.lastName,
       'rpg' AS category,
       max_stats.rpg AS value
  FROM players
  JOIN (SELECT personId,
               rpg
          FROM career_summaries
         WHERE rpg = (SELECT MAX(rpg)
                        FROM career_summaries)) AS max_stats
    ON players.personId = max_stats.personId
 UNION
SELECT players.firstName,
       players.lastName,
       'spg' AS category,
       max_stats.spg AS value
  FROM players
  JOIN (SELECT personId,
               spg
          FROM career_summaries
         WHERE spg = (SELECT MAX(spg)
                        FROM career_summaries)) AS max_stats
    ON players.personId = max_stats.personId
 UNION
SELECT players.firstName,
       players.lastName,
       'bpg' AS category,
       max_stats.bpg AS value
  FROM players
  JOIN (SELECT personId,
               bpg
          FROM career_summaries
         WHERE bpg = (SELECT MAX(bpg)
                        FROM career_summaries)) AS max_stats
    ON players.personId = max_stats.personId;
-- SQL 查詢語法終點
"""

## 50. 從 `twElection2020` 資料庫查詢三組總統候選人在各縣市的得票數，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(22, 4) 的查詢結果。

```
   county  soong_yu_votes  han_chang_votes  tsai_lai_votes
0     南投縣           13315           133791          152046
1     嘉義市            6204            56269           99265
2     嘉義縣           11138            98810          197342
3     基隆市           11878            99360          114966
4     宜蘭縣           10739            90010          173657
5     屏東縣           14021           179353          317676
6     彰化縣           35060           291835          436336
7     新北市          112620           959631         1393936
8     新竹市           14103           102725          144274
9     新竹縣           18435           154224          152380
10    桃園市           63132           529749          718260
11    澎湖縣            2583            20911           27410
12    臺中市           84800           646366          967304
13    臺北市           70769           685830          875854
14    臺南市           41075           339702          786471
15    臺東縣            4163            67413           44092
16    花蓮縣            6869           111834           66509
17    苗栗縣           15222           164345          147034
18    連江縣             188             4776            1226
19    金門縣            1636            35948           10456
20    雲林縣           15331           138341          246116
21    高雄市           55309           610896         1097621
```

In [6]:
summarize_presidential_votes_from_twelection2020 =\
"""
-- SQL 查詢語法起點
SELECT soong_yu.county,
       soong_yu.soong_yu_votes,
       han_chang.han_chang_votes,
       tsai_lai.tsai_lai_votes
  FROM (SELECT admin_regions.county,
               SUM(votes) AS soong_yu_votes
          FROM presidential
          JOIN admin_regions
            ON presidential.admin_region_id = admin_regions.id
         WHERE presidential.candidate_id = 1
         GROUP BY admin_regions.county) AS soong_yu
  JOIN (SELECT admin_regions.county,
               SUM(votes) AS han_chang_votes
          FROM presidential
          JOIN admin_regions
            ON presidential.admin_region_id = admin_regions.id
         WHERE presidential.candidate_id = 2
         GROUP BY admin_regions.county) AS han_chang
    ON soong_yu.county = han_chang.county
  JOIN (SELECT admin_regions.county,
               SUM(votes) AS tsai_lai_votes
          FROM presidential
          JOIN admin_regions
            ON presidential.admin_region_id = admin_regions.id
         WHERE presidential.candidate_id = 3
         GROUP BY admin_regions.county) AS tsai_lai
    ON soong_yu.county = tsai_lai.county;
-- SQL 查詢語法終點
"""

## 執行測試！

Kernel -> Restart & Run All -> Restart and Run All Cells.

In [7]:
class TestMoreExercises(unittest.TestCase):
    def test_46_find_cruise_ships_from_covid19(self):
        cruise_ships_from_covid19 = pd.read_sql(find_cruise_ships_from_covid19, conn)
        self.assertEqual(cruise_ships_from_covid19.shape, (4, 4))
        column_values = set(cruise_ships_from_covid19.iloc[:, 1].values)
        self.assertTrue('Canada' in column_values)
        self.assertTrue('US' in column_values)
        column_values = set(cruise_ships_from_covid19.iloc[:, 2].values)
        self.assertTrue('Diamond Princess' in column_values)
        self.assertTrue('Grand Princess' in column_values)
    def test_47_summarize_confirmed_deaths_by_country_from_covid19(self):
        confirmed_deaths_by_country_from_covid19 = pd.read_sql(summarize_confirmed_deaths_by_country_from_covid19, conn)
        self.assertEqual(confirmed_deaths_by_country_from_covid19.shape, (192, 3))
    def test_48_find_two_trilogy_casting_list(self):
        two_trilogy_casting_list = pd.read_sql(find_two_trilogy_casting_list, conn)
        self.assertEqual(two_trilogy_casting_list.shape, (66, 2))
        column_values = set(two_trilogy_casting_list.iloc[:, 1].values)
        self.assertTrue('Christian Bale' in column_values)
        self.assertTrue('Heath Ledger' in column_values)
        self.assertTrue('Anne Hathaway' in column_values)
        self.assertTrue('Sean Astin' in column_values)
        self.assertTrue('Cate Blanchett' in column_values)
        self.assertTrue('Orlando Bloom' in column_values)
    def test_49_find_max_stats_per_game_from_nba(self):
        max_stats_per_game_from_nba = pd.read_sql(find_max_stats_per_game_from_nba, conn)
        self.assertEqual(max_stats_per_game_from_nba.shape, (6, 4))
        first_names = set(max_stats_per_game_from_nba.iloc[:, 0].values)
        self.assertTrue('Chris' in first_names)
        self.assertTrue('Anthony' in first_names)
        self.assertTrue('Kevin' in first_names)
        categories = set(max_stats_per_game_from_nba.iloc[:, 2].values)
        self.assertTrue('apg' in categories)
        self.assertTrue('ppg' in categories)
        self.assertTrue('spg' in categories)
        self.assertTrue('rpg' in categories)
        self.assertTrue('rpg' in categories)
    def test_50_summarize_presidential_votes_from_twelection2020(self):
        presidential_votes_from_twelection2020 = pd.read_sql(summarize_presidential_votes_from_twelection2020, conn)
        self.assertEqual(presidential_votes_from_twelection2020.shape, (22, 4))
        counties = set(presidential_votes_from_twelection2020.iloc[:, 0].values)
        self.assertTrue('臺北市' in counties)
        self.assertTrue('新北市' in counties)
        self.assertTrue('桃園市' in counties)
        self.assertTrue('臺中市' in counties)
        self.assertTrue('臺南市' in counties)
        self.assertTrue('高雄市' in counties)
        self.assertEqual(presidential_votes_from_twelection2020.iloc[:, 1].sum(), 608590)
        self.assertEqual(presidential_votes_from_twelection2020.iloc[:, 2].sum(), 5522119)
        self.assertEqual(presidential_votes_from_twelection2020.iloc[:, 3].sum(), 8170231)

suite = unittest.TestLoader().loadTestsFromTestCase(TestMoreExercises)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)
cwd = os.getcwd()
folder_name = cwd.split("/")[-1]
with open("../exercise_index.json", "r") as content:
    exercise_index = json.load(content)
chapter_name = exercise_index[folder_name]

test_46_find_cruise_ships_from_covid19 (__main__.TestMoreExercises) ... ok
test_47_summarize_confirmed_deaths_by_country_from_covid19 (__main__.TestMoreExercises) ... ok
test_48_find_two_trilogy_casting_list (__main__.TestMoreExercises) ... ok
test_49_find_max_stats_per_game_from_nba (__main__.TestMoreExercises) ... ok
test_50_summarize_presidential_votes_from_twelection2020 (__main__.TestMoreExercises) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.199s

OK


In [8]:
print("您在「{}」章節中的 {} 道 SQL 練習答對了 {} 題。".format(chapter_name, number_of_test_runs, number_of_successes))

您在「綜合練習題」章節中的 5 道 SQL 練習答對了 5 題。
