# SQL 的五十道練習

> 包含子查詢的結構 

[數據交點](https://www.datainpoint.com) | 郭耀仁 <yaojenkuo@datainpoint.com>

In [1]:
import sqlite3
import unittest
import numpy as np
import pandas as pd
conn = sqlite3.connect('../databases/nba.db')
conn.execute("""ATTACH '../databases/covid19.db' AS covid19""")
conn.execute("""ATTACH '../databases/twElection2020.db' AS twElection2020""")
conn.execute("""ATTACH '../databases/imdb.db' AS imdb""")

<sqlite3.Cursor at 0x7f9973380650>

## 從球員的基本資料表 `players` 中找出聯盟身高最高與最矮的球員是誰。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(5, 3) 的查詢結果。

```
  firstName  lastName  heightMeters
0     Tacko      Fall          2.26
1     Jared    Harper          1.78
2   Tremont    Waters          1.78
3    Markus    Howard          1.78
4   Facundo  Campazzo          1.78
```

In [2]:
find_tallest_shortest_players =\
"""
-- SQL 查詢語法起點
SELECT firstName,
       lastName,
       heightMeters
  FROM players
 WHERE heightMeters = (SELECT MAX(heightMeters) FROM players) OR
       heightMeters = (SELECT MIN(heightMeters) FROM players);
-- SQL 查詢語法終點
"""

## 從球員的基本資料表 `players` 中以 `country` 作分組，計算球員的國籍佔比，顯示前六高的國家。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(6, 2) 的查詢結果。

```
     country  player_percentage
0        USA           0.769697
1     Canada           0.036364
2     France           0.022222
3  Australia           0.018182
4    Germany           0.012121
5     Serbia           0.012121
```

In [3]:
calculate_top_six_player_percentage =\
"""
-- SQL 查詢語法起點
SELECT country,
       CAST(COUNT(*) AS REAL) / (SELECT COUNT(*) 
                                   FROM players) AS player_percentage
  FROM players
 GROUP BY country
 ORDER BY player_percentage DESC
 LIMIT 6;
-- SQL 查詢語法終點
"""

## 從球員的基本資料表 `players` 與球隊的基本資料表 `teams` 中找出目前洛杉磯湖人隊的球員名單。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(16, 2) 的查詢結果。

```
     firstName       lastName
0       LeBron          James
1        Jared         Dudley
2         Marc          Gasol
3       Wesley       Matthews
4     Markieff         Morris
5      Anthony          Davis
6       Dennis       Schroder
7   Kentavious  Caldwell-Pope
8     Montrezl        Harrell
9        Quinn           Cook
10        Alex         Caruso
11     Alfonzo       McKinnie
12        Kyle          Kuzma
13      Kostas  Antetokounmpo
14       Talen  Horton-Tucker
15    Devontae          Cacok
```

In [4]:
find_los_angelas_lakers_roster =\
"""
-- SQL 查詢語法起點
SELECT firstName,
       lastName
  FROM players
 WHERE teamId = (SELECT teamId
                   FROM teams
                  WHERE nickname = 'Lakers');
-- SQL 查詢語法終點
"""

## 從台灣 2020 不分區立委的各投票所得票明細 `legislative_at_large` 資料表中計算政黨得票率，並依照得票率遞減排序。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(19, 2) 的查詢結果。

```
      party  votes_percentage
0     民主進步黨          0.339774
1     中國國民黨          0.333578
2     台灣民眾黨          0.112203
3      時代力量          0.077549
4       親民黨          0.036647
5      台灣基進          0.031588
6        綠黨          0.024115
7        新黨          0.010408
8   一邊一國行動黨          0.010142
9      安定力量          0.006678
10   台灣團結聯盟          0.003562
11   國會政黨聯盟          0.002848
12  中華統一促進黨          0.002328
13     宗教聯盟          0.002198
14    喜樂島聯盟          0.002071
15      勞動黨          0.001408
16   合一行動聯盟          0.001237
17     台灣維新          0.000844
18      台澎黨          0.000825
```

In [5]:
calculate_party_votes_percentage =\
"""
-- SQL 查詢語法起點
SELECT party,
       CAST(SUM(votes) AS REAL) / (SELECT SUM(votes) FROM legislative_at_large) AS votes_percentage
  FROM legislative_at_large
 GROUP BY party
 ORDER BY votes_percentage DESC;
-- SQL 查詢語法終點
"""

## 從台灣 2020 總統副總統的各投票所得票明細 `presidential` 資料表中計算各組候選人的得票率，利用 `||` 文字運算符將百分比符號 `%` 加入至查詢結果中。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(3, 3) 的查詢結果。

```
   number candidate votes_percentage
0       1    宋楚瑜/余湘            4.26%
1       2   韓國瑜/張善政           38.61%
2       3   蔡英文/賴清德           57.13%
```

In [6]:
calculate_president_votes_percentage =\
"""
-- SQL 查詢語法起點
SELECT number,
       candidate,
       CAST(ROUND(CAST(SUM(votes) AS REAL) / (SELECT SUM(votes) FROM presidential), 4) * 100 AS TEXT) || '%' AS votes_percentage
  FROM presidential
 GROUP BY number;
-- SQL 查詢語法終點
"""

## 執行測試！

Kernel -> Restart & Run All.

In [7]:
class TestSubQuery(unittest.TestCase):
    def test_01_find_tallest_shortest_players(self):
        tallest_shortest_players = pd.read_sql(find_tallest_shortest_players, conn)
        self.assertEqual(tallest_shortest_players.shape, (5, 3))
        np.testing.assert_equal(tallest_shortest_players['heightMeters'].values,
                               np.array([2.26, 1.78, 1.78, 1.78, 1.78]))
    def test_02_calculate_top_six_player_percentage(self):
        top_six_player_percentage = pd.read_sql(calculate_top_six_player_percentage, conn)
        self.assertEqual(top_six_player_percentage.shape, (6, 2))
        np.testing.assert_equal(top_six_player_percentage['country'].values,
                               np.array(['USA', 'Canada', 'France', 'Australia', 'Germany', 'Serbia']))
    def test_03_find_los_angelas_lakers_roster(self):
        los_angelas_lakers_roster = pd.read_sql(find_los_angelas_lakers_roster, conn)
        self.assertEqual(los_angelas_lakers_roster.shape, (16, 2))
        self.assertEqual(np.isin(los_angelas_lakers_roster['firstName'].values, ['LeBron', 'Anthony']).sum(), 2)
        self.assertEqual(np.isin(los_angelas_lakers_roster['lastName'].values, ['James', 'Davis']).sum(), 2)
    def test_04_calculate_party_votes_percentage(self):
        party_votes_percentage = pd.read_sql(calculate_party_votes_percentage, conn)
        self.assertEqual(party_votes_percentage.shape, (19, 2))
        np.testing.assert_equal(party_votes_percentage['party'].values[:5], 
                               np.array(['民主進步黨', '中國國民黨', '台灣民眾黨', '時代力量', '親民黨']))
    def test_05_calculate_president_votes_percentage(self):
        president_votes_percentage = pd.read_sql(calculate_president_votes_percentage, conn)
        self.assertEqual(president_votes_percentage.shape, (3, 3))
        np.testing.assert_equal(president_votes_percentage['votes_percentage'].values, 
                               np.array(['4.26%', '38.61%', '57.13%']))
        
suite = unittest.TestLoader().loadTestsFromTestCase(TestSubQuery)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)

test_01_find_tallest_shortest_players (__main__.TestSubQuery) ... ok
test_02_calculate_top_six_player_percentage (__main__.TestSubQuery) ... ok
test_03_find_los_angelas_lakers_roster (__main__.TestSubQuery) ... ok
test_04_calculate_party_votes_percentage (__main__.TestSubQuery) ... ok
test_05_calculate_president_votes_percentage (__main__.TestSubQuery) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.423s

OK


In [8]:
print("您在 {} 道 SQL 練習中答對了 {} 題。".format(number_of_test_runs, number_of_successes))

您在 5 道 SQL 練習中答對了 5 題。
