# SQL 的五十道練習

> 函數（或稱函式）

郭耀仁 <yaojenkuo@datainpoint.com>，[數據交點](https://www.datainpoint.com)

In [1]:
import sqlite3
import unittest
import numpy as np
import pandas as pd
conn = sqlite3.connect('../databases/nba.db')
conn.execute("""ATTACH '../databases/covid19.db' AS covid19""")
conn.execute("""ATTACH '../databases/twElection2020.db' AS twElection2020""")
conn.execute("""ATTACH '../databases/imdb.db' AS imdb""")

<sqlite3.Cursor at 0x7ff240261650>

## 從 NBA 球員資料表 `players` 依據身高 `heightMeters` 與 `weightKilograms` 衍生計算欄位 `bmi`，使用 `ROUND()` function 將 `bmi` 的小數點位數調整為 2 位。

\begin{equation}
BMI = \frac{weight_{kg}}{height_{m}^2}
\end{equation}

- 預期輸入：SQL 查詢語法。
- 預期輸出：(495, 3) 的查詢結果。

```
    heightMeters weightKilograms    bmi
0           2.03           102.1  24.78
1           1.83           102.1  30.49
2           2.11           120.2  27.00
3           2.06           115.7  27.26
4           2.11           113.4  25.47
..           ...             ...    ...
490         1.96            83.9  21.84
491         2.03           106.6  25.87
492         1.85            81.6  23.84
493         2.11           108.9  24.46
494         2.13           108.9  24.00

[495 rows x 3 columns]
```

In [2]:
calculate_rounded_bmi =\
"""
-- SQL 查詢語法起點
SELECT heightMeters,
       weightKilograms,
       ROUND(weightKilograms / (heightMeters*heightMeters), 2) AS bmi
  FROM players;
-- SQL 查詢語法終點
"""

## 從 NBA 球員生涯攻守資料表 `career_summaries` 中選擇 `assists` 與 `turnovers` 衍生計算助攻失誤比 `ast_to_ratio`，使用 `CAST()` function 讓 `ast_to_ratio` 的資料類型為浮點數 `REAL`。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(495, 3) 的查詢結果。

```
    assists turnovers  ast_to_ratio
0        16        23      0.695652
1        65        28      2.321429
2       660       793      0.832282
3       746       446      1.672646
4      2020      1587      1.272842
..      ...       ...           ...
490     871       290      3.003448
491    1666      1422      1.171589
492    1368       670      2.041791
493     600       457      1.312910
494     223       206      1.082524

[495 rows x 3 columns]
```

In [3]:
calculate_ast_to_ratio =\
"""
-- SQL 查詢語法起點
SELECT assists,
       turnovers,
       CAST(assists AS REAL) / turnovers AS ast_to_ratio
  FROM career_summaries;
-- SQL 查詢語法終點
"""

## 從 NBA 球員資料表 `players` 中選擇 `firstName` 與 `lastName`，使用 `UPPER()` 以及 `LOWER()` functions 將 `firstName` 調整為全小寫、將 `lastName` 調整為全大寫。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(495, 2) 的查詢結果。

```
    lower_first_name upper_last_name
0           precious         ACHIUWA
1             jaylen           ADAMS
2             steven           ADAMS
3                bam         ADEBAYO
4           lamarcus        ALDRIDGE
..               ...             ...
490            delon          WRIGHT
491         thaddeus           YOUNG
492             trae           YOUNG
493             cody          ZELLER
494            ivica           ZUBAC

[495 rows x 2 columns]
```

In [4]:
select_lower_firstname_upper_lastname =\
"""
-- SQL 查詢語法起點
SELECT LOWER(firstName) AS lower_first_name,
       UPPER(lastName) AS upper_last_name
  FROM players;
-- SQL 查詢語法終點
"""

## 從 2020 台灣總統大選資料表 `presidential` 中利用聚合函數（或稱函式）彙總有多少人投下總統大選的選票 `total_votes`（有效票數），透過連結 `county`、`town` 與 `village` 計算台灣共有幾個獨一選舉區 `n_electoral_area`。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(1, 2) 的查詢結果。

```
   total_votes  n_electoral_area
0     14300940              7737
```

In [5]:
summarize_presidential_votes_electoral_area =\
"""
-- SQL 查詢語法起點
SELECT SUM(votes) AS total_votes,
       COUNT(DISTINCT county || town || village) AS n_electoral_area
  FROM presidential;
-- SQL 查詢語法終點
"""

## 從新冠肺炎每日報告資料表 `daily_report` 中利用聚合函數（或稱函式）彙總截至 2021-01-31 全世界有出現確診數的國家有幾個 `n_countries_affected`、全世界總確診數 `total_confirmed`、全世界總痊癒數 `total_recovered` 以及全世界總死亡數 `total_deaths`。
- 預期輸入：SQL 查詢語法。
- 預期輸出：(1, 4) 的查詢結果。

```
   n_countries_affected  total_confirmed  total_recovered  total_deaths
0                   192        102965855         57049238       2227905
```

In [6]:
summarize_daily_report_stats =\
"""
-- SQL 查詢語法起點
SELECT COUNT(DISTINCT Country_Region) AS n_countries_affected,
       SUM(Confirmed) AS total_confirmed,
       SUM(Recovered) AS total_recovered,
       SUM(Deaths) AS total_deaths
  FROM daily_report;
-- SQL 查詢語法終點
"""

## 執行測試！

Kernel -> Restart & Run All.

In [7]:
class TestFunctions(unittest.TestCase):
    def test_calculate_rounded_bmi(self):
        rounded_bmi = pd.read_sql(calculate_rounded_bmi, conn)
        self.assertEqual(rounded_bmi.shape, (495, 3))
        np.testing.assert_equal(rounded_bmi.columns.values,
                               np.array(['heightMeters', 'weightKilograms', 'bmi']))
        np.testing.assert_almost_equal(rounded_bmi['bmi'].values[:5],
                               np.array([24.78, 30.49, 27.00, 27.26, 25.47]))
        np.testing.assert_almost_equal(rounded_bmi['bmi'].values[-5:],
                               np.array([21.84, 25.87, 23.84, 24.46, 24.00]))  
    def test_calculate_ast_to_ratio(self):
        ast_to_ratio = pd.read_sql(calculate_ast_to_ratio, conn)
        self.assertEqual(ast_to_ratio.shape, (495, 3))
        np.testing.assert_equal(ast_to_ratio.columns.values,
                               np.array(['assists', 'turnovers', 'ast_to_ratio']))
        self.assertEqual(str(pd.read_sql(calculate_ast_to_ratio, conn)['ast_to_ratio'].dtype), 'float64')
        
    def test_select_lower_firstname_upper_lastname(self):
        lower_firstname_upper_lastname = pd.read_sql(select_lower_firstname_upper_lastname, conn)
        self.assertEqual(lower_firstname_upper_lastname.shape, (495, 2))
        np.testing.assert_equal(lower_firstname_upper_lastname['lower_first_name'].values[:5],
                               np.array(['precious', 'jaylen', 'steven', 'bam', 'lamarcus']))
        np.testing.assert_equal(lower_firstname_upper_lastname['upper_last_name'].values[:5],
                               np.array(['ACHIUWA', 'ADAMS', 'ADAMS', 'ADEBAYO', 'ALDRIDGE']))
        
    def test_summarize_daily_report_stats(self):
        daily_report_stats = pd.read_sql(summarize_daily_report_stats, conn)
        self.assertEqual(daily_report_stats.shape, (1, 4))
        np.testing.assert_equal(daily_report_stats.loc[0, :].values,
                               np.array([192, 102965855, 57049238, 2227905]))
        
    def test_summarize_presidential_votes_electoral_area(self):
        presidential_votes_electoral_area = pd.read_sql(summarize_presidential_votes_electoral_area, conn)
        self.assertEqual(presidential_votes_electoral_area.shape, (1, 2))
        np.testing.assert_equal(presidential_votes_electoral_area.loc[0, :].values,
                               np.array([14300940, 7737]))

suite = unittest.TestLoader().loadTestsFromTestCase(TestFunctions)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)

test_calculate_ast_to_ratio (__main__.TestFunctions) ... ok
test_calculate_rounded_bmi (__main__.TestFunctions) ... ok
test_select_lower_firstname_upper_lastname (__main__.TestFunctions) ... ok
test_summarize_daily_report_stats (__main__.TestFunctions) ... ok
test_summarize_presidential_votes_electoral_area (__main__.TestFunctions) ... ok

----------------------------------------------------------------------
Ran 5 tests in 0.233s

OK


In [8]:
print("您在 {} 道 SQL 練習中答對了 {} 題。".format(number_of_test_runs, number_of_successes))

您在 5 道 SQL 練習中答對了 5 題。
