# SQL 基礎

> 練習題

[數據交點](https://www.datainpoint.com/) | 郭耀仁 <yaojenkuo@datainpoint.com>

In [1]:
import sqlite3
import unittest
import numpy as np
import pandas as pd
conn = sqlite3.connect('imdb.db')

## 練習題指引

- 這 10 道練習題是從 [SQL 的五十道練習](https://hahow.in/cr/sqlfifty)中所挑出與 `imdb.db` 相關的題目。
- 在 SQL 語法起點與 SQL 語法終點這兩個單行註解之間撰寫能夠得到預期結果的 SQL。
- 執行測試的方式為點選上方選單的 Kernel -> Restart & Run All -> Restart and Run All Cells。
- 可以每寫一題就執行測試，也可以全部寫完再執行測試。
- 練習題閒置超過 10 分鐘會自動斷線，這時只要重新點選練習題連結即可重新啟動。

## 01. 從 `imdb` 資料庫的 `actors` 資料表將 Tom Hanks、Christian Bale、Leonardo DiCaprio 篩選出來，參考下列的預期查詢結果。

PS Tom Hanks 是一位著名的美國男演員及電視製片人，以演技精湛而著稱。他是歷史上第2位連續兩屆獲得奧斯卡金像獎最佳男主角獎的演員，亦是最年輕獲得美國電影學會終身成就獎的演員。Christian Bale 是一名英國男演員和電影製片人，在蝙蝠俠三部曲中飾演 Bruce Wayne 獲得了廣泛讚揚及商業認可。Leonardo DiCaprio 是一位美國著名男演員、電影製片人兼環保概念的推動者，出演了由史詩愛情片鐵達尼號知名度大開。
Source: Wikipedia

- 預期輸入：SQL 查詢語法。
- 預期輸出：(3, 2) 的查詢結果。

```
     id               name
0   502     Christian Bale
1  1773  Leonardo DiCaprio
2  2865          Tom Hanks
```

In [2]:
filter_three_male_actors_from_actors =\
"""
-- SQL 查詢語法起點
SELECT *
  FROM actors
 WHERE name IN ('Tom Hanks', 'Christian Bale', 'Leonardo DiCaprio');
-- SQL 查詢語法終點
"""

## 02. 從 `imdb` 資料庫的 `movies` 資料表篩選出由 Christopher Nolan 或 Peter Jackson 所導演的電影，參考下列的預期查詢結果。

PS Christopher Nolan 是一名英國導演、編劇及監製，他的十部電影在全球獲得超過 47 億美元的票房，執導著名電影包含「黑暗騎士三部曲」、全面啟動、星際效應及敦克爾克大行動；Peter Jackson 是一名紐西蘭導演、編劇及監製，執導最出名的作品是「魔戒電影三部曲」與「哈比人電影系列」。
Source: Wikipedia

- 預期輸入：SQL 查詢語法。
- 預期輸出：(10, 2) 的查詢結果。

```
                                               title           director
0                                    The Dark Knight  Christopher Nolan
1                                          Inception  Christopher Nolan
2                                       Interstellar  Christopher Nolan
3                                       The Prestige  Christopher Nolan
4                                            Memento  Christopher Nolan
5                              The Dark Knight Rises  Christopher Nolan
6                                      Batman Begins  Christopher Nolan
7      The Lord of the Rings: The Return of the King      Peter Jackson
8  The Lord of the Rings: The Fellowship of the Ring      Peter Jackson
9              The Lord of the Rings: The Two Towers      Peter Jackson
```

In [3]:
filter_directed_by_two_directors_from_movies =\
"""
-- SQL 查詢語法起點
SELECT title,
       director
  FROM movies
 WHERE director IN ('Christopher Nolan', 'Peter Jackson')
 ORDER BY director;
-- SQL 查詢語法終點
"""

## 03. 從 `imdb` 資料庫的 `movies` 資料表將上映年份為 1994 的電影篩選出來，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(5, 4) 的查詢結果。

```
                      title  rating           director  runtime
0  The Shawshank Redemption     9.3     Frank Darabont      142
1              Pulp Fiction     8.9  Quentin Tarantino      154
2              Forrest Gump     8.8    Robert Zemeckis      142
3    Léon: The Professional     8.5         Luc Besson      110
4             The Lion King     8.5       Roger Allers       88
```

In [4]:
filter_year_1994_from_movies =\
"""
-- SQL 查詢語法起點
SELECT title,
       rating,
       director,
       runtime
  FROM movies
 WHERE release_year = 1994;
-- SQL 查詢語法終點
"""

## 04. 從 `imdb` 資料庫的 `movies` 資料表將評等超過 8.7（`>8.7`）的電影分類為 `'Awesome'`、將評等超過 8.4（`>8.4`）的電影分類為 `'Terrific'`，再將其餘的電影分類為 `'Great'`，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(250, 3) 的查詢結果。

```
                                              title  rating rating_category
0                          The Shawshank Redemption     9.3         Awesome
1                                     The Godfather     9.2         Awesome
2                            The Godfather: Part II     9.0         Awesome
3                                   The Dark Knight     9.0         Awesome
4                                      12 Angry Men     9.0         Awesome
..                                              ...     ...             ...
245  Neon Genesis Evangelion: The End of Evangelion     8.1           Great
246                              7 Kogustaki Mucize     8.2           Great
247                                      Tangerines     8.2           Great
248                                        Drishyam     8.2           Great
249                                          Swades     8.2           Great

[250 rows x 3 columns]
```

In [5]:
case_rating_category_from_movies =\
"""
-- SQL 查詢語法起點
SELECT title,
       rating,
       CASE WHEN rating > 8.7 THEN 'Awesome'
            WHEN rating > 8.4 THEN 'Terrific'
            ELSE 'Great' END AS rating_category
  FROM movies;
-- SQL 查詢語法終點
"""

## 05. 從 `imdb` 資料庫的 `movies` 資料表計算每一年有幾部在 IMDb.com 獲得高評等的經典電影，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(85, 2) 的查詢結果。

```
    release_year  number_of_movies
0           1921                 1
1           1924                 1
2           1925                 1
3           1926                 1
4           1927                 1
..           ...               ...
80          2017                 3
81          2018                 6
82          2019                 8
83          2020                 2
84          2021                 1

[85 rows x 2 columns]
```

In [6]:
count_number_of_movies_by_year_from_movies =\
"""
-- SQL 查詢語法起點
SELECT release_year,
       COUNT(*) AS number_of_movies
  FROM movies
 GROUP BY release_year;
-- SQL 查詢語法終點
"""

## 06. 從 `imdb` 資料庫的 `movies` 資料表計算每一年有幾部在 IMDb.com 獲得高評等的經典電影，只顯示電影數在 5 部以上（`>= 5`）的年份，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(19, 2) 的查詢結果。

```
    release_year  number_of_movies
0           1957                 6
1           1988                 5
2           1994                 5
3           1995                 8
4           1997                 6
5           1998                 5
6           1999                 5
7           2000                 6
8           2001                 5
9           2003                 5
10          2004                 7
11          2006                 5
12          2009                 6
13          2010                 5
14          2013                 6
15          2014                 5
16          2015                 5
17          2018                 6
18          2019                 8
```

In [7]:
count_number_of_movies_by_year_having_from_movies =\
"""
-- SQL 查詢語法起點
SELECT release_year,
       COUNT(*) AS number_of_movies
  FROM movies
 GROUP BY release_year
HAVING number_of_movies >= 5
 ORDER BY release_year;
-- SQL 查詢語法終點
"""

## 07. 從 `imdb` 資料庫中查詢 Tom Hanks 與 Leonardo DiCaprio 在 IMDb.com 最高評價的 250 部電影中演出哪些電影，依據 `casting` 資料表中的 `ord` 衍生計算欄位 `is_lead_actor` 註記是否為第一主角（`ord` 若為 1 表示為第一主角），將查詢結果依 `release_year` 排序，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(12, 4) 的查詢結果。

```
    release_year                    title               name  is_lead_actor
0           1994             Forrest Gump          Tom Hanks              1
1           1995                Toy Story          Tom Hanks              1
2           1998      Saving Private Ryan          Tom Hanks              1
3           1999           The Green Mile          Tom Hanks              1
4           2002      Catch Me If You Can  Leonardo DiCaprio              1
5           2002      Catch Me If You Can          Tom Hanks              0
6           2006             The Departed  Leonardo DiCaprio              1
7           2010                Inception  Leonardo DiCaprio              1
8           2010              Toy Story 3          Tom Hanks              1
9           2010           Shutter Island  Leonardo DiCaprio              1
10          2012         Django Unchained  Leonardo DiCaprio              0
11          2013  The Wolf of Wall Street  Leonardo DiCaprio              1
```

In [8]:
list_movies_in_which_tom_and_leonardo_appeared =\
"""
-- SQL 查詢語法起點
SELECT movies.release_year,
       movies.title,
       actors.name,
       CASE WHEN casting.ord = 1 THEN 1
            ELSE 0 END AS is_lead_actor
  FROM actors
  JOIN casting
    ON actors.id = casting.actor_id
  JOIN movies
    ON casting.movie_id = movies.id
 WHERE actors.name IN ('Tom Hanks', 'Leonardo DiCaprio')
 ORDER BY movies.release_year;
-- SQL 查詢語法終點
"""

## 08. 從 `imdb` 資料庫查詢「魔戒三部曲」與「蝙蝠俠三部曲」的電影資訊與演員名單，三部曲電影系列中演員重複出演的情況是正常的，這時顯示獨一值即可，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(66, 2) 的查詢結果。

```
                          trilogy                  name
0                  Batman Trilogy         Aaron Eckhart
1                  Batman Trilogy          Aidan Gillen
2                  Batman Trilogy        Alon Aboutboul
3                  Batman Trilogy         Anne Hathaway
4                  Batman Trilogy  Anthony Michael Hall
..                            ...                   ...
61  The Lord of the Rings Trilogy         Sadwyn Brophy
62  The Lord of the Rings Trilogy            Sala Baker
63  The Lord of the Rings Trilogy            Sam Comery
64  The Lord of the Rings Trilogy            Sean Astin
65  The Lord of the Rings Trilogy             Sean Bean

[66 rows x 2 columns]
```

In [9]:
find_two_trilogy_casting_list =\
"""
-- SQL 查詢語法起點
SELECT CASE WHEN movies.title LIKE '%Lord of the Rings%' THEN 'The Lord of the Rings Trilogy'
            ELSE 'Batman Trilogy' END AS trilogy,
       actors.name
  FROM actors
  JOIN casting
    ON actors.id = casting.actor_id
  JOIN movies
    ON casting.movie_id = movies.id
 WHERE movies.title LIKE '%Lord of the Rings%' OR
       movies.title LIKE '%Batman%' OR
       movies.title LIKE '%The Dark Knight%'
 GROUP BY trilogy, actors.name;
-- SQL 查詢語法終點
"""

## 09. 從 `imdb` 資料庫查詢出現最多次的導演為誰，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(3, 2) 的查詢結果。

```
            director  counts
0  Christopher Nolan       7
1    Martin Scorsese       7
2    Stanley Kubrick       7
```

In [10]:
find_most_frequent_directors_from_imdb =\
"""
-- SQL 查詢語法起點
SELECT director,
       counts
  FROM (SELECT director,
               COUNT(*) AS counts
          FROM movies
         GROUP BY director) AS director_counts
 WHERE director_counts.counts = (SELECT MAX(counts) AS max_counts
                                   FROM (SELECT director,
                                                COUNT(*) AS counts
                                           FROM movies
                                          GROUP BY director) AS director_counts);
-- SQL 查詢語法終點
"""

## 10. 從 `imdb` 資料庫查詢出現最多次的演員為誰，參考下列的預期查詢結果。

- 預期輸入：SQL 查詢語法。
- 預期輸出：(3, 2) 的查詢結果。

```
   actor_id            name  counts
0      2472  Robert De Niro       9
```

In [11]:
find_most_frequent_actors_from_imdb =\
"""
-- SQL 查詢語法起點
SELECT actor_id,
       actors.name,
       counts
  FROM (SELECT actor_id,
               COUNT(*) AS counts
          FROM casting
         GROUP BY actor_id) AS actor_counts
  JOIN actors
    ON actor_counts.actor_id = actors.id
 WHERE actor_counts.counts = (SELECT MAX(counts) AS max_counts
                                FROM (SELECT actor_id,
                                             COUNT(*) AS counts
                                        FROM casting
                                       GROUP BY actor_id) AS actor_counts);
-- SQL 查詢語法終點
"""

## 執行測試！

Kernel -> Restart & Run All -> Restart and Run All Cells.

In [12]:
class TestWeekElevenMileStone(unittest.TestCase):
    def test_01_filter_year_1994_from_movies(self):
        year_1994_from_movies = pd.read_sql(filter_year_1994_from_movies, conn)
        self.assertEqual(year_1994_from_movies.shape, (5, 4))
        column_values = set(year_1994_from_movies['title'].values)
        self.assertTrue('The Shawshank Redemption' in column_values)
        self.assertTrue('Forrest Gump' in column_values)
    def test_02_filter_three_male_actors_from_actors(self):
        three_male_actors_from_actors = pd.read_sql(filter_three_male_actors_from_actors, conn)
        self.assertEqual(three_male_actors_from_actors.shape, (3, 2))
        column_values = set(three_male_actors_from_actors['name'].values)
        self.assertTrue('Christian Bale' in column_values)
        self.assertTrue('Leonardo DiCaprio' in column_values)
        self.assertTrue('Tom Hanks' in column_values)
    def test_03_filter_directed_by_two_directors_from_movies(self):
        directed_by_two_directors_from_movies = pd.read_sql(filter_directed_by_two_directors_from_movies, conn)
        self.assertEqual(directed_by_two_directors_from_movies.shape, (10, 2))
        column_values = set(directed_by_two_directors_from_movies['director'].values)
        self.assertTrue('Christopher Nolan' in column_values)
        self.assertTrue('Peter Jackson' in column_values)
    def test_04_case_rating_category_from_movies(self):
        rating_category_from_movies = pd.read_sql(case_rating_category_from_movies, conn)
        self.assertEqual(rating_category_from_movies.shape, (250, 3))
        column_values = set(rating_category_from_movies.iloc[:, 2].values)
        self.assertTrue('Awesome' in column_values)
        self.assertTrue('Terrific' in column_values)
        self.assertTrue('Great' in column_values)
    def test_05_count_number_of_movies_by_year_from_movies(self):
        number_of_movies_by_year_from_movies = pd.read_sql(count_number_of_movies_by_year_from_movies, conn)
        self.assertEqual(number_of_movies_by_year_from_movies.shape, (85, 2))
        column_values = number_of_movies_by_year_from_movies.iloc[:, 1].values
        self.assertEqual(column_values.sum(), 250)
    def test_06_count_number_of_movies_by_year_having_from_movies(self):
        number_of_movies_by_year_having_from_movies = pd.read_sql(count_number_of_movies_by_year_having_from_movies, conn)
        self.assertEqual(number_of_movies_by_year_having_from_movies.shape, (19, 2))
        column_values = number_of_movies_by_year_having_from_movies.iloc[:, 1].values
        self.assertEqual(column_values.sum(), 109)
    def test_07_list_movies_in_which_tom_and_leonardo_appeared(self):
        movies_in_which_tom_and_leonardo_appeared = pd.read_sql(list_movies_in_which_tom_and_leonardo_appeared, conn)
        self.assertEqual(movies_in_which_tom_and_leonardo_appeared.shape, (12, 4))
        actors = set(movies_in_which_tom_and_leonardo_appeared.iloc[:, 2].values)
        self.assertTrue('Tom Hanks' in actors)
        self.assertTrue('Leonardo DiCaprio' in actors)
        titles = set(movies_in_which_tom_and_leonardo_appeared.iloc[:, 1].values)
        self.assertTrue('Forrest Gump' in titles)
        self.assertTrue('Saving Private Ryan' in titles)
        self.assertTrue('Catch Me If You Can' in titles)
        self.assertTrue('Inception' in titles)
        self.assertTrue('The Wolf of Wall Street' in titles)
    def test_08_find_two_trilogy_casting_list(self):
        two_trilogy_casting_list = pd.read_sql(find_two_trilogy_casting_list, conn)
        self.assertEqual(two_trilogy_casting_list.shape, (66, 2))
        column_values = set(two_trilogy_casting_list.iloc[:, 1].values)
        self.assertTrue('Christian Bale' in column_values)
        self.assertTrue('Heath Ledger' in column_values)
        self.assertTrue('Anne Hathaway' in column_values)
        self.assertTrue('Sean Astin' in column_values)
        self.assertTrue('Cate Blanchett' in column_values)
        self.assertTrue('Orlando Bloom' in column_values)
    def test_09_find_most_frequent_directors_from_imdb(self):
        most_frequent_directors_from_imdb = pd.read_sql(find_most_frequent_directors_from_imdb, conn)
        self.assertEqual(most_frequent_directors_from_imdb.shape, (3, 2))
        column_values = set(most_frequent_directors_from_imdb.iloc[:, 0].values)
        self.assertTrue('Christopher Nolan' in column_values)
        self.assertTrue('Martin Scorsese' in column_values)
        self.assertTrue('Stanley Kubrick' in column_values)
        column_values = set(most_frequent_directors_from_imdb.iloc[:, 1].values)
        self.assertTrue(7 in column_values)
    def test_10_find_most_frequent_actors_from_imdb(self):
        most_frequent_actors_from_imdb = pd.read_sql(find_most_frequent_actors_from_imdb, conn)
        self.assertEqual(most_frequent_actors_from_imdb.shape, (1, 3))
        column_values = set(most_frequent_actors_from_imdb.iloc[:, 1].values)
        self.assertTrue('Robert De Niro' in column_values)
        column_values = set(most_frequent_actors_from_imdb.iloc[:, 2].values)
        self.assertTrue(9 in column_values)

suite = unittest.TestLoader().loadTestsFromTestCase(TestWeekElevenMileStone)
runner = unittest.TextTestRunner(verbosity=2)
test_results = runner.run(suite)
number_of_failures = len(test_results.failures)
number_of_errors = len(test_results.errors)
number_of_test_runs = test_results.testsRun
number_of_successes = number_of_test_runs - (number_of_failures + number_of_errors)

test_01_filter_year_1994_from_movies (__main__.TestWeekElevenMileStone) ... ok
test_02_filter_three_male_actors_from_actors (__main__.TestWeekElevenMileStone) ... ok
test_03_filter_directed_by_two_directors_from_movies (__main__.TestWeekElevenMileStone) ... ok
test_04_case_rating_category_from_movies (__main__.TestWeekElevenMileStone) ... ok
test_05_count_number_of_movies_by_year_from_movies (__main__.TestWeekElevenMileStone) ... ok
test_06_count_number_of_movies_by_year_having_from_movies (__main__.TestWeekElevenMileStone) ... ok
test_07_list_movies_in_which_tom_and_leonardo_appeared (__main__.TestWeekElevenMileStone) ... ok
test_08_find_two_trilogy_casting_list (__main__.TestWeekElevenMileStone) ... ok
test_09_find_most_frequent_directors_from_imdb (__main__.TestWeekElevenMileStone) ... ok
test_10_find_most_frequent_actors_from_imdb (__main__.TestWeekElevenMileStone) ... ok

----------------------------------------------------------------------
Ran 10 tests in 0.060s

OK


In [13]:
print("您在 {} 道 SQL 練習答對了 {} 題。".format(number_of_test_runs, number_of_successes))

您在 10 道 SQL 練習答對了 10 題。
