<a href="https://colab.research.google.com/github/ccwu0918/book-sqlfifty/blob/main/ch06-order-by/ch06-order-by.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SQL 的五十道練習：初學者友善的資料庫入門

> 排序查詢結果

讀者如果是資料科學的初學者，可以略過下述的程式碼；讀者如果不是資料科學的初學者，欲使用 JupyterLab 執行本章節內容，必須先執行下述程式碼載入所需模組與連接資料庫。

In [None]:
!pip install SQLAlchemy==1.4.46

In [None]:
!git clone https://github.com/ccwu0918/book-sqlfifty

In [None]:
# %LOAD sqlite3 db=../databases/imdb.db timeout=2 shared_cache=true

In [None]:
import sqlite3
import unittest
import json
import os
import numpy as np
import pandas as pd
conn = sqlite3.connect('./databases/imdb.db')
conn.execute("""ATTACH './databases/covid19.db' AS covid19""")
conn.execute("""ATTACH './databases/twElection2020.db' AS twElection2020""")
conn.execute("""ATTACH './databases/nba.db' AS nba""")
conn.execute("""ATTACH './databases/northwind.db' AS Northwind""")
conn.execute("""ATTACH './databases/Chinook_Sqlite.sqlite' AS Chinook""")

In [None]:
# %%capture
# load the SQL magic extension
# https://github.com/catherinedevlin/ipython-sql
# this extension allows us to connect to DBs and issue SQL command
%load_ext sql

# now we can use the magic extension to connect to our SQLite DB
# use %sql to write an inline SQL command
# use %%sql to write SQL commands in a cell
%sql sqlite:///databases/imdb.db

In [None]:
%%sql
ATTACH "./databases/covid19.db" AS covid19;
ATTACH "./databases/twElection2020.db" AS twElection2020;
ATTACH "./databases/nba.db" AS nba;
ATTACH "./databases/northwind.db" AS Northwind;
ATTACH "./databases/Chinook_Sqlite.sqlite" AS Chinook;

In [None]:
%%sql
SELECT sqlite_version();

## 以 `ORDER BY` 排序查詢結果

截至目前，我們撰寫的 SQL 敘述執行後獲得的查詢結果，都是依據原始資料在資料表中存放的順序呈現，像是電影的流水編號（`imdb` 資料庫的 `movies` 資料表中 `id` 欄）或者演員的流水編號（`imdb` 資料庫的 `actors` 資料表中 `id` 欄），在 SQL 敘述中加入 `ORDER BY` 保留字能指定欄位作為排序依據。

```sql
SELECT columns
  FROM table
 ORDER BY columns;
```

原本 `imdb` 資料庫的 `movies` 資料表是依照電影的流水編號由小而大排序。

In [None]:
%%sql
SELECT *
  FROM movies
 LIMIT 10;

id,title,release_year,rating,director,runtime
1,The Shawshank Redemption,1994,9.3,Frank Darabont,142
2,The Godfather,1972,9.2,Francis Ford Coppola,175
3,The Dark Knight,2008,9.0,Christopher Nolan,152
4,The Godfather Part II,1974,9.0,Francis Ford Coppola,202
5,12 Angry Men,1957,9.0,Sidney Lumet,96
6,Schindler's List,1993,9.0,Steven Spielberg,195
7,The Lord of the Rings: The Return of the King,2003,9.0,Peter Jackson,201
8,Pulp Fiction,1994,8.9,Quentin Tarantino,154
9,The Lord of the Rings: The Fellowship of the Ring,2001,8.8,Peter Jackson,178
10,"The Good, the Bad and the Ugly",1966,8.8,Sergio Leone,178


加入 `ORDER BY release_year` 能夠指定以電影的上映年份排序。

In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY release_year
 LIMIT 10;

id,title,release_year,rating,director,runtime
130,The Kid,1921,8.3,Charles Chaplin,68
193,Sherlock Jr.,1924,8.2,Buster Keaton,45
175,The Gold Rush,1925,8.2,Charles Chaplin,95
182,The General,1926,8.2,Clyde Bruckman,67
115,Metropolis,1927,8.3,Fritz Lang,153
207,The Passion of Joan of Arc,1928,8.2,Carl Theodor Dreyer,114
52,City Lights,1931,8.5,Charles Chaplin,87
97,M,1931,8.3,Fritz Lang,99
244,It Happened One Night,1934,8.1,Frank Capra,105
46,Modern Times,1936,8.5,Charles Chaplin,87


## 兩種排序方式

以 `ORDER BY` 排序查詢結果時可以採取兩種排序方式：

1. 遞增（或稱升冪）排序。
2. 遞減（或稱降冪）排序。

預設的方式為遞增（或稱升冪）排序，數值（包含整數、浮點數）會由小而大，文字會由 A 到 Z。

In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY release_year -- ascending
 LIMIT 10;

id,title,release_year,rating,director,runtime
130,The Kid,1921,8.3,Charles Chaplin,68
193,Sherlock Jr.,1924,8.2,Buster Keaton,45
175,The Gold Rush,1925,8.2,Charles Chaplin,95
182,The General,1926,8.2,Clyde Bruckman,67
115,Metropolis,1927,8.3,Fritz Lang,153
207,The Passion of Joan of Arc,1928,8.2,Carl Theodor Dreyer,114
52,City Lights,1931,8.5,Charles Chaplin,87
97,M,1931,8.3,Fritz Lang,99
244,It Happened One Night,1934,8.1,Frank Capra,105
46,Modern Times,1936,8.5,Charles Chaplin,87


In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY director -- ascending
 LIMIT 10;

id,title,release_year,rating,director,runtime
125,Like Stars on Earth,2007,8.3,Aamir Khan,165
204,Mary and Max,2009,8.1,Adam Elliot,92
20,Seven Samurai,1954,8.6,Akira Kurosawa,207
92,High and Low,1963,8.4,Akira Kurosawa,143
107,Ikiru,1952,8.3,Akira Kurosawa,143
139,Ran,1985,8.2,Akira Kurosawa,162
146,Yojimbo,1961,8.2,Akira Kurosawa,110
148,Rashomon,1950,8.2,Akira Kurosawa,88
240,Dersu Uzala,1975,8.2,Akira Kurosawa,142
235,Amores perros,2000,8.1,Alejandro G. Iñárritu,154


假如希望讓查詢結果遞減（或稱降冪）排序，數值（包含整數、浮點數）會由大而小，文字會 Z 到 A，必須在欄位名稱後加上保留字 `DESC`。

In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY release_year DESC -- descending
 LIMIT 10;

id,title,release_year,rating,director,runtime
43,Top Gun: Maverick,2022,8.6,Joseph Kosinski,130
124,Everything Everywhere All at Once,2022,8.3,Dan Kwan,139
121,Spider-Man: No Way Home,2021,8.3,Jon Watts,148
114,Hamilton,2020,8.4,Thomas Kail,160
132,The Father,2020,8.2,Florian Zeller,97
35,Parasite,2019,8.5,Bong Joon Ho,132
71,Joker,2019,8.4,Todd Phillips,122
81,Avengers: Endgame,2019,8.4,Anthony Russo,181
123,1917,2019,8.2,Sam Mendes,119
198,Klaus,2019,8.1,Sergio Pablos,96


In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY director DESC -- descending
 LIMIT 10;

id,title,release_year,rating,director,runtime
211,Tokyo Story,1953,8.2,Yasujirô Ozu,136
77,The Boat,1981,8.4,Wolfgang Petersen,105
181,Ben-Hur,1959,8.1,William Wyler,212
227,The Best Years of Our Lives,1946,8.1,William Wyler,170
225,The Exorcist,1973,8.1,William Friedkin,122
190,The Grand Budapest Hotel,2014,8.1,Wes Anderson,99
122,Bicycle Thieves,1948,8.3,Vittorio De Sica,89
158,Gone with the Wind,1939,8.2,Victor Fleming,238
221,The Wizard of Oz,1939,8.1,Victor Fleming,102
38,American History X,1998,8.5,Tony Kaye,119


## 使用多個欄位排序

我們可以在 `ORDER BY` 之後指定不只一個欄位、不只一種排序方式，舉例來說，當我們以 `release_year`、`rating` 或者 `director` 排序時，會發現有一些電影的上映年份、imdb 評等以及導演是相同的，這時我們能夠再指定其他欄位排序，如此一來在「先指定」的排序相同情況下就能再依據「後指定」的欄位排序。舉例來說，先以 `release_year` 遞減排序、再依 `rating` 遞增排序、再依 `title` 遞增排序。

In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY release_year DESC, -- descending
          rating,            -- ascending
          title              -- ascending
 LIMIT 10;

id,title,release_year,rating,director,runtime
124,Everything Everywhere All at Once,2022,8.3,Dan Kwan,139
43,Top Gun: Maverick,2022,8.6,Joseph Kosinski,130
121,Spider-Man: No Way Home,2021,8.3,Jon Watts,148
132,The Father,2020,8.2,Florian Zeller,97
114,Hamilton,2020,8.4,Thomas Kail,160
214,Ford v Ferrari,2019,8.1,James Mangold,152
198,Klaus,2019,8.1,Sergio Pablos,96
123,1917,2019,8.2,Sam Mendes,119
81,Avengers: Endgame,2019,8.4,Anthony Russo,181
71,Joker,2019,8.4,Todd Phillips,122


## 指定衍生計算欄位排序

除了能夠指定資料表中的欄位，`ORDER BY` 之後也能夠指定衍生計算欄位作為排序依據，舉例來說，我們透過相除運算符 `/` 以及回傳餘數運算符 `%` 衍生運算電影長度為 `x` 小時 `y` 分鐘，再於 `ORDER BY` 敘述中指定以 `hours` 遞減排序、`minutes` 遞增排序。

In [None]:
%%sql
SELECT runtime,
       runtime / 60 AS hours,
       runtime % 60 AS minutes
  FROM movies
 ORDER BY hours DESC,
          minutes
 LIMIT 10;

runtime,hours,minutes
180,3,0
181,3,1
181,3,1
183,3,3
185,3,5
189,3,9
191,3,11
195,3,15
201,3,21
202,3,22


## `ORDER BY` 搭配 `LIMIT`

結合 `ORDER BY` 與 `LIMIT`，可以輕鬆地進行「前 `m` 低」或「前 `m` 高」的資料分析，「前 `m` 低」的資料分析可以透過預設的遞增排序與 `LIMIT m` 達成。

```sql
-- bottom m observations
SELECT columns
  FROM table
 ORDER BY columns
 LIMIT m;
```

舉例來說，找出片長前十短的電影是哪些。

In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY runtime
 LIMIT 10;

id,title,release_year,rating,director,runtime
193,Sherlock Jr.,1924,8.2,Buster Keaton,45
182,The General,1926,8.2,Clyde Bruckman,67
130,The Kid,1921,8.3,Charles Chaplin,68
226,Before Sunset,2004,8.1,Richard Linklater,80
75,Toy Story,1995,8.3,John Lasseter,81
239,Persona,1966,8.1,Ingmar Bergman,83
249,Beauty and the Beast,1991,8.0,Gary Trousdale,84
172,My Neighbor Totoro,1988,8.1,Hayao Miyazaki,86
46,Modern Times,1936,8.5,Charles Chaplin,87
52,City Lights,1931,8.5,Charles Chaplin,87


「前 `m` 高」的資料分析可以透過指定遞減排序與 `LIMIT m` 達成。

```sql
-- top m observations
SELECT columns
  FROM table
 ORDER BY columns DESC
 LIMIT m;
```

舉例來說，找出片長前十長的電影是哪些

In [None]:
%%sql
SELECT *
  FROM movies
 ORDER BY runtime DESC
 LIMIT 10;

id,title,release_year,rating,director,runtime
158,Gone with the Wind,1939,8.2,Victor Fleming,238
80,Once Upon a Time in America,1984,8.3,Sergio Leone,229
95,Lawrence of Arabia,1962,8.3,David Lean,218
181,Ben-Hur,1959,8.1,William Wyler,212
20,Seven Samurai,1954,8.6,Akira Kurosawa,207
4,The Godfather Part II,1974,9.0,Francis Ford Coppola,202
7,The Lord of the Rings: The Return of the King,2003,9.0,Peter Jackson,201
6,Schindler's List,1993,9.0,Steven Spielberg,195
247,Gandhi,1982,8.1,Richard Attenborough,191
26,The Green Mile,1999,8.6,Frank Darabont,189


## 重點統整

- 以 `ORDER BY` 排序查詢結果時可以採取兩種排序方式：
    - 預設的方式為遞增（或稱升冪）排序。
    - 遞減（或稱降冪）排序必須在欄位名稱後加上保留字 `DESC`。
- 這個章節學起來的 SQL 保留字：
    - `ORDER BY`
    - `DESC`
- 將截至目前所學的 SQL 保留字集中在一個敘述中，寫作順序必須遵從標準 SQL 的規定。

```sql
SELECT DISTINCT columns AS alias
  FROM table
 ORDER BY columns DESC
 LIMIT m;
```

## 練習題 14-16

練習題會涵蓋四個學習資料庫，記得要依據題目的需求，調整編輯器選單的學習資料庫，在自己電腦的 SQLiteStudio 寫出跟預期輸出相同的 SQL 敘述，寫作過程如果卡關了，可以參考附錄二「練習題參考解答」。

### 14. 從 `nba` 資料庫的 `career_summaries` 資料表中依據 `ppg`（Points per game，場均得分）找出場均得分最高的 10 名球員，參考下列的預期查詢結果。

預期輸出：(10, 2) 的查詢結果。

In [None]:
%%sql


personId,ppg
201142,27.2
2544,27.1
1629029,26.4
203954,26.0
1629627,25.7
1629027,25.3
201935,24.9
203081,24.6
201939,24.3
1628378,23.9


### 15. 從 `covid19` 資料庫的 `time_series` 資料表中依據 `Daily_Cases` 找出前十個單日新增確診數最多的日期，參考下列的預期查詢結果。

預期輸出：(10, 3) 的查詢結果。

In [None]:
%%sql


Date,Country_Region,Daily_Cases
2022-01-10,US,1383913
2022-01-18,US,1129521
2022-01-03,US,1044956
2022-01-24,US,922164
2022-01-19,US,908133
2022-01-14,US,880068
2022-01-07,US,869756
2022-01-13,US,861347
2022-01-12,US,848642
2022-01-31,United Kingdom,848169


### 16. 從 `nba` 資料庫的 `career_summaries` 資料表中依據 `assists`、`turnovers` 欄位以及下列公式衍生計算助攻失誤比，讓衍生計算欄位的資料類型為浮點數 REAL，找出助攻失誤比最高的前 10 名球員，參考下列的預期查詢結果。

\begin{equation}
\text{Assists Turnover Ratio} = \frac{\text{Assists}}{\text{Turnovers}}
\end{equation}

預期輸出：(10, 3) 的查詢結果。

In [None]:
%%sql


personId,assists,turnovers
1630540,41,5
1626145,1691,326
1630573,10,2
1628420,1042,218
1630200,272,59
1630580,9,2
1629162,498,120
1628221,4,1
1629875,24,6
1630602,4,1
