<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 4. Sorting and Grouping
---

This chapter provides a brief introduction to sorting and grouping your results.

In [1]:
# %pip install ipython-sql

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///data/database.db

'Connected: @data/database.db'

## ORDER BY
---

Congratulations on making it this far! You now know how to select and filter your results.

In this chapter you'll learn how to sort and group your results to gain further insight. Let's go!

In SQL, the `ORDER BY` keyword is used to sort results in ascending or descending order according to the values of one or more columns.

By default `ORDER BY` will sort in ascending order. If you want to sort the results in descending order, you can use the `DESC` keyword. For example,

`SELECT title
 FROM films
 ORDER BY release_year DESC`

gives you the titles of films sorted by release year, from newest to oldest.


## Sorting single columns
---

Now that you understand how `ORDER BY` works, give these exercises a go!

### Instructions
Get the names of people from the `people` table, sorted alphabetically.

In [4]:
%%sql

SELECT name
FROM people
ORDER BY name
LIMIT 10

 * sqlite:///data/database.db
Done.


name
50 Cent
A. Michael Baldwin
A. Raven Cruz
A.J. Buckley
A.J. DeLucia
A.J. Langer
AJ Michalka
Aaliyah
Aaron Ashmore
Aaron Hann


Get the names of people, sorted by birth date.

In [5]:
%%sql

SELECT name
FROM   people
ORDER  BY birthdate
LIMIT  10 

 * sqlite:///data/database.db
Done.


name
A. Raven Cruz
A.J. DeLucia
Aaron Hann
Aaron Hughes
Aaron Schneider
Aaron Staton
Aasheekaa Bathija
Abby Mukiibi Nkaaga
Abigail Evans
Adam Boyer


Get the birth date and name for every person, in order of when they were born.

In [6]:
%%sql

SELECT birthdate,
       name
FROM   people
ORDER  BY birthdate
LIMIT  10 

 * sqlite:///data/database.db
Done.


birthdate,name
,A. Raven Cruz
,A.J. DeLucia
,Aaron Hann
,Aaron Hughes
,Aaron Schneider
,Aaron Staton
,Aasheekaa Bathija
,Abby Mukiibi Nkaaga
,Abigail Evans
,Adam Boyer


## Sorting single columns (2)
---

Let's get some more practice with `ORDER BY`!

### Instructions
Get the title of films released in 2000 or 2012, in the order they were released.

In [7]:
%%sql

SELECT title
FROM   films
WHERE  release_year IN ( 2000, 2012 )
ORDER  BY release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


title
102 Dalmatians
28 Days
3 Strikes
Aberdeen
All the Pretty Horses
Almost Famous
American Psycho
Amores Perros
An Everlasting Piece
Anatomy


Get all details for all films except those released in 2015 and order them by duration.

In [8]:
%%sql

SELECT *
FROM   films
WHERE  release_year <> 2015
ORDER  BY duration
LIMIT  10 

 * sqlite:///data/database.db
Done.


id,title,release_year,country,duration,language,certification,gross,budget
1398,Hum To Mohabbat Karega,2000,India,,Hindi,,,
2326,Dil Jo Bhi Kahey...,2005,India,,English,,129319.0,70000000.0
2712,The Naked Ape,2006,USA,,English,,,
3208,Black Water Transit,2009,USA,,English,,,23000000.0
3504,Harry Potter and the Deathly Hallows: Part I,2010,UK,,English,,,
3552,N-Secure,2010,USA,,English,R,2592808.0,
3728,Harry Potter and the Deathly Hallows: Part II,2011,UK,,English,,,
4018,Should've Been Romeo,2012,USA,,English,,,5000000.0
4138,Barfi,2013,India,,Kannada,,,
4396,Destiny,2014,USA,,English,,,


Get the title and gross earnings for movies which begin with the letter 'M' and order the results alphabetically.

In [9]:
%%sql

SELECT title,
       gross
FROM   films
WHERE  title LIKE 'M%'
ORDER  BY title
LIMIT  10 

 * sqlite:///data/database.db
Done.


title,gross
MacGruber,8460995.0
Machete,26589953.0
Machete Kills,7268659.0
Machine Gun McCain,
Machine Gun Preacher,537580.0
Mad City,10556196.0
Mad Hot Ballroom,8044906.0
Mad Max,
Mad Max 2: The Road Warrior,9003011.0
Mad Max Beyond Thunderdome,36200000.0


## Sorting single columns (DESC)
---

To order results in descending order, you can put the keyword `DESC` after your `ORDER BY`. For example, to get all the names in the people table, in reverse alphabetical order:

`SELECT name
 FROM people
 ORDER BY name DESC`

Now practice using `ORDER BY` with `DESC` to sort single columns in descending order!

### Instructions
Get the IMDB score and film ID for every film from the reviews table, sorted from highest to lowest score.

In [10]:
%%sql

SELECT imdb_score,
       film_id
FROM   reviews
ORDER  BY imdb_score DESC
LIMIT  10 

 * sqlite:///data/database.db
Done.


imdb_score,film_id
9.5,4960
9.3,742
9.2,178
9.1,4866
9.0,3110
9.0,192
8.9,120
8.9,676
8.9,2045
8.9,723


Get the title for every film, in reverse order.

In [11]:
%%sql

SELECT title
FROM   films
ORDER  BY title DESC
LIMIT  10 

 * sqlite:///data/database.db
Done.


title
Æon Flux
xXx: State of the Union
xXx
eXistenZ
[Rec] 2
[Rec]
Zulu
Zoom
Zoolander 2
Zoolander


Get the title and duration for every film, in order of longest duration to shortest.

In [12]:
%%sql

SELECT title,
       duration
FROM   films
ORDER  BY duration DESC
LIMIT  10 

 * sqlite:///data/database.db
Done.


title,duration
Carlos,334
"Blood In, Blood Out",330
Heaven's Gate,325
The Legend of Suriyothai,300
Das Boot,293
Apocalypse Now,289
The Company,286
Gods and Generals,280
Gettysburg,271
Arn: The Knight Templar,270


## Sorting multiple columns
---

`ORDER BY` can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example,

`SELECT birthdate, name
 FROM people
 ORDER BY birthdate, name`

sorts on birth dates first (oldest to newest) and then sorts on the names in alphabetical order. **The order of columns is important!**

Try using `ORDER BY` to sort multiple columns! Remember, to specify multiple columns you separate the column names with a comma.

### Instructions
Get the birth date and name of people in the `people` table, in order of when they were born and alphabetically by name.

In [13]:
%%sql

SELECT birthdate,
       name
FROM   people
ORDER  BY birthdate,
          name
LIMIT  10 

 * sqlite:///data/database.db
Done.


birthdate,name
,A. Raven Cruz
,A.J. DeLucia
,Aaron Hann
,Aaron Hughes
,Aaron Schneider
,Aaron Staton
,Aasheekaa Bathija
,Abby Mukiibi Nkaaga
,Abigail Evans
,Adam Boyer


Get the release year, duration, and title of films ordered by their release year and duration.

In [14]:
%%sql

SELECT release_year,
       duration,
       title
FROM   films
ORDER  BY release_year,
          duration
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,duration,title
,,Wolf Creek
,22.0,"10,000 B.C."
,22.0,Anger Management
,24.0,Lovesick
,24.0,Yu-Gi-Oh! Duel Monsters
,30.0,Fired Up
,30.0,Jesse
,30.0,Meet the Browns
,30.0,The Doombolt Chase
,30.0,The Honeymooners


Get certifications, release years, and titles of films ordered by certification (alphabetically) and release year.

In [15]:
%%sql

SELECT certification,
       release_year,
       title
FROM   films
ORDER  BY certification,
          release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


certification,release_year,title
,,"10,000 B.C."
,,A Touch of Frost
,,Anger Management
,,Animal Kingdom
,,BrainDead
,,Creature
,,Deadline Gallipoli
,,Del 1 - Män som hatar kvinnor
,,Emma
,,Fired Up


Get the names and birthdates of people ordered by name and birth date.

In [16]:
%%sql

SELECT name,
       birthdate
FROM   people
ORDER  BY name,
          birthdate
LIMIT  10 

 * sqlite:///data/database.db
Done.


name,birthdate
50 Cent,1975-07-06
A. Michael Baldwin,1963-04-04
A. Raven Cruz,
A.J. Buckley,1978-02-09
A.J. DeLucia,
A.J. Langer,1974-05-22
AJ Michalka,1991-04-10
Aaliyah,1979-01-16
Aaron Ashmore,1979-10-07
Aaron Hann,


## GROUP BY 
---

Now you know how to sort results! Often you'll need to aggregate results. For example, you might want to count the number of male and female employees in your company. Here, what you want is to group all the males together and count them, and group all the females together and count them. In SQL, `GROUP BY` allows you to group a result by one or more columns, like so:

`SELECT sex, COUNT(*)
 FROM employees
 GROUP BY sex`

Commonly, `GROUP BY` is used with aggregate functions like `COUNT()` or `MAX()`. Note that `GROUP BY` always goes after the FROM clause!

## GROUP BY practice
---

As you've just seen, combining aggregate functions with `GROUP BY` can yield some powerful results!

A word of warning: SQL will return an error if you try to `SELECT` a field that is not in your `GROUP BY` clause without using it to calculate some kind of value about the entire group.

Note that you can combine `GROUP BY` with `ORDER BY` to group your results, calculate something about them, and then order your results. For example,

`SELECT sex, COUNT(*)
 FROM employees
 GROUP BY sex
 ORDER BY count DESC`

### Instructions
Get the release year and count of films released in each year.

In [17]:
%%sql

SELECT release_year,
       COUNT(*)
FROM   films
GROUP  BY release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,COUNT(*)
,42
1916.0,1
1920.0,1
1925.0,1
1927.0,1
1929.0,2
1930.0,1
1932.0,1
1933.0,2
1934.0,1


Get the release year and average duration of all films, grouped by release year.

In [18]:
%%sql

SELECT release_year,
       AVG( duration )
FROM   films
GROUP  BY release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,AVG( duration )
,77.4390243902439
1916.0,123.0
1920.0,110.0
1925.0,151.0
1927.0,145.0
1929.0,105.0
1930.0,96.0
1932.0,79.0
1933.0,77.5
1934.0,65.0


Get the release year and largest budget for all films, grouped by release year.

In [19]:
%%sql

SELECT release_year,
       MAX( budget )
FROM   films
GROUP  BY release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,MAX( budget )
,15000000
1916.0,385907
1920.0,100000
1925.0,245000
1927.0,6000000
1929.0,379000
1930.0,3950000
1932.0,800000
1933.0,439000
1934.0,325000


Get the IMDB score and count of film reviews grouped by IMDB score in the `reviews` table.

In [20]:
%%sql

SELECT imdb_score,
       COUNT(*)
FROM   reviews
GROUP  BY imdb_score
LIMIT  10 

 * sqlite:///data/database.db
Done.


imdb_score,COUNT(*)
1.6,1
1.7,1
1.9,3
2.0,2
2.1,3
2.2,3
2.3,3
2.4,2
2.5,2
2.6,2


## GROUP BY practice (2)
---

Now practice your new skills by combining `GROUP BY` and `ORDER BY` with some more aggregate functions!

Make sure to always put the `ORDER BY` clause at the end of your query. You can't sort values that you haven't calculated yet!

### Instructions
Get the release year and lowest gross earnings per release year.

In [21]:
%%sql

SELECT release_year,
       MIN( gross )
FROM   films
GROUP  BY release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,MIN( gross )
,145118.0
1916.0,
1920.0,3000000.0
1925.0,
1927.0,26435.0
1929.0,9950.0
1930.0,
1932.0,
1933.0,2300000.0
1934.0,


Get the language and total gross amount films in each language made.

In [22]:
%%sql

SELECT language,
       SUM( gross )
FROM   films
GROUP  BY language
LIMIT  10 

 * sqlite:///data/database.db
Done.


language,SUM( gross )
,4319281
Aboriginal,78680789
Arabic,1681831
Aramaic,499263
Bosnian,301305
Cantonese,64294253
Chinese,50000
Czech,617228
Danish,2403857
Dari,16925238


Get the country and total budget spent making movies in each country.

In [23]:
%%sql

SELECT country,
       SUM( budget )
FROM   films
GROUP  BY country
LIMIT  10 

 * sqlite:///data/database.db
Done.


country,SUM( budget )
,3500000.0
Afghanistan,46000.0
Argentina,5700000.0
Aruba,35000000.0
Australia,1558605523.0
Bahamas,5000000.0
Belgium,49000000.0
Brazil,29200000.0
Bulgaria,7000000.0
Cambodia,


Get the release year, country, and highest budget spent making a film for each year, for each country. Sort your results by release year and country.

In [24]:
%%sql

SELECT release_year,
       country,
       MAX( budget )
FROM   films
GROUP  BY release_year,
          country
ORDER  BY release_year,
          country
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,country,MAX( budget )
,,
,Australia,15000000.0
,Canada,
,France,
,Iceland,
,Japan,
,Poland,
,Sweden,
,UK,
,USA,5000000.0


Get the country, release year, and lowest amount grossed per release year per country. Order your results by country and release year.

In [25]:
%%sql

SELECT country,
       release_year,
       MIN( gross )
FROM   films
GROUP  BY release_year,
          country
ORDER  BY country,
          release_year
LIMIT  10 

 * sqlite:///data/database.db
Done.


country,release_year,MIN( gross )
,,
,2014.0,
Afghanistan,2003.0,1127331.0
Argentina,2000.0,1221261.0
Argentina,2004.0,304124.0
Argentina,2009.0,20167424.0
Aruba,1998.0,10076136.0
Australia,,
Australia,1979.0,
Australia,1981.0,9003011.0


## HAVING a great time
---

In SQL, aggregate functions can't be used in `WHERE` clauses. For example, the following query is invalid:

`SELECT release_year
 FROM films
 GROUP BY release_year
 WHERE COUNT( title ) > 10`

This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the `HAVING` clause comes in. For example,

`SELECT release_year
 FROM films
 GROUP BY release_year
 HAVING COUNT( title ) > 10`

shows only those years in which more than 10 films were released.

In how many different years were more than 200 movies released?

In [26]:
%%sql

SELECT release_year
FROM   films
GROUP  BY release_year
HAVING COUNT( title ) > 200 

 * sqlite:///data/database.db
Done.


release_year
2002
2004
2005
2006
2007
2008
2009
2010
2011
2012


## All together now
---

Time to practice using `ORDER BY`, `GROUP BY` and `HAVING` together.

Now you're going to write a query that returns the average budget and average gross earnings for films in each year after 1990, if the average budget is greater than $60 million.

This is going to be a big query, but you can handle it!

### Instructions
Get the release year, budget and gross earnings for each film in the `films` table.

In [27]:
%%sql

SELECT release_year,
       budget,
       gross
FROM   films
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,budget,gross
1916,385907.0,
1920,100000.0,3000000.0
1925,245000.0,
1927,6000000.0,26435.0
1929,,9950.0
1929,379000.0,2808000.0
1930,3950000.0,
1932,800000.0,
1933,439000.0,2300000.0
1933,200000.0,


Modify your query so that only records with a `release_year` after 1990 are included.

In [28]:
%%sql

SELECT release_year,
       budget,
       gross
FROM   films
WHERE  release_year > 1990
LIMIT  10 

 * sqlite:///data/database.db
Done.


release_year,budget,gross
1991,6000000,869325
1991,20000000,38037513
1991,6000000,57504069
1991,35000000,79100000
1991,15000000,30102717
1991,35000000,14587732
1991,8500000,34872293
1991,23000000,7434726
1991,70000000,119654900
1991,5000000,19281235


Remove the budget and gross columns, and group your results by release year.

In [29]:
%%sql

SELECT release_year
FROM   films
GROUP  BY release_year
HAVING release_year > 1990 

 * sqlite:///data/database.db
Done.


release_year
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000


Modify your query to include the average budget and average gross earnings for the results you have so far. Alias the average budget as `avg_budget`; alias the average gross earnings as `avg_gross`.

In [30]:
%%sql

SELECT release_year,
       AVG( budget ) AS avg_budget,
       AVG( gross )  AS avg_gross
FROM   films
WHERE  release_year > 1990
GROUP  BY release_year

 * sqlite:///data/database.db
Done.


release_year,avg_budget,avg_gross
1991,25176548.387096774,53844501.66666666
1992,25982030.303030305,63665195.14705882
1993,20729787.23404255,45302091.41304348
1994,29013773.58490566,59395666.16981132
1995,32775000.0,44909519.98550725
1996,31620612.24489796,42044174.25263158
1997,59424490.74074074,44793772.43103448
1998,40460000.0,38377007.96124031
1999,38981780.48780488,38072176.27710843
2000,34931375.75757576,42172627.58083832


Modify your query so that only years with an average budget of greater than $60 million are included.

In [31]:
%%sql

SELECT release_year,
       AVG( budget ) AS avg_budget,
       AVG( gross )  AS avg_gross
FROM   films
WHERE  release_year > 1990
GROUP  BY release_year
HAVING AVG( budget ) > 60000000 

 * sqlite:///data/database.db
Done.


release_year,avg_budget,avg_gross
2005,70323938.23152709,41159143.29064039
2006,93968929.5774648,39237855.9537037


Finally, modify your query to order the results from highest average gross earnings to lowest.

In [32]:
%%sql

SELECT release_year,
       AVG( budget ) AS avg_budget,
       AVG( gross )  AS avg_gross
FROM   films
WHERE  release_year > 1990
GROUP  BY release_year
HAVING AVG( budget ) > 60000000
ORDER  BY AVG( gross ) DESC 

 * sqlite:///data/database.db
Done.


release_year,avg_budget,avg_gross
2005,70323938.23152709,41159143.29064039
2006,93968929.5774648,39237855.9537037


## All together now (2)
---

Great work! Now try another large query. This time, all in one go!

Remember, if you only want to return a certain number of results, you can use the `LIMIT` keyword to limit the number of rows returned.

### Instructions
Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as `avg_budget` and `avg_gross` respectively.

In [33]:
%%sql

SELECT country,
       AVG( budget ) AS avg_budget,
       AVG( gross )  AS avg_gross
FROM   films
GROUP  BY country
HAVING COUNT( title ) > 10
ORDER  BY country
LIMIT  5 

 * sqlite:///data/database.db
Done.


country,avg_budget,avg_gross
Australia,31172110.46,40205909.571428575
Canada,14798458.71559633,22432066.68055556
China,62219000.0,14143040.736842103
Denmark,13922222.222222222,1418469.111111111
France,30672034.615384616,16350593.578512397


## A taste of things to come
---

Congrats on making it to the end of the course! By now you should have a good understanding of the basics of SQL.

There's one more concept we're going to introduce. You may have noticed that all your results so far have been from just one table, e.g., **films** or **people**.

In the real world however, you will often want to query multiple tables. For example, what if you want to see the IMDB score for a particular movie?

In this case, you'd want to get the ID of the movie from the **films** table and then use it to get IMDB information from the **reviews** table. In SQL, this concept is known as a join, and a basic join is shown in the editor to the right.

The query in the editor gets the IMDB score for the film To Kill a Mockingbird! Cool right?

As you can see, joins are incredibly useful and important to understand for anyone using SQL.

In [34]:
%%sql

SELECT title,
       imdb_score
FROM   films
       JOIN reviews
         ON films.id = reviews.film_id
WHERE  title = 'To Kill a Mockingbird' 

 * sqlite:///data/database.db
Done.


title,imdb_score
To Kill a Mockingbird,8.4
