# SQL Cheat Sheet:  Sorting & Grouping
ORDER BY | GROUP BY | HAVING | 

___

## 0. Load Database

In [1]:
%reload_ext sql

In [2]:
%%sql

postgresql://localhost/films

'Connected: @films'

In [3]:
%%sql
SELECT * FROM films LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


id,title,release_year,country,duration,language,certification,gross,budget
1,Intolerance: Love's Struggle Throughout the Ages,1916,USA,123,,Not Rated,,385907
2,Over the Hill to the Poorhouse,1920,USA,110,,,3000000.0,100000
3,The Big Parade,1925,USA,151,,Not Rated,,245000


___

## 1. ORDER BY
- In SQL, the ORDER BY keyword is used to sort results in ascending or descending order according to the values of one or more columns.
- By default ORDER BY will sort in ascending order. If you want to sort the results in descending order, you can use the DESC keyword. For example,

In [4]:
%%sql
SELECT title, release_year FROM films
WHERE release_year IS NOT NULL
ORDER BY release_year DESC
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


title,release_year
10 Cloverfield Lane,2016
13 Hours,2016
A Beginner's Guide to Snuff,2016


In [23]:
%%sql
SELECT title, release_year FROM films
WHERE release_year IN(2000, 2012)
ORDER BY release_year LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


title,release_year
28 Days,2000
3 Strikes,2000
102 Dalmatians,2000


In [24]:
%%sql
SELECT * FROM films
WHERE release_year <> 2015
ORDER BY duration LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


id,title,release_year,country,duration,language,certification,gross,budget
2926,The Touch,2007,USA,7,English,,,13000.0
4098,Vessel,2012,USA,14,English,,,
2501,Wal-Mart: The High Cost of Low Price,2005,USA,20,English,Not Rated,,1500000.0


In [7]:
%%sql
SELECT title, gross FROM films
WHERE title LIKE 'M%'
ORDER BY title LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


title,gross
MacGruber,8460995
Machete,26589953
Machete Kills,7268659


In [8]:
%%sql
SELECT film_id, imdb_score FROM reviews
ORDER BY imdb_score DESC LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


film_id,imdb_score
4960,9.5
742,9.3
178,9.2


- #### Sorting Multiple Columns
- ORDER BY can also be used to sort on multiple columns. It will sort by the first column specified, then sort by the next, then the next, and so on. For example
- In the example below if two people have the same birthday it will sort them by their names. 

In [9]:
%%sql
SELECT birthdate, name FROM people
ORDER BY birthdate, name
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


birthdate,name
1837-10-10,Robert Shaw
1872-11-07,Lucille La Verne
1874-03-14,Mary Carr


___

---

## 2. GROUP BY
- Often you'll need to aggregate results. For example, you might want to count the number of male and female employees in your company
In SQL, GROUP BY allows you to group a result by one or more columns, like so:
>SELECT sex, count(*)
>FROM employees
>GROUP BY sex;

- SQL will return an error **if you try to SELECT a field that is not in your GROUP BY clause without using it to calculate some kind of value about the entire group.**
- you can combine GROUP BY with ORDER BY to group your results, calculate something about them, and then order your results

>SELECT sex, count(*)
FROM employees
GROUP BY sex
ORDER BY count DESC;

In [10]:
%%sql
-- the release year and count of films released in each year.
SELECT release_year, count(*) FROM films
GROUP BY release_year 
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


release_year,count
1954,5
1988,31
1959,3


In [26]:
%%sql
-- the release year and average duration of films per year
SELECT release_year, AVG(duration) FROM films
GROUP BY release_year
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


release_year,avg
1954,140.6
1988,107.0
1959,136.66666666666666


In [28]:
%%sql
-- the release year and max budget of films per year
SELECT release_year, MAX(budget) FROM films
GROUP BY release_year 
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


release_year,max
1954,5000000
1988,1100000000
1959,5000000


___

## 3. GROUP BY & ORDER BY
- WHERE comes before GROUP BY
- GROUP BY goes after FROM, WHERE
- HAVING comes after GROUP BY
- ORDER BY comes last
- FROM - WHERE - GROUP BY - HAVING - ORDER BY


In [30]:
%%sql
SELECT release_year, country, max(budget) AS max_budget from films
GROUP BY release_year, country
ORDER BY release_year, country
LIMIT 4;

 * postgresql://localhost/films
4 rows affected.


release_year,country,max_budget
1916,USA,385907
1920,USA,100000
1925,USA,245000
1927,Germany,6000000


In [15]:
%%sql
SELECT country, release_year, min(gross) from films
GROUP BY country, release_year
ORDER BY country, release_year
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


country,release_year,min
Afghanistan,2003,1127331
Argentina,2000,1221261
Argentina,2004,304124


___

## 4. GROUP BY & HAVING
- #### Filter Using Aggregate Function
- In SQL, aggregate functions can't be used in WHERE clauses. For example, the following query is invalid:
> SELECT release_year
FROM films
GROUP BY release_year
**WHERE COUNT(title) > 10;**

- This means that if you want to filter based on the result of an aggregate function, you need another way! That's where the HAVING clause comes in. For example,
>SELECT release_year
FROM films
GROUP BY release_year
**HAVING COUNT(title) > 10;**

- Shows only those years in which more than 10 films were released.

In [16]:
%%sql
SELECT release_year, count(title) FROM films 
GROUP BY release_year
HAVING COUNT(*) > 10
LIMIT 3;

 * postgresql://localhost/films
3 rows affected.


release_year,count
1988.0,31
,42
2008.0,225


___

## 5. WHERE, GROUP BY, HAVING


In [17]:
%%sql
SELECT release_year, 
       AVG(budget) AS avg_budget,
       AVG(gross) AS avg_gross from films
WHERE release_year > 1990
GROUP BY release_year
HAVING AVG(budget) > 60000000
ORDER BY avg_gross;

 * postgresql://localhost/films
2 rows affected.


release_year,avg_budget,avg_gross
2006,93968929.5774648,39237855.953703694
2005,70323938.23152709,41159143.2906404


___

- Get the country, average budget, and average gross take of countries that have made more than 10 films. Order the result by country name, and limit the number of results displayed to 5. You should alias the averages as avg_budget and avg_gross respectively.

In [18]:
%%sql
SELECT country, AVG(budget) as avg_budget, AVG(gross) as avg_gross
FROM films
GROUP BY country
HAVING count(*) > 10
ORDER BY country
LIMIT 5;


 * postgresql://localhost/films
5 rows affected.


country,avg_budget,avg_gross
Australia,31172110.46,40205909.57142857
Canada,14798458.71559633,22432066.680555556
China,62219000.0,14143040.736842103
Denmark,13922222.222222222,1418469.1111111112
France,30672034.615384612,16350593.578512397
