# **SQL Notes**

## General 

### Order of Execution
SQL code is not processed in the order that it is written, the actual order of execution of the operations is:
<!-- |Operation |Description |
|--|--|
| `FROM` 		|first SQL will access the table from which data should be grabbed|
| `JOIN` 		|then SQL will access other tables that are joined with the initial table|
| `WHERE` 		|then SQL will apply a filter on the data in the (joined) table|
| `GROUP BY` 	|next SQL will group the data in the (joined) table if applicable|
| `HAVING` 	|then SQL will apply a filter on the grouped data in the (joined) table|
| [...]		|Window Functions|
| `SELECT` 	|then SQL will look at which columns from the table the user has requested|
| `DISTINCT`	 |the select operation will return a column, from which the unique values can be extracted|
| `UNION`		|then SQL can combine the resulting columns with other tables using set operations|
| `ORDER BY` 	|then SQL will sort the values in the resulting table|
| `OFFSET`		|then SQL will ignore a set of values in the resulting table|
| `LIMIT / TOP` 	|finally SQL will output part of the table| -->

- `FROM` 		first SQL will access the table from which data should be grabbed
- `JOIN` 		then SQL will access other tables that are joined with the initial table
- `WHERE` 		then SQL will apply a filter on the data in the (joined) table
- `GROUP BY` 	next SQL will group the data in the (joined) table if applicable
- `HAVING` 	then SQL will apply a filter on the grouped data in the (joined) table
- [...]		Window Functions
- `SELECT` 	then SQL will look at which columns from the table the user has requested
- `DISTINCT`	 the select operation will return a column, from which the unique values can be extracted
- `UNION`		then SQL can combine the resulting columns with other tables using set operations
- `ORDER BY` 	then SQL will sort the values in the resulting table
- `OFFSET`		then SQL will ignore a set of values in the resulting table
- `LIMIT / TOP` 	finally SQL will output part of the table


<!-- <table>
    <tr>
        <th>Operation</th>
        <th>Description</th>
    </tr>
    <tr>
        <td>`FROM`</td>
        <td>first SQL will access the table from which data should be grabbed</td>
    </tr>
    <tr>
        <td>row 2, cell 1</td>
        <td>row 2, cell 2</td>
    </tr>
</table> -->

## Filter Data

In [1]:
%load_ext sql
%sql sqlite:///cinema.db

### Where - Numeric

The `WHERE` statement can be used to filter numeric data through the following set of operators
- `=`	equals to
- `>`	greater than
- `<`	less than
- `>=`	greater than or equal to
- `<=`	less than or equal to
- `<>`	not equal to

In [2]:
%%sql
SELECT title 
FROM films
WHERE release_year = 1960

 * sqlite:///cinema.db
Done.


title
Elmer Gantry
Psycho
The Apartment


The `BETWEEN` statement can be used to filter numeric data between two values (inclusive)

In [3]:
%%sql 
SELECT title
FROM films
WHERE release_year BETWEEN 1994 AND 2000
LIMIT 5

 * sqlite:///cinema.db
Done.


title
3 Ninjas Kick Back
A Low Down Dirty Shame
Ace Ventura: Pet Detective
Baby's Day Out
Beverly Hills Cop III


When performing a `WHERE` filter for multiple entries, the IN operator may be used

In [4]:
%%sql
SELECT title 
FROM films
WHERE release_year IN (1920, 1930, 1940)
LIMIT 5

 * sqlite:///cinema.db
Done.


title
Over the Hill to the Poorhouse
Hell's Angels
Boom Town
Fantasia
Pinocchio


### Where - Text

If one is interested in filtering the data based on the entire/exact string, then the same operations for filtering numeric data can be used. 

The `LIKE` operator can be used to look for a sub-string in the string records/rows of a field/column. Here the `%` symbol is used to determine where any number of non-matching characters are allowed to be located  
- `‘The%’`	String has to start with ‘The’  
- `‘%The’`	String has to end with ‘The’  
- `‘%The%’`	String must contain ‘The’  

In [5]:
%%sql 
SELECT name
FROM people
WHERE name LIKE 'Ade%'

 * sqlite:///cinema.db
Done.


name
Adel Karam
Adelaide Kane
Aden Young


Alternatively the `_` symbol is used to determine where a *single non-matching character* is allowed to be located

In [6]:
%%sql
SELECT name
FROM people
WHERE name LIKE 'Ev_'

 * sqlite:///cinema.db
Done.


name
Eve


Similar to filtering numeric data the `IN` operator may be used to check if a string is in a list 

In [7]:
%%sql
SELECT title 
FROM films 
WHERE country IN ('Germany', 'France')
LIMIT(5)

 * sqlite:///cinema.db
Done.


title
Metropolis
Pandora's Box
The Train
Une Femme Mariée
Pierrot le Fou


### Having - Grouped

The `WHERE` operator will not work on grouped data as it is executed before the `GROUP BY` operation. Instead the `HAVING` operator can be used. 

In [8]:
%%sql 
SELECT release_year, COUNT(title) AS title_count
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10
LIMIT(5)

 * sqlite:///cinema.db
Done.


release_year,title_count
1968,11
1970,12
1971,11
1977,16
1978,16


## Functions

### Aggregate Functions

There are 5 different built in aggregate functions in SQL, namely
- `COUNT()`
- `MIN()`
- `MAX()`
- `AVG()`
- `SUM()`

The `AVG()`and `SUM()`operators can only be used on numeric data as these involve Arithmetic. Similar to the `COUNT()` operator,  `MIN()` and `MAX()` may be used for non-numeric data 


`COUNT(field_name)`	number of records/rows with a value in a field/column 		(excludes null values)  
`COUNT(*)`		number of records/rows in a table 				(includes null values)

### Arithmatic

SQL can perform basic arithmetic in the form of `+`,`-`, `*`, and `/`. It should be noted that integers as input will return integers as output; for decimal output one should use float input. 

The main difference between the aggregate functions and arithmetic is that the former performs the operation across all values in a field/column, while the latter performs the operation on the values in the records/rows.

In [9]:
%%sql
SELECT(1*3)

 * sqlite:///cinema.db
Done.


(1*3)
3


## Case

### `CASE` in `SELECT` Statement

When a new column/field needs to be created based on existing data the `CASE` operation may be used. This operation outputs a new value based on the existing data on a case-by-case basis.

In [10]:
%sql sqlite:///football.db

In [11]:
%%sql
SELECT id, home_goal, away_goal,
    CASE  
        WHEN home_goal > away_goal THEN 'Home team win'
        WHEN home_goal < away_goal THEN 'Away team win'
        ELSE 'TIE'
    END AS outcome
FROM matches
WHERE season = '2013/2014'
LIMIT(5)

   sqlite:///cinema.db
 * sqlite:///football.db
Done.


id,home_goal,away_goal,outcome
1237,2,0,Home team win
1238,0,1,Away team win
1239,1,0,Home team win
1240,0,0,TIE
1241,2,1,Home team win


### `CASE` in `WHERE` Statement

The `WHERE` operation is executed before the `SELECT` statement, as a result the alias of a `CASE` operation in the `SELECT` statement cannot be used in the where statement. Instead the entire `CASE` statement can be copy-pasted into the `WHERE` statement in order to filter by the output of the `CASE` statement. 

In [16]:
%%sql
SELECT date, season,
    (CASE 
        WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chealsea home win'
        WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 'Chealsea away win'
    END) AS outcome
FROM matches
WHERE 
    (CASE 
        WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chealsea home win'
        WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 'Chealsea away win'
    END) IS NOT NULL
LIMIT(5)
    

   sqlite:///cinema.db
 * sqlite:///football.db
Done.


date,season,outcome
2011-11-05T00:00:00.000,2011/2012,Chealsea away win
2011-11-26T00:00:00.000,2011/2012,Chealsea home win
2011-12-03T00:00:00.000,2011/2012,Chealsea away win
2011-12-12T00:00:00.000,2011/2012,Chealsea home win
2011-08-20T00:00:00.000,2011/2012,Chealsea home win


### `CASE` with Aggregate Functions

The `CASE` operation will return a column of filtered data, one can then use an aggregate function to summarize the data into a single value. 

The `COUNT` operator will count the number of non-null rows in this newly created column (so it does not actually matter what value is being returned in the column if the `CASE` operation is only used for counting). 

In [18]:
%%sql 
SELECT 
    season,
    COUNT(
        CASE 
            WHEN hometeam_id = 8650 AND home_goal > away_goal THEN id 
        END
    ) AS home_wins,
    COUNT(
        CASE 
            WHEN awayteam_id = 8650 AND away_goal > home_goal THEN id 
        END
    ) AS away_wins
FROM matches
GROUP BY season

   sqlite:///cinema.db
 * sqlite:///football.db
Done.


season,home_wins,away_wins
2011/2012,6,8
2012/2013,9,7
2013/2014,16,10
2014/2015,10,8


The `SUM` operator will sum all the numeric values in the newly created column (so in this case it does matter what values are being returned). 

In [19]:
%%sql 
SELECT 
    season,
    SUM(
        CASE 
            WHEN hometeam_id = 8650 THEN home_goal 
        END
    ) AS home_goals,
    SUM(
        CASE 
            WHEN awayteam_id = 8650 THEN away_goal
        END
    ) AS away_goals
FROM matches
GROUP BY season

   sqlite:///cinema.db
 * sqlite:///football.db
Done.


season,home_goals,away_goals
2011/2012,24,23
2012/2013,33,38
2013/2014,53,48
2014/2015,30,22


By assigning a binary value in the `CASE` operation, and subsequently using `AVG()`, one is able to easily calculate the percentage of how often something occurs. 

In [23]:
%%sql
SELECT
    season,
    ROUND(AVG(
        CASE
            WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
            WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
        END), 2) AS pct_homewins,
    ROUND(AVG(
        CASE
            WHEN awayteam_id = 8455 AND home_goal < away_goal THEN 1
            WHEN awayteam_id = 8455 AND home_goal > away_goal THEN 0
        END), 2) AS pct_awaywins
FROM matches
GROUP BY season

   sqlite:///cinema.db
 * sqlite:///football.db
Done.


season,pct_homewins,pct_awaywins
2011/2012,0.75,0.5
2012/2013,0.86,0.67
2013/2014,0.94,0.67
2014/2015,1.0,0.79


## Joining Tables

### Inner Join

The `INNER JOIN` command looks for records that match in both tables, and only returns the matching records

<img src="images/inner_join.png" width="600"></img>

In [34]:
%sql sqlite:///world.db

In [33]:
%%sql
SELECT pm.country, pm.continent, pm.prime_minister, p.president 
FROM prime_ministers AS pm
INNER JOIN presidents AS p USING(country)
LIMIT(5);

-- OR --

SELECT pm.country, pm.continent, pm.prime_minister, p.president 
FROM prime_ministers AS pm
INNER JOIN presidents AS p ON pm.country = p.country
LIMIT(5);

   sqlite:///cinema.db
   sqlite:///football.db
 * sqlite:///world.db
Done.
Done.


country,continent,prime_minister,president
Egypt,Africa,Sherif Ismail,Abdel Fattah el-Sisi
Portugal,Europe,Antonio Costa,Marcelo Rebelo de Sousa
Vietnam,Asia,Nguyen Xuan Phuc,Tran Dai Quang
Haiti,North America,Jack Guy Lafontant,Jovenel Moise


One can join on multiple keys with the `AND` keyword

<img src="images/inner_join_and.png" width="800"></img>


You can join multiple tables by chaining `JOIN ON` commands

<img src="images/inner_join_chain.png" width="700"></img>


In [61]:
%%sql
SELECT 
    ci.name AS city_name, 
    co.name AS country_name, 
    cu.basic_unit AS currency
FROM cities AS ci
INNER JOIN countries AS co ON co.code = ci.country_code
INNER JOIN currencies AS cu ON cu.code = ci.country_code
LIMIT(5)

   sqlite:///cinema.db
   sqlite:///football.db
 * sqlite:///world.db
Done.


city_name,country_name,currency
Abidjan,Cote d'Ivoire,West African CFA franc
Abu Dhabi,United Arab Emirates,United Arab Emirates dirham
Abuja,Nigeria,Nigerian naira
Accra,Ghana,Ghanaian cedi
Addis Ababa,Ethiopia,Ethiopian birr


<img src="images/inner_join.png" width="600"></img>
