# **Extract Data**

In [1]:
%load_ext sql
%sql sqlite:///databases/football.db

## Aggregate Functions

There are 5 different built in aggregate functions in SQL, namely
- `COUNT()`
- `MIN()`
- `MAX()`
- `AVG()`
- `SUM()`

The `AVG()`and `SUM()`operators can only be used on numeric data as these involve Arithmetic. Similar to the `COUNT()` operator,  `MIN()` and `MAX()` may be used for non-numeric data 


`COUNT(field_name)`	number of records/rows with a value in a field/column 		(excludes null values)  
`COUNT(*)`		number of records/rows in a table 				(includes null values)

## Arithmatic

SQL can perform basic arithmetic in the form of `+`,`-`, `*`, and `/`. It should be noted that integers as input will return integers as output; for decimal output one should use float input. 

The main difference between the aggregate functions and arithmetic is that the former performs the operation across all values in a field/column, while the latter performs the operation on the values in the records/rows.

In [2]:
%%sql
SELECT(1*3)

 * sqlite:///databases/football.db
Done.


(1*3)
3


## Case

### `CASE` in `SELECT` Statement

When a new column/field needs to be created based on existing data the `CASE` operation may be used. This operation outputs a new value based on the existing data on a case-by-case basis.

In [3]:
%%sql
SELECT id, home_goal, away_goal,
    CASE  
        WHEN home_goal > away_goal THEN 'Home team win'
        WHEN home_goal < away_goal THEN 'Away team win'
        ELSE 'TIE'
    END AS outcome
FROM matches
WHERE season = '2013/2014'
LIMIT(5)

 * sqlite:///databases/football.db
Done.


id,home_goal,away_goal,outcome
1237,2,0,Home team win
1238,0,1,Away team win
1239,1,0,Home team win
1240,0,0,TIE
1241,2,1,Home team win


### `CASE` in `WHERE` Statement

The `WHERE` operation is executed before the `SELECT` statement, as a result the alias of a `CASE` operation in the `SELECT` statement cannot be used in the where statement. Instead the entire `CASE` statement can be copy-pasted into the `WHERE` statement in order to filter by the output of the `CASE` statement. 

In [4]:
%%sql
SELECT date, season,
    (CASE 
        WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chealsea home win'
        WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 'Chealsea away win'
    END) AS outcome
FROM matches
WHERE 
    (CASE 
        WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chealsea home win'
        WHEN awayteam_id = 8455 AND away_goal > home_goal THEN 'Chealsea away win'
    END) IS NOT NULL
LIMIT(5)
    

 * sqlite:///databases/football.db
Done.


date,season,outcome
2011-11-05T00:00:00.000,2011/2012,Chealsea away win
2011-11-26T00:00:00.000,2011/2012,Chealsea home win
2011-12-03T00:00:00.000,2011/2012,Chealsea away win
2011-12-12T00:00:00.000,2011/2012,Chealsea home win
2011-08-20T00:00:00.000,2011/2012,Chealsea home win


### `CASE` with Aggregate Functions

The `CASE` operation will return a column of filtered data, one can then use an aggregate function to summarize the data into a single value. 

The `COUNT` operator will count the number of non-null rows in this newly created column (so it does not actually matter what value is being returned in the column if the `CASE` operation is only used for counting). 

In [5]:
%%sql 
SELECT 
    season,
    COUNT(
        CASE 
            WHEN hometeam_id = 8650 AND home_goal > away_goal THEN id 
        END
    ) AS home_wins,
    COUNT(
        CASE 
            WHEN awayteam_id = 8650 AND away_goal > home_goal THEN id 
        END
    ) AS away_wins
FROM matches
GROUP BY season

 * sqlite:///databases/football.db
Done.


season,home_wins,away_wins
2011/2012,6,8
2012/2013,9,7
2013/2014,16,10
2014/2015,10,8


The `SUM` operator will sum all the numeric values in the newly created column (so in this case it does matter what values are being returned). 

In [6]:
%%sql 
SELECT 
    season,
    SUM(
        CASE 
            WHEN hometeam_id = 8650 THEN home_goal 
        END
    ) AS home_goals,
    SUM(
        CASE 
            WHEN awayteam_id = 8650 THEN away_goal
        END
    ) AS away_goals
FROM matches
GROUP BY season

 * sqlite:///databases/football.db
Done.


season,home_goals,away_goals
2011/2012,24,23
2012/2013,33,38
2013/2014,53,48
2014/2015,30,22


By assigning a binary value in the `CASE` operation, and subsequently using `AVG()`, one is able to easily calculate the percentage of how often something occurs. 

In [7]:
%%sql
SELECT
    season,
    ROUND(AVG(
        CASE
            WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 1
            WHEN hometeam_id = 8455 AND home_goal < away_goal THEN 0
        END), 2) AS pct_homewins,
    ROUND(AVG(
        CASE
            WHEN awayteam_id = 8455 AND home_goal < away_goal THEN 1
            WHEN awayteam_id = 8455 AND home_goal > away_goal THEN 0
        END), 2) AS pct_awaywins
FROM matches
GROUP BY season

 * sqlite:///databases/football.db
Done.


season,pct_homewins,pct_awaywins
2011/2012,0.75,0.5
2012/2013,0.86,0.67
2013/2014,0.94,0.67
2014/2015,1.0,0.79
