# **Subqueries**

A window function performs a set of calculations on an already genereated set of results (a window). The window can either be an entire data set of a small section of it. 

In [1]:
%load_ext sql
%sql sqlite:///databases/football.db

## `OVER()` Function

The `Over` operation tells SQL to pass a calculated value over the entire data set. Not only is the syntax simpler and clearer, it's also faster in terms of processing time.

In [2]:
%%sql
SELECT 
    date,
    (home_goal + away_goal) AS goals,
    (
        SELECT AVG(home_goal + away_goal) 
        FROM matches
        WHERE season = '2011/2012'
    ) AS overall_avg
FROM matches
WHERE season = '2011/2012'
LIMIT(5);

-- is equivalent to --

SELECT 
    date,
    (home_goal + away_goal) AS goals,
    AVG(home_goal + away_goal) OVER() AS overall_avg
FROM matches
WHERE season = '2011/2012'
LIMIT(5)


 * sqlite:///databases/football.db
Done.
Done.


date,goals,overall_avg
2011-07-29T00:00:00.000,3,2.7164596273291925
2011-07-30T00:00:00.000,2,2.7164596273291925
2011-07-30T00:00:00.000,4,2.7164596273291925
2011-07-30T00:00:00.000,1,2.7164596273291925
2011-07-30T00:00:00.000,0,2.7164596273291925


## `RANK()` Function

The `Rank` operation assigns a rank (enumerates) to the table. For example in order to find the rank of the number of goals scored, the `ORDER BY` operation needs to be passed to the `OVER` function. 

It should be noted that incase of a tie the same rank is given to both rows, and the relevant ranks are skipped. For example if two rows tie for rank #1, then the first and second row will have rank #1, rank #2 is skipped, and the ranking continues at rank #3

In [3]:
%%sql
SELECT
    date,
    (home_goal + away_goal) AS goals,
    RANK() OVER(ORDER BY home_goal + away_goal DESC) AS goals_rank
FROM matches
WHERE season = '2011/2012'
LIMIT(5)

 * sqlite:///databases/football.db
Done.


date,goals,goals_rank
2011-08-28T00:00:00.000,10,1
2011-11-06T00:00:00.000,10,1
2011-10-29T00:00:00.000,9,3
2012-02-12T00:00:00.000,9,3
2012-03-09T00:00:00.000,9,3


## `PARTITION BY` Operation

The `PARTITION BY` operation allows different values to be calculated based on different categories in the same column. It is similar to a `GROUP BY` operation, except that in case of the `GROUP BY` operation the resulting aggreagate table needs to be joined onto the original table. 

<img src="images/partition_by.png" width="800"></img>

In [4]:
%%sql
WITH seasonal_avg AS (
    SELECT 
        season,
        AVG(home_goal + away_goal) AS season_avg
    FROM matches
    GROUP BY season
)
SELECT 
    m.date,
    (m.home_goal + m.away_goal) AS goals,
    s.season_avg 
FROM matches AS m
INNER JOIN seasonal_avg AS s ON m.season = s.season
ORDER BY m.date
LIMIT(5)

-- is equivalent to --

SELECT 
    date,
    (home_goal + away_goal) AS goals,
    AVG(home_goal + away_goal) OVER(PARTITION BY season) AS season_avg
FROM matches
ORDER BY date
LIMIT(5)

 * sqlite:///databases/football.db
(sqlite3.OperationalError) near "SELECT": syntax error
[SQL: WITH seasonal_avg AS (
    SELECT 
        season,
        AVG(home_goal + away_goal) AS season_avg
    FROM matches
    GROUP BY season
)
SELECT 
    m.date,
    (m.home_goal + m.away_goal) AS goals,
    s.season_avg 
FROM matches AS m
INNER JOIN seasonal_avg AS s ON m.season = s.season
ORDER BY m.date
LIMIT(5)

-- is equivalent to --

SELECT 
    date,
    (home_goal + away_goal) AS goals,
    AVG(home_goal + away_goal) OVER(PARTITION BY season) AS season_avg
FROM matches
ORDER BY date
LIMIT(5)]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


The calculation can be done by creating a partition based on multiple columns simply by passing multiple columns to the `OVER` function. 

In [5]:
%%sql
SELECT 
    c.name,
    m.season,
    (m.home_goal + m.away_goal) AS goals,
    (
        ROUND(AVG(home_goal + away_goal) 
        OVER(PARTITION BY m.season, c.name), 2)   
    ) AS season_country_avg
FROM country AS c
LEFT JOIN matches AS m ON c.id = m.country_id
ORDER BY RANDOM()
LIMIT(5)

 * sqlite:///databases/football.db
Done.


name,season,goals,season_country_avg
Germany,2012/2013,7,2.93
France,2011/2012,2,2.52
Italy,2013/2014,3,2.72
Germany,2012/2013,0,2.93
Switzerland,2014/2015,4,2.87


## Sliding Window

A sliding window is a window function where the calculation is performed on a set of sequential rows. 

<img src="images/sliding_window.png" width="800"></img>

The general syntax is as follows 

```
ROWS BETWEEN <start> AND <finish>
```

Where `<start>` and `<finish>` are replaced by one of the following:
- `CURRENT ROW` = current row
- `PRECEDING` = number of rows before current row
- `FOLLOWING` = number of rows after current row
- `UNBOUNDED PRECEDING` = all rows from first row to current row
- `UNBOUNDED FOLLOWING` = all rows from current row to last row


In [6]:
%%sql
SELECT  
    date,
    home_goal,
    away_goal,
    SUM(home_goal) OVER(
        ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM matches
WHERE hometeam_id=8456 AND season = '2011/2012'
LIMIT(5)

 * sqlite:///databases/football.db
Done.


date,home_goal,away_goal,running_total
2011-08-15T00:00:00.000,4,0,4
2011-09-10T00:00:00.000,3,0,7
2011-09-24T00:00:00.000,2,0,9
2011-10-15T00:00:00.000,4,1,13
2011-10-29T00:00:00.000,3,1,16
