<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 2. Fetching, ranking, and paging
---
In this chapter, you'll learn three practical applications of window functions: fetching values from different parts of the table, ranking rows according to their values, and binning rows into different tables.

In [1]:
%load_ext sql

In [2]:
%sql sqlite:///data/summer.db

'Connected: @data/summer.db'

### The four functions

**Relative**

-`LAG(column, n)` returns `column` value at the row `n` rows before the current row

-`LEAD(column, n)` returns `column` value at the row `n` rows after the current row

**Absolute**

-`FIRST_VALUE(column)` returns the first value in the table or partition

-`LAST_VALUE(column)` returns the last value in the table or partition

## Future gold medalists
---

Fetching functions allow you to get values from different parts of the table into one row. If you have time-ordered data, you can "peek into the future" with the `LEAD` fetching function. This is especially useful if you want to compare a current value to a future value.

### Instructions

For each year, fetch the current gold medalist and the gold medalist 3 competitions ahead of the current row.

In [3]:
%%sql

WITH Discus_Medalists 
     AS (SELECT DISTINCT year,
                         athlete
         FROM   summer_medals
         WHERE  medal = 'Gold'
                AND event = 'Discus Throw'
                AND gender = 'Women'
                AND year >= 2000)
SELECT year,
       athlete,
       LEAD(athlete, 3)
         OVER (
           ORDER BY year ASC) AS Future_Champion
FROM   Discus_Medalists 
ORDER  BY year ASC 

 * sqlite:///data/summer.db
Done.


year,athlete,Future_Champion
2000,ZVEREVA Ellina,PERKOVIC Sandra
2004,SADOVA Natalya,
2008,BROWN TRAFTON Stephanie,
2012,PERKOVIC Sandra,


## First athlete by name
---

It's often useful to get the first or last value in a dataset to compare all other values to it. With absolute fetching functions like `FIRST_VALUE`, you can fetch a value at an absolute position in the table, like its beginning or end.

### Instructions

Return all athletes and the first athlete ordered by alphabetical order.

In [5]:
%%sql

WITH All_Male_Medalists 
     AS (SELECT DISTINCT athlete
         FROM   summer_medals
         WHERE  medal = 'Gold'
                AND gender = 'Men')
SELECT athlete,
       FIRST_VALUE(athlete)
         OVER (
           ORDER BY athlete ASC ) AS First_Athlete
FROM   All_Male_Medalists 
LIMIT  5

 * sqlite:///data/summer.db
Done.


athlete,First_Athlete
AABYE Edgar,AABYE Edgar
AALTONEN Paavo Johannes,AABYE Edgar
AAS Thomas Valentin,AABYE Edgar
ABALMASAU Aliaksei,AABYE Edgar
ABALO Luc,AABYE Edgar


## Last country by name
---

Just like you can get the first row's value in a dataset, you can get the last row's value. This is often useful when you want to compare the most recent value to previous values.

### Instructions

Return the year and the city in which each Olympic games were held.

Fetch the last city in which the Olympic games were held.

In [6]:
%%sql

WITH Hosts AS (
  SELECT DISTINCT year, city
    FROM Summer_Medals)

SELECT year,
       city,
       LAST_VALUE(city) OVER (ORDER BY year ASC
           RANGE BETWEEN UNBOUNDED PRECEDING AND
           UNBOUNDED FOLLOWING) AS Last_City
FROM   Hosts
ORDER  BY year ASC
LIMIT  10

 * sqlite:///data/summer.db
Done.


Year,City,Last_City
1896,Athens,London
1900,Paris,London
1904,St Louis,London
1908,London,London
1912,Stockholm,London
1920,Antwerp,London
1924,Paris,London
1928,Amsterdam,London
1932,Los Angeles,London
1936,Berlin,London


## Ranking
---

`ROW_NUMBER()` always assigns unique numbers, even if two rows' values are the same

`RANK()` assigns the same number to rows with identical values, skipping over the next number in such cases

`DENSE_RANK()` also assigns the same number to rows with identical values, but doesn't skip over the next numbers

## Ranking athletes by medals earned
---
In chapter 1, you used `ROW_NUMBER` to rank athletes by awarded medals. However, `ROW_NUMBER` assigns different numbers to athletes with the same count of awarded medals, so it's not a useful ranking function; if two athletes earned the same number of medals, they should have the same rank.

### Instructions

Rank each athlete by the number of medals they've earned -- the higher the count, the higher the rank -- with identical numbers in case of identical values.

In [7]:
%%sql

WITH Athlete_Medals 
     AS (SELECT athlete,
                COUNT(*) AS Medals
         FROM   summer_medals
         GROUP  BY athlete)
SELECT athlete,
       medals,
       RANK() OVER (ORDER BY medals DESC) AS Rank_N
FROM   Athlete_Medals 
ORDER  BY medals DESC 
LIMIT  10

 * sqlite:///data/summer.db
Done.


athlete,Medals,Rank_N
PHELPS Michael,22,1
LATYNINA Larisa,18,2
ANDRIANOV Nikolay,15,3
MANGIAROTTI Edoardo,13,4
ONO Takashi,13,4
SHAKHLIN Boris,13,4
COUGHLIN Natalie,12,7
FISCHER Birgit,12,7
KATO Sawao,12,7
NEMOV Alexei,12,7


## Ranking athletes from multiple countries
---

In the previous exercise, you used `RANK` to assign rankings to one group of athletes. In real-world data, however, you'll often find numerous groups within your data. Without partitioning your data, one group's values will influence the rankings of the others.

Also, while `RANK` skips numbers in case of identical values, the most natural way to assign rankings is not to skip numbers. If two countries are tied for second place, the country after them is considered to be third by most people.

### Instructions

Rank each country's athletes by the count of medals they've earned -- the higher the count, the higher the rank -- without skipping numbers in case of identical values.

In [9]:
%%sql

WITH athlete_medals
     AS (SELECT country,
                athlete,
                COUNT(*) AS Medals
         FROM   summer_medals
         WHERE  country IN ( 'JPN', 'KOR' )
                AND year >= 2000
         GROUP  BY country,
                   athlete
         HAVING Count(*) > 1)
SELECT country,
       athlete,
       DENSE_RANK() OVER (partition BY country ORDER BY medals DESC) AS Rank_N
FROM   athlete_medals
ORDER  BY country ASC,
          rank_n ASC 

 * sqlite:///data/summer.db
Done.


country,athlete,Rank_N
JPN,KITAJIMA Kosuke,1
JPN,UCHIMURA Kohei,2
JPN,TACHIBANA Miya,3
JPN,TAKEDA Miho,3
JPN,ICHO Kaori,4
JPN,IRIE Ryosuke,4
JPN,KASHIMA Takehiro,4
JPN,MATSUDA Takeshi,4
JPN,SUZUKI Satomi,4
JPN,TANI Ryoko,4


## What is paging?
---
- **Paging:** Splitting data into (approximately) equal chunks

- **Uses**

    - Many APIs return data in "pages" to reduce data being sent
    
    - Separating data into quartiles or thirds (top middle 33%, and bottom thirds) to judge performance
    
Enter NTILE

- `NTILE(n)` splits the data into `n` approximately equal pages

## Paging events
---

There are exactly 666 unique events in the Summer Medals Olympics dataset. If you want to chunk them up to analyze them piece by piece, you'll need to split the events into groups of approximately equal size.

### Instructions

Split the distinct events into exactly 111 groups, ordered by event in alphabetical order.

In [12]:
%%sql

WITH Events
     AS (SELECT DISTINCT event
         FROM   summer_medals)
SELECT event,
       NTILE(111) OVER (ORDER BY event ASC) AS Page
FROM   Events
ORDER  BY event ASC
LIMIT  20

 * sqlite:///data/summer.db
Done.


event,Page
+ 100KG,1
+ 100KG (Heavyweight),1
+ 100KG (Super Heavyweight),1
+ 105KG,1
+ 108KG Total (Super Heavyweight),1
+ 110KG Total (Super Heavyweight),1
+ 67 KG,2
+ 71.67KG (Heavyweight),2
+ 72KG (Heavyweight),2
+ 73KG (Heavyweight),2


## Top, middle, and bottom thirds
---

Splitting your data into thirds or quartiles is often useful to understand how the values in your dataset are spread. Getting summary statistics (averages, sums, standard deviations, etc.) of the top, middle, and bottom thirds can help you determine what distribution your values follow.

### Instructions

Split the athletes into top, middle, and bottom thirds based on their count of medals.

In [13]:
%%sql

WITH Athlete_Medals
     AS (SELECT athlete,
                COUNT(*) AS Medals
         FROM   summer_medals
         GROUP  BY athlete
         HAVING COUNT(*) > 1)
SELECT athlete,
       medals,
       NTILE(3) OVER(ORDER BY medals DESC) AS Third
FROM   Athlete_Medals
ORDER  BY medals DESC,
          athlete ASC
LIMIT  20

 * sqlite:///data/summer.db
Done.


athlete,Medals,Third
PHELPS Michael,22,1
LATYNINA Larisa,18,1
ANDRIANOV Nikolay,15,1
MANGIAROTTI Edoardo,13,1
ONO Takashi,13,1
SHAKHLIN Boris,13,1
COUGHLIN Natalie,12,1
FISCHER Birgit,12,1
KATO Sawao,12,1
NEMOV Alexei,12,1


Return the average of each third.

In [14]:
%%sql

WITH Athlete_Medals
     AS (SELECT athlete,
                COUNT(*) AS Medals
         FROM   summer_medals
         GROUP  BY athlete
         HAVING COUNT(*) > 1),
     Thirds AS (
         SELECT athlete,
                medals,
                NTILE(3) OVER (ORDER BY medals DESC) AS Third
         FROM   athlete_medals)
SELECT third,
       AVG(medals) AS Avg_Medals
FROM   thirds
GROUP  BY third
ORDER  BY third ASC 

 * sqlite:///data/summer.db
Done.


Third,Avg_Medals
1,3.786446469248292
2,2.0
3,2.0
