<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# SQL window functions
© ExploreAI Academy

In this exercise, we will test our understanding and application of SQL window functions on a sample SQLite database file for a retail company called Northwind by performing complex calculations and analyses like ranking, running totals, and date differences. Ensure that you have downloaded the database file, Northwind.db.

## Learning objectives

By the end of this train, you should:
- Use the RANK() function to assign a ranking number to each row based on the order specified within the window.
- Use aggregate window functions to calculate running totals. 
- Use the LAG() function to help calculate the difference, in days, between consecutive date readings in our dataset.
- Use aggregate window functions to calculate the moving average. 

First, let's load our sample database:

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook.
%load_ext sql


In [2]:
# Load the Northwind database stored in your local machine. 
# Make sure the file is saved in the same folder as this notebook.
%sql sqlite:///db/Northwind.db
    

Here is a view of all of our tables in the database:

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Northwind_ERD.png"  style="width:500px";/>
<br>
<br>
    <em>Figure 1: Northwind ERD</em>
</div>

## Exercise

Run the necessary queries that will provide us with the following information. Compare your queries with the solutions at the end of this notebook.

### Exercise 1

Rank all the orders of a specific customer from the most recent to the least recent using window functions. Assume that the customer ID is `'ALFKI'`.

In [8]:
%%sql

SELECT 
    CustomerID,
    OrderID,
    OrderDate,
    RANK() OVER(ORDER BY OrderDate DESC)
FROM 
    Orders
WHERE 
    CustomerID='ALFKI';

 * sqlite:///db/Northwind.db
Done.


CustomerID,OrderID,OrderDate,RANK() OVER(ORDER BY OrderDate DESC)
ALFKI,11011,1998-04-09 00:00:00,1
ALFKI,10952,1998-03-16 00:00:00,2
ALFKI,10835,1998-01-15 00:00:00,3
ALFKI,10702,1997-10-13 00:00:00,4
ALFKI,10692,1997-10-03 00:00:00,5
ALFKI,10643,1997-08-25 00:00:00,6


### Exercise 2

Calculate a running total of the quantity of orders using window functions.

In [28]:
%%sql
SELECT 
    CustomerID,
    o.OrderID,
    o.OrderDate,
    od.Quantity,
    SUM(od.Quantity) OVER(PARTITION BY o.OrderID ORDER BY OrderDate) as running_total
FROM 
    Orders as o, OrderDetails as od
WHERE 
    o.OrderID=od.OrderID
LIMIT 20;

 * sqlite:///db/Northwind.db
Done.


CustomerID,OrderID,OrderDate,Quantity,running_total
VINET,10248,1996-07-04 00:00:00,12,27
VINET,10248,1996-07-04 00:00:00,10,27
VINET,10248,1996-07-04 00:00:00,5,27
TOMSP,10249,1996-07-05 00:00:00,9,49
TOMSP,10249,1996-07-05 00:00:00,40,49
HANAR,10250,1996-07-08 00:00:00,10,60
HANAR,10250,1996-07-08 00:00:00,35,60
HANAR,10250,1996-07-08 00:00:00,15,60
VICTE,10251,1996-07-08 00:00:00,6,41
VICTE,10251,1996-07-08 00:00:00,15,41


### Exercise 3


Use window functions to find the difference in successive order dates for each customer. **HINT:** The `TIMESTAMPDIFF()` function in MySQL is not available in SQLite. We can use the `julianday()` function to convert the dates to a floating point number and then calculate the difference.

In [33]:
%%sql

SELECT
    CustomerID,
    OrderDate,
    (julianday(OrderDate)-julianday(LAG(OrderDate) OVER(PARTITION BY CustomerID ORDER BY OrderDate))) as successive_order_date_diff
FROM
    Orders
LIMIT 10;

 * sqlite:///db/Northwind.db
Done.


CustomerID,OrderDate,successive_order_date_diff
ALFKI,1997-08-25 00:00:00,
ALFKI,1997-10-03 00:00:00,39.0
ALFKI,1997-10-13 00:00:00,10.0
ALFKI,1998-01-15 00:00:00,94.0
ALFKI,1998-03-16 00:00:00,60.0
ALFKI,1998-04-09 00:00:00,24.0
ANATR,1996-09-18 00:00:00,
ANATR,1997-08-08 00:00:00,324.0
ANATR,1997-11-28 00:00:00,112.0
ANATR,1998-03-04 00:00:00,96.0


### Exercise 4

Calculate the moving average of the quantity of the last 3 orders for each product using window functions.

In [39]:
%%sql
SELECT 
    *, 
    AVG(Quantity) OVER(PARTITION BY ProductID ORDER BY OrderID) as moving_avg_quantity
FROM OrderDetails LIMIT 10;

 * sqlite:///db/Northwind.db
Done.


OrderID,ProductID,UnitPrice,Quantity,Discount,moving_avg_quantity
10285,1,14.4,45,0.0,45.0
10294,1,14.4,18,0.0,31.5
10317,1,14.4,20,0.0,27.666666666666668
10348,1,14.4,15,0.0,24.5
10354,1,14.4,12,0.0,22.0
10370,1,14.4,15,0.0,20.83333333333333
10406,1,14.4,10,0.0,19.285714285714285
10413,1,14.4,24,0.0,19.875
10477,1,14.4,15,0.0,19.33333333333333
10522,1,18.0,40,0.0,21.4


## Solutions

### Exercise 1

In [9]:
%%sql

SELECT 
    OrderID, 
    OrderDate,
    RANK() OVER (ORDER BY OrderDate DESC) as Order_rank
FROM 
    Orders 
WHERE 
    CustomerID = 'ALFKI';

 * sqlite:///db/Northwind.db
Done.


OrderID,OrderDate,Order_rank
11011,1998-04-09 00:00:00,1
10952,1998-03-16 00:00:00,2
10835,1998-01-15 00:00:00,3
10702,1997-10-13 00:00:00,4
10692,1997-10-03 00:00:00,5
10643,1997-08-25 00:00:00,6


The `RANK()` window function is used here to rank each order of the customer with the ID `'ALFKI'` based on the `OrderDate`. The `DESC` keyword is used so that the most recent order gets the highest rank (i.e. 1).

### Exercise 2

In [29]:
%%sql

SELECT 
    OrderID, 
    Quantity, 
    SUM(Quantity) OVER (
    ORDER BY OrderID) as RunningTotal 
FROM 
    OrderDetails
GROUP BY 
    OrderID

LIMIT 10;


 * sqlite:///db/Northwind.db
Done.


OrderID,Quantity,RunningTotal
10248,12,12
10249,9,21
10250,10,31
10251,6,37
10252,40,77
10253,20,97
10254,15,112
10255,20,132
10256,15,147
10257,25,172


The `SUM()` window function is used here to calculate a running total of the quantity of orders. The `ORDER BY` clause inside the `OVER()` clause ensures that the running total is calculated in the order of the `OrderID`. We then group our data by `OrderID`.

### Exercise 3

In [35]:
%%sql

SELECT 
    CustomerID, 
    OrderDate, 
    LAG(OrderDate, 1) OVER 
        (PARTITION BY CustomerID 
        ORDER BY OrderDate) as PrevOrderDate, 
    julianday(OrderDate)-
    julianday(LAG(OrderDate, 1) OVER
            (PARTITION BY CustomerID 
            ORDER BY OrderDate)) as DateDiff
FROM 
    Orders
LIMIT 10;

 * sqlite:///db/Northwind.db
Done.


CustomerID,OrderDate,PrevOrderDate,DateDiff
ALFKI,1997-08-25 00:00:00,,
ALFKI,1997-10-03 00:00:00,1997-08-25 00:00:00,39.0
ALFKI,1997-10-13 00:00:00,1997-10-03 00:00:00,10.0
ALFKI,1998-01-15 00:00:00,1997-10-13 00:00:00,94.0
ALFKI,1998-03-16 00:00:00,1998-01-15 00:00:00,60.0
ALFKI,1998-04-09 00:00:00,1998-03-16 00:00:00,24.0
ANATR,1996-09-18 00:00:00,,
ANATR,1997-08-08 00:00:00,1996-09-18 00:00:00,324.0
ANATR,1997-11-28 00:00:00,1997-08-08 00:00:00,112.0
ANATR,1998-03-04 00:00:00,1997-11-28 00:00:00,96.0


The `LAG()` window function is used twice here, once to get the previous order date for each customer, and again to calculate the difference between the current order date and the previous order date. The `PARTITION BY` clause is used to separate the data into partitions based on the `CustomerID`. 

Since the `TIMESTAMPDIFF()` function in MySQL is not available in SQLite, we use the `julianday()` function to convert the dates to a floating point number, which we can then use for subtraction to find the difference between two dates.

### Exercise 4

In [41]:
%%sql

SELECT 
    OrderID, 
    ProductID, 
    Quantity,
    AVG(Quantity) OVER (PARTITION BY ProductID ORDER BY OrderID ROWS BETWEEN 2 PRECEDING AND 0 FOLLOWING) as MovingAvgQuantity
FROM 
    OrderDetails
ORDER BY 
    ProductID, 
    OrderID
LIMIT 10;

 * sqlite:///db/Northwind.db
Done.


OrderID,ProductID,Quantity,MovingAvgQuantity
10285,1,45,45.0
10294,1,18,31.5
10317,1,20,27.666666666666668
10348,1,15,17.666666666666668
10354,1,12,15.666666666666666
10370,1,15,14.0
10406,1,10,12.333333333333334
10413,1,24,16.333333333333332
10477,1,15,16.333333333333332
10522,1,40,26.33333333333333


The `AVG()` window function is used here to calculate the moving average of `Quantity` for the last 3 orders (the current order and the two preceding orders) for each product. The window is defined using the `PARTITION BY` clause (to segment the data by `ProductID`) and the `ORDER BY` clause (to arrange the data in order of `OrderID`). The `ROWS BETWEEN` clause specifies the size and location of the window – in this case, **the current row and the two rows preceding it.**

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>