# WINDOW FUNCTIONS IN SQL

## Business Case
- Often business managers want to compare current sales to previous sales
- Previous sales can be:
    - sales during previous month
    - average sales during last three months
    - last year’s sales until current date (year-to-date)
- Window functions offer a solution to these kind of problems in a single, efficient SQL query
- Introduced in SQL: 2003


## OVER CLAUSE
- Results of a `SELECT` are partitioned
- The `OVER` clauses creates partitions and ordering
- Numbering, ordering and aggregate functions per partition
- The partition behaves as a window that shifts over the data
- The `OVER` clause can be used with standard aggregate functions (`SUM`, `AVG`, …) or specific window functions (`RANK`, `LAG`,…)


## Example: Running Total - Year To Date
- Database `xtreme`: give `OrderId`, `OrderDate`, `OrderAmount` and running total (Year To Date-YTD) of the `OrderAmount`.Initialize the total for each new year.
- **Using a correlated subquery** this is **very inefficient** as for each line the **complete sum is recalculated** (see chapter about subqueries).


In [None]:
SELECT 
 OrderId
,OrderDate
,OrderAmount
,(SELECT SUM(OrderAmount) 
  FROM Orders [Order]
  WHERE YEAR(OrderDate) = Year(o.OrderDate) 
	AND [Order].OrderId <= o.OrderId
 ) AS [Year To Date]
FROM Orders o
ORDER BY OrderId;

- The `OVER` clause makes the query 
    - much simpler 
    - far more efficient
- The sum is repeated for each partition


In [None]:
SELECT 
 OrderId
,OrderDate
,OrderAmount
,SUM(OrderAmount) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderId) AS [Year To Date]
FROM Orders
ORDER BY OrderId

> Notice: 
> - The execution time of both queries.
> - The running total is initialized for the next year.


## Window functions: `ROW_NUMBER() | RANK()`
- `PARTITION` is **optional**, `ORDER BY` is **mandatory**
- `ROW_NUMBER()`: running sequence number, no duplicates occur in the same partition. 
- `RANK()`: running `RANK` per partition, duplicates can occur: 1, 2, 3, 3, <b>5</b>
    - Since there are 2 rows with a same rank namely (3), there is no fourth `RANK`, it's immediatly rank 5.
- `DENSE_RANK()`: no gaps in ranking 1, 2, 3, 3, <b>4</b>


In [None]:
SELECT 
 ROW_NUMBER() OVER (ORDER BY o.orderdate, o.orderid) AS OrderSequence
,ROW_NUMBER() OVER (PARTITION BY o.customerid ORDER BY o.orderdate, o.orderid) AS CustomerOrderSequence
,RANK() OVER (ORDER BY o.orderamount DESC) AS OrderRanking
,RANK() OVER (PARTITION BY o.customerid ORDER BY o.orderamount DESC) AS CustomerOrderRanking
,o.orderid, o.customerid, o.orderdate, o.orderamount
FROM orders o
ORDER BY OrderSequence


CustomerOrderRanking = 18 means: 
- The current order is the 18th biggest order for the current customer (customerid = 30)


## Window functions: percent_rank()
- pct_rank() shows the ranking on a scale from 0 - 1 

In [None]:
select 
 row_number() over (order by o.orderdate, o.orderid) as OrderSequence
,rank() over (order by o.orderamount desc) as OrderRanking
,percent_rank() over (order by o.orderamount desc) as PctOrderRanking, o.orderid, o.orderdate, o.orderamount
from orders o
order by OrderSequence

## Window functions: moving aggregate 
- Real meaning of window functions: apply to a window that shifts over the result set
- Previous examples work with default window: start of resultset to current row
- ‘Running query total’ could also have been written as: 


In [None]:
select 
 orderid
,orderdate
,orderamount
,sum(orderamount) over (partition by year(o.orderdate) order by o.orderid range between unbounded preceding and current row) YTD
from orders o
order by orderid;

With range you have three valid options:
- range between unbounded preceding and current row
- range between current row and unbounded following 
- range between unbounded preceding and unbounded following 

Example: show running total and overall total by customer


In [None]:
select 
 o.orderid
,o.customerid
,o.orderamount
,sum(o.orderamount) over (partition by o.customerid order by o.orderid,o.customerid
    range between unbounded preceding and current row) as RunningTotalByCustomer -- running total
,sum(o.orderamount) over (partition by o.customerid order by o.orderid  -- order by is mandatory
    range between unbounded preceding and unbounded following) as OverallTotalByCustomer
from orders o
order by o.customerid;

- When you use RANGE, the current row is compared to other rows and grouped based on the ORDER BY predicate. 
- This is not always desirable; you might actually want a physical offset.
- In this scenario, you would specify `ROWS` instead of `RANGE`. 
    This gives you three options in addition to the three options enumerated previously:
    - rows between `N preceding` and `current row`
    - rows between `current row` and `N following`
    - rows between `N preceding` and `N following`
- Example: show moving average of monthly sales for
    1. three preceding months and current month
    2. preceding, current and next month
- We first use a CTE to calculate the monthly sales



In [None]:
with monthlysales as 
(select year(orderdate)*100 + month(orderdate) MON, sum(o.orderamount) SALES
from Orders o
group by year(orderdate)*100 + month(orderdate))

select mon, sales, 
round(avg(sales) over (order by mon rows between 3 preceding and current row),0) AVG4MONTHS,
round(avg(sales) over (order by mon rows between 1 preceding and 1 following),0) AVG3MONTHS
from monthlysales
order by 1;

## Window functions LAG and LEAD
- Windows functions LAG and LEAD refer to previous and next row respectively 
- Example: show monthly sales for previous and next month

In [None]:
with monthlysales as 
(select year(orderdate)*100 + month(orderdate) MON, sum(o.orderamount) SALES
from orders o
group by year(orderdate)*100 + month(orderdate))

select mon, sales,
lag(sales) over (order by mon) SALESPREVMONTH,
lead(sales) over (order by mon) SALESNEXTMONTH
from monthlysales
order by 1;

## Exercises
### DB xtreme
1. Compare the monthly sales to the moving average of the last three months. Show month, sales and moving average. 


2. Show for each month the percentual growth (or decline) as opposed to the previous month. Show month, sales and growth-%


3. Show for each month for which we have sales the total sold quantites and the average of the sold quanties in the previous and the next month. Also add a row number and show the rank (highest first) of each month in the current year. 
