If you’re learning Structured Query Language (SQL), it’s important to be familiar with advanced techniques such as Window Functions, Common Table Expressions, Aggregate Functions, and Pivot Tables. In this post, I’ll focus on Window Functions and share my understanding of this topic.

# Prerequestiqe
Before running any SQL's querry, the most essential thing is database. below code will read sheet "test" in test_data.xlsx and convert to a database that use in the post. 

In [1]:
import sqlite3
import pandas as pd
from tabulate import tabulate

# create a sqlite database with name test_sqlite.db
db_path = "data/db_test.db"

# path of spreadsheet file
excel_file = 'data/test_data.xlsx'
# read spreadsheet data and put to dataframe
test_data = pd.read_excel(excel_file, sheet_name='test', header=0)
table_test = """
    CREATE TABLE test (city TEXT, id INTEGER, sold INTEGER, month INTEGER)
    """

with sqlite3.connect(db_path) as con:
    # delete the table if it exist
    con.execute( "DROP TABLE IF EXISTS test;")
    # execute these commands to create database tables
    con.execute(table_test)
    test_data.to_sql('test', con=con, if_exists='append',index=False)


Window functions are a type of function in SQL that allow you to perform calculations across a set of rows that are related to the current row. They are similar to aggregate functions, but unlike aggregate functions, they do not group rows into a single output row. Instead, window functions calculate a value for each row based on a window of rows that you define.

![Alt text](images/diff_to_WindowFunctions.png)

*source:  https://learnsql.com/blog/why-learn-sql-window-functions/*

SELECT <column_1>, <column_2>,\
 <<span style="color:green">window_function </span>>() OVER (\
 PARTITION BY <...>\
 ORDER BY <...>\
 <<span style="color:green">window_frame </span>>) <window_column_alias>\
FROM <table_name>;

PARTITION BY is an optional clause that split the data into partitions. Including the partition clause divides the query result set into partitions, and the window function is applied to each partition seperately. If no PARTITION BY, the function uses only one partition is the entire table.

Below image is an example of PARTITION BY city. It groups a same city into one partition. 

![Alt text](images/partition.jpg)

*source:  me*

ORDER BY clause defines the logical order of rows within each partition of result set or entire table.
Next image is the result after add both PARTITION BY and ORDER BY. 
![Alt text](images/order_and_partition.jpg)

*source:  me*

 <<span style="color:green">window_frame </span>> is a set of rows that are somehow related to the current row. The window frame is evaluated separately within each partition.
        
        ROWS | RANGE | GROUPS BETWEEN lower_bound AND upper_bound
Note: the upper_bound must be after lower_bound

The bounds can be any of the five options for short:
| Abbreviation | Meaning |
| --- | ----------- |
| UNBOUNDED PRECEDING | BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
| n PRECEDING | BETWEEN n PRECEDING AND CURRENT ROW |
| CURRENT ROW | BETWEEN CURRENT ROW AND CURRENT ROW
| n FOLLOWING | BETWEEN AND CURRENT ROW AND n FOLLOWING
| UNBOUNDED FOLLOWING | BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING


![Alt text](images/bounds.png)

*source:  https://learnsql.com/blog/sql-window-functions-rows-clause/*

The differences among ROWS, RANGE, GROUPS:

![Alt text](images/rows_ranges_groups.png)

*source:  https://data-xtractor.com/blog/query-builder/window-function-framing-rows-vs-range-vs-groups/*


Let’s take a closer look at the differences between ROWS, RANGE, and GROUPS. The picture below shows an example of ROWS, RANGE, and GROUPS with PARTITION and ORDER. The window frame is defined as ‘BETWEEN 1 PRECEDING AND 2 FOLLOWING’.

![Alt text](images/row_range_group_order_partition.png)

*source:  me*



Due to PARTITION, window function only works in each partition, it will repeat in other partitions. Above example focus on Paris partition.

ROWS: goes back 1 individual row and forward 2 individual rows from the current row. The SUM of "sold" for this frame will be 1300 (300 + 500 + 200 + 300).

RANGE: unlike ROWS the PRECEEDING and FOLLOWING are calculated base on a column of ORDER BY (month column) with the calucations that are PRECCEDING offset = the value of current row - the value of the row before the current row and FOLLOWING offset = the value of the row after the current row - the value of current row. Applying the calculatiom to our case, you know the value of current row 2 (month column) and value 1 PRECEEDING offset, then result of value of the row before current row 1 = 2 (from current row) - 1 (from PRECEEDING offset). next step, you have to calculate the value of row after the current row by FOLLOWING offset + the value of current row, the result is 4 = 2 (from FOLLOWING offset) + 2 (from current row). After the simple calculation you have a range from 1 to 4, then we are looking for all rows that has month column with value from 1 to 4 and city column is 'Paris'. The SUM of 'sold' for this frame will be 1000 (300 + 500 + 200).

Note: RANGE cannot go without ORDER BY

GROUPS: is somehow similar to ROWS, but it will look back for a number of peer groups, instead of individual rows. There is a group (1 PRECEEDING) above current row. You are finding two groups (2 FOLLOWING) after group with the value 2 (value of "month" column). The SUM of "sold" for this frame will be 1300 (300 + 500 + 200 + 300).

Below code SUM and COUNT window function are using for ROWS, RANGE, GROUPS. The table store the result, you can see a little different of these window frames. The COUNT help to show how many value will be calculate together for current row.
Querry command:

```sql
SELECT city,id,sold,month,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_rows,
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_rows,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_range,
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_range,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_groups,
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_groups
FROM tb
ORDER BY city,month
```

In [6]:

cols = ['city', 'id','sold','month',"total_rows",'count_rows','total_range','count_range','total_groups','count_groups']
cmd3 = """
SELECT city,id,sold,month,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[4]+""",
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[5]+""",
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[6]+""",
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[7]+""",
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[8]+""",
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[9]+"""
FROM tb
ORDER BY city,month
"""
# print (cmd3)
with sqlite3.connect(db_path) as con:
    re=con.execute(cmd3)
print(tabulate(re, headers=cols, tablefmt='psql'))



SELECT city,id,sold,month,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_rows,
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_rows,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_range,
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_range,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_groups,
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_groups
FROM tb
ORDER BY city,month

+--------+------+--------+---------+--------------+--------------+---------------+---------------+----------------+----------------+
| city   |   id |   sold |   month |   total_rows |   count_rows |   total_range |   count_range |   to

Next, it is an example without PARTITION that means the window fuinction use entire table as one partition. 

![Alt text](images/row_range_group_order.jpg)

*source:  me*

Without PARTITION BY, the 'city' column is not sorted. whole table sort by only month that is ORDER BY column. Since this the result is different. 

ROWS: goes back 1 individual row and forward 2 individual rows from the current row. The SUM of "sold" for this frame will be 900 ( 500 + 100 + 200 + 100 ).

RANGE: result of value of the row before current row 2 = 3 (from current row) - 1 (from PRECEEDING offset). you have to calculate the value of row after the current row, the result is 5 = 2 (from FOLLOWING offset) + 3 (from current row). you are looking for all rows that has month column with value from 2 to 5. The SUM of 'sold' for this frame will be 1600 (500 + 100 + 100 + 200 + 300 + 200 + 200).

GROUPS: There is a group (1 PRECEEDING) above current row and two groups (2 FOLLOWING) after group with the value 3 (value of "month" column). The SUM of "sold" for this frame will be 1600 (500 + 100 + 100 + 200 + 300 + 200 + 200).

```
SELECT city,id,sold,month,
SUM(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_rows,
COUNT(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_rows,
SUM(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_range,
COUNT(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_range,
SUM(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_groups,
COUNT(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_groups
FROM tb
ORDER BY month
```


In [7]:
cols = ['city', 'id','sold','month',"total_rows",'count_rows','total_range','count_range','total_groups','count_groups']
cmd3 = """
SELECT city,id,sold,month,
SUM(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[4]+""",
COUNT(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[5]+""",
SUM(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[6]+""",
COUNT(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[7]+""",
SUM(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[8]+""",
COUNT(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[9]+"""
FROM tb
ORDER BY month
"""
print(cmd3)
with sqlite3.connect(db_path) as con:
    re=con.execute(cmd3)
print(tabulate(re, headers=cols, tablefmt='psql'))


SELECT city,id,sold,month,
SUM(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_rows,
COUNT(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_rows,
SUM(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_range,
COUNT(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_range,
SUM(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)total_groups,
COUNT(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)count_groups
FROM tb
ORDER BY month

+--------+------+--------+---------+--------------+--------------+---------------+---------------+----------------+----------------+
| city   |   id |   sold |   month |   total_rows |   count_rows |   total_range |   count_range |   total_groups |   count_groups |
|--------+------+--------+---------+--------------+--------------+---------------+---------------+---------

You can take a look at sheet "Explaination" in "test_data.xlsx" file. I have fill some explaination for Paris's partition with all combination among ROWS, RANGE, GROUPS, PARTITION BY and ORDER BY

# Conclusion:
Understanding the window frame (ROWS, RANGE, GROUPS) is crucial when working with window functions in SQL. The window frame can be a bit tricky to grasp at first, but it’s an essential concept to master. In a future post, I’ll cover other important functions such as AVG, MAX, MIN, LAG, and RANGE. For now, let’s focus on the window frame and how it works

In [None]:
import sqlite3
import pandas as pd
from tabulate import tabulate

# create a sqlite database with name test_sqlite.db
db_path = "data/db_test.db"

# path of spreadsheet file
excel_file = 'data/test_data.xlsx'
# read spreadsheet data and put to dataframe
test_data = pd.read_excel(excel_file, sheet_name='test', header=0)
table_test = """
    CREATE TABLE test (city TEXT, id INTEGER, sold INTEGER, month INTEGER)
    """

with sqlite3.connect(db_path) as con:
    # delete the table if it exist
    con.execute( "DROP TABLE IF EXISTS test;")
    # execute these commands to create database tables
    con.execute(table_test)
    test_data.to_sql('test', con=con, if_exists='append',index=False)

# run SQL query with ORDER BY, PARTITION BY and ROWS, RANGE, GROUPS for time_frame  
cols = ['city', 'id','sold','month',"total_rows",'count_rows','total_range','count_range','total_groups','count_groups']
cmd3 = """
SELECT city,id,sold,month,
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[4]+""",
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[5]+""",
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[6]+""",
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[7]+""",
SUM(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[8]+""",
COUNT(sold) OVER (
    PARTITION BY city
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[9]+"""
FROM tb
ORDER BY city,month
"""
# print (cmd3)
with sqlite3.connect(db_path) as con:
    re=con.execute(cmd3)
print(tabulate(re, headers=cols, tablefmt='psql'))

# run SQL query with ORDER BY and ROWS, RANGE, GROUPS for time_frame  
cols = ['city', 'id','sold','month',"total_rows",'count_rows','total_range','count_range','total_groups','count_groups']
cmd3 = """
SELECT city,id,sold,month,
SUM(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[4]+""",
COUNT(sold) OVER (
    ORDER BY month
    ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[5]+""",
SUM(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[6]+""",
COUNT(sold) OVER (
    ORDER BY month
    RANGE BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[7]+""",
SUM(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[8]+""",
COUNT(sold) OVER (
    ORDER BY month
    GROUPS BETWEEN 1 PRECEDING AND 2 FOLLOWING)""" +cols[9]+"""
FROM tb
ORDER BY month
"""
print(cmd3)
with sqlite3.connect(db_path) as con:
    re=con.execute(cmd3)
print(tabulate(re, headers=cols, tablefmt='psql'))