## Analytic Functions – Ranking

Let us see how we can assign ranks using different **rank** functions.

* If we have to assign ranks globally, we just need to specify **ORDER BY**
* If we have to assign ranks with in a key then we need to specify **PARTITION BY** and then **ORDER BY**.
* By default **ORDER BY** will sort the data in ascending order. We can change the order by passing **DESC** after order by.
* We have 3 main functions to assign ranks - `rank`, `dense_rank` and `row_number`. We will see the differences between the 3 in a moment.

In [16]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [17]:
%env DATABASE_URL=postgresql://itv002461_retail_user:7ji8g7gg8p8olbqbna5vz1tjyikaixco@pg.itversity.com:5433/itv002461_retail_db

env: DATABASE_URL=postgresql://itv002461_retail_user:7ji8g7gg8p8olbqbna5vz1tjyikaixco@pg.itversity.com:5433/itv002461_retail_db


In [65]:
%%sql

SELECT t.*,
    rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) AS rnk
FROM daily_product_revenue_v t
ORDER BY order_date, revenue DESC
LIMIT 5

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
5 rows affected.


order_date,order_item_product_id,revenue,rnk
2013-07-25 00:00:00,1004,10799.46,1
2013-07-25 00:00:00,957,9599.36,2
2013-07-25 00:00:00,191,8499.15,3
2013-07-25 00:00:00,365,7558.74,4
2013-07-25 00:00:00,1073,6999.65,5


```{note}
Here is an example to assign sparse ranks using daily_product_revenue with in each day based on revenue.
```

```{note}
Here is another example to assign sparse ranks using employees data set with in each department. Make sure to restart kernel as you might have connected to retail database.
```

In [57]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [58]:
%env DATABASE_URL=postgresql://itv002461_hr_user:7ji8g7gg8p8olbqbna5vz1tjyikaixco@pg.itversity.com:5433/itv002461_hr_db

env: DATABASE_URL=postgresql://itv002461_hr_user:7ji8g7gg8p8olbqbna5vz1tjyikaixco@pg.itversity.com:5433/itv002461_hr_db


In [59]:
%%sql

SELECT employee_id, department_id, salary FROM employees 
ORDER BY department_id,
    salary DESC
LIMIT 10

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
(psycopg2.errors.UndefinedTable) relation "employees" does not exist
LINE 1: SELECT employee_id, department_id, salary FROM employees 
                                                       ^

[SQL: SELECT employee_id, department_id, salary FROM employees 
ORDER BY department_id,
    salary DESC
LIMIT 10]
(Background on this error at: http://sqlalche.me/e/13/f405)


In [22]:
%%sql

SELECT employee_id, department_id, salary,
    rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS rnk
FROM employees
LIMIT 20

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
(psycopg2.errors.UndefinedTable) relation "employees" does not exist
LINE 6: FROM employees
             ^

[SQL: SELECT employee_id, department_id, salary,
    rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS rnk
FROM employees
LIMIT 20]
(Background on this error at: http://sqlalche.me/e/13/f405)


```{note}
Here is an example to assign dense ranks using employees data set with in each department.
```

In [23]:
%%sql

SELECT employee_id, department_id, salary,
    dense_rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
(psycopg2.errors.UndefinedTable) relation "employees" does not exist
LINE 6: FROM employees
             ^

[SQL: SELECT employee_id, department_id, salary,
    dense_rank() OVER (
        PARTITION BY department_id 
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20]
(Background on this error at: http://sqlalche.me/e/13/f405)


```{note}
Here is an example for global rank based on salary. If all the salaries are unique, we can use `LIMIT` but when they are not unique, we have to go with analytic functions.
```

In [24]:
%%sql

SELECT employee_id, department_id, salary,
    rank() OVER (
        ORDER BY salary DESC
    ) AS rnk,
    dense_rank() OVER (
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
(psycopg2.errors.UndefinedTable) relation "employees" does not exist
LINE 8: FROM employees
             ^

[SQL: SELECT employee_id, department_id, salary,
    rank() OVER (
        ORDER BY salary DESC
    ) AS rnk,
    dense_rank() OVER (
        ORDER BY salary DESC
    ) AS drnk
FROM employees
LIMIT 20]
(Background on this error at: http://sqlalche.me/e/13/f405)


Let us understand the difference between **rank**, **dense_rank** and **row_number**.

* We can use either of the functions to generate ranks when the rank field does not have duplicates.
* When rank field have duplicates then row_number should not be used as it generate unique number for each record with in the partition.
* **rank** will skip the ranks in between if multiple people get the same rank while **dense_rank** continue with the next number.

In [25]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [26]:
%env DATABASE_URL=postgresql://itversity_hr_user:hr_password@pg.itversity.com:5432/itversity_hr_db

env: DATABASE_URL=postgresql://itversity_hr_user:hr_password@pg.itversity.com:5432/itversity_hr_db


In [27]:
%%sql

SELECT
    employee_id,
    department_id,
    salary,
    rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) rnk,
    dense_rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) drnk,
    row_number() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC, employee_id
      ) rn
FROM employees
ORDER BY department_id, salary DESC
LIMIT 50

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
(psycopg2.errors.UndefinedTable) relation "employees" does not exist
LINE 14: FROM employees
              ^

[SQL: SELECT employee_id, department_id, salary,
    rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) rnk,
    dense_rank() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC
      ) drnk,
    row_number() OVER (
        PARTITION BY department_id
        ORDER BY salary DESC, employee_id
      ) rn
FROM employees
ORDER BY department_id, salary DESC
LIMIT 50]
(Background on this error at: http://sqlalche.me/e/13/f405)


```{note}
Here is another example to with respect to all 3 functions. Make sure to restart kernel as you might have connected to HR database.
```

In [28]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [29]:
%env DATABASE_URL=postgresql://itversity_retail_user:retail_password@pg.itversity.com:5432/itversity_retail_db

env: DATABASE_URL=postgresql://itversity_retail_user:retail_password@pg.itversity.com:5432/itversity_retail_db


In [30]:
%%sql

SELECT
    t.*,
    rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rnk,
    dense_rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) drnk,
    row_number() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rn
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30

 * postgresql://itv002461_retail_user:***@pg.itversity.com:5433/itv002461_retail_db
(psycopg2.errors.UndefinedTable) relation "daily_product_revenue" does not exist
LINE 13: FROM daily_product_revenue AS t
              ^

[SQL: SELECT t.*, rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rnk,
    dense_rank() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) drnk,
    row_number() OVER (
        PARTITION BY order_date
        ORDER BY revenue DESC
    ) rn
FROM daily_product_revenue AS t
ORDER BY order_date, revenue DESC
LIMIT 30]
(Background on this error at: http://sqlalche.me/e/13/f405)
