# 1. SQL Window Functions

## Setup
- `cd lessons/sql_advanced && docker-compose up -d`


In [1]:
# Setup

from sqlalchemy import create_engine
import pandas as pd

# db connection
user = 'employees_user'
password = 'employees_password'
host = 'localhost'
port = '3306'
database = 'employees'

# create engine
engine = create_engine(f'mysql+pymysql://{user}:{password}@{host}:{port}/{database}')

columns_query = f'''
SELECT column_name, table_name, data_type
FROM information_schema.columns 
WHERE table_schema = "{database}" 
ORDER BY table_name, ordinal_position
'''
columns_df = pd.read_sql(columns_query, engine)
print(columns_df)

   COLUMN_NAME            TABLE_NAME DATA_TYPE
0       emp_no      current_dept_emp       int
1      dept_no      current_dept_emp      char
2    from_date      current_dept_emp      date
3      to_date      current_dept_emp      date
4      dept_no           departments      char
5    dept_name           departments   varchar
6       emp_no              dept_emp       int
7      dept_no              dept_emp      char
8    from_date              dept_emp      date
9      to_date              dept_emp      date
10      emp_no  dept_emp_latest_date       int
11   from_date  dept_emp_latest_date      date
12     to_date  dept_emp_latest_date      date
13      emp_no          dept_manager       int
14     dept_no          dept_manager      char
15   from_date          dept_manager      date
16     to_date          dept_manager      date
17      emp_no             employees       int
18  birth_date             employees      date
19  first_name             employees   varchar
20   last_nam


# Ranked Window Functions

## Exercise 1:

Write a query that upon execution, assigns a row number to all managers we have information for in the "employees" database (regardless of their department).


Let the numbering disregard the department the managers have worked in. Also, let it start from the value of 1. Assign that value to the manager with the lowest employee number.

````sql
SELECT
	*,
	ROW_NUMBER() OVER (ORDER BY emp_no) AS row_num 
FROM 
	dept_manager;
````


## Exercise 2:
Write a query that upon execution, assigns a sequential number for each employee number registered in the "employees" table. Partition the data by the employee's first name and order it by their last name in ascending order (for each partition).

````sql
SELECT 
	*,
	ROW_NUMBER() OVER (PARTITION BY first_name ORDER BY last_name ASC) AS seq_number
FROM 
	employees;
````

# LAG() and LEAD() Window Functions

## Exercise 1:

Write a query that can extract the following information from the "employees" database:

- the salary values (in ascending order) of the contracts signed by all employees numbered between 10500 and 10600 inclusive
- a column showing the previous salary from the given ordered list
- a column showing the subsequent salary from the given ordered list
- a column displaying the difference between the current salary of a certain employee and their previous salary
- a column displaying the difference between the next salary of a certain employee and their current salary

Limit the output to salary values higher than $80,000 only.
Also, to obtain a meaningful result, partition the data by employee number.

````sql
SELECT 
	e.emp_no,
	s.salary,
	LAG(s.salary) OVER win AS previous_salary,
	LEAD(s.salary) OVER win AS next_salary,
	(s.salary - LAG(s.salary) OVER win) AS diff_curr_prev,
	(LEAD(s.salary) OVER win - s.salary) AS diff_curr_next
FROM 
	employees e
JOIN salaries s 
	ON e.emp_no = s.emp_no
WHERE s.salary > 80000 AND e.emp_no BETWEEN 10500 AND 10600
WINDOW win AS (PARTITION BY e.emp_no);

````

## Exercise 2:

The MySQL LAG() and LEAD() value window functions can have a second argument, designating how many rows/steps back (for LAG()) or forth (for LEAD()) we'd like to refer to with respect to a given record.
With that in mind, create a query whose result set contains data arranged by the salary values associated to each employee number (in ascending order). Let the output contain the following six columns:
- the employee number
- the salary value of an employee's contract (i.e. which we’ll consider as the employee's current salary)
- the employee's previous salary
- the employee's contract salary value preceding their previous salary
- the employee's next salary
- the employee's contract salary value subsequent to their next salary
Restrict the output to the first 1000 records you can obtain.

````sql
SELECT 
	e.emp_no,
	s.salary,
	LAG(s.salary) OVER win AS previous_salary,
	LAG(s.salary, 2) OVER win AS prec_previous_salary,
	LEAD(s.salary) OVER win AS next_salary,
	LEAD(s.salary, 2) OVER win AS after_next_salary
FROM 
	employees e
JOIN salaries s 
	ON e.emp_no = s.emp_no
WINDOW win AS (PARTITION BY e.emp_no)
LIMIT 1000;
````

# Aggregate Functions and Window Functions

## Exercise 1:

Create a query that upon execution returns a result set containing the employee numbers, contract salary values, start, and end dates of the first ever contracts that each employee signed for the company.

To obtain the desired output, refer to the data stored in the "salaries" table.

````sql
SELECT
	s1.emp_no, s.salary, s.from_date, s.to_date 
FROM
	salaries s
	JOIN (
		SELECT
			emp_no, MIN(from_date) AS from_date
		FROM
			salaries
		GROUP BY emp_no
	) s1 ON s.emp_no = s1.emp_no;
````

## Exercise 2:

Consider the employees' contracts that have been signed after the 1st of January 2000 and terminated before the 1st of January 2002 (as registered in the "dept_emp" table).
Create a MySQL query that will extract the following information about these employees:

- Their employee number
- The salary values of the latest contracts they have signed during the suggested time period
- The department they have been working in (as specified in the latest contract they've signed during the suggested time period)
- Use a window function to create a fourth field containing the average salary paid in the department the employee was last working in during the suggested time period. Name that field "average_salary_per_department".

Note1: This exercise is not related neither to the query you created nor to the output you obtained while solving the exercises after the previous lecture.

Note2: Now we are asking you to practically create the same query as the one we worked on during the video lecture; the only difference being to refer to contracts that have been valid within the period between the 1st of January 2000 and the 1st of January 2002.

Note3: We invite you solve this task after assuming that the "to_date" values stored in the "salaries" and "dept_emp" tables are greater than the "from_date" values stored in these same tables. If you doubt that, you could include a couple of lines in your code to ensure that this is the case anyway!

Hint: If you've worked correctly, you should obtain an output containing 200 rows.

````sql
SELECT
    de2.emp_no,
    d.dept_name,
    s2.salary,
    AVG(s2.salary) OVER w AS average_salary_per_department
FROM
    (
        -- Get the most recent department for each employee
        SELECT
            de.emp_no,
            de.dept_no,
            de.from_date,
            de.to_date
        FROM
            dept_emp de
            JOIN (
                SELECT
                    emp_no,
                    MAX(from_date) AS from_date
                FROM
                    dept_emp
                WHERE
                    from_date > '2000-01-01'
                    AND to_date < '2002-01-01'
                GROUP BY
                    emp_no
            ) de1 ON de.emp_no = de1.emp_no
            AND de.from_date = de1.from_date
        ORDER BY
            de.emp_no,
            de.dept_no
    ) de2
    JOIN (
        -- Get the most recent salary for each employee
        SELECT
            s1.emp_no,
            s.salary,
            s.from_date,
            s.to_date
        FROM
            salaries s
            JOIN (
                SELECT
                    emp_no,
                    MAX(from_date) AS from_date
                FROM
                    salaries
                GROUP BY
                    emp_no
            ) s1 ON s.emp_no = s1.emp_no
        WHERE
            s.to_date < '2002-01-01'
            AND s.from_date > '2000-01-01'
            AND s.from_date = s1.from_date
    ) s2 ON s2.emp_no = de2.emp_no
    JOIN departments d ON d.dept_no = de2.dept_no
GROUP BY
    de2.emp_no,
    d.dept_name,
    s2.salary
WINDOW
    w AS (
        PARTITION BY
            de2.dept_no
    )
ORDER BY
    de2.emp_no,
    salary;
````