# Set Operations and Analytics Functions

Now let us have a look into set operations and analytics functions to get rank, aggregations etc.

* Set Operations
    * UNION and UNION ALL
    * INTERSECT
    * MINUS
* Analytics Functions
    * Aggregations
    * Ranking
    * Windowing Functions

### Set Operations

In [None]:
#sql-queries-set-operations.sql
-- UNION/UNION ALL
-- (1, 2, 3), (2, 3, 4) -> (1, 2, 3, 4)

SELECT * FROM
  (SELECT 1 FROM dual
   UNION
   SELECT 1 FROM dual
   UNION
   SELECT 2 FROM dual
  );

-- INTERSECT
SELECT * FROM (
  (SELECT 1 FROM dual
   UNION
   SELECT 1 FROM dual
   UNION
   SELECT 2 FROM dual
  )
  INTERSECT
  (SELECT 2 FROM dual
   UNION
   SELECT 3 FROM dual
   UNION
   SELECT 4 FROM dual)
  );

-- MINUS
SELECT * FROM (
  (SELECT 1 FROM dual
   UNION
   SELECT 1 FROM dual
   UNION
   SELECT 2 FROM dual
  )
  MINUS
  (SELECT 2 FROM dual
   UNION
   SELECT 3 FROM dual
   UNION
   SELECT 4 FROM dual
  )
);

### SQL – Execution Flow
Let us understand SQL Execution Flow.

* There are different clauses in SQL. Below are the clauses and we typically develop queries in that order.
    * SELECT – Projection of columns and transformations
    * FROM – Tables or Views or Nested Queries
    * WHERE – Filtering on Data
    * JOIN – Join data from multiple tables
    * GROUP BY – Grouping data for aggregations
        * HAVING – Filtering after grouping
    * ORDER BY
* But when it comes to execution flow, it is a bit different
    * FROM
    * WHERE
    * JOIN
    * GROUP BY
        * HAVING
    * SELECT
    * ORDER BY
* One of the side effects of the above execution flow is that the aliases specified in SELECT is only available in ORDER BY not in other clauses.

### Analytics Functions
Let us understand APIs related to aggregations, ranking and windowing functions.

* Analytics Functions can be used only in SELECT clause. Due to that if we have to filter data after applying Analytics Functions, then we have to use nested sub queries.
* There are multiple clauses within SQL to accomplish these
    * over
    * partition by
    * order by
* All aggregate functions, rank functions and windowing functions can be used with over clause to get aggregations per partition or group
* It is mandatory to specify over clause
* e.g.: rank() over(spec) where spec can be a partition by or order by or both
* Aggregations – sum, avg, min, max etc
* Ranking – rank, dense_rank, row_number etc
* Windowing – lead, lag etc
* We typically use partition by clause for aggregations and then partition by as well as an order by for ranking and windowing functions.

### Understanding Clauses
Let us understand different clauses required for analytics functions.

* Typical syntax – function(argument) over (partition by groupcolumn [order by [desc] ordercolumn])
* For aggregations we can define group by using partition by
* For ranking or windowing we need to use partition by and then order by. partition by is to group the data and order by is to sort the data to assign rank.
* We will not be able to use these any where except for select clause
* If we have to filter on these derived fields in select clause, we need to nest the whole query into another query.

### Performing aggregations
Let us see how to perform aggregations with in each group.

* We have functions such as sum, avg, min, max etc which can be used to aggregate the data.
* We need to use over (partition by) to get aggregations with in each group.
* Some realistic use cases
    * Get average salary for each department and get all employee details who earn more than average salary
    * Get average revenue for each day and get all the orders who earn revenue more than average revenue
    * Get highest order revenue and get all the orders which have revenue more than 75% of the revenue

### Using windowing functions
Let us see details about windowing functions with in each group

* We have functions such as lead, lag etc
* We need to use partition by and then order by for most of the windowing functions
* Some realistic use cases
    * Salary difference between current and next/previous employee with in each department

### Ranking with in each partition or group
Let us talk about ranking functions with in each group.

* We have functions like rank, dense_rank, row_number, first, last etc
* We need to use partition by and then order by for most of the windowing functions
* Some realistic use cases
    * Assign rank to employees based on salary with in each department
    * Assign ranks to products based on revenue each day or month

In [None]:
#sql-queries-analytics-functions.sql
-- Get % salary of each employee with in his department
SELECT first_name, last_name, department_id, salary,
  sum(salary) OVER (PARTITION BY department_id) department_expense,
  round((salary / sum(salary) OVER (PARTITION BY department_id)) * 100, 2) percentage_salary
FROM employees
ORDER BY department_id;

In [None]:
-- Get top 3 paid employees with in each department
--rank
SELECT * FROM (
  SELECT first_name, last_name, department_id, salary,
  rank() OVER (PARTITION BY department_id ORDER BY salary DESC) rnk
  FROM employees)
WHERE rnk <= 5 AND department_id = 60
ORDER BY department_id;

In [None]:
--dense_rank
SELECT * FROM (
  SELECT first_name, last_name, department_id, salary,
  dense_rank() OVER (PARTITION BY department_id ORDER BY salary DESC) rnk
  FROM employees)
WHERE rnk <= 5 AND department_id = 60
ORDER BY department_id;

In [None]:
SELECT * FROM employees WHERE department_id = 60;

In [None]:
-- Get number of days it took to add each employee with in department
SELECT first_name, last_name, department_id, hire_date,
  lag(hire_date) OVER (PARTITION BY department_id ORDER BY hire_date) prev_hire_date,
  hire_date - lag(hire_date) OVER (PARTITION BY department_id ORDER BY hire_date) diff
FROM employees
ORDER BY department_id, hire_date;

In [None]:
-- Get salary difference between top paid employee and least paid employee with in each department
SELECT DISTINCT department_id,
  max(salary) OVER (PARTITION BY department_id)
    - min(salary) OVER (PARTITION BY department_id) salary_diff
FROM employees
ORDER BY department_id;

### Exercises
*ORDERS   
ORDER_ITEMS  
PRODUCTS  
CATEGORIES  
DEPARTMENTS  
CUSTOMERS* 

* Using products, get top 3 priced products (dense rank) with in each category by price – product id, product name, product category name, product price and rank
* Get top 3 customers for each month by revenue – customer first name, customer last name, month, revenue and rank
* Get top 3 products for each month by revenue – month, product name, revenue and rank
* Get top 3 orders by revenue for each day – order date, order revenue, rank
