## SQL Practice

> Based on notes on: https://www.youtube.com/watch?v=gwp3dJUsy5g

---

### SQL Order of execution

> https://sqlbolt.com/lesson/select_queries_order_of_execution

```sql
SELECT DISTINCT column, AGG_FUNC(column_or_expression), …
FROM mytable
    JOIN another_table
      ON mytable.column = another_table.column
    WHERE constraint_expression
    GROUP BY column
    HAVING constraint_expression
    ORDER BY column ASC/DESC
    LIMIT count OFFSET COUNT;
```


1. FROM and JOIN

The FROM clause, and subsequent JOINs are first executed to determine the total working set of data that is being queried. This includes subqueries in this clause, and can cause temporary tables to be created under the hood containing all the columns and rows of the tables being joined.

2. WHERE

Once we have the total working set of data, the first-pass WHERE constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded. Each of the constraints can only access columns directly from the tables requested in the FROM clause. __Aliases in the SELECT part of the query are not accessible in most databases since they may include expressions dependent on parts of the query that have not yet executed.__

3. GROUP BY

The remaining rows after the WHERE constraints are applied are then grouped based on common values in the column specified in the GROUP BY clause. __As a result of the grouping, there will only be as many rows as there are unique values in that column__. Implicitly, this means that you should only need to use this when you have aggregate functions in your query.

4. HAVING

If the query has a GROUP BY clause, then the constraints in the HAVING clause are then applied to the grouped rows, discard the grouped rows that don't satisfy the constraint. __Like the WHERE clause, aliases are also not accessible from this step in most databases__.

5. SELECT

Any expressions in the SELECT part of the query are finally computed.

6. DISTINCT

Of the remaining rows, rows with duplicate values in the column marked as DISTINCT will be discarded.

7. ORDER BY

If an order is specified by the ORDER BY clause, the rows are then sorted by the specified data in either ascending or descending order. Since all the expressions in the SELECT part of the query have been computed, you can reference aliases in this clause.


8. LIMIT / OFFSET

Finally, the rows that fall outside the range specified by the LIMIT and OFFSET are discarded, leaving the final set of rows to be returned from the query.





### Difference bw DISTINCT and GROUP BY

> https://www.sitepoint.com/community/t/mysql-when-group-by-is-faster-than-distinct/272420

In some cases GROUP BY might be faster than DISTICT

- When you run DISTINCT MySQL has to looks across all selected columns whereas GROUP BY will only do it for whatever columns you explicitly assign to GROUP BY so there is less work to do (my query was selecting about 15 columns)

this is true as far as it goes… however, if you have GROUP BY with only a few of your 15 columns, then technically you are running an invalid query in mysql (yes, mysql will run these invalid queries), and the results are indeterminate



### Combining Data

#### Unions

Put simply, a union (SQL Union) is the process of stacking two tables on top of one another. You will usually do this when your data is split up into multiple sections like an excel spreadsheet of a year’s sales split by month.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fa5adad37-92c6-4924-b52c-2ab340ae351b%2FScreen_Shot_2021-03-15_at_12.23.40_PM.png?table=block&id=199c2a41-408a-4898-a329-c7810d5d9a39&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1390&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

#### Joins

Joins combine two tables horizontally. For a join, like a Union you have to have at least two tables, what we call our Left Table and our Right Table. You (mostly) have to have at least one matching column between the two tables, and you will match rows from these columns. The most common way to visualize the types of Joins are through Venn Diagrams.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fba7b4d86-db23-4142-b860-2e23fce3bae4%2FScreen_Shot_2021-03-15_at_12.36.03_PM.png?table=block&id=31a987e2-12d3-4d95-bdbc-a37449be2b29&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=2000&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fe94f8697-5537-4458-9992-4ea02defbea0%2FScreen_Shot_2021-03-15_at_12.36.37_PM.png?table=block&id=91c32502-734f-4ef3-90b6-d9725da9cd3b&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=960&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

__Inner Join__

We’re now going to do something called an Inner Join on the [ID] column which will only output exact matches from the [ID] column in our output.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F1d7856a8-7fc5-4673-a7a6-522d5fd52a9c%2FScreen_Shot_2021-03-15_at_12.37.59_PM.png?table=block&id=b5c56e80-bb05-4dc5-955b-073c3d231bfc&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1370&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

__Left Join__

A Left Join keeps all of the data from your Left table and whatever matches from the Right table.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fa4b067dc-dd79-4456-8cae-2f81a8ac4201%2FScreen_Shot_2021-03-15_at_12.39.07_PM.png?table=block&id=f0f911be-573d-415e-b829-873cd6c300c1&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1350&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

__Right Join__

A Right Join does the exact opposite and keeps everything from your Right table while only bringing in the matches from the Left table.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F1235ef85-ef65-4e03-ba26-6661b39bad62%2FScreen_Shot_2021-03-15_at_12.39.43_PM.png?table=block&id=fd1880dd-7d29-477b-b654-7e22a9b9873c&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1360&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

__Full Join__

A Full Join brings in everything from both tables and matches whatever will match from the columns you specify.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fe42b3cca-c8df-48fb-9b63-211c498be6a9%2FScreen_Shot_2021-03-15_at_12.40.52_PM.png?table=block&id=02ceefe0-20a9-48f7-831d-88698f314d43&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1360&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

__Cross Join__

The CROSS JOIN is used to generate a paired combination of each row of the first table with each row of the second table. This join type is also known as cartesian join.

Suppose that we are sitting in a coffee shop and we decide to order breakfast. Shortly, we will look at the menu and we will start thinking of which meal and drink combination could be more tastier. Our brain will receive this signal and begin to generate all meal and drink combinations.

The following image illustrates all menu combinations that can be generated by our brain. The SQL CROSS JOIN works similarly to this mechanism, as it creates all paired combinations of the rows of the tables that will be joined.

![](https://www.sqlshack.com/wp-content/uploads/2020/02/sql-cross-join-working-mechanism.png)

The SQL queries which contain the CROSS JOIN keyword can be very costly as it will need nested loops. We try to say that these queries have a high potential to consume more resources and can cause performance issues.

Briefly, when we decide to use the CROSS JOIN in any query, we should consider the number of the tables that will be joined. Such as, when we CROSS JOIN two tables and if the first one contains 1000 rows and the second one contains 1000 rows, the row count of the resultset will be 1.000.000 rows.

---

Joins can get a bit tricky because of the potential for gotchas when joining two tables. The most common one is row duplication where you accidentally duplicate rows because the columns you’re matching on have multiple potential matches. In the example below we’re going to try an Inner Join. You’ll notice the columns in Orange were duplicated.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F89439f1d-e24a-4451-bd0d-37a0dfcc4487%2FScreen_Shot_2021-03-15_at_12.42.17_PM.png?table=block&id=9bac2f13-52ee-4ac9-9bc8-806a6d7187d0&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1370&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)

This isn’t an error per se but it is something to watch out for as it can cause you to duplicate data you don’t intend to duplicate.

![](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F8d64f5ba-d067-40f5-94bb-32beb9fe49d1%2FScreen_Shot_2021-03-15_at_12.42.37_PM.png?table=block&id=6a39c3e9-2900-4be6-9dc2-bbb14f68181a&spaceId=691f8197-dec0-4338-b1a8-a47162b151ba&width=1360&userId=bdc14b6b-7340-420b-85e2-540dbef29bc8&cache=v2)



### LIKE, BETWEEN, IN

https://www.w3schools.com/sql/sql_wildcards.asp

```
SELECT *
FROM dataset_1
WHERE weather LIKE 'Sun%';
```

```
SELECT DISTINCT temperature 
FROM dataset_1
WHERE temperature BETWEEN 29 AND 75;
```


```
SELECT occupation
FROM dataset_1
WHERE occupation IN ('Sales & Related', 'Management');
```



### Window functions

> Based on tutorial : https://www.youtube.com/watch?v=Ww71knvhQ-s

---

We create an `employee` table

![](https://i.imgur.com/dehIA2J.png)

```
SELECT e.*,
MAX(SALARY) OVER() as max_salary 
FROM employee e;
```

This gives us max salary for each dept

```
DEPT_NAME|MAX(SALARY)|
---------+-----------+
Admin    |       5000|
Finance  |       6500|
HR       |       8000|
IT       |      11000|
```

what i want is all the cols from employee table along with a max_salary column, which displays the overall max salary

```
SELECT e.*,
MAX(SALARY) OVER() as max_salary 
FROM employee e;
```


```
emp_ID|emp_NAME|DEPT_NAME|SALARY|max_salary|
------+--------+---------+------+----------+
   101|Mohan   |Admin    |  4000|     11000|
   102|Rajkumar|HR       |  3000|     11000|
   103|Akbar   |IT       |  4000|     11000|
   104|Dorvin  |Finance  |  6500|     11000|
   105|Rohit   |HR       |  3000|     11000|
   106|Rajesh  |Finance  |  5000|     11000|
   107|Preet   |HR       |  7000|     11000|
   108|Maryam  |Admin    |  4000|     11000|
   109|Sanjay  |IT       |  6500|     11000|
   110|Vasudha |IT       |  7000|     11000|
   111|Melinda |IT       |  8000|     11000|
   112|Komal   |IT       | 10000|     11000|
   113|Gautham |Admin    |  2000|     11000|
   114|Manisha |HR       |  3000|     11000|
   115|Chandni |IT       |  4500|     11000|
   116|Satya   |Finance  |  6500|     11000|
   117|Adarsh  |HR       |  3500|     11000|
   118|Tejaswi |Finance  |  5500|     11000|
   119|Cory    |HR       |  8000|     11000|
   120|Monica  |Admin    |  5000|     11000|
   121|Rosalin |IT       |  6000|     11000|
   122|Ibrahim |IT       |  8000|     11000|
   123|Vikram  |IT       |  8000|     11000|
   124|Dheeraj |IT       | 11000|     11000|
   ````

Since we are using an `OVER` clause, SQL does not tream max as an agg function, it will treat it as a window function. But we have not specified any col in the over clause, so it will consider a window over the entire dataset


Now we want the max salary for each dept along with other cols

```
SELECT e.*,
MAX(SALARY) OVER() as max_salary 
FROM employee e;
```

Now for every distinct value of dept, sql creates a window and calculates the max salary for that window

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|max_salary|
------+--------+---------+------+----------+
   101|Mohan   |Admin    |  4000|      5000|
   108|Maryam  |Admin    |  4000|      5000|
   113|Gautham |Admin    |  2000|      5000|
   120|Monica  |Admin    |  5000|      5000|
   104|Dorvin  |Finance  |  6500|      6500|
   106|Rajesh  |Finance  |  5000|      6500|
   116|Satya   |Finance  |  6500|      6500|
   118|Tejaswi |Finance  |  5500|      6500|
   102|Rajkumar|HR       |  3000|      8000|
   105|Rohit   |HR       |  3000|      8000|
   107|Preet   |HR       |  7000|      8000|
   114|Manisha |HR       |  3000|      8000|
   117|Adarsh  |HR       |  3500|      8000|
   119|Cory    |HR       |  8000|      8000|
   103|Akbar   |IT       |  4000|     11000|
   109|Sanjay  |IT       |  6500|     11000|
   110|Vasudha |IT       |  7000|     11000|
   111|Melinda |IT       |  8000|     11000|
   112|Komal   |IT       | 10000|     11000|
   115|Chandni |IT       |  4500|     11000|
   121|Rosalin |IT       |  6000|     11000|
   122|Ibrahim |IT       |  8000|     11000|
   123|Vikram  |IT       |  8000|     11000|
   124|Dheeraj |IT       | 11000|     11000|
```

We can use MAX, MIN, COUNT, SUM - the agg functions we use with GROUP BY

But there are some specific window functions as well:

#### Row number

This simply assigns an id to every record in our table

```
SELECT e.*,
ROW_NUMBER () OVER() AS rn
FROM employee e 
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|rn|
------+--------+---------+------+--+
   101|Mohan   |Admin    |  4000| 1|
   102|Rajkumar|HR       |  3000| 2|
   103|Akbar   |IT       |  4000| 3|
   104|Dorvin  |Finance  |  6500| 4|
   105|Rohit   |HR       |  3000| 5|
   106|Rajesh  |Finance  |  5000| 6|
   107|Preet   |HR       |  7000| 7|
   108|Maryam  |Admin    |  4000| 8|
   109|Sanjay  |IT       |  6500| 9|
   110|Vasudha |IT       |  7000|10|
   111|Melinda |IT       |  8000|11|
   112|Komal   |IT       | 10000|12|
   113|Gautham |Admin    |  2000|13|
   114|Manisha |HR       |  3000|14|
   115|Chandni |IT       |  4500|15|
   116|Satya   |Finance  |  6500|16|
   117|Adarsh  |HR       |  3500|17|
   118|Tejaswi |Finance  |  5500|18|
   119|Cory    |HR       |  8000|19|
   120|Monica  |Admin    |  5000|20|
   121|Rosalin |IT       |  6000|21|
   122|Ibrahim |IT       |  8000|22|
   123|Vikram  |IT       |  8000|23|
   124|Dheeraj |IT       | 11000|24|
```

```
SELECT e.*,
ROW_NUMBER () OVER(PARTITION BY DEPT_NAME) AS rn
FROM employee e 
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|rn|
------+--------+---------+------+--+
   101|Mohan   |Admin    |  4000| 1|
   108|Maryam  |Admin    |  4000| 2|
   113|Gautham |Admin    |  2000| 3|
   120|Monica  |Admin    |  5000| 4|
   104|Dorvin  |Finance  |  6500| 1|
   106|Rajesh  |Finance  |  5000| 2|
   116|Satya   |Finance  |  6500| 3|
   118|Tejaswi |Finance  |  5500| 4|
   102|Rajkumar|HR       |  3000| 1|
   105|Rohit   |HR       |  3000| 2|
   107|Preet   |HR       |  7000| 3|
   114|Manisha |HR       |  3000| 4|
   117|Adarsh  |HR       |  3500| 5|
   119|Cory    |HR       |  8000| 6|
   103|Akbar   |IT       |  4000| 1|
   109|Sanjay  |IT       |  6500| 2|
   110|Vasudha |IT       |  7000| 3|
   111|Melinda |IT       |  8000| 4|
   112|Komal   |IT       | 10000| 5|
   115|Chandni |IT       |  4500| 6|
   121|Rosalin |IT       |  6000| 7|
   122|Ibrahim |IT       |  8000| 8|
   123|Vikram  |IT       |  8000| 9|
   124|Dheeraj |IT       | 11000|10|
```




__Say we want to fetch 1st 2 employees that joined company in each dept__

Assume emp_id is lower for employees who joined earlier

```
SELECT * FROM (
	SELECT e.*,
	ROW_NUMBER () OVER(PARTITION BY DEPT_NAME ORDER BY emp_ID) AS rn
	FROM employee e 
) x
WHERE x.rn < 3
```


```
emp_ID|emp_NAME|DEPT_NAME|SALARY|rn|
------+--------+---------+------+--+
   101|Mohan   |Admin    |  4000| 1|
   108|Maryam  |Admin    |  4000| 2|
   104|Dorvin  |Finance  |  6500| 1|
   106|Rajesh  |Finance  |  5000| 2|
   102|Rajkumar|HR       |  3000| 1|
   105|Rohit   |HR       |  3000| 2|
   103|Akbar   |IT       |  4000| 1|
   109|Sanjay  |IT       |  6500| 2|
```



__Fetch top 3 employees in each dept earning max salary__

We can use the rank or dense_rank function

```
SELECT * FROM (
	SELECT e.*,
	RANK() OVER(PARTITION BY DEPT_NAME ORDER BY SALARY DESC) AS `rank`
	FROM employee e 
) x
WHERE x.rank < 4
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|rank|
------+--------+---------+------+----+
   120|Monica  |Admin    |  5000|   1|
   101|Mohan   |Admin    |  4000|   2|
   108|Maryam  |Admin    |  4000|   2|
   104|Dorvin  |Finance  |  6500|   1|
   116|Satya   |Finance  |  6500|   1|
   118|Tejaswi |Finance  |  5500|   3|
   119|Cory    |HR       |  8000|   1|
   107|Preet   |HR       |  7000|   2|
   117|Adarsh  |HR       |  3500|   3|
   124|Dheeraj |IT       | 11000|   1|
   112|Komal   |IT       | 10000|   2|
   111|Melinda |IT       |  8000|   3|
   122|Ibrahim |IT       |  8000|   3|
   123|Vikram  |IT       |  8000|   3|
```

__Rank vs Dense Rank vs Row no__

```
SELECT e.*,
RANK() OVER(PARTITION BY DEPT_NAME ORDER BY SALARY DESC) AS `rank`,
DENSE_RANK () OVER(PARTITION BY DEPT_NAME ORDER BY SALARY DESC) AS `dense_rank`,
ROW_NUMBER () OVER(PARTITION BY DEPT_NAME ORDER BY SALARY DESC) AS `rn`
FROM employee e 
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|rank|dense_rank|rn|
------+--------+---------+------+----+----------+--+
   120|Monica  |Admin    |  5000|   1|         1| 1|
   101|Mohan   |Admin    |  4000|   2|         2| 2|
   108|Maryam  |Admin    |  4000|   2|         2| 3|
   113|Gautham |Admin    |  2000|   4|         3| 4|
   104|Dorvin  |Finance  |  6500|   1|         1| 1|
   116|Satya   |Finance  |  6500|   1|         1| 2|
   118|Tejaswi |Finance  |  5500|   3|         2| 3|
   106|Rajesh  |Finance  |  5000|   4|         3| 4|
   119|Cory    |HR       |  8000|   1|         1| 1|
   107|Preet   |HR       |  7000|   2|         2| 2|
   117|Adarsh  |HR       |  3500|   3|         3| 3|
   102|Rajkumar|HR       |  3000|   4|         4| 4|
   105|Rohit   |HR       |  3000|   4|         4| 5|
   114|Manisha |HR       |  3000|   4|         4| 6|
   124|Dheeraj |IT       | 11000|   1|         1| 1|
   112|Komal   |IT       | 10000|   2|         2| 2|
   111|Melinda |IT       |  8000|   3|         3| 3|
   122|Ibrahim |IT       |  8000|   3|         3| 4|
   123|Vikram  |IT       |  8000|   3|         3| 5|
   110|Vasudha |IT       |  7000|   6|         4| 6|
   109|Sanjay  |IT       |  6500|   7|         5| 7|
   121|Rosalin |IT       |  6000|   8|         6| 8|
   115|Chandni |IT       |  4500|   9|         7| 9|
   103|Akbar   |IT       |  4000|  10|         8|10|
```

Rank skips the next rank if there are duplicates 1->2->2->4
Dense rank does not skip ranks: 1->2->2->3
Row number simply assigns an id to each entry, it does not care for duplicates 1->2->3->4 irrespective of duplicates



__Lead and Lag__

> Also read: https://learnsql.com/blog/lead-and-lag-functions-in-sql/


Basic Lead and Lag

```
SELECT e.*,
LAG(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) AS prev_emp_salary,
LEAD(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) AS next_emp_salary
FROM employee e
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|prev_emp_salary|next_emp_salary|
------+--------+---------+------+---------------+---------------+
   101|Mohan   |Admin    |  4000|               |           4000|
   108|Maryam  |Admin    |  4000|           4000|           2000|
   113|Gautham |Admin    |  2000|           4000|           5000|
   120|Monica  |Admin    |  5000|           2000|               |
   104|Dorvin  |Finance  |  6500|               |           5000|
   106|Rajesh  |Finance  |  5000|           6500|           6500|
   116|Satya   |Finance  |  6500|           5000|           5500|
   118|Tejaswi |Finance  |  5500|           6500|               |
   102|Rajkumar|HR       |  3000|               |           3000|
   105|Rohit   |HR       |  3000|           3000|           7000|
   107|Preet   |HR       |  7000|           3000|           3000|
   114|Manisha |HR       |  3000|           7000|           3500|
   117|Adarsh  |HR       |  3500|           3000|           8000|
   119|Cory    |HR       |  8000|           3500|               |
   103|Akbar   |IT       |  4000|               |           6500|
   109|Sanjay  |IT       |  6500|           4000|           7000|
   110|Vasudha |IT       |  7000|           6500|           8000|
   111|Melinda |IT       |  8000|           7000|          10000|
   112|Komal   |IT       | 10000|           8000|           4500|
   115|Chandni |IT       |  4500|          10000|           6000|
   121|Rosalin |IT       |  6000|           4500|           8000|
   122|Ibrahim |IT       |  8000|           6000|           8000|
   123|Vikram  |IT       |  8000|           8000|          11000|
   124|Dheeraj |IT       | 11000|           8000|               |
```


Lead and lag follow the syntax: `LAG(expression [,offset[,default_value]]) OVER(ORDER BY columns)`

These functions take three arguments: the name of the column or an expression from which the value is obtained, the number of rows to skip (offset) above, and the default value to be returned if the stored value obtained from the row above is empty. Only the first argument is required. The third argument (default value) is allowed only if you specify the second argument, the offset.

```
--- Lead and Lag
SELECT e.*,
LAG(SALARY, 2, -1) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) AS prev_emp_salary,
LEAD(SALARY, 2, -1) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) AS next_emp_salary
FROM employee e
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|prev_emp_salary|next_emp_salary|
------+--------+---------+------+---------------+---------------+
   101|Mohan   |Admin    |  4000|             -1|           2000|
   108|Maryam  |Admin    |  4000|             -1|           5000|
   113|Gautham |Admin    |  2000|           4000|             -1|
   120|Monica  |Admin    |  5000|           4000|             -1|
   104|Dorvin  |Finance  |  6500|             -1|           6500|
   106|Rajesh  |Finance  |  5000|             -1|           5500|
   116|Satya   |Finance  |  6500|           6500|             -1|
   118|Tejaswi |Finance  |  5500|           5000|             -1|
   102|Rajkumar|HR       |  3000|             -1|           7000|
   105|Rohit   |HR       |  3000|             -1|           3000|
   107|Preet   |HR       |  7000|           3000|           3500|
   114|Manisha |HR       |  3000|           3000|           8000|
   117|Adarsh  |HR       |  3500|           7000|             -1|
   119|Cory    |HR       |  8000|           3000|             -1|
   103|Akbar   |IT       |  4000|             -1|           7000|
   109|Sanjay  |IT       |  6500|             -1|           8000|
   110|Vasudha |IT       |  7000|           4000|          10000|
   111|Melinda |IT       |  8000|           6500|           4500|
   112|Komal   |IT       | 10000|           7000|           6000|
   115|Chandni |IT       |  4500|           8000|           8000|
   121|Rosalin |IT       |  6000|          10000|           8000|
   122|Ibrahim |IT       |  8000|           4500|          11000|
   123|Vikram  |IT       |  8000|           6000|             -1|
   124|Dheeraj |IT       | 11000|           8000|             -1|
```

compare salary of each employee with prev one in the dept:

```
SELECT e.*,
LAG(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) AS prev_emp_salary,
CASE WHEN e.SALARY > LAG(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) THEN 'higher than prev'
	WHEN e.SALARY < LAG(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) THEN 'lower than prev'
	WHEN e.SALARY = LAG(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY emp_ID) THEN 'same as prev'
END salary_comparison
FROM employee e
```

```
emp_ID|emp_NAME|DEPT_NAME|SALARY|prev_emp_salary|salary_comparison|
------+--------+---------+------+---------------+-----------------+
   101|Mohan   |Admin    |  4000|               |                 |
   108|Maryam  |Admin    |  4000|           4000|     same as prev|
   113|Gautham |Admin    |  2000|           4000|  lower than prev|
   120|Monica  |Admin    |  5000|           2000| higher than prev|
   104|Dorvin  |Finance  |  6500|               |                 |
   106|Rajesh  |Finance  |  5000|           6500|  lower than prev|
   116|Satya   |Finance  |  6500|           5000| higher than prev|
   118|Tejaswi |Finance  |  5500|           6500|  lower than prev|
   102|Rajkumar|HR       |  3000|               |                 |
   105|Rohit   |HR       |  3000|           3000|     same as prev|
   107|Preet   |HR       |  7000|           3000| higher than prev|
   114|Manisha |HR       |  3000|           7000|  lower than prev|
   117|Adarsh  |HR       |  3500|           3000| higher than prev|
   119|Cory    |HR       |  8000|           3500| higher than prev|
   103|Akbar   |IT       |  4000|               |                 |
   109|Sanjay  |IT       |  6500|           4000| higher than prev|
   110|Vasudha |IT       |  7000|           6500| higher than prev|
   111|Melinda |IT       |  8000|           7000| higher than prev|
   112|Komal   |IT       | 10000|           8000| higher than prev|
   115|Chandni |IT       |  4500|          10000|  lower than prev|
   121|Rosalin |IT       |  6000|           4500| higher than prev|
   122|Ibrahim |IT       |  8000|           6000| higher than prev|
   123|Vikram  |IT       |  8000|           8000|     same as prev|
   124|Dheeraj |IT       | 11000|           8000| higher than prev|
```


#### SQL Question 1

> https://platform.stratascratch.com/coding/9899-percentage-of-total-spend?python&utm_source=youtube&utm_medium=click&utm_campaign=YT+description+link

```
--- INNER JOIN AS WE WANT ONLY CUSTOMERS WHO HAVE PLACED AN ORDER

select first_name,order_details,
total_order_cost/SUM(total_order_cost) OVER(PARTITION BY first_name) AS "percentage of the order cost"

from orders o
INNER JOIN customers c
ON o.cust_id = c.id
```

#### SQL Question 2

> https://platform.stratascratch.com/coding/2036-lowest-revenue-generated-restaurants?python&utm_source=youtube&utm_medium=click&utm_campaign=YT+description+link

```
--- Filter data to only use May 2020 records
SELECT * FROM (
SELECT 
    restaurant_id,
    order_total,
    NTILE(100) OVER (ORDER BY order_total ASC) AS percentile_value
FROM (

SELECT 
    --EXTRACT(MONTH FROM customer_placed_order_datetime) as order_month,
    --EXTRACT(YEAR FROM customer_placed_order_datetime) as order_year,
    restaurant_id, 
    SUM(order_total) AS order_total
FROM doordash_delivery
WHERE EXTRACT(MONTH FROM customer_placed_order_datetime) = 5 and EXTRACT(YEAR FROM customer_placed_order_datetime) = 2020
GROUP BY restaurant_id
) as sub1
) as sub0
WHERE percentile_value < 3
ORDER BY 2 ASC
```

Instead of a subquery we can use a CTE, which is a bit easier to interpret
Alse CTE creates temp tables, so we can re-use the query later


## Leetcode 180. Consecutive Numbers

> https://leetcode.com/problems/consecutive-numbers/

```sql
WITH CTE AS (
SELECT
num,
LAG(num, 1) OVER() AS `prev_1`,
LAG(num, 2) OVER() AS `prev_2`
FROM 
Logs
),
CTE2 AS (
    SELECT
    CASE WHEN num = prev_1 and num = prev_2 THEN num
    ELSE NULL END AS `ConsecutiveNums`
    FROM CTE
)

SELECT DISTINCT ConsecutiveNums FROM CTE2
WHERE ConsecutiveNums IS NOT NULL
```

## Leetcode 262. Trips and Users

> https://leetcode.com/problems/trips-and-users/

```sql

# Write your MySQL query statement below
WITH CTE AS (
SELECT request_at, status, u.banned AS user_banned, u1.banned AS driver_banned
FROM Trips t
INNER JOIN Users u
ON t.client_id = u.users_id
INNER JOIN Users u1 
ON t.driver_id = u1.users_id
    
WHERE u.banned = "No" and u1.banned = "No" and request_at BETWEEN "2013-10-01" and "2013-10-03"
),

CTE2 AS
(
SELECT request_at,
CASE WHEN status LIKE "cancelled%" THEN 1
ELSE 0 END AS new_status
FROM CTE
),
CTE3 AS 
(
SELECT 
    request_at AS Day,
    ROUND(SUM(new_status)/COUNT(new_status), 2) AS `Cancellation Rate`
    FROM CTE2
    GROUP BY request_at
)

SELECT * FROM CTE3

```

## SQL Moving Averages

> https://www.essentialsql.com/sql-puzzle-calculate-moving-averages/
---



<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=4546c55d-0fa0-4550-8db3-cc7c2a668064' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>