#  Data Science Learning Journey  
*Curiosity to Capability — One Notebook at a Time*

---
Compiled and authored by **Partho Sarothi Das**   
	Dhaka, Bangladesh  
	Bachelor's & Master's in Statistics  
	Investment Banking Professional → Aspiring Data Scientist 
    
---

# Subquery

### Definition:

In SQL, subqueries (also called *nested queries* or *inner queries*) are queries written **inside another SQL statement**. They can be used in various ways depending on their **position**, **return type**, and **correlation** with the outer query.


## Types of Subqueries in SQL

### 1. Based on Position in SQL

| Type                | Description                                                                         | Example                                                                                                                             |
| ------------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| **Scalar Subquery** | Returns **a single value** (one row, one column)                                    | `SELECT name FROM users WHERE id = (SELECT MAX(user_id) FROM orders)`                                                               |
| **Column Subquery** | Returns **a single column** with **multiple rows**                                  | `SELECT name FROM users WHERE id IN (SELECT user_id FROM orders)`                                                                   |
| **Row Subquery**    | Returns **a single row with multiple columns**                                      | `SELECT * FROM employees WHERE (department_id, salary) = (SELECT department_id, MAX(salary) FROM employees GROUP BY department_id)` |
| **Table Subquery**  | Returns **multiple rows and columns**, used in `FROM` clause like a temporary table | `SELECT avg_salary FROM (SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) AS dept_avg`        |

---

### 2. Based on Correlation

| Type                        | Description                                    | Example                                                                                      |
| --------------------------- | ---------------------------------------------- | -------------------------------------------------------------------------------------------- |
| **Non-Correlated Subquery** | Does **not depend** on outer query             | `SELECT name FROM users WHERE age > (SELECT AVG(age) FROM users)`                            |
| **Correlated Subquery**     | **Depends** on outer query’s row-by-row values | `SELECT name FROM users U WHERE EXISTS (SELECT 1 FROM orders O WHERE O.user_id = U.user_id)` |

---

### 3. Based on Location in Main Query

| Location        | Example                                                                                                                                                                             |
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **In `SELECT`** | `SELECT name, (SELECT COUNT(*) FROM orders WHERE user_id = users.id) AS order_count FROM users`                                                                                     |
| **In `FROM`**   | `SELECT * FROM (SELECT * FROM orders WHERE amount > 1000) AS big_orders`                                                                                                            |
| **In `WHERE`**  | `SELECT name FROM users WHERE id IN (SELECT user_id FROM orders)`                                                                                                                   |
| **In `HAVING`** | `SELECT user_id, COUNT(*) FROM orders GROUP BY user_id HAVING COUNT(*) > (SELECT AVG(order_count) FROM (SELECT COUNT(*) AS order_count FROM orders GROUP BY user_id) AS avg_table)` |

---

### Example
*Find the highest score movie from movies table*

```sql
SELECT * FROM partho.movies
WHERE score = (SELECT MAX(score) FROM partho.movies)
```

# Independent Subquery - Scalar Subquery

*1. Find the movie with highest profit (vs order by)*

```sql
SELECT * FROM partho.movies
WHERE (gross - budget) = (SELECT MAX(gross - budget) FROM partho.movies)
```

*2. Find how many movies have a rating > the avg of all the movie ratings (Find the count of above average movies)*

```sql
SELECT COUNT(*) FROM partho.movies
WHERE score > (SELECT AVG(score) FROM partho.movies)
```

*3. Find the highest rated movie of 2000*

```sql
SELECT * FROM partho.movies
WHERE year = 2000 AND score = (SELECT  MAX(score) FROM partho.movies 
				WHERE year=2000)
```

*4. Find the highest rated movie among all movies whose number of votes are > the dataset avg votes*

```sql
SELECT * FROM partho.movies
 WHERE score = (SELECT MAX(score) FROM partho.movies
					WHERE votes > (SELECT AVG(votes) FROM partho.movies))
```

# Independent Subquery - Row Subquery (One Col Multi Rows)

1. Find all users who never ordered

```sql
SELECT * FROM zomato.users
WHERE user_id NOT IN (SELECT DISTINCT(user_id) FROM zomato.orders)
```

2. Find all the movies made by top 3 directors (in terms of total gross income)

```sql
with top_directors AS (SELECT director FROM partho.movies
					GROUP BY director
					ORDER BY SUM(gross) DESC LIMIT 3)

SELECT * FROM partho.movies
WHERE director IN (SELECT * FROM top_directors)
```

3. Find all movies of all those actors whose filmography's avg rating > 8.5 (take 25000 votes as cutoff)

```sql
SELECT * FROM partho.movies
WHERE star IN (SELECT star FROM partho.movies
				WHERE votes> 25000
				GROUP BY star
				HAVING AVG(score)> 8.5)
```

# Independent Subquery - Table Subquery (Multi Col Multi Rows)

*1. Find the most profitable movie of each year.*

```sql
SELECT * FROM partho.movies
WHERE (year, gross-budget) IN (SELECT year, MAX(gross-budget) 
								FROM partho.movies
								GROUP BY year)
```

*2. Find the highest rated movie of each genre votes cutoff of 25000*

```sql
SELECT * FROM partho.movies
WHERE (genre, score) IN (SELECT genre, MAX(score) FROM partho.movies
								WHERE votes > 25000
								GROUP BY genre) 
AND votes> 25000
```

*3. Find the highest grossing movies of top 5 actor/director combo in terms of total gross income.*

```sql
WITH top_duos AS(
	SELECT star, director, MAX(gross) 
	FROM partho.movies
	GROUP BY star, director
	ORDER BY SUM(gross) DESC LIMIT 5
)
SELECT * FROM partho.movies
WHERE (star,director, gross) IN (SELECT * FROM top_duos)
```

# Correlated Subquery

*1. Find all the movies that have a rating higher than the average rating of movies in the same genre.*

```sql
SELECT * FROM partho.movies M1
WHERE score > (SELECT AVG(score) FROM partho.movies M2
				WHERE M1.genre = M2.genre)
```

*2. Find the favorite food of each customer.*

```sql
WITH fav_food AS(
	SELECT T2.user_id, T1.name, T4.f_name, COUNT(*) AS 'frequency'
	FROM zomato.users T1
	JOIN zomato.orders T2 ON T1.user_id = T2.user_id
	JOIN zomato.order_details T3 ON T2.order_id = T3.order_id
	JOIN zomato.food T4 ON T3.f_id = T4.f_id
	GROUP BY T2.user_id, T1.name, T4.f_name
)
SELECT * FROM fav_food
WHERE frequency = (SELECT MAX(frequency))	
```

# Usage with SELECT

*1. Get the percentage of votes for each movie compared of the total number of votes.*

```sql
SELECT name, votes*100/(SELECT SUM(votes) FROM partho.movies) FROM partho.movies
```

*2. Display all movie names, genre, score, and AVG(score) of genre*

```sql
SELECT name, genre, score,
(SELECT AVG(score) FROM partho.movies M2 WHERE M1.genre=M2.genre)
FROM partho.movies M1
```

*1. Display average rating of all the restaurants.*

```sql
SELECT r_name, AVG(restaurant_rating) AS 'avg_ratings'
FROM zomato.orders T1
JOIN zomato.restaurants T2
ON T1.r_id = T2.r_id
GROUP BY r_name
```

# Usage with having

*1. Find genres having avg score>avg score of all the movies*

```sql
SELECT genre, AVG(score) 
FROM partho.movies
GROUP BY genre
HAVING AVG(score) > (SELECT AVG(score) FROM partho.movies)
```

# Subquery in INSERT

*1. Populate a already created loyal_customers table with records of only those customers who have ordered food more than 3 times.*

```sql
INSERT INTO zomato.loyal_customers
(user_id, name)
SELECT T1.user_id, T2.name 
FROM zomato.orders T1
JOIN zomato.users T2 ON T1.user_id = T2.user_id
GROUP BY T1.user_id, T2.name
HAVING COUNT(*) > 3
```

# Subquery in UPDATE

*1. Populate the money col of loyal_customer table using the orders table. Provide a 10% app money to all customers based on their order value.*

```sql
UPDATE zomato.loyal_customers LC
JOIN (
    SELECT user_id, SUM(amount) * 0.1 AS cashback
    FROM zomato.orders
    GROUP BY user_id
) AS sub ON LC.user_id = sub.user_id
SET LC.money = sub.cashback;
```

# Subquery in DELETE

*1. Delete all the customers record who have never ordered.*

``` sql
DELETE FROM zomato.users
WHERE user_id IN (SELECT user_id FROM (
        SELECT user_id FROM zomato.users
        WHERE user_id NOT IN (
            SELECT DISTINCT user_id FROM zomato.orders
        )
    ) AS temp
);
```