###


---

# **Module 3: SQL for Data Analysis**

SQL (**Structured Query Language**) is the **standard language for databases**. Whether it’s a small Excel export or a billion-row data warehouse, SQL is how you **ask the database questions** like:

👉 *“Show me only the customers I want, in the form I want.”*

---

## 🟢 1. SQL Basics

Let’s start with a sample **customers** table:

| customer_id | first_name | last_name | age | city       | email                                   |
| ----------- | ---------- | --------- | --- | ---------- | --------------------------------------- |
| 101         | John       | Smith     | 35  | London     | [john@gmail.com](mailto:john@gmail.com) |
| 102         | Mary       | Jones     | 28  | Bristol    | [mary@yahoo.com](mailto:mary@yahoo.com) |
| 103         | Alex       | Brown     | 42  | Manchester | [alex@gmail.com](mailto:alex@gmail.com) |

---

### SELECT → Fetch data

```sql
SELECT first_name, last_name, age
FROM customers;
```

👉 Picks only chosen columns.

---

### WHERE → Filter rows

```sql
SELECT *
FROM customers
WHERE age > 30;
```

👉 Returns only customers older than 30.

---

### ORDER BY → Sort results

```sql
SELECT first_name, age
FROM customers
ORDER BY age DESC;
```

👉 Sorts customers by age, oldest first.

---

### LIMIT → Restrict rows

```sql
SELECT *
FROM customers
LIMIT 2;
```

👉 Shows only the first 2 rows.

---

**💡 Interview Tip:**
*What is the difference between WHERE and HAVING?*

* **WHERE** filters rows **before aggregation**.
* **HAVING** filters rows **after aggregation**.

---

## 🟡 2. Filtering Data

More powerful filtering with logical operators:

---

### AND / OR

```sql
SELECT *
FROM customers
WHERE age > 30 AND city = 'London';
```

👉 Customers older than 30 **and** living in London.

---

### IN

```sql
SELECT *
FROM customers
WHERE city IN ('London', 'Manchester');
```

👉 Customers from London **or** Manchester.

---

### BETWEEN

```sql
SELECT *
FROM customers
WHERE age BETWEEN 30 AND 40;
```

👉 Customers aged between 30 and 40 (inclusive).

---

### LIKE → Pattern matching

```sql
SELECT *
FROM customers
WHERE email LIKE '%@gmail.com';
```

👉 Finds all customers with Gmail accounts.

* `%` = any number of characters
* `_` = exactly one character

---

## 🔵 3. Aggregations

Aggregations summarize data.

Let’s add an **orders** table:

| order_id | customer_id | order_date | amount |
| -------- | ----------- | ---------- | ------ |
| 2001     | 101         | 2024-01-12 | 500    |
| 2002     | 102         | 2024-02-15 | 300    |
| 2003     | 101         | 2024-03-05 | 200    |
| 2004     | 103         | 2024-03-10 | 400    |

---

### COUNT, SUM, AVG, MIN, MAX

```sql
SELECT COUNT(*) AS total_orders,
       SUM(amount) AS total_revenue,
       AVG(amount) AS avg_order
FROM orders;
```

👉 Counts orders, total sales, and average order size.

---

### GROUP BY → Grouped aggregation

```sql
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id;
```

👉 Shows total spending per customer.

---

### HAVING → Filter aggregated results

```sql
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 400;
```

👉 Only customers who spent more than 400.

---

## 🟣 4. Joins

Joins combine data across tables.

---

### INNER JOIN → Only matching rows

```sql
SELECT c.customer_id, c.first_name, o.order_id, o.amount
FROM customers c
INNER JOIN orders o
ON c.customer_id = o.customer_id;
```

👉 Shows only customers **who placed orders**.

---

### LEFT JOIN → All from left + matches

```sql
SELECT c.customer_id, c.first_name, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id;
```

👉 All customers (even if they never ordered).

---

### RIGHT JOIN → Opposite of LEFT

```sql
SELECT c.customer_id, c.first_name, o.order_id
FROM customers c
RIGHT JOIN orders o
ON c.customer_id = o.customer_id;
```

👉 All orders, even if customer info is missing.

---

### FULL OUTER JOIN → All rows from both

```sql
SELECT c.customer_id, c.first_name, o.order_id
FROM customers c
FULL JOIN orders o
ON c.customer_id = o.customer_id;
```

👉 Every customer and every order, matched where possible.

---

**💡 Interview Tip:**

* **INNER JOIN** → Use when you only care about matched records.
* **LEFT JOIN** → Use when you want “all customers,” even if they didn’t order.

---

## 🔴 5. Window Functions (Advanced SQL)

Window functions calculate **across rows** but keep each row visible (unlike GROUP BY).

---

### ROW_NUMBER → Assigns unique numbers

```sql
SELECT customer_id, amount,
       ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS row_num
FROM orders;
```

👉 Ranks each customer’s orders from largest to smallest.

---

### RANK vs DENSE_RANK

```sql
SELECT order_id, amount,
       RANK() OVER (ORDER BY amount DESC) AS rnk,
       DENSE_RANK() OVER (ORDER BY amount DESC) AS dense_rnk
FROM orders;
```

👉 RANK leaves gaps if ties exist; DENSE_RANK does not.

---

### SUM / AVG with OVER

```sql
SELECT customer_id, amount,
       SUM(amount) OVER (PARTITION BY customer_id) AS total_spent
FROM orders;
```

👉 Shows each order along with the customer’s total spending.

---

## 🟤 6. Subqueries & CTEs

---

### Subquery → Query inside another query

```sql
SELECT first_name, age
FROM customers
WHERE age > (SELECT AVG(age) FROM customers);
```

👉 Finds customers older than the average.

---

### Common Table Expressions (CTEs)

Make queries readable:

```sql
WITH customer_sales AS (
    SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id
)
SELECT *
FROM customer_sales
WHERE total_spent > 400;
```

👉 First calculate spending → then filter high spenders.

---

**💡 Interview Tip:**

* **Subquery** = used inline (filters, conditions).
* **CTE** = best for multi-step logic, more readable.

---

## 💡 Practice Tasks

1. **Top 5 customers by spending**

```sql
SELECT customer_id, SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 5;
```

2. **Find customers older than the average age.**

3. **Show each customer’s total spending, even if 0 (use LEFT JOIN).**

4. **Rank customers by total spending.**

5. **Use a CTE to list customers who spent more than 1000 in 2024.**

---

✅ By the end of this module, you know how to:

* Extract data (`SELECT, WHERE, ORDER BY, LIMIT`)
* Summarize data (`COUNT, SUM, GROUP BY, HAVING`)
* Combine tables (`JOINs`)
* Do advanced analytics (`Window Functions, Subqueries, CTEs`)

👉 This is **both beginner-friendly and advanced enough** for interviews.

---

Do you want me to **extend this further with real-world case studies** (like “find churned customers,” “monthly revenue growth,” “second highest salary”)? That would make it even more **job-ready**.


---

## 🚀 End of Module 3 – Key Takeaways

* Start with **SELECT, WHERE, ORDER BY, LIMIT** → build solid foundations.
* Master **aggregations & GROUP BY** → essential for reporting.
* Learn **JOINS** → 90% of real-world queries involve multiple tables.
* Advance with **window functions & CTEs** → common in interviews.

👉 In interviews, you’ll often face **case-based SQL challenges** like:

* “Find the second highest salary in the company.”
* “Show monthly revenue growth compared to last year.”
* “Identify customers who bought only once.”

---

###