<a href="https://colab.research.google.com/github/YogeshwaranJ/Data-Analytics-Foundations/blob/main/Lectures/SQL_Basics_and_Python_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lecture 2: Python Basics & SQL Deep Dive

**Welcome back!**

In our last session, we got a high-level view of the data analysis lifecycle and even performed a full analysis using Python's Pandas library. We also briefly covered the *theory* of SQL.

Today, we're going to build on that. We'll start with a quick refresher on the core Python syntax that powers libraries like Pandas. Then, we will spend the majority of our time taking a deep, practical dive into SQL.

## Lecture Agenda

**Part 1: Python Fundamentals (Refresher)**
* Variables & Data Types
* Key Data Structures: Lists & Dictionaries
* Controlling the Flow: `if`/`else` and `for` loops
* Creating Reusable Code: Functions (`def`)

**Part 2: SQL In-Depth (Practical Application)**
* **A Note on W3Schools (Our Live SQL Environment)**
* Refresher: The `SELECT`, `FROM`, `WHERE` Clauses
* Filtering on Steroids: `AND`, `OR`, `IN`, `BETWEEN`, `LIKE`
* Sorting Your Results: `ORDER BY`
* The Core of Analytics: Aggregate Functions (`COUNT`, `SUM`, `AVG`)
* Answering Business Questions: `GROUP BY`
* Filtering Groups: The `HAVING` Clause
* The *Real* Power of SQL: `JOIN`s (Joining Tables)
* Advanced Topics: Subqueries
* Quick Review: DML (`INSERT`, `UPDATE`, `DELETE`)
* Quick Review: DDL (`CREATE TABLE`)

# Part 1: Python Fundamentals (Refresher)

You've already *used* Python with Pandas, but let's quickly review the building blocks. Understanding these makes Pandas (and all Python code) much less magical.

### 1.1 Variables & Data Types
Variables are just containers for storing data. Python automatically figures out the type.

In [None]:
# String (text)
my_name = "Alice"

# Integer (whole number)
my_age = 30

# Float (decimal number)
my_score = 95.5

# Boolean (True/False)
is_student = True

print(f"Name: {my_name}, Age: {my_age}, Is Student: {is_student}")

### 1.2 Key Data Structures: Lists & Dictionaries
These are the two most important data structures for data analysis.

**Lists** store a collection of items *in order*. (Used for `pd.Series` or columns)
**Dictionaries** store a collection of *key-value pairs*. (Used to create `pd.DataFrame`s)

In [None]:
# A list of numbers
prices = [10.99, 5.45, 12.00]

# Access the first item (Python is 0-indexed)
print(f"First price: {prices[0]}")

# Add an item to the list
prices.append(8.75)
print(f"All prices: {prices}")

print("---")

# A dictionary representing a person
person = {
    "name": "Bob",
    "age": 42,
    "city": "London"
}

# Access a value by its key
print(f"Bob's city: {person['city']}")

# Add a new key-value pair
person['job'] = "Engineer"
print(f"Bob's data: {person}")

**Connection to Pandas:** A `DataFrame` is basically a dictionary of lists!

```python
# This...
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [30, 42, 25]
}
# ...is how you make this:
df = pd.DataFrame(data)
```

### 1.3 Control Flow: `if`/`else` and `for` Loops

`if`/`else` lets you make decisions. `for` loops let you do something to every item in a list.

In [None]:
# if/elif/else example
age = 19

if age < 18:
    print("Minor")
elif age >= 18 and age < 65:
    print("Adult")
else:
    print("Senior")

print("---")

# for loop example
names = ["Alice", "Bob", "Charlie"]

for name in names:
    print(f"Hello, {name}!")

### 1.4 Functions (`def`)
Functions let you bundle up a piece of code to reuse it. This is the foundation of *all* libraries (like Pandas).

In [None]:
# Define a function
def calculate_total(price, tax_rate):
    """Calculates the total price including tax."""
    total = price * (1 + tax_rate)
    return total

# Call the function
item_price = 100
tax = 0.07
total_cost = calculate_total(item_price, tax)

print(f"The total cost is: ${total_cost:.2f}") # .2f formats to 2 decimal places

# Part 2: SQL In-Depth (Practical Application)

## A Note for Our Lecture: Using W3Schools

As we discussed, the best way to learn SQL is to write it.

As an instructor, I will present the concepts and queries here. You (the student) should open the **[W3Schools "Try SQL" Editor](https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all)**.

That link gives you a live, sample database with tables like `Customers`, `Products`, `Orders`, etc. You can run all the queries we write here directly in that editor to see the results instantly.

### 2.1 Refresher: `SELECT`, `FROM`, `WHERE`

This is the basic structure of almost every query you'll write.

* `SELECT` - chooses the **columns** you want.
* `FROM` - chooses the **table** you want them from.
* `WHERE` - **filters** the rows based on a condition.

**Business Question:** "Get the names and cities of all our customers in Germany."

```sql
SELECT CustomerName, City
FROM Customers
WHERE Country = 'Germany';
```

### 2.2 Advanced Filtering

Real-world questions are more complex. You often need multiple conditions.

**`AND` / `OR`** (Combine conditions)
* `AND`: *All* conditions must be true.
* `OR`: *At least one* condition must be true.

**Question:** "Show me customers in Germany *or* the USA."

```sql
SELECT CustomerName, Country
FROM Customers
WHERE Country = 'Germany' OR Country = 'USA';
```

**Question:** "Show me customers in Germany *who are also* in the city of Berlin."
```sql
SELECT CustomerName, City, Country
FROM Customers
WHERE Country = 'Germany' AND City = 'Berlin';
```

**`IN`** (A shortcut for multiple `OR`s)

**Question:** "Show me customers from 'Germany', 'France', or 'UK'."
```sql
/* This query is cleaner than using multiple ORs */
SELECT CustomerName, Country
FROM Customers
WHERE Country IN ('Germany', 'France', 'UK');
```

**`BETWEEN`** (Selects a range)

**Question:** "Show me all products with a price between $10 and $20."
```sql
SELECT ProductName, Price
FROM Products
WHERE Price BETWEEN 10 AND 20;
```

**`LIKE`** (Pattern matching for text. `%` is a wildcard for 'any characters')

**Question:** "Find all customers whose name starts with the letter 'A'."
```sql
SELECT CustomerName
FROM Customers
WHERE CustomerName LIKE 'A%';
```

### 2.3 Sorting Results: `ORDER BY`

This clause sorts your final output.
* `ASC`: Ascending order (A-Z, 1-10) - This is the default.
* `DESC`: Descending order (Z-A, 10-1)

**Question:** "List all products, from most expensive to least expensive."

```sql
SELECT ProductName, Price
FROM Products
ORDER BY Price DESC;
```

### 2.4 The Core of Analytics: Aggregate Functions

Aggregates perform a calculation on a set of rows and return a *single value*.

* `COUNT(*)`: Counts the number of rows.
* `SUM(column)`: Adds up all values in a column.
* `AVG(column)`: Calculates the average of a column.
* `MIN(column)`: Finds the minimum value.
* `MAX(column)`: Finds the maximum value.

**Question:** "How many customers do we have in total?"
```sql
SELECT COUNT(*)
FROM Customers;
```
**Question:** "What is the average price of all our products?"
```sql
SELECT AVG(Price) AS AveragePrice -- 'AS' gives the column a new name
FROM Products;
```

### 2.5 Answering Business Questions: `GROUP BY`

This is one of the most important concepts. `GROUP BY` is used with aggregate functions to split the data into groups and perform the calculation *for each group*.

**Business Question:** "Simple `COUNT(*)` told me the *total* customers. But now I need to know: **How many customers do we have *in each country*?**"

```sql
SELECT Country, COUNT(*) AS CustomerCount
FROM Customers
GROUP BY Country
ORDER BY CustomerCount DESC; -- Let's sort it to see the biggest first!
```
This query tells the database:
1.  Group all rows by their `Country` (all 'Germany' rows in one bucket, all 'USA' in another).
2.  For each bucket, `COUNT` the number of rows in it.
3.  Show me the `Country` and its `COUNT`.

### 2.6 Filtering Groups: The `HAVING` Clause

This is a common point of confusion.
* `WHERE` filters rows **before** they are grouped.
* `HAVING` filters groups **after** they are grouped.

**Business Question:** "Show me only the countries where we have *more than 10 customers*."

```sql
/* You CANNOT use 'WHERE CustomerCount > 10' because 'WHERE' runs
   before the 'COUNT' is even calculated! */

SELECT Country, COUNT(*) AS CustomerCount
FROM Customers
GROUP BY Country
HAVING CustomerCount > 10; -- or HAVING COUNT(*) > 10
```

### 2.7 The Real Power of SQL: `JOIN`s

Data is almost never in one giant table. It's 'normalized' (split) into related tables.
Example: We have a `Customers` table and an `Orders` table. The `Orders` table has a `CustomerID` column to link it back to the customer who made the order.

A `JOIN` lets us combine these tables for a query.

**`INNER JOIN`**
Selects *only* the rows that have a match in *both* tables. (e.g., Customers who *have* placed an order).


**Business Question:** "Show me a list of all orders, and include the Customer's name with each order."

```sql
SELECT O.OrderID, C.CustomerName, O.OrderDate
FROM Orders AS O
JOIN Customers AS C
ON O.CustomerID = C.CustomerID; -- This is the 'link' between the tables
```

**`LEFT JOIN`**
Selects *all* rows from the 'left' table (the first one) and any matching rows from the 'right' table. If there's no match, the right side columns will be `NULL` (empty).


**Business Question:** "Show me *all* customers, and if they've placed an order, show that too. I want to see customers who *haven't* ordered anything."

```sql
SELECT C.CustomerName, O.OrderID
FROM Customers AS C
LEFT JOIN Orders AS O
ON C.CustomerID = O.CustomerID
WHERE O.OrderID IS NULL; -- This part filters to find *only* the non-ordering customers
```

### 2.8 Advanced Topic: Subqueries

A subquery is a SQL query *inside* another query. It's used when you need to answer one question to answer another.

**Business Question:** "Show me all products that are more expensive than our average product price."

*Step 1 (Subquery):* Find the average price. `(SELECT AVG(Price) FROM Products)`
*Step 2 (Outer Query):* Find products `WHERE Price >` that average.

```sql
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
```

### 2.9 Quick Review: DML (Manipulating Data)
This is what we covered in theory last time: `INSERT`, `UPDATE`, `DELETE`.
*(Be careful running these on the W3Schools DB, you may need to 'Restore Database' after)*

**`INSERT INTO`** (Add a new row)
```sql
INSERT INTO Customers (CustomerName, City, Country)
VALUES ('New Customer Inc.', 'New York', 'USA');
```

**`UPDATE`** (Change an existing row. **ALWAYS use `WHERE`!**)
```sql
UPDATE Customers
SET City = 'San Francisco'
WHERE CustomerName = 'New Customer Inc.';
```

**`DELETE`** (Remove a row. **ALWAYS use `WHERE`!**)
```sql
DELETE FROM Customers
WHERE CustomerName = 'New Customer Inc.';
```

### 2.10 Quick Review: DDL (Defining Data)
This is how we create the tables themselves.

**`CREATE TABLE`** (Make a new, empty table)
```sql
CREATE TABLE Employees (
    EmployeeID INT, -- Integer type
    LastName VARCHAR(255), -- Variable-length text (255 max chars)
    FirstName VARCHAR(255),
    BirthDate DATE -- A date type
);
```

**`DROP TABLE`** (Delete the entire table and all its data!)
```sql
DROP TABLE Employees;
```

# Lecture Conclusion

Today we've accomplished a lot:
1.  **Python:** We reviewed the core building blocks: variables, lists, dictionaries, loops, and functions. You can now see how these simple parts build a powerful tool like Pandas.
2.  **SQL:** We went from basic `SELECT` queries to powerful analytical tools. You've learned to filter (`WHERE`), sort (`ORDER BY`), aggregate (`GROUP BY`), and combine tables (`JOIN`).

**Key Takeaway:**
Python/Pandas is for deep analysis, visualization, and modeling on data you *have* (like a CSV).
SQL is the language you use to *get* the data from a massive company database.

An analyst's most common workflow is:
1.  Write a complex **SQL** query to pull *only* the data you need.
2.  Export that data to a CSV or load it directly into **PandVandas**.
3.  Perform your analysis in **Python**.