# Week 5: Introduction to SQL
**Asynchronous Lecture Material**

---
# LECTURE 1: Introduction to Databases and SQL
---

## Why Databases?

**Limitations of CSV files:**
- Work well for small datasets (< 10GB)
- Limited for concurrent access by multiple users
- Security and access control challenges
- No built-in data validation

**Databases solve these problems:**
- Reliable storage (survives crashes)
- Optimized for large-scale data
- Concurrent user access
- Built-in access control and validation
- Special data structures for performance

![alt text](image_files/data_storage.png)

## Common Database Systems

**Industry-standard systems:**
- Google BigQuery
- Amazon Redshift
- Snowflake
- Databricks
- Microsoft SQL Server
- PostgreSQL

**⚠️ Important:** Improper use can result in thousands of dollars wasted and days of processing time!

## Data Persistence Techniques

**1. Basic File I/O**

In [None]:
# Simple file writing example
data = """Data to write to the file, which can easily include any Python datatype
by using string formatting techniques."""

with open('test.dat', 'w') as fout:
    fout.write(data)

In [None]:
# Reading data back
with open('test.dat', 'r') as fin:
    for line in fin:
        print(line)

**Problems with basic file I/O:**
- All data handled as strings (conversion overhead)
- No concurrency control
- Large storage requirements (without compression)
- Depends entirely on file system

## Pickling

**Pickling** = Python's built-in object serialization

**Benefits:**
- Saves class information with data
- Easy to use
- Works with complex Python objects

**⚠️ Security Warning:** Only unpickle data from trusted sources!

In [None]:
import numpy as np
import pickle as pkl

# Create and pickle data
data = np.random.rand(100)

with open('test.p', 'wb') as fout:
    pkl.dump(data, fout)

In [None]:
# Unpickle data
with open('test.p', 'rb') as fin:
    newData = pkl.load(fin)

print(newData[0:20:4])

In [None]:
!ls -l .

**Pickling limitations:**
- Still relies on file system for concurrency, consistency, and durability
- Security risks with untrusted data

**For more robust data persistence → We need database systems!**

---
# LECTURE 2: Database Systems, ACID Test, and SQLite
---

## Database-Backed Applications

**Examples you use daily:**
- Google search
- News websites
- Banking portals
- Map/navigation services
- E-commerce sites

All dynamically retrieve and display data from databases!

## Types of Database Systems

**1. Relational Databases (RDBMS)**
- Use tabular data model (rows and columns)
- Examples: MySQL, PostgreSQL, SQL Server, Oracle
- **← Focus of this course**

**2. NoSQL Databases**
- Non-tabular models for scalability
- Types: Key-store (Dynamo), Document (MongoDB), Column (Cassandra)
- Designed for "big data" challenges

![alt text](image_files/f1-1.png)

## Database Roles

**Database Administrator (DBA):**
- Hardware selection and setup
- Database server installation
- Performance optimization
- Backup and recovery

**Database Developer:**
- Database design
- Schema and table creation
- Relationship and index design
- Query optimization

**Database Application Developer:**
- Integrates apps with databases
- Uses database APIs
- Manages data storage/retrieval in code

## The ACID Test

**ACID = Atomicity, Consistency, Isolation, Durability**

Critical for reliable databases (e.g., banking systems)!

![alt text](image_files/f2-1.png)

![alt text](image_files/f3-1.png)

### ACID Components Explained

**Atomicity:**
- Operations grouped as single unit
- Example: Money transfer between accounts

**Consistency:**
- All operations succeed OR none do
- No partial/incomplete states
- Example: Money not lost mid-transfer

**Isolation:**
- Independent transactions don't interfere
- Example: Two customers transferring funds simultaneously

**Durability:**
- Survives unexpected terminations
- Example: Power outage during transaction
- Database recovers and continues

**ACID-compliant databases:** IBM DB2, Oracle, SQL Server, PostgreSQL, SQLite

## SQLite

**What is SQLite?**
- Lightweight, embedded database
- Perfect for learning and prototyping
- ACID-compliant
- Used in many production applications!

In [None]:
from IPython.display import HTML
HTML('<iframe src=https://www.sqlite.org/famous.html width=750 height=300></iframe>')

### SQLite Key Features

**Self-contained:**
- Single library, no dependencies
- Comes with Python by default

**Serverless:**
- No server process needed
- Database = single file
- Platform-independent

**Zero-configuration:**
- No setup required
- Ready to use immediately

**Transactional:**
- ACID-compliant
- Atomic commits

**Compact:**
- Library can be < 300 KB!

### Installing and Using SQLite

In [None]:
!apt-get install sqlite3

In [None]:
# SQLite help
!sqlite3 -help

In [None]:
%%writefile help.txt
.help

In [None]:
# List SQLite 'dot' commands
!sqlite3 < help.txt

### Key SQLite Commands

- `.header on/off` - Show/hide column names
- `.separator ","` - Set field separator (for CSV import)
- `.import` - Import data from file
- `.schema` - Show table structure
- `.stats on/off` - Show query statistics
- `.width` - Set column display widths

---
# LECTURE 3: SQL Basics - Data Types and CREATE TABLE
---

## Creating a Database

**Three steps:**
1. Create new database (just name a file!)
2. Create schema (table definitions)
3. Populate tables with data

In [None]:
!pwd

In [None]:
# Create directory for database
!mkdir /content/database

In [None]:
!ls -l /content/database

## Relational Database Basics

**Key concepts:**
- **Database** = collection of related tables
- **Table** = like a spreadsheet with rows and columns
- **Row** = single record/entry
- **Column** = specific data type (integer, text, etc.)
- **Key column** = links tables together

![alt text](image_files/sql_terminology.png)

## Naming Conventions (SQLite)

**Rules:**
- Case insensitive (by default)
- Start with letter or underscore
- Only alphanumeric characters or underscores
- Keep reasonable length
- Avoid reserved keywords

**Our style:**
- SQL commands: UPPERCASE
- Names: camelCase
- Example: `SELECT aLongIdentifier FROM dataTable;`

## SQL = Structured Query Language

**Standardized language for databases**
- SQL-92 (1992 standard)
- SQL-99 (1999 standard)

**Two main components:**

**1. DDL (Data Definition Language)**
- CREATE, ALTER, DROP tables
- Define structure

**2. DML (Data Manipulation Language)**
- INSERT, SELECT, UPDATE, DELETE data
- Work with content

## SQL Data Types

**SQLite has 5 storage classes:**

1. **NULL** - Null value
2. **INTEGER** - Signed integer (1, 2, 3, 4, 6, or 8 bytes)
3. **REAL** - 8-byte floating-point
4. **TEXT (VARCHAR)** - String in database encoding (UTF-8)
5. **BLOB** - Binary data stored exactly as-is
6. **DATETIME** - Date and time

**Note:** Boolean → INTEGER (0=False, 1=True)

![alt text](image_files/fig4-1.png)

## The NULL Type

**NULL = "no value"**
- Used when field is blank/empty
- Different from 0 or empty string
- Important for database integrity

## CREATE TABLE Syntax

```sql
-- Comment describing the table

CREATE TABLE tableName (
    columnName dataType [constraints],
    columnName dataType [constraints],
    ...
);
```

**Example constraints:**
- `NOT NULL` - Must have a value
- `PRIMARY KEY` - Unique identifier
- `DEFAULT value` - Default if not specified

![alt text](image_files/fg5-1.png)

## Example: Bigdog's Surf Shop

**Two tables:**
- **Products** - Items for sale
- **Vendors** - Suppliers

In [None]:
%%writefile create.sql

-- First drop tables if they exist
DROP TABLE IF EXISTS myVendors;
DROP TABLE IF EXISTS myProducts;

-- Vendor Table
CREATE TABLE myVendors (
    itemNumber INT NOT NULL,
    vendornumber INT NOT NULL,
    vendorName TEXT
);

-- Product Table
CREATE TABLE myProducts (
    itemNumber INT NOT NULL,
    price REAL,
    stockDate TEXT,
    description TEXT
);

In [None]:
# Create the schema
!sqlite3 test < create.sql

In [None]:
# Verify tables were created
!sqlite3 test ".schema"

## Primary Keys and Foreign Keys

**Common constraints:**
- **CHECK** - Data must satisfy condition
- **PRIMARY KEY** - Uniquely identifies rows
- **NOT NULL** - Cannot be null
- **DEFAULT** - Provides default value

![alt text](image_files/f6-1.png)

**Primary Key:**
- Uniquely identifies each record
- Example: Dragon's "name" is primary key
- No two dragons can have same name!
- Ensures data integrity and optimizes access

![alt text](image_files/f7-1.png)

**Foreign Key:**
- References primary key in another table
- Creates relationships between tables
- Ensures referential integrity

![alt text](image_files/f8-1.png)

## DROP TABLE

**Delete a table:**
```sql
DROP TABLE tableName;
```

**⚠️ Warning:** Permanent deletion - no confirmation dialog!

---
# LECTURE 4: SQL INSERT and Transactions
---

## INSERT Statement

**Basic syntax:**
```sql
INSERT INTO tableName
    [(column1, column2, ...)]
    VALUES (value1, value2, ...);
```

**Two forms:**
- Single-row insert
- Multiple-row insert

## Single-Row INSERT Examples

In [None]:
%%writefile insert.sql

-- Unnamed INSERT (uses column order)
INSERT INTO myProducts
VALUES(1, 19.95, '2015-03-31', 'Hooded sweatshirt');

-- Named INSERT (explicit columns)
INSERT INTO myProducts (itemNumber, price, stockDate, description)
VALUES(2, 99.99, '2015-03-29', 'Beach umbrella');

-- Partial INSERT (some columns)
INSERT INTO myProducts (itemNumber, price, stockDate)
VALUES(3, 0.99, '2015-02-28');

-- Multiple-row INSERT
INSERT INTO myProducts (itemNumber, price, stockDate, description)
VALUES (4, 29.95, '2015-02-10', 'Male bathing suit, blue'),
       (5, 49.95, '2015-02-20', 'Female bathing suit, one piece, aqua'),
       (6, 9.95, '2015-01-15', 'Child sand toy set'),
       (7, 24.95, '2014-12-20', 'White beach towel'),
       (8, 32.95, '2014-12-22', 'Blue-striped beach towel'),
       (9, 12.95, '2015-03-12', 'Flip-flop'),
       (10, 34.95, '2015-01-24', 'Open-toed sandal');

-- Insert into myVendors
INSERT INTO myVendors(itemNumber, vendorNumber, vendorName)
VALUES (1, 1, 'Luna Vista Limited'),
       (2, 1, 'Luna Vista Limited'),
       (3, 1, 'Luna Vista Limited'),
       (4, 2, 'Mikal Arroyo Incorporated'),
       (5, 2, 'Mikal Arroyo Incorporated'),
       (6, 1, 'Luna Vista Limited'),
       (7, 1, 'Luna Vista Limited'),
       (8, 1, 'Luna Vista Limited'),
       (9, 3, 'Quiet Beach Industries'),
       (10, 3, 'Quiet Beach Industries');

In [None]:
!head -10 insert.sql

In [None]:
!sqlite3 test < insert.sql

In [None]:
# View complete database
!sqlite3 test ".dump"

## Transactions

**Transaction = Logical unit of work**

**Key statements:**
- `BEGIN TRANSACTION;` - Start transaction
- `COMMIT;` - Save all changes
- `ROLLBACK;` - Undo all changes

**ACID in action:**
- If operations fail → ROLLBACK
- If all succeed → COMMIT
- Database never in partial state

---
# LECTURE 5: SQL SELECT and WHERE Clause
---

## SELECT Statement

**Basic format:** `SELECT ... FROM ... WHERE;`

**Full syntax:**
```sql
SELECT [DISTINCT | ALL] column1, column2, ...
FROM table
[WHERE condition]
[GROUP BY column]
[HAVING condition]
```

## Evaluation Order

**SQL processes in this order:**
1. FROM - Locate data
2. WHERE - Filter rows
3. GROUP BY - Group related rows
4. HAVING - Filter groups
5. SELECT - Choose columns

## Basic SELECT Examples

In [None]:
# Select all columns (not recommended for production!)
!sqlite3 test "SELECT * FROM myProducts;"

In [None]:
# Select specific columns (recommended!)
!sqlite3 test "SELECT price, itemNumber, description FROM myProducts;"

**Best practice:** Always explicitly list column names!

**Why?**
- Prevents bugs if table structure changes
- Documents what data you need
- Controls column order

![alt text](image_files/f9-1.png)

## WHERE Clause

**Purpose:** Filter rows before selecting data

**Benefits:**
- Improves performance
- Reduces data transfer
- Focuses on relevant data

In [None]:
# Simple WHERE condition
!sqlite3 test "SELECT p.itemNumber, p.price FROM myProducts AS p WHERE p.price > 30.00;"

In [None]:
# Multiple conditions with AND
!sqlite3 test "SELECT * FROM myProducts WHERE price > 30.00 AND stockDate < '2015-01-01';"

## Boolean Operators in WHERE

| Operator | Example | Description |
|----------|---------|-------------|
| = | `price = 29.95` | Equal to |
| < | `price < 29.95` | Less than |
| > | `price > 29.95` | Greater than |
| <= | `price <= 29.95` | Less than or equal |
| >= | `price >= 29.95` | Greater than or equal |
| <> | `price <> 29.95` | Not equal |
| IS NULL | `description IS NULL` | Is null |
| IS NOT NULL | `description IS NOT NULL` | Is not null |
| AND | `price > 29.95 AND itemNumber > 5` | Both conditions |
| OR | `price > 29.95 OR itemNumber > 5` | Either condition |
| NOT | `NOT vendorNumber = 1` | Negation |
| BETWEEN | `price BETWEEN 29.95 AND 39.95` | Range (inclusive) |
| LIKE | `vendorName LIKE 'Lun%'` | Pattern match (% = wildcard) |

## Table Aliases (AS clause)

**Purpose:** Create shorthand for table names

```sql
SELECT p.price, p.description
FROM myProducts AS p
WHERE p.price > 30.00;
```

**Benefits:**
- Shorter, cleaner queries
- Essential for joins
- Can rename result columns

---
# LECTURE 6: JOINs - Combining Tables
---

## Why JOIN Tables?

**Real databases use multiple tables:**
- Reduces data redundancy
- Organizes related information
- Enables complex queries

**Example:** Products + Vendors
- Product table: Item details
- Vendor table: Supplier info
- Link them by itemNumber

## Star Schema

**Common database organization:**
- **Fact table** - Central table with links
- **Dimension tables** - Detailed information

**Example:** Boba shop database
- Fact: Orders
- Dimensions: Teas, Toppings, Locations

![alt text](image_files/multidimensional.png)

![alt text](image_files/star.png)

## JOIN Syntax

```sql
SELECT column_list
FROM table_1
    JOIN table_2
    ON key_1 = key_2;
```

**Key parts:**
- FROM - First table
- JOIN - Second table
- ON - Matching condition

## Types of JOINs

**We'll cover:**
1. INNER JOIN (most common)
2. CROSS JOIN
3. LEFT OUTER JOIN
4. RIGHT OUTER JOIN
5. FULL OUTER JOIN

### Sample Tables for Examples

![alt text](image_files/cats.png)

### INNER JOIN

**Keeps only matching rows from both tables**

![alt text](inner.png)

**Most common join type!**

### CROSS JOIN

**All possible combinations (Cartesian product)**

![alt text](image_files/cross.png)

**No ON clause needed - combines everything!**

### LEFT OUTER JOIN

**Keeps all rows from left table, matching rows from right**

![alt text](image_files/left.png)

**Missing values filled with NULL**

### RIGHT OUTER JOIN

**Keeps all rows from right table, matching rows from left**

![alt text](image_files/right.png)

### FULL OUTER JOIN

**Keeps all rows from both tables**

![alt text](image_files/full.png)

**INNER JOIN + non-matching rows with NULLs**

## JOIN Example: Products and Vendors

In [None]:
%%writefile select.sql

SELECT p.price, p.description AS 'Item', v.vendorName AS 'Vendor'
FROM myProducts AS p, myVendors AS v
WHERE p.itemNumber = v.itemNumber;

In [None]:
# Execute join query
!sqlite3 test < select.sql

**What happened:**
1. Listed both tables in FROM
2. Created aliases (p and v)
3. Matched on itemNumber in WHERE
4. Selected columns from both tables
5. Renamed columns with AS

## Explicit JOIN Syntax

```sql
SELECT p.price, p.description AS 'Item', v.vendorName AS 'Vendor'
FROM myProducts AS p
    INNER JOIN myVendors AS v
    ON p.itemNumber = v.itemNumber;
```

**Both styles work - explicit JOIN is clearer!**

---
# LECTURE 7: Aggregation - Functions, GROUP BY, and HAVING
---

## DISTINCT

**Returns only unique values**

```sql
SELECT DISTINCT column
FROM table;
```

![alt text](image_files/f10-1.png)

In [None]:
!sqlite3 test "SELECT DISTINCT vendorNumber AS 'Vendor #' FROM myVendors;"

In [None]:
# DISTINCT on multiple columns
!sqlite3 test "SELECT DISTINCT vendorNumber AS 'Vendor #', itemNumber as 'Item #' FROM myVendors WHERE itemNumber > 5;"

**Note:** DISTINCT applies to entire row combination!

## ORDER BY

**Sorts query results**

```sql
SELECT columns
FROM table
ORDER BY column [ASC | DESC];
```

**Default:** Ascending (ASC)

![alt text](image_files/f11-1.png)

In [None]:
%%writefile orderby.sql

-- Order by single column
SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
    p.price AS "Price", p.itemNumber AS "Item #"
FROM myProducts AS p, myVendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
ORDER BY v.vendorNumber;

-- Order by multiple columns
SELECT v.vendorNumber AS "Vendor #", vendorName as "Vendor",
    p.price AS "Price", p.itemNumber AS "Item #"
FROM myProducts AS p, myVendors AS v
WHERE p.itemNumber = v.itemNumber AND p.price > 20.0
ORDER BY v.vendorNumber ASC, p.price DESC;

In [None]:
!sqlite3 test < orderby.sql

**Best practice:** Use column names, not numbers!

## SQL Math Operators

| Operator | Example | Description |
|----------|---------|-------------|
| + | `price + 10` | Addition |
| - | `price - 10` | Subtraction |
| * | `price * 1.0825` | Multiplication |
| / | `price / 100.0` | Division |
| unary - | `-price` | Negation |

**Example:**
```sql
SELECT price, price * 1.0825 AS priceWithTax
FROM myProducts;
```

## Aggregate Functions

**Compute values across multiple rows**

| Function | Example | Description |
|----------|---------|-------------|
| COUNT | `COUNT(price)` | Number of rows |
| AVG | `AVG(price)` | Average value |
| MAX | `MAX(price)` | Maximum value |
| MIN | `MIN(price)` | Minimum value |
| SUM | `SUM(price)` | Sum of values |

**Example query:**
```sql
SELECT COUNT(itemNumber) AS Number,
       AVG(price) AS Average,
       MIN(stockDate) AS "First Date",
       MAX(stockDate) AS "Last Date"
FROM myProducts;
```

## RANDOM and LIMIT

**Sample data randomly:**

![alt text](image_files/f12-1.png)

## GROUP BY

**Groups rows for aggregation**

![alt text](image_files/f13-1.png)

**Compute stats per group:**

![alt text](image_files/f14-1.png)

## HAVING Clause

**Filters groups (not rows!)**

```sql
SELECT columns
FROM table
GROUP BY grouping_column
HAVING condition_on_group;
```

**WHERE vs HAVING:**
- WHERE - Filters individual rows
- HAVING - Filters groups

![alt text](image_files/f15-1.png)

**Pandas equivalent:**
```python
df.groupby("type").filter(lambda f: condition)
```

---
# LECTURE 8: Data Modification - UPDATE, DELETE, and ALTER TABLE
---

## DELETE Statement

**Remove rows from table**

```sql
DELETE FROM tableName
[WHERE clause];
```

**⚠️ Warning:** Without WHERE, deletes ALL rows!

In [None]:
%%writefile delete.sql

-- Create temporary table
CREATE TABLE temp (aValue INT);

-- Insert data
INSERT INTO temp VALUES(0), (1), (2), (3);

-- Count rows
SELECT COUNT(*) AS COUNT FROM temp;

-- Delete all rows
DELETE FROM temp;

-- Verify deletion
SELECT COUNT(*) AS COUNT FROM temp;

-- Drop table
DROP TABLE temp;

In [None]:
!sqlite3 test < delete.sql

## DELETE with WHERE

In [None]:
%%writefile delete2.sql

-- Show original data
SELECT itemNumber, description FROM myProducts;

-- Delete specific rows
DELETE FROM myProducts
WHERE description LIKE '%towel%' OR itemNumber <= 3;

-- Confirm deletion
SELECT itemNumber, description FROM myProducts;

In [None]:
!sqlite3 test < delete2.sql

**This deleted 5 rows:**
- 2 towels (itemNumber 7, 8)
- 3 items with itemNumber ≤ 3

## UPDATE Statement

**Modify existing data**

```sql
UPDATE tableName
SET column = value [, column = value]*
[WHERE clause];
```

**⚠️ Warning:** Without WHERE, updates ALL rows!

In [None]:
%%writefile update.sql

-- Show original
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6;

-- Update row
UPDATE myProducts
SET price = price * 1.25, stockDate = date('now')
WHERE itemNumber = 6;

-- Show updated
SELECT itemNumber, price, stockDate FROM myProducts WHERE itemNumber = 6;

In [None]:
!sqlite3 test < update.sql

**Updated:**
- Price increased by 25%
- Stock date set to current date

## UPDATE with Subquery

In [None]:
%%writefile update2.sql

-- Update using subquery
UPDATE myProducts
SET price = price * 1.10, description = 'NEW: ' || description
WHERE itemNumber IN
    (SELECT v.itemNumber
     FROM myProducts as p, myVendors as v
     WHERE p.itemNumber = v.itemNumber AND v.vendorNumber = 3);

-- Show results
SELECT * FROM myProducts;

In [None]:
!sqlite3 test < update2.sql

**Subqueries in UPDATE:**
- **Scalar subquery** - Single value
- **Table subquery** - Multiple values (use with IN)

## ALTER TABLE

**Modify table structure**

**Approach:** Create new table, copy data, rename

**Why?** SQLite has limited ALTER TABLE support

In [None]:
%%writefile rename.sql

-- Create new table with extra column
CREATE TABLE newProducts (
    itemNumber INT NOT NULL,
    price REAL,
    stockDate TEXT,
    count INT NOT NULL DEFAULT 0,
    description TEXT
);

-- Copy data from old table
INSERT INTO newProducts(itemNumber, price, stockDate, description)
SELECT itemNumber, price, stockDate, description FROM myProducts;

-- Drop old table
DROP TABLE myProducts;

-- Rename new table
ALTER TABLE newProducts RENAME TO myProducts;

-- Show results
SELECT * FROM myProducts;

In [None]:
# Execute ALTER TABLE
!sqlite3 test < rename.sql

**Steps:**
1. CREATE new table with desired schema
2. INSERT data from old table
3. DROP old table
4. RENAME new table

---
# LECTURE 9: Advanced SQL - Indexes and Complex Joins
---

## CREATE INDEX

**Speeds up queries on specific columns**

```sql
CREATE [UNIQUE] INDEX idx_name
ON table_name(column [, column]*)
[WHERE condition];
```

**When to use:**
- Columns in WHERE clauses
- Columns in JOIN conditions
- Frequently searched columns

**Example:**
```sql
CREATE INDEX itn ON myProducts(itemNumber);
```

**UNIQUE index:** Only allows distinct values

## LIMIT Clause

**Restricts number of rows returned**

```sql
SELECT * FROM myProducts LIMIT 5;
```

**Useful for:**
- Testing queries
- Pagination
- Top N queries

## Common Table Expressions (CTEs)

**Break complex queries into manageable parts**

```sql
WITH
table_name1 AS (
    SELECT ...
),
table_name2 AS (
    SELECT ...
)
SELECT ...
FROM table_name1, table_name2, ...
```

**Example: Top action movies with popular actors**

```sql
WITH
good_action_movies AS (
    SELECT *
    FROM Title T JOIN Rating R ON T.tconst = R.tconst
    WHERE genres LIKE '%Action%' AND averageRating > 7 AND numVotes > 5000
),
prolific_actors AS (
    SELECT N.nconst, primaryName, COUNT(*) as numRoles
    FROM Name N JOIN Principal P ON N.nconst = P.nconst
    WHERE category = 'actor'
    GROUP BY N.nconst, primaryName
)
SELECT primaryTitle, primaryName, numRoles, ROUND(averageRating) AS rating
FROM good_action_movies m, prolific_actors a, principal p
WHERE p.tconst = m.tconst AND p.nconst = a.nconst
ORDER BY rating DESC, numRoles DESC
LIMIT 10;
```

**Benefits:**
- Organize complex logic
- Reuse intermediate results
- Improve readability

## Advanced JOIN Aliasing

**Create aliases without AS keyword:**

```sql
SELECT primaryTitle, averageRating
FROM Title T INNER JOIN Rating R
ON T.tconst = R.tconst;
```

**Both work - AS is more explicit!**

## Other Advanced Features

**Views:**
- Save queries as virtual tables
- Read-only
- Simplify complex queries

**Triggers:**
- Automatic actions on events
- INSERT, UPDATE, DELETE triggers
- Enforce business rules

**➡️ Explore these in the full notebook!**

---
# Practice Exercises
---

**The following exercises work through a flight database use case.**

**Setup:**

In [None]:
CSV_PATH = "2001.csv"

In [None]:
%%writefile airport.sql

DROP TABLE IF EXISTS flights;

CREATE TABLE flights (
    year INT,
    month INT,
    dayOfMonth INT,
    dayOfWeek INT,
    actualDepartureTime INT,
    scheduledDepartureTime INT,
    arrivalArrivalTime INT,
    scheduledArrivalTime INT,
    uniqueCarrierCode TEXT,
    flightNumber INT,
    tailNumber TEXT,
    actualElapsedTime INT,
    scheduledElapsedTime INT,
    airTime INT,
    arrivalDelay INT,
    departureDelay INT,
    originCode TEXT,
    destinationCode TEXT,
    distance INT,
    taxiIn INT,
    taxiOut INT,
    cancelled INT,
    cancellationCode TEXT,
    diverted INT,
    carrierDelay INT,
    weatherDelay INT,
    nasDelay INT,
    securityDelay INT,
    lateAircraftDelay INT
);

.separator ,
.import 2001.csv flights

-- Delete header row
DELETE FROM flights WHERE Year='Year';

In [None]:
!sqlite3 assignment.db < airport.sql

## Exercise 1: Flight Count

Write an SQL statement that counts the number of rows in the flights table.

In [None]:
%%writefile count_lines_flights.sql

-- YOUR CODE HERE

In [None]:
nlines_flights = !sqlite3 assignment.db < count_lines_flights.sql
print(nlines_flights.s)
assert nlines_flights.s == "49"

## Exercise 2: Creating Another Table

Import iata.csv and create a table named iata with columns:

`airportID, name, city, country, iata, icao, latitude, longitude, altitude, timeZone, dst, tzDatabaseTimeZone`

Use correct data types (TEXT for quoted values, REAL for decimals, INT for integers).

In [None]:
%%writefile import_iata.sql

-- YOUR CODE HERE

In [None]:
!sqlite3 assignment.db < import_iata.sql

In [None]:
!sqlite3 assignment.db "SELECT * FROM iata LIMIT 10"

In [None]:
iata_exists = !sqlite3 assignment.db "SELECT name FROM sqlite_master WHERE type='table' and name='iata'"
assert iata_exists.s == "iata"

In [None]:
iata_info = !sqlite3 assignment.db "PRAGMA table_info(iata)"
iata_names = [i.split("|")[1] for i in iata_info]
iata_names_answer = [
    "airportID", "name", "city", "country", "iata", "icao",
    "latitude", "longitude", "altitude", "timeZone", "dst", "tzDatabaseTimeZone"
]
assert len(iata_names) == len(iata_names_answer)
assert set(iata_names) == set(iata_names_answer)

## Exercise 3: Joining Tables

Join flights and iata tables to create myTable with columns:

`month, dayOfMonth, uniqueCarrierCode, flightNumber, scheduledDepartureTime, diverted, city`

Convert destinationCode (IATA codes) to full city names.

In [None]:
%%writefile join.sql

-- YOUR CODE HERE

In [None]:
!sqlite3 assignment.db < join.sql

In [None]:
!sqlite3 assignment.db "SELECT * FROM myTable LIMIT 10;"

## Exercise 4: Inserting

Insert a new row into myTable for a flight on September 9, 2001:
- uniqueCarrierCode: INFO
- flightNumber: 490
- scheduledDepartureTime: 0800
- diverted: 1
- city: San Francisco

In [None]:
%%writefile insert.sql

-- YOUR CODE HERE

In [None]:
!sqlite3 assignment.db < insert.sql

In [None]:
info_month = !sqlite3 assignment.db "SELECT month FROM myTable WHERE uniqueCarrierCode='INFO';"
info_day = !sqlite3 assignment.db "SELECT dayOfMonth FROM myTable WHERE uniqueCarrierCode='INFO';"
info_flight_no = !sqlite3 assignment.db "SELECT flightNumber FROM myTable WHERE uniqueCarrierCode='INFO';"
info_crs_dep = !sqlite3 assignment.db "SELECT scheduledDepartureTime FROM myTable WHERE uniqueCarrierCode='INFO';"
info_diverted = !sqlite3 assignment.db "SELECT diverted FROM myTable WHERE uniqueCarrierCode='INFO';"
info_dest = !sqlite3 assignment.db "SELECT city FROM myTable WHERE uniqueCarrierCode='INFO';"

print('''
UniqueCarrierCode: {0}
Month: {1}
Day: {2}
Flight Number: {3}
Scheduled Departure Time: {4}
Diverted: {5}
Origin City: {6}
'''.format("INFO", info_month.s, info_day.s, info_flight_no.s,
    info_crs_dep.s, info_diverted.s, info_dest.s))

In [None]:
assert "9" == info_month.s
assert "9" == info_day.s
assert "490" == info_flight_no.s
assert "800" == info_crs_dep.s
assert "1" == info_diverted.s
assert "San Francisco" == info_dest.s

## Exercise 5: Query Maximum

Compute the maximum departureDelay in the flights table.

In [None]:
%%writefile get_maximum_depdelay.sql

-- YOUR CODE HERE

In [None]:
maximum_depdelay = !sqlite3 assignment.db < get_maximum_depdelay.sql
print(maximum_depdelay)
assert maximum_depdelay.s == '100'

---
# Summary
---

**You've learned:**
- Why databases are essential for data science
- ACID properties and database reliability
- SQL basics: CREATE, INSERT, SELECT, UPDATE, DELETE
- JOINs to combine multiple tables
- Aggregate functions and GROUP BY
- Indexes for performance
- CTEs for complex queries

**Next steps:**
- Review the full notebook for details
- Complete the practice exercises
- Explore advanced SQL features
- Practice with real datasets!