# Module 00: Setup & Introduction to SQL

**Estimated Time:** 30 minutes

## Learning Objectives

By the end of this module, you will be able to:
- Set up your SQL development environment
- Connect to SQLite databases using Python
- Use SQL magic commands in Jupyter notebooks
- Explore database schema and structure
- Run basic SQL queries
- Understand the sample databases used in this course

## What is SQL?

**SQL (Structured Query Language)** is a standard language for managing and manipulating relational databases. It allows you to:
- Query data (SELECT)
- Insert data (INSERT)
- Update data (UPDATE)
- Delete data (DELETE)
- Create database structures (CREATE)
- Control access (GRANT, REVOKE)

SQL is used by data analysts, data scientists, software developers, and database administrators across all industries.

## 1. Environment Setup

Let's verify that all required libraries are installed and working correctly.

In [None]:
# Import required libraries
import sqlite3
import pandas as pd
import numpy as np
from pathlib import Path
import sqlalchemy
from sqlalchemy import create_engine

print("✓ All libraries imported successfully!")
print(f"\nLibrary versions:")
print(f"  - pandas: {pd.__version__}")
print(f"  - numpy: {np.__version__}")
print(f"  - sqlalchemy: {sqlalchemy.__version__}")
print(f"  - sqlite3: {sqlite3.version}")

In [None]:
# Load SQL magic extension for Jupyter
%load_ext sql

print("✓ SQL magic commands enabled!")
print("\nYou can now use:")
print("  - %sql for single-line SQL")
print("  - %%sql for multi-line SQL cells")

## 2. Connect to Sample Databases

We'll be working with three sample databases throughout this course:

1. **ecommerce.db** - Online store with products, orders, and customers
2. **employees.db** - Company database with departments, employees, and salaries
3. **sales.db** - Sales tracking with transactions, regions, and performance

Let's verify they exist and connect to them.

In [None]:
# Set up database paths
BASE_DIR = Path.cwd().parent
DB_DIR = BASE_DIR / "data" / "databases"

# Database paths
ECOMMERCE_DB = DB_DIR / "ecommerce.db"
EMPLOYEES_DB = DB_DIR / "employees.db"
SALES_DB = DB_DIR / "sales.db"

# Check if databases exist
databases = {"ecommerce.db": ECOMMERCE_DB, "employees.db": EMPLOYEES_DB, "sales.db": SALES_DB}

print("Database Status:")
print("=" * 60)
all_exist = True
for name, path in databases.items():
    exists = path.exists()
    status = "✓ Found" if exists else "✗ Missing"
    print(f"{status}: {name}")
    if not exists:
        all_exist = False

print("=" * 60)

if not all_exist:
    print("\n⚠ Some databases are missing!")
    print("Please run: python scripts/setup_databases.py")
else:
    print("\n✓ All databases ready!")

In [None]:
# Connect to ecommerce database (our primary database for this module)
conn = sqlite3.connect(ECOMMERCE_DB)
cursor = conn.cursor()

print("✓ Connected to ecommerce.db")
print("\nConnection object:", type(conn))
print("Cursor object:", type(cursor))

In [None]:
# Also create SQLAlchemy engine (for pandas integration)
engine = create_engine(f"sqlite:///{ECOMMERCE_DB}")

print("✓ SQLAlchemy engine created")
print(f"Database URL: sqlite:///{ECOMMERCE_DB}")

In [None]:
# Set default database for SQL magic commands
%sql sqlite:///$ECOMMERCE_DB

print("✓ Default database set for SQL magic commands")

## 3. Exploring Database Schema

Before querying data, it's important to understand the database structure:
- What tables exist?
- What columns are in each table?
- What are the data types?
- How are tables related?

In [None]:
# List all tables in the database
cursor.execute(
    """
    SELECT name 
    FROM sqlite_master 
    WHERE type='table'
    ORDER BY name
"""
)

tables = cursor.fetchall()

print("Tables in ecommerce.db:")
print("=" * 60)
for i, table in enumerate(tables, 1):
    print(f"{i}. {table[0]}")
print("=" * 60)

In [None]:
# Function to display table schema
def show_table_info(table_name):
    """Display detailed information about a table."""
    print(f"\nTable: {table_name}")
    print("=" * 80)

    # Get column information
    cursor.execute(f"PRAGMA table_info({table_name})")
    columns = cursor.fetchall()

    print(f"\nColumns ({len(columns)}):")
    print("-" * 80)
    print(f"{'Column Name':<20} {'Type':<15} {'Not Null':<10} {'Primary Key'}")
    print("-" * 80)

    for col in columns:
        col_id, name, col_type, not_null, default, pk = col
        print(f"{name:<20} {col_type:<15} {str(bool(not_null)):<10} {str(bool(pk))}")

    # Get row count
    cursor.execute(f"SELECT COUNT(*) FROM {table_name}")
    count = cursor.fetchone()[0]
    print("-" * 80)
    print(f"Total Rows: {count}")
    print("=" * 80)


# Display info for all tables
for table in tables:
    show_table_info(table[0])

## 4. Your First SQL Queries

Let's run some basic queries to explore the data. We'll use three methods:
1. Direct sqlite3 cursor
2. pandas read_sql
3. SQL magic commands

### Method 1: Using sqlite3 cursor

In [None]:
# Query categories using cursor
cursor.execute("SELECT * FROM categories")
results = cursor.fetchall()

print("Categories (using cursor):")
print("=" * 60)
for row in results:
    print(row)

### Method 2: Using pandas (Recommended for data analysis)

In [None]:
# Query categories using pandas
df_categories = pd.read_sql_query("SELECT * FROM categories", conn)

print("Categories (using pandas):")
print("=" * 60)
display(df_categories)

### Method 3: Using SQL Magic Commands (Great for learning)

In [None]:
%%sql
SELECT * FROM categories

## 5. Exploring Sample Data

Let's look at some sample data from each table to understand what we're working with.

In [None]:
# Products - first 10 rows
%%sql
SELECT * FROM products LIMIT 10

In [None]:
# Customers - first 10 rows
%%sql
SELECT * FROM customers LIMIT 10

In [None]:
# Orders - first 10 rows
%%sql
SELECT * FROM orders LIMIT 10

In [None]:
# Order items - first 10 rows
%%sql
SELECT * FROM order_items LIMIT 10

## 6. Understanding Table Relationships

Our ecommerce database has the following relationships:

```
categories
    |
    | (one-to-many)
    |
products
    |
    | (many-to-many through order_items)
    |
order_items ---- orders ---- customers
```

- A **category** can have many **products**
- A **customer** can have many **orders**
- An **order** can have many **order_items**
- A **product** can appear in many **order_items**

In [None]:
# Example: Show products with their category names
%%sql
SELECT 
    p.product_name,
    c.category_name,
    p.price,
    p.stock_quantity
FROM products p
JOIN categories c ON p.category_id = c.category_id
LIMIT 10

## 7. Basic SELECT Syntax Review

The basic SQL SELECT statement has this structure:

```sql
SELECT column1, column2, ...
FROM table_name
WHERE condition
ORDER BY column
LIMIT number;
```

Let's practice:

In [None]:
# Select specific columns
%%sql
SELECT product_name, price 
FROM products
LIMIT 5

In [None]:
# Select with WHERE condition
%%sql
SELECT product_name, price 
FROM products
WHERE price > 100
LIMIT 5

In [None]:
# Select with ORDER BY
%%sql
SELECT product_name, price 
FROM products
ORDER BY price DESC
LIMIT 5

## 8. Quick Database Statistics

Let's get an overview of our data using COUNT and aggregate functions.

In [None]:
# Database statistics
stats_queries = {
    "Total Products": "SELECT COUNT(*) FROM products",
    "Total Customers": "SELECT COUNT(*) FROM customers",
    "Total Orders": "SELECT COUNT(*) FROM orders",
    "Total Order Items": "SELECT COUNT(*) FROM order_items",
    "Average Product Price": "SELECT ROUND(AVG(price), 2) FROM products",
    "Average Order Amount": "SELECT ROUND(AVG(total_amount), 2) FROM orders",
}

print("E-Commerce Database Statistics")
print("=" * 60)
for stat_name, query in stats_queries.items():
    cursor.execute(query)
    result = cursor.fetchone()[0]
    print(f"{stat_name:<30} {result}")
print("=" * 60)

## 9. Exercises

Try these exercises to practice what you've learned:

### Exercise 1: List all customer cities
Write a query to show all unique cities where customers are located.

In [None]:
# Your code here
%%sql
-- SELECT DISTINCT ...

### Exercise 2: Find expensive products
List all products with a price greater than $100, ordered by price (highest first).

In [None]:
# Your code here
%%sql
-- SELECT product_name, price FROM products WHERE ...

### Exercise 3: Count products by category
How many products are in each category?

In [None]:
# Your code here
%%sql
-- SELECT c.category_name, COUNT(*) as product_count ...

## 10. Cleanup and Next Steps

Always close database connections when you're done!

In [None]:
# Close the connection
conn.close()
print("✓ Database connection closed")

## Summary

In this module, you learned:
- ✓ How to set up your SQL environment in Jupyter
- ✓ How to connect to SQLite databases
- ✓ Three methods for running SQL queries (cursor, pandas, SQL magic)
- ✓ How to explore database schema and structure
- ✓ Basic SELECT statement syntax
- ✓ How tables are related in a relational database

## Next Steps

In **Module 01: SELECT, FROM, WHERE**, you'll learn:
- Advanced SELECT techniques
- Column aliases and expressions
- Complex WHERE conditions
- Working with NULL values
- Pattern matching with LIKE

## Additional Resources

- SQL Cheat Sheet: `docs/SQL_CHEAT_SHEET.md`
- SQL Glossary: `docs/SQL_GLOSSARY.md`
- FAQ: `docs/FAQ.md`

**Great work!** You're ready to move on to Module 01.