# Homework: SQL SELECT Statements with Northwind Database

**Name:** _________________________

**Date:** _________________________

---

**Objective:** Practice writing SQL SELECT statements, filtering data with WHERE clauses, using JOINs to combine tables, and performing aggregate operations.

**Database:** Northwind (classic sales database with customers, orders, products, employees)

---

## Instructions
1. Complete all exercises below
2. Write your SQL queries in the provided code cells
3. Run each cell to verify your queries work
4. Ensure your output makes sense for the question asked
5. Submit your completed notebook

**Tip:** Use the SOLUTIONS notebook to check your work if you get stuck!

## Part 0: Database Setup

**Run this cell first!** This cell will:
1. Import necessary libraries
2. Set database parameters
3. Terminate any active connections to the database
4. Drop and recreate the Northwind database
5. Load the Northwind SQL file
6. Create a SQLAlchemy engine and test the connection

**You don't need to modify this cell - just run it!**

In [None]:
# Import libraries
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, text
import subprocess

# Database parameters
db_params = {
    'host': 'localhost',
    'database': 'northwind',
    'user': 'student',
    'password': ''
}

# Step 1: Terminate active connections and recreate database
print("Step 1: Setting up database...")
terminate_cmd = f"psql -U {db_params['user']} -d postgres -c \"SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = '{db_params['database']}' AND pid <> pg_backend_pid();\""
drop_cmd = f"psql -U {db_params['user']} -d postgres -c 'DROP DATABASE IF EXISTS {db_params['database']};'"
create_cmd = f"psql -U {db_params['user']} -d postgres -c 'CREATE DATABASE {db_params['database']};'"

subprocess.run(terminate_cmd, shell=True, capture_output=True)
subprocess.run(drop_cmd, shell=True, capture_output=True)
result = subprocess.run(create_cmd, shell=True, capture_output=True, text=True)
print(f"Database created: {result.stdout.strip()}")

# Step 2: Load Northwind SQL file
print("\nStep 2: Loading Northwind database...")
sql_file = "/workspaces/Fall2025-MS3083-Base_Template/databases/northwind.sql"
load_cmd = f"psql -U {db_params['user']} -d {db_params['database']} -f {sql_file}"
result = subprocess.run(load_cmd, shell=True, capture_output=True, text=True)
print("Northwind database loaded successfully!")

# Step 3: Create SQLAlchemy engine
print("\nStep 3: Creating database connection...")
engine = create_engine(
    f"postgresql://{db_params['user']}@{db_params['host']}/{db_params['database']}"
)

# Test connection
with engine.connect() as conn:
    result = conn.execute(text("SELECT version();"))
    version = result.fetchone()[0]
    print(f"✓ Connected to: {version[:50]}...")

print("\n✓ Setup complete! Ready to run queries.")

## Part 1: Exploring the Database

Before writing queries, let's explore what tables and data are available.

### Exercise 1.1: List all tables

Write a query to show all tables in the `northwind` schema.

**Hint:** Use `information_schema.tables` with `WHERE table_schema = 'northwind'`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Total tables: {len(df)}\n")
df

### Exercise 1.2: Explore the Products table

Write a query to show the first 5 products with all their columns.

**Hint:** Use `SELECT *` and `LIMIT 5`

In [None]:
# YOUR CODE HERE
query = """

"""

pd.read_sql(text(query), engine)

### Exercise 1.3: Count records in each table

Write queries to count how many records are in the `products`, `customers`, `orders`, and `employees` tables.

**Hint:** Use `COUNT(*)` for each table

In [None]:
# YOUR CODE HERE
tables = ['products', 'customers', 'orders', 'employees']
counts = {}

for table in tables:
    query = f"""  # Write your query here
    
    """
    result = pd.read_sql(text(query), engine)
    counts[table] = result['count'][0]

pd.DataFrame(list(counts.items()), columns=['Table', 'Record Count'])

## Part 2: Basic SELECT Statements

Practice selecting specific columns and filtering data.

### Exercise 2.1: Select specific columns

Select only the `product_name`, `unit_price`, and `units_in_stock` from the products table.

**Hint:** List the column names after SELECT, separated by commas

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Total products: {len(df)}\n")
df.head(10)

### Exercise 2.2: Filter with WHERE clause

Find all products where the `unit_price` is greater than 50.

**Hint:** Use `WHERE unit_price > 50`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Products with price > $50: {len(df)}\n")
df

### Exercise 2.3: Multiple conditions

Find products where `unit_price` is between 20 and 50 AND `units_in_stock` is greater than 0.

**Hint:** Use `WHERE unit_price BETWEEN 20 AND 50 AND units_in_stock > 0`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Products matching criteria: {len(df)}\n")
df

### Exercise 2.4: Using LIKE for pattern matching

Find all customers whose `company_name` starts with the letter 'A'.

**Hint:** Use `WHERE company_name LIKE 'A%'`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Customers starting with 'A': {len(df)}\n")
df

### Exercise 2.5: Using IN for multiple values

Find all customers located in 'USA', 'Canada', or 'Mexico'.

**Hint:** Use `WHERE country IN ('USA', 'Canada', 'Mexico')`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"North American customers: {len(df)}\n")
df

## Part 3: JOINs - Combining Tables

Practice joining multiple tables to get related information.

### Exercise 3.1: INNER JOIN - Products with Categories

Join the `products` and `categories` tables to show product names with their category names.

**Hint:** Use `INNER JOIN northwind.categories c ON p.category_id = c.category_id`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Total products with categories: {len(df)}\n")
df.head(10)

### Exercise 3.2: Multiple JOINs - Orders with Customer and Employee Info

Join `orders`, `customers`, and `employees` to show:
- Order ID
- Customer company name
- Employee first and last name (concatenated)
- Order date

**Hint:** Use `||` to concatenate strings: `e.first_name || ' ' || e.last_name`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Total orders: {len(df)}\n")
df.head(10)

### Exercise 3.3: JOIN with ORDER BY - Products by Supplier

Join `products` and `suppliers` to show products sorted by supplier name.

**Hint:** Join on `supplier_id` and use `ORDER BY` on the supplier's company name

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Total products: {len(df)}\n")
df.head(15)

### Exercise 3.4: Complex JOIN - Order Details with Full Information

Join `order_details`, `orders`, `products`, and `customers` to show:
- Order ID
- Customer company name
- Product name
- Quantity
- Unit price
- Line total (quantity × unit_price)

**Hint:** Calculate line total as `(od.quantity * od.unit_price)`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Total order line items: {len(df)}\n")
df.head(15)

## Part 4: Aggregate Functions and GROUP BY

Practice using aggregate functions to summarize data.

### Exercise 4.1: Count products by category

Show how many products are in each category.

**Hint:** Use `COUNT(p.product_id)` with `GROUP BY c.category_name`

In [None]:
# YOUR CODE HERE
query = """

"""

pd.read_sql(text(query), engine)

### Exercise 4.2: Average, Min, and Max prices by category

Calculate the average, minimum, and maximum price for products in each category.

**Hint:** Use `AVG()`, `MIN()`, and `MAX()` functions

In [None]:
# YOUR CODE HERE
query = """

"""

pd.read_sql(text(query), engine)

### Exercise 4.3: Total sales by customer

Calculate the total sales amount for each customer (sum of quantity × unit_price from order_details).

**Hint:** Use `SUM(od.quantity * od.unit_price)` and join multiple tables

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print("Top 10 Customers by Total Sales\n")
df

### Exercise 4.4: HAVING clause - Categories with high average price

Find categories where the average product price is greater than 30.

**Hint:** Use `HAVING AVG(p.unit_price) > 30` after GROUP BY

In [None]:
# YOUR CODE HERE
query = """

"""

pd.read_sql(text(query), engine)

### Exercise 4.5: Orders per employee

Show the number of orders handled by each employee, sorted by order count.

**Hint:** Use `COUNT(o.order_id)` and `GROUP BY` employee information

In [None]:
# YOUR CODE HERE
query = """

"""

pd.read_sql(text(query), engine)

## Part 5: Advanced Queries

Combine multiple concepts to answer business questions.

### Exercise 5.1: Products that need reordering

Find products where `units_in_stock` is less than or equal to `reorder_level` and the product is not discontinued.

**Hint:** Use multiple conditions in WHERE clause: `discontinued = FALSE`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Products needing reorder: {len(df)}\n")
df

### Exercise 5.2: Most expensive order

Find the top 5 orders with the highest total value (sum of quantity × unit_price).

**Hint:** Use `SUM()`, `GROUP BY order_id`, and `LIMIT 5`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print("Top 5 Most Expensive Orders\n")
df

### Exercise 5.3: Customer order frequency

Show customers who have placed more than 10 orders, with their total order count and total sales.

**Hint:** Use `HAVING COUNT(DISTINCT o.order_id) > 10`

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print(f"Customers with more than 10 orders: {len(df)}\n")
df

### Exercise 5.4: Product popularity

Find the top 10 most frequently ordered products (by total quantity sold).

**Hint:** Join products with order_details, sum quantities, and order by total

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print("Top 10 Most Popular Products\n")
df

### Exercise 5.5: Sales by country

Calculate total sales for each country, showing only countries with total sales over 10000.

**Hint:** Use customer country, join to orders and order_details, sum sales, use HAVING

In [None]:
# YOUR CODE HERE
query = """

"""

df = pd.read_sql(text(query), engine)
print("Countries with Total Sales > $10,000\n")
df

## Summary

Great work! You've practiced:
- ✓ Basic SELECT statements with specific columns
- ✓ Filtering data with WHERE, LIKE, IN, and BETWEEN
- ✓ INNER JOINs to combine related tables
- ✓ Multiple JOINs across 3-4 tables
- ✓ Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
- ✓ GROUP BY for summarizing data
- ✓ HAVING clause for filtering grouped results
- ✓ Complex queries combining multiple concepts

These are essential SQL skills for data analysis and database work!

---

## Submission Checklist
- [ ] All code cells run without errors
- [ ] Query results make sense for each question
- [ ] Name and date filled in at the top
- [ ] Notebook saved

**Submit this completed notebook to your instructor.**