# Homework: SQL SELECT Statements with Northwind Database

**Name:_Luis Amaya De Leon_

**Date:** _11/16/2025_

---

**Objective:** Practice writing SQL SELECT statements, filtering data with WHERE clauses, using JOINs to combine tables, and performing aggregate operations.

**Database:** Northwind (classic sales database with customers, orders, products, employees)

---

## Instructions
1. Complete all exercises below
2. Write your SQL queries in the provided code cells
3. Run each cell to verify your queries work
4. Ensure your output makes sense for the question asked
5. Submit your completed notebook

**Tip:** Use the SOLUTIONS notebook to check your work if you get stuck!

## Part 0: Database Setup

**Run this cell first!** This cell will:
1. Import necessary libraries
2. Set database parameters
3. Terminate any active connections to the database
4. Drop and recreate the Northwind database
5. Load the Northwind SQL file
6. Create a SQLAlchemy engine and test the connection

**You don't need to modify this cell - just run it!**

In [23]:
# Import libraries
import pandas as pd
import psycopg2
from sqlalchemy import create_engine, text
import subprocess

# Database parameters
db_params = {
    'host': 'localhost',
    'database': 'northwind',
    'user': 'student',
    'password': ''
}

# Step 1: Terminate active connections and recreate database
print("Step 1: Setting up database...")
terminate_cmd = f"psql -U {db_params['user']} -d postgres -c \"SELECT pg_terminate_backend(pg_stat_activity.pid) FROM pg_stat_activity WHERE pg_stat_activity.datname = '{db_params['database']}' AND pid <> pg_backend_pid();\""
drop_cmd = f"psql -U {db_params['user']} -d postgres -c 'DROP DATABASE IF EXISTS {db_params['database']};'"
create_cmd = f"psql -U {db_params['user']} -d postgres -c 'CREATE DATABASE {db_params['database']};'"

subprocess.run(terminate_cmd, shell=True, capture_output=True)
subprocess.run(drop_cmd, shell=True, capture_output=True)
result = subprocess.run(create_cmd, shell=True, capture_output=True, text=True)
print(f"Database created: {result.stdout.strip()}")

# Step 2: Load Northwind SQL file
print("\nStep 2: Loading Northwind database...")
sql_file = "/workspaces/assignment-1-version-3-Legion-codex/databases/northwind.sql"
load_cmd = f"psql -U {db_params['user']} -d {db_params['database']} -f {sql_file}"
result = subprocess.run(load_cmd, shell=True, capture_output=True, text=True)
print("Northwind database loaded successfully!")

# Step 3: Create SQLAlchemy engine
print("\nStep 3: Creating database connection...")
engine = create_engine(
    f"postgresql://{db_params['user']}@{db_params['host']}/{db_params['database']}"
)

# Test connection
with engine.connect() as conn:
    result = conn.execute(text("SELECT version();"))
    version = result.fetchone()[0]
    print(f"✓ Connected to: {version[:50]}...")

print("\n✓ Setup complete! Ready to run queries.")

Step 1: Setting up database...
Database created: CREATE DATABASE

Step 2: Loading Northwind database...
Northwind database loaded successfully!

Step 3: Creating database connection...
✓ Connected to: PostgreSQL 18.0 on x86_64-conda-linux-gnu, compile...

✓ Setup complete! Ready to run queries.
Northwind database loaded successfully!

Step 3: Creating database connection...
✓ Connected to: PostgreSQL 18.0 on x86_64-conda-linux-gnu, compile...

✓ Setup complete! Ready to run queries.


## Part 1: Exploring the Database

Before writing queries, let's explore what tables and data are available.

### Exercise 1.1: List all tables

Write a query to show all tables in the `northwind` schema.

**Hint:** Use `information_schema.tables` with `WHERE table_schema = 'northwind'`

In [2]:
# YOUR CODE HERE
query = """
SELECT 
    table_schema,     -- Schema name (like a folder for tables)
    table_name        -- Name of the table
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'     -- Only actual tables (not views)
  AND table_schema NOT IN ('pg_catalog', 'information_schema')  -- Exclude system tables
ORDER BY table_schema, table_name;  -- Sort by schema, then table name
"""

df = pd.read_sql(text(query), engine)
print(f"Total tables: {len(df)}\n")
df

Total tables: 8



Unnamed: 0,table_schema,table_name
0,northwind,categories
1,northwind,customers
2,northwind,employees
3,northwind,order_details
4,northwind,orders
5,northwind,products
6,northwind,shippers
7,northwind,suppliers


### Exercise 1.2: Explore the Products table

Write a query to show the first 5 products with all their columns.

**Hint:** Use `SELECT *` and `LIMIT 5`

In [3]:
# Exercise 1.2: First 5 products (all columns)
query = """
SELECT *
FROM northwind.products
LIMIT 5;
"""

pd.read_sql(text(query), engine)

Unnamed: 0,product_id,product_name,supplier_id,category_id,quantity_per_unit,unit_price,units_in_stock,units_on_order,reorder_level,discontinued
0,1,Chai,1,1,,18.0,39,0,0,False
1,2,Chang,1,1,,19.0,17,0,0,False
2,3,Aniseed Syrup,1,2,,10.0,13,0,0,False
3,4,Chef Anton's Cajun Seasoning,2,2,,22.0,53,0,0,False
4,5,Chef Anton's Gumbo Mix,2,2,,21.35,0,0,0,False


### Exercise 1.3: Count records in each table

Write queries to count how many records are in the `products`, `customers`, `orders`, and `employees` tables.

**Hint:** Use `COUNT(*)` for each table

In [4]:
# Exercise 1.3: Record counts for key tables
tables = ['products', 'customers', 'orders', 'employees']
counts = []

for table in tables:
    query = f"SELECT COUNT(*) AS count FROM northwind.{table};"
    count_df = pd.read_sql(text(query), engine)
    counts.append({'Table': table, 'Record Count': int(count_df.loc[0, 'count'])})

pd.DataFrame(counts)

Unnamed: 0,Table,Record Count
0,products,10
1,customers,5
2,orders,5
3,employees,5


## Part 2: Basic SELECT Statements

Practice selecting specific columns and filtering data.

### Exercise 2.1: Select specific columns

Select only the `product_name`, `unit_price`, and `units_in_stock` from the products table.

**Hint:** List the column names after SELECT, separated by commas

In [5]:
# Exercise 2.1: Select specific columns
query = """
SELECT product_name, unit_price, units_in_stock
FROM northwind.products;
"""

df = pd.read_sql(text(query), engine)
print(f"Total products: {len(df)}\n")
df.head(10)

Total products: 10



Unnamed: 0,product_name,unit_price,units_in_stock
0,Chai,18.0,39
1,Chang,19.0,17
2,Aniseed Syrup,10.0,13
3,Chef Anton's Cajun Seasoning,22.0,53
4,Chef Anton's Gumbo Mix,21.35,0
5,Grandma's Boysenberry Spread,25.0,120
6,Uncle Bob's Organic Dried Pears,30.0,15
7,Northwoods Cranberry Sauce,40.0,6
8,Mishi Kobe Niku,97.0,29
9,Ikura,31.0,31


### Exercise 2.2: Filter with WHERE clause

Find all products where the `unit_price` is greater than 50.

**Hint:** Use `WHERE unit_price > 50`

In [6]:
# Exercise 2.2: Products with unit_price > 50
query = """
SELECT product_id, product_name, unit_price, units_in_stock
FROM northwind.products
WHERE unit_price > 50
ORDER BY unit_price DESC;
"""

df = pd.read_sql(text(query), engine)
print(f"Products with price > $50: {len(df)}\n")
df

Products with price > $50: 1



Unnamed: 0,product_id,product_name,unit_price,units_in_stock
0,9,Mishi Kobe Niku,97.0,29


### Exercise 2.3: Multiple conditions

Find products where `unit_price` is between 20 and 50 AND `units_in_stock` is greater than 0.

**Hint:** Use `WHERE unit_price BETWEEN 20 AND 50 AND units_in_stock > 0`

In [7]:
# Exercise 2.3: Products priced between 20 and 50 and in stock
query = """
SELECT product_id, product_name, unit_price, units_in_stock
FROM northwind.products
WHERE unit_price BETWEEN 20 AND 50
  AND units_in_stock > 0
ORDER BY unit_price, product_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Products matching criteria: {len(df)}\n")
df

Products matching criteria: 5



Unnamed: 0,product_id,product_name,unit_price,units_in_stock
0,4,Chef Anton's Cajun Seasoning,22.0,53
1,6,Grandma's Boysenberry Spread,25.0,120
2,7,Uncle Bob's Organic Dried Pears,30.0,15
3,10,Ikura,31.0,31
4,8,Northwoods Cranberry Sauce,40.0,6


### Exercise 2.4: Using LIKE for pattern matching

Find all customers whose `company_name` starts with the letter 'A'.

**Hint:** Use `WHERE company_name LIKE 'A%'`

In [8]:
# Exercise 2.4: Customers whose company_name starts with 'A'
query = """
SELECT customer_id, company_name, contact_name, country
FROM northwind.customers
WHERE company_name LIKE 'A%'
ORDER BY company_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Customers starting with 'A': {len(df)}\n")
df

Customers starting with 'A': 4



Unnamed: 0,customer_id,company_name,contact_name,country
0,ALFKI,Alfreds Futterkiste,Maria Anders,Germany
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Mexico
2,ANTON,Antonio Moreno Taquería,Antonio Moreno,Mexico
3,AROUT,Around the Horn,Thomas Hardy,UK


### Exercise 2.5: Using IN for multiple values

Find all customers located in 'USA', 'Canada', or 'Mexico'.

**Hint:** Use `WHERE country IN ('USA', 'Canada', 'Mexico')`

In [9]:
# Exercise 2.5: Customers in USA, Canada, or Mexico
query = """
SELECT customer_id, company_name, contact_name, country
FROM northwind.customers
WHERE country IN ('USA', 'Canada', 'Mexico')
ORDER BY country, company_name;
"""

df = pd.read_sql(text(query), engine)
print(f"North American customers: {len(df)}\n")
df

North American customers: 2



Unnamed: 0,customer_id,company_name,contact_name,country
0,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Mexico
1,ANTON,Antonio Moreno Taquería,Antonio Moreno,Mexico


## Part 3: JOINs - Combining Tables

Practice joining multiple tables to get related information.

### Exercise 3.1: INNER JOIN - Products with Categories

Join the `products` and `categories` tables to show product names with their category names.

**Hint:** Use `INNER JOIN northwind.categories c ON p.category_id = c.category_id`

In [10]:
# Exercise 3.1: Products with their Category names
query = """
SELECT p.product_id,
       p.product_name,
       c.category_name
FROM northwind.products p
INNER JOIN northwind.categories c ON p.category_id = c.category_id
ORDER BY c.category_name, p.product_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Total products with categories: {len(df)}\n")
df.head(10)

Total products with categories: 10



Unnamed: 0,product_id,product_name,category_name
0,1,Chai,Beverages
1,2,Chang,Beverages
2,3,Aniseed Syrup,Condiments
3,4,Chef Anton's Cajun Seasoning,Condiments
4,5,Chef Anton's Gumbo Mix,Condiments
5,6,Grandma's Boysenberry Spread,Condiments
6,8,Northwoods Cranberry Sauce,Condiments
7,10,Ikura,Confections
8,9,Mishi Kobe Niku,Produce
9,7,Uncle Bob's Organic Dried Pears,Seafood


### Exercise 3.2: Multiple JOINs - Orders with Customer and Employee Info

Join `orders`, `customers`, and `employees` to show:
- Order ID
- Customer company name
- Employee first and last name (concatenated)
- Order date

**Hint:** Use `||` to concatenate strings: `e.first_name || ' ' || e.last_name`

In [11]:
# Exercise 3.2: Orders joined with Customer and Employee info
query = """
SELECT o.order_id,
       c.company_name AS customer_company,
       e.first_name || ' ' || e.last_name AS employee_name,
       o.order_date
FROM northwind.orders o
INNER JOIN northwind.customers c ON o.customer_id = c.customer_id
INNER JOIN northwind.employees e ON o.employee_id = e.employee_id
ORDER BY o.order_date;
"""

df = pd.read_sql(text(query), engine)
print(f"Total orders returned: {len(df)}\n")
df.head(10)

Total orders returned: 5



Unnamed: 0,order_id,customer_company,employee_name,order_date
0,1,Alfreds Futterkiste,Nancy Davolio,1996-07-04
1,2,Ana Trujillo Emparedados y helados,Andrew Fuller,1996-07-05
2,3,Antonio Moreno Taquería,Janet Leverling,1996-07-08
3,4,Around the Horn,Margaret Peacock,1996-07-08
4,5,Berglunds snabbköp,Steven Buchanan,1996-07-09


### Exercise 3.3: JOIN with ORDER BY - Products by Supplier

Join `products` and `suppliers` to show products sorted by supplier name.

**Hint:** Join on `supplier_id` and use `ORDER BY` on the supplier's company name

In [12]:
# Exercise 3.3: Products by Supplier (sorted by supplier name)
query = """
SELECT s.company_name AS supplier,
       p.product_id,
       p.product_name,
       p.unit_price,
       p.units_in_stock
FROM northwind.products p
INNER JOIN northwind.suppliers s ON p.supplier_id = s.supplier_id
ORDER BY s.company_name, p.product_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Total products: {len(df)}\n")
df.head(15)

Total products: 10



Unnamed: 0,supplier,product_id,product_name,unit_price,units_in_stock
0,Exotic Liquids,3,Aniseed Syrup,10.0,13
1,Exotic Liquids,1,Chai,18.0,39
2,Exotic Liquids,2,Chang,19.0,17
3,Grandma Kelly's Homestead,6,Grandma's Boysenberry Spread,25.0,120
4,Grandma Kelly's Homestead,8,Northwoods Cranberry Sauce,40.0,6
5,Grandma Kelly's Homestead,7,Uncle Bob's Organic Dried Pears,30.0,15
6,New Orleans Cajun Delights,4,Chef Anton's Cajun Seasoning,22.0,53
7,New Orleans Cajun Delights,5,Chef Anton's Gumbo Mix,21.35,0
8,Tokyo Traders,10,Ikura,31.0,31
9,Tokyo Traders,9,Mishi Kobe Niku,97.0,29


### Exercise 3.4: Complex JOIN - Order Details with Full Information

Join `order_details`, `orders`, `products`, and `customers` to show:
- Order ID
- Customer company name
- Product name
- Quantity
- Unit price
- Line total (quantity × unit_price)

**Hint:** Calculate line total as `(od.quantity * od.unit_price)`

In [13]:
# Exercise 3.4: Order Details with full information (and line totals)
query = """
SELECT 
  o.order_id,
  c.company_name AS customer,
  p.product_name,
  od.quantity,
  od.unit_price,
  (od.quantity * od.unit_price) AS line_total
FROM northwind.order_details od
INNER JOIN northwind.orders o      ON od.order_id  = o.order_id
INNER JOIN northwind.products p    ON od.product_id = p.product_id
INNER JOIN northwind.customers c   ON o.customer_id = c.customer_id
ORDER BY o.order_id, customer, p.product_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Total order line items: {len(df)}\n")
df.head(15)

Total order line items: 5



Unnamed: 0,order_id,customer,product_name,quantity,unit_price,line_total
0,1,Alfreds Futterkiste,Chai,12,18.0,216.0
1,1,Alfreds Futterkiste,Chang,10,19.0,190.0
2,2,Ana Trujillo Emparedados y helados,Aniseed Syrup,5,10.0,50.0
3,3,Antonio Moreno Taquería,Chef Anton's Cajun Seasoning,9,22.0,198.0
4,4,Around the Horn,Chef Anton's Gumbo Mix,40,21.35,854.0


## Part 4: Aggregate Functions and GROUP BY

Practice using aggregate functions to summarize data.

### Exercise 4.1: Count products by category

Show how many products are in each category.

**Hint:** Use `COUNT(p.product_id)` with `GROUP BY c.category_name`

In [14]:
# Exercise 4.1: Count products by category
query = """
SELECT c.category_name,
       COUNT(p.product_id) AS product_count
FROM northwind.categories c
LEFT JOIN northwind.products p
  ON p.category_id = c.category_id
GROUP BY c.category_name
ORDER BY product_count DESC, c.category_name;
"""

pd.read_sql(text(query), engine)

Unnamed: 0,category_name,product_count
0,Condiments,5
1,Beverages,2
2,Confections,1
3,Produce,1
4,Seafood,1
5,Dairy Products,0
6,Grains/Cereals,0
7,Meat/Poultry,0


### Exercise 4.2: Average, Min, and Max prices by category

Calculate the average, minimum, and maximum price for products in each category.

**Hint:** Use `AVG()`, `MIN()`, and `MAX()` functions

In [15]:
# Exercise 4.2: Average, Min, and Max prices by category
query = """
SELECT c.category_name,
       AVG(p.unit_price) AS avg_price,
       MIN(p.unit_price) AS min_price,
       MAX(p.unit_price) AS max_price
FROM northwind.categories c
LEFT JOIN northwind.products p
  ON p.category_id = c.category_id
GROUP BY c.category_name
ORDER BY avg_price DESC, c.category_name;
"""

pd.read_sql(text(query), engine)

Unnamed: 0,category_name,avg_price,min_price,max_price
0,Dairy Products,,,
1,Grains/Cereals,,,
2,Meat/Poultry,,,
3,Produce,97.0,97.0,97.0
4,Confections,31.0,31.0,31.0
5,Seafood,30.0,30.0,30.0
6,Condiments,23.67,10.0,40.0
7,Beverages,18.5,18.0,19.0


### Exercise 4.3: Total sales by customer

Calculate the total sales amount for each customer (sum of quantity × unit_price from order_details).

**Hint:** Use `SUM(od.quantity * od.unit_price)` and join multiple tables

In [16]:
# Exercise 4.3: Total sales by customer
query = """
SELECT
  c.customer_id,
  c.company_name,
  COALESCE(SUM(od.quantity * od.unit_price), 0) AS total_sales
FROM northwind.customers c
LEFT JOIN northwind.orders o
  ON o.customer_id = c.customer_id
LEFT JOIN northwind.order_details od
  ON od.order_id = o.order_id
GROUP BY c.customer_id, c.company_name
ORDER BY total_sales DESC, c.company_name
LIMIT 10;
"""

df = pd.read_sql(text(query), engine)
print("Top 10 Customers by Total Sales\n")
df

Top 10 Customers by Total Sales



Unnamed: 0,customer_id,company_name,total_sales
0,AROUT,Around the Horn,854.0
1,ALFKI,Alfreds Futterkiste,406.0
2,ANTON,Antonio Moreno Taquería,198.0
3,ANATR,Ana Trujillo Emparedados y helados,50.0
4,BERGS,Berglunds snabbköp,0.0


### Exercise 4.4: HAVING clause - Categories with high average price

Find categories where the average product price is greater than 30.

**Hint:** Use `HAVING AVG(p.unit_price) > 30` after GROUP BY

In [17]:
# Exercise 4.4: Categories with average price > 30
query = """
SELECT c.category_name,
       AVG(p.unit_price) AS avg_price
FROM northwind.categories c
LEFT JOIN northwind.products p
  ON p.category_id = c.category_id
GROUP BY c.category_name
HAVING AVG(p.unit_price) > 30
ORDER BY avg_price DESC, c.category_name;
"""

pd.read_sql(text(query), engine)

Unnamed: 0,category_name,avg_price
0,Produce,97.0
1,Confections,31.0


### Exercise 4.5: Orders per employee

Show the number of orders handled by each employee, sorted by order count.

**Hint:** Use `COUNT(o.order_id)` and `GROUP BY` employee information

In [18]:
# Exercise 4.5: Orders per employee
query = """
SELECT
  e.employee_id,
  e.first_name || ' ' || e.last_name AS employee_name,
  COUNT(o.order_id) AS order_count
FROM northwind.employees e
LEFT JOIN northwind.orders o
  ON o.employee_id = e.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name
ORDER BY order_count DESC, employee_name;
"""

pd.read_sql(text(query), engine)

Unnamed: 0,employee_id,employee_name,order_count
0,2,Andrew Fuller,1
1,3,Janet Leverling,1
2,4,Margaret Peacock,1
3,1,Nancy Davolio,1
4,5,Steven Buchanan,1


## Part 5: Advanced Queries

Combine multiple concepts to answer business questions.

### Exercise 5.1: Products that need reordering

Find products where `units_in_stock` is less than or equal to `reorder_level` and the product is not discontinued.

**Hint:** Use multiple conditions in WHERE clause: `discontinued = FALSE`

In [19]:
# Exercise 5.1: Products that need reordering
query = """
SELECT 
  product_id,
  product_name,
  units_in_stock,
  reorder_level,
  discontinued
FROM northwind.products
WHERE discontinued = FALSE
  AND units_in_stock <= reorder_level
ORDER BY units_in_stock ASC, reorder_level DESC, product_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Products needing reorder: {len(df)}\n")
df

Products needing reorder: 1



Unnamed: 0,product_id,product_name,units_in_stock,reorder_level,discontinued
0,5,Chef Anton's Gumbo Mix,0,0,False


### Exercise 5.2: Most expensive order

Find the top 5 orders with the highest total value (sum of quantity × unit_price).

**Hint:** Use `SUM()`, `GROUP BY order_id`, and `LIMIT 5`

In [20]:
# Exercise 5.2: Top 5 most expensive orders
query = """
SELECT 
  o.order_id,
  o.order_date,
  c.company_name AS customer_company,
  SUM(od.quantity * od.unit_price) AS order_total
FROM northwind.orders o
INNER JOIN northwind.order_details od ON o.order_id = od.order_id
INNER JOIN northwind.customers c ON o.customer_id = c.customer_id
GROUP BY o.order_id, o.order_date, customer_company
ORDER BY order_total DESC
LIMIT 5;
"""

df = pd.read_sql(text(query), engine)
print("Top 5 Most Expensive Orders\n")
df

Top 5 Most Expensive Orders



Unnamed: 0,order_id,order_date,customer_company,order_total
0,4,1996-07-08,Around the Horn,854.0
1,1,1996-07-04,Alfreds Futterkiste,406.0
2,3,1996-07-08,Antonio Moreno Taquería,198.0
3,2,1996-07-05,Ana Trujillo Emparedados y helados,50.0


### Exercise 5.3: Customer order frequency

Show customers who have placed more than 10 orders, with their total order count and total sales.

**Hint:** Use `HAVING COUNT(DISTINCT o.order_id) > 10`

In [21]:
# Exercise 5.3: Customers with more than 10 orders (frequency + total sales)
query = """
SELECT 
  c.customer_id,
  c.company_name,
  COUNT(DISTINCT o.order_id) AS order_count,
  COALESCE(SUM(od.quantity * od.unit_price), 0) AS total_sales
FROM northwind.customers c
INNER JOIN northwind.orders o ON o.customer_id = c.customer_id
INNER JOIN northwind.order_details od ON od.order_id = o.order_id
GROUP BY c.customer_id, c.company_name
HAVING COUNT(DISTINCT o.order_id) > 10
ORDER BY order_count DESC, total_sales DESC, c.company_name;
"""

df = pd.read_sql(text(query), engine)
print(f"Customers with more than 10 orders: {len(df)}\n")
df

Customers with more than 10 orders: 0



Unnamed: 0,customer_id,company_name,order_count,total_sales


### Exercise 5.4: Product popularity

Find the top 10 most frequently ordered products (by total quantity sold).

**Hint:** Join products with order_details, sum quantities, and order by total

In [22]:
# Exercise 5.4: Top 10 most popular products by total quantity sold
query = """
SELECT 
  p.product_id,
  p.product_name,
  SUM(od.quantity) AS total_quantity
FROM northwind.order_details od
INNER JOIN northwind.products p ON od.product_id = p.product_id
GROUP BY p.product_id, p.product_name
ORDER BY total_quantity DESC, p.product_name
LIMIT 10;
"""

df = pd.read_sql(text(query), engine)
print("Top 10 Most Popular Products (by quantity sold)\n")
df

Top 10 Most Popular Products (by quantity sold)



Unnamed: 0,product_id,product_name,total_quantity
0,5,Chef Anton's Gumbo Mix,40
1,1,Chai,12
2,2,Chang,10
3,4,Chef Anton's Cajun Seasoning,9
4,3,Aniseed Syrup,5


### Exercise 5.5: Sales by country

Calculate total sales for each country, showing only countries with total sales over 10000.

**Hint:** Use customer country, join to orders and order_details, sum sales, use HAVING

In [24]:
# Exercise 5.5: Sales by country (total sales > 10000)
query = """
SELECT 
  c.country,
  SUM(od.quantity * od.unit_price) AS total_sales
FROM northwind.customers c
INNER JOIN northwind.orders o      ON o.customer_id = c.customer_id
INNER JOIN northwind.order_details od ON od.order_id = o.order_id
GROUP BY c.country
HAVING SUM(od.quantity * od.unit_price) > 10000
ORDER BY total_sales DESC, c.country;
"""

df = pd.read_sql(text(query), engine)
print("Countries with Total Sales > $10,000\n")
df

Countries with Total Sales > $10,000



Unnamed: 0,country,total_sales


## Summary

Great work! You've practiced:
- ✓ Basic SELECT statements with specific columns
- ✓ Filtering data with WHERE, LIKE, IN, and BETWEEN
- ✓ INNER JOINs to combine related tables
- ✓ Multiple JOINs across 3-4 tables
- ✓ Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
- ✓ GROUP BY for summarizing data
- ✓ HAVING clause for filtering grouped results
- ✓ Complex queries combining multiple concepts

These are essential SQL skills for data analysis and database work!

---

## Submission Checklist
- [ ] All code cells run without errors
- [ ] Query results make sense for each question
- [ ] Name and date filled in at the top
- [ ] Notebook saved

**Submit this completed notebook to your instructor.**