# SQL for Data Engineers: Hands-On Exercise

Objective: By the end of this hands-on exercise, students will understand how to design a simple data model and implement a basic data warehouse schema using Python and SQL. This exercise is tailored for beginners with no prior coding experience and will cover fundamental concepts essential for data engineering tasks.

## Exercise 1: Designing a Data Model

In this exercise, we will design a conceptual data model for a fictional `online retail store`. The data model will serve as the blueprint for our database structure, outlining how data is organized and related within the system.

### Task 1: Identifying Entities and Relationships
Description: Identify the main entities involved in the online retail store and understand how they interact with each other.

**Instructions:**

- Customers: Represent individuals who purchase products from the store.
- Orders: Represent transactions made by customers.
- Products: Represent items available for purchase.
- Categories: Represent groupings or classifications of products.

Entities:
- Customers
- Orders
- Products
- Categories

Relationships:

- A Customer can place multiple Orders.
- An Order can contain multiple Products.
- A Product belongs to one Category.

Example:

Customer John Doe places an order containing 2 items: a Laptop and a Wireless Mouse. The Laptop belongs to the Electronics category, and the Wireless Mouse belongs to the Accessories category.

### Task 2: Defining Attributes for Each Entity
Description: Define the specific pieces of information (attributes) that need to be stored for each entity.

**Instructions:**

Ensure that each entity has a primary key (*_id) that uniquely identifies each record.
Define appropriate data types for each attribute when implementing in SQL (e.g., integer, varchar, date, decimal).
Identify and establish foreign key relationships between entities where applicable.

Attributes:

- for "Customers":
	- customer_id (unique identifier)
	- first_name
	- last_name
	- email
	- phone_number
	- address
	- city
	- state
	- zip_code
	- registration_date

- for "Orders":
	- order_id (unique identifier)
	- customer_id (foreign key referencing Customers)
	- order_date
	- total_amount
	- shipping_address
	- shipping_city
	- shipping_state
	- shipping_zip_code
	- status (e.g., Pending, Shipped, Delivered, Cancelled)

- for "Products":
	- product_id (unique identifier)
	- category_id (foreign key referencing Categories)
	- product_name
	- description
	- price
	- stock_quantity

- for "Categories"
	- category_id (unique identifier)
	- category_name
	- category_description


### Task 3: Creating an Entity-Relationship (ER) Diagram
Description: Visualize the data model by creating an ER diagram that illustrates entities, attributes, and relationships.

TODO: Add ER diagram

## Exercise 2: Implementing the Data Model using SQL
In this exercise, we will translate the conceptual data model into a physical database schema using SQL. We will create tables, define relationships, and populate the database with sample data.

### Task 4: Setting Up the Database Environment
Description: Set up a SQL database environment where you can execute SQL commands.

**Instructions:** postrgres deployment on docker-compose command

In [None]:
%% cli

docker compose \
	-f week1/sql/postgres_docker_compose.yml \
	--project-name de_course \
	up -d

### Task 5: Creating Tables with SQL
Description: Write SQL statements to create tables for each entity, including defining primary keys, foreign keys, and appropriate data types.

**Instructions:**

1. Create the Tables for Customers, Orders, Products, and Categories using the following SQL statements:


In [None]:
%% SQL

CREATE TABLE Categories (
    category_id SERIAL PRIMARY KEY,
    category_name VARCHAR(100) NOT NULL,
    category_description TEXT
);

CREATE TABLE Products (
    product_id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    product_name VARCHAR(100) NOT NULL,
    description TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INTEGER NOT NULL,
    FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);

CREATE TABLE Customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    phone_number VARCHAR(20),
    address VARCHAR(200),
    city VARCHAR(50),
    state VARCHAR(50),
    zip_code VARCHAR(10),
    registration_date DATE DEFAULT CURRENT_DATE
);

CREATE TABLE Orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date DATE DEFAULT CURRENT_DATE,
    total_amount DECIMAL(10, 2) NOT NULL,
    shipping_address VARCHAR(200),
    shipping_city VARCHAR(50),
    shipping_state VARCHAR(50),
    shipping_zip_code VARCHAR(10),
    status VARCHAR(20) DEFAULT 'Pending',
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

2. Create the OrderItems Table: This table represents the many-to-many relationship between Orders and Products.

In [None]:
%% SQL

CREATE TABLE OrderItems (
    order_item_id SERIAL PRIMARY KEY,
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES Orders(order_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

Explanation of SQL Commands:

- CREATE TABLE: Defines a new table in the database.
- PRIMARY KEY: Specifies the primary key for the table, which uniquely identifies each record.
- SERIAL: Automatically generates a unique value for the primary key when a new record is inserted.
- FOREIGN KEY: Establishes a link between two tables based on the foreign key and primary key relationship.
- NOT NULL: Ensures that a column cannot have a NULL value.
- DEFAULT: Sets a default value for a column if no value is provided during insertion.


### Task 6: Inserting Sample Data into Tables
Description: Populate the tables with sample data to enable testing and querying.

**Instructions:**
1. Insert Data into Categories, Products, and Customers Tables:

In [None]:
%% SQL

INSERT INTO Categories (category_name, category_description)
VALUES
	('Electronics', 'Devices and gadgets such as phones, laptops, and tablets.'),
	('Home Appliances', 'Appliances for household use like refrigerators and microwaves.'),
	('Books', 'Various genres of books and literature.'),
	('Clothing', 'Apparel for men, women, and children.')
;

INSERT INTO Products (category_id, product_name, description, price, stock_quantity)
VALUES
	(1, 'Smartphone', 'Latest model smartphone with advanced features.', 699.99, 50),
	(1, 'Laptop', 'High-performance laptop suitable for gaming and work.', 1199.99, 30),
	(2, 'Microwave Oven', '800W microwave oven with multiple settings.', 89.99, 100),
	(3, 'Science Fiction Novel', 'A thrilling journey through space and time.', 15.99, 200),
	(4, 'Mens T-Shirt', '100% cotton t-shirt available in various sizes.', 9.99, 150)
;

INSERT INTO Customers (first_name, last_name, email, phone_number, address, city, state, zip_code)
VALUES
	('John', 'Doe', 'john.doe@example.com', '123-456-7890', '123 Elm Street', 'Springfield', 'IL', '62704'),
	('Jane', 'Smith', 'jane.smith@example.com', '987-654-3210', '456 Oak Avenue', 'Metropolis', 'NY', '10001'),
	('Alice', 'Johnson', 'alice.johnson@example.com', '555-123-4567', '789 Pine Road', 'Gotham', 'CA', '90001')
;

2. Insert Data into Orders and OrderItems Tables:

In [None]:
%% SQL

-- Order 1
INSERT INTO Orders (customer_id, total_amount, shipping_address, shipping_city, shipping_state, shipping_zip_code, status)
VALUES
	(1, 789.98, '123 Elm Street', 'Springfield', 'IL', '62704', 'Shipped')
;

INSERT INTO OrderItems (order_id, product_id, quantity, unit_price)
VALUES
	(1, 1, 1, 699.99),
	(1, 3, 1, 89.99)
;

-- Order 2
INSERT INTO Orders (customer_id, total_amount, shipping_address, shipping_city, shipping_state, shipping_zip_code, status)
VALUES
	(2, 25.98, '456 Oak Avenue', 'Metropolis', 'NY', '10001', 'Delivered')
;

INSERT INTO OrderItems (order_id, product_id, quantity, unit_price)
VALUES
	(2, 4, 2, 12.99)
;

-- Order 3
INSERT INTO Orders (customer_id, total_amount, shipping_address, shipping_city, shipping_state, shipping_zip_code, status)
VALUES
	(3, 1209.98, '789 Pine Road', 'Gotham', 'CA', '90001', 'Pending')
;

INSERT INTO OrderItems (order_id, product_id, quantity, unit_price)
VALUES
	(3, 2, 1, 1199.99),
	(3, 5, 1, 9.99)
;

### Task 7: Performing Basic SQL Queries
Description: Write and execute SQL queries to retrieve and manipulate data from the database.

**Explanation of SQL Commands:**

- **SELECT**: Retrieves data from one or more tables.
- **JOIN**: Combines rows from two or more tables based on related columns.
- **WHERE**: Filters records based on specified conditions.
- **GROUP** BY: Aggregates data across rows that share common values.
- **ORDER** BY: Sorts the result set in ascending or descending order.
- **UPDATE**: Modifies existing records in a table.
- **DELETE**: Removes records from a table.


**Instructions:**
1. Retrieve All Customers:

In [None]:
%% SQL

SELECT * FROM Customers;

2. Retrieve Orders with Customer Information:

In [None]:
%% SQL

SELECT 
    Orders.order_id,
    Orders.order_date,
    Orders.total_amount,
    Customers.first_name,
    Customers.last_name,
    Orders.status
FROM Orders
JOIN Customers ON Orders.customer_id = Customers.customer_id
;

3. Retrieve Order Details Including Products:

In [None]:
%% SQL

SELECT 
    Orders.order_id,
    Customers.first_name || ' ' || Customers.last_name AS customer_name,
    Products.product_name,
    OrderItems.quantity,
    OrderItems.unit_price,
    (OrderItems.quantity * OrderItems.unit_price) AS total_price
FROM OrderItems
JOIN Orders ON OrderItems.order_id = Orders.order_id
JOIN Customers ON Orders.customer_id = Customers.customer_id
JOIN Products ON OrderItems.product_id = Products.product_id
;

4. Find Products with Low Stock (Less than 50 units):

In [None]:
%% SQL

SELECT 
    product_id,
    product_name,
    stock_quantity
FROM Products
WHERE stock_quantity < 50
;

5. Calculate Total Sales per Product:

In [None]:
%% SQL

SELECT 
    Products.product_name,
    SUM(OrderItems.quantity) AS total_units_sold,
    SUM(OrderItems.quantity * OrderItems.unit_price) AS total_sales
FROM OrderItems
JOIN Products ON OrderItems.product_id = Products.product_id
GROUP BY Products.product_name
ORDER BY total_sales DESC
;

6. List Customers with Their Total Orders and Amount Spent:

In [None]:
%% SQL

SELECT 
    Customers.first_name || ' ' || Customers.last_name AS customer_name,
    COUNT(Orders.order_id) AS total_orders,
    SUM(Orders.total_amount) AS total_amount_spent
FROM Customers
LEFT JOIN Orders ON Customers.customer_id = Orders.customer_id
GROUP BY Customers.customer_id
ORDER BY total_amount_spent DESC
;

7. Update Order Status:

In [None]:
%% SQL

UPDATE Orders
SET status = 'Delivered'
WHERE order_id = 3
;

8. Delete a Product from the Catalog:

In [None]:
%% SQL

DELETE FROM Products
WHERE product_id = 5
;

----------------------------------------------------------------------------------------------------------------------------

## Exercise 3: NoSQL for Data Engineers
In this exercise, we will explore NoSQL databases and how they differ from traditional relational databases. We will use MongoDB as an example of a document-oriented NoSQL database and perform basic CRUD operations.

### Task 1: Access the MongoDB Shell
Description: Connect to the MongoDB instance using the MongoDB shell from within the container.

Commands:

In [None]:
%%cli
docker exec -it mongodb mongosh

### Task 2: Create the Database and Collections
Description: Create a database named OnlineRetailStore and collections for Customers, Orders, Products, Categories, and OrderDetails.

MongoDB Shell Commands:

In [None]:
%% MongoDB

-- Switch to OnlineRetailStore database
use OnlineRetailStore;

-- Create Collections
db.createCollection("Customers");
db.createCollection("Orders");
db.createCollection("Products");
db.createCollection("Categories");
db.createCollection("OrderDetails");

-- Verify Collections
show collections;

### Task 3: Insert Sample Data into Collections
Description: Populate each collection with sample data to represent entities like Customers, Orders, Products, and Categories.

MongoDB Shell Commands:

In [None]:
%% MongoDB

-- Insert into Customers Collection
db.Customers.insertMany([
  { CustomerID: 1, FirstName: "John", LastName: "Doe", Email: "john.doe@example.com", Phone: "123-456-7890", Address: "123 Maple St", City: "Springfield", Country: "USA" },
  { CustomerID: 2, FirstName: "Jane", LastName: "Smith", Email: "jane.smith@example.com", Phone: "987-654-3210", Address: "456 Oak St", City: "Shelbyville", Country: "USA" }
]);

-- Insert into Categories Collection
db.Categories.insertMany([
  { CategoryID: 1, CategoryName: "Electronics", Description: "Devices and gadgets" },
  { CategoryID: 2, CategoryName: "Books", Description: "Fiction and non-fiction books" }
]);

-- Insert into Products Collection
db.Products.insertMany([
  { ProductID: 1, ProductName: "Laptop", Description: "14-inch laptop with 8GB RAM", Price: 799.99, CategoryID: 1 },
  { ProductID: 2, ProductName: "Smartphone", Description: "5G enabled smartphone", Price: 699.99, CategoryID: 1 },
  { ProductID: 3, ProductName: "Novel", Description: "Bestselling novel", Price: 19.99, CategoryID: 2 }
]);

-- Insert into Orders Collection
db.Orders.insertMany([
  { OrderID: 1, OrderDate: new Date("2024-08-15"), CustomerID: 1, TotalAmount: 819.98 },
  { OrderID: 2, OrderDate: new Date("2024-08-16"), CustomerID: 2, TotalAmount: 719.98 }
]);

-- Insert into OrderDetails Collection
db.OrderDetails.insertMany([
  { OrderDetailID: 1, OrderID: 1, ProductID: 1, Quantity: 1, UnitPrice: 799.99 },
  { OrderDetailID: 2, OrderID: 1, ProductID: 3, Quantity: 1, UnitPrice: 19.99 },
  { OrderDetailID: 3, OrderID: 2, ProductID: 2, Quantity: 1, UnitPrice: 699.99 },
  { OrderDetailID: 4, OrderID: 2, ProductID: 3, Quantity: 1, UnitPrice: 19.99 }
]);


### Task 4: Read Operations
Description: Perform read operations to retrieve data from the collections.

MongoDB Shell Commands:

In [None]:
%% MongoDB

-- Find All Customers
db.Customers.find().pretty();

-- Find Orders for a Specific Customer
db.Orders.find({ CustomerID: 1 }).pretty();

-- Find Products in a Specific Category
db.Products.find({ CategoryID: 1 }).pretty();

### Task 5: Update Operations
Description: Update specific documents in the collections.

MongoDB Shell Commands:

In [None]:
%% MongoDB

-- Update Customer's Phone Number
db.Customers.updateOne(
  { CustomerID: 1 },
  { $set: { Phone: "111-222-3333" } }
);

-- Increase Product Price by 10%
db.Products.updateMany(
  {},
  { $mul: { Price: 1.10 } }
);

### Task 6: Delete Operations
Description: Delete specific documents from the collections.

MongoDB Shell Commands:

In [None]:
-- Delete a Customer by ID
db.Customers.deleteOne({ CustomerID: 2 });

-- Delete All Products in a Specific Category
db.Products.deleteMany({ CategoryID: 2 });

----------------------------------------------------------------------------------------------------------------------------