# Practice

You work for ShopNow, a growing e-commerce platform. The company wants to create a data warehouse to analyze business performance.

Tasks:
1. Designing the data warehouse schema.
2. Building and populating the tables in PostgreSQL.
3. Creating a sales data mart to focus on analyzing revenue trends and customer behavior.

# Dataset

Tables (in raw transactional format):

**orders**
- order_id (INT): Unique identifier for the order.
- customer_id (INT): Identifier for the customer placing the order.
- order_date (DATE): Date when the order was placed.
- order_status (VARCHAR): Status of the order (e.g., "Completed", "Cancelled").
- total_amount (NUMERIC): Total value of the order.

**products**
- product_id (INT): Unique identifier for the product.
- product_name (VARCHAR): Name of the product.
- category (VARCHAR): Category to which the product belongs.
- price (NUMERIC): Price of the product.

**order_items**
- order_item_id (INT): Unique identifier for the order item.
- order_id (INT): Foreign key to the orders table.
- product_id (INT): Foreign key to the products table.
- quantity (INT): Quantity of the product in the order.
- subtotal (NUMERIC): Total price for this order item (quantity × price).

**customers**
- customer_id (INT): Unique identifier for the customer.
- first_name (VARCHAR): Customer's first name.
- last_name (VARCHAR): Customer's last name.
- email (VARCHAR): Customer's email.
- country (VARCHAR): Country of the customer.

Here are the sql query to create database, tables, and insert the data sample:

```sql

CREATE DATABASE ecommerce;

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    order_status VARCHAR(50) NOT NULL,
    total_amount NUMERIC(10, 2) NOT NULL
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    category VARCHAR(50) NOT NULL,
    price NUMERIC(10, 2) NOT NULL
);

CREATE TABLE order_items (
    order_item_id INT PRIMARY KEY,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    subtotal NUMERIC(10, 2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    country VARCHAR(50) NOT NULL
);


CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100) NOT NULL,
    role VARCHAR(50) NOT NULL,
    region VARCHAR(50) NOT NULL
);


INSERT INTO orders (order_id, customer_id, order_date, order_status, total_amount)
VALUES
(1, 101, '2023-01-01', 'Completed', 200),
(2, 102, '2023-01-05', 'Completed', 300),
(3, 103, '2023-01-10', 'Cancelled', 0),
(4, 101, '2023-01-15', 'Completed', 150);

INSERT INTO products (product_id, product_name, category, price)
VALUES
(1, 'Laptop', 'Electronics', 800),
(2, 'Phone', 'Electronics', 500),
(3, 'Headphones', 'Accessories', 100),
(4, 'Notebook', 'Stationery', 20);

INSERT INTO order_items (order_item_id, order_id, product_id, quantity, subtotal)
VALUES
(1, 1, 1, 1, 800),
(2, 1, 3, 2, 200),
(3, 2, 2, 1, 500),
(4, 4, 4, 5, 100);

INSERT INTO customers (customer_id, first_name, last_name, email, country)
VALUES
(101, 'John', 'Doe', 'john.doe@example.com', 'USA'),
(102, 'Jane', 'Smith', 'jane.smith@example.com', 'UK'),
(103, 'Michael', 'Brown', 'michael.brown@example.com', 'Canada');

INSERT INTO employees (employee_id, employee_name, role, region)
VALUES
(1, 'Alice Johnson', 'Sales Rep', 'North America'),
(2, 'Bob White', 'Sales Rep', 'Europe');

```

# Task 1

Schema Type: Star Schema

- Fact Table: fact_sales
  - Measures: Total amount, quantity.
  - Keys: Foreign keys to dimension tables (customer_id, product_id, date_id, employee_id).

- Dimension Tables:
  - dim_customers: Contains customer details.
  - dim_products: Contains product details.
  - dim_date: Contains a full calendar for date-based analysis.


# Task 2

## A. Create Tables in PostgreSQL

### Dimension Tables
`dim_customers`
```sql
CREATE TABLE dim_customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    country VARCHAR(50)
);

```

`dim_products`
```sql
CREATE TABLE dim_products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    category VARCHAR(50),
    price NUMERIC
);
```

`dim_date`
```sql
CREATE TABLE dim_date (
    date_id SERIAL PRIMARY KEY,
    full_date DATE UNIQUE,
    year INT,
    quarter INT,
    month INT,
    day INT,
    weekday VARCHAR(10)
);
```

### Fact Table
`fact_sales`
```sql
CREATE TABLE fact_sales (
    sales_id SERIAL PRIMARY KEY,
    order_id INT,
    customer_id INT,
    product_id INT,
    date_id INT,
    employee_id INT,
    quantity INT,
    total_amount NUMERIC,
    FOREIGN KEY (customer_id) REFERENCES dim_customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES dim_products(product_id),
    FOREIGN KEY (date_id) REFERENCES dim_date(date_id),
    FOREIGN KEY (employee_id) REFERENCES dim_employees(employee_id)
);
```

## B. Populate Tables

**Insert data into dim_customers**
```sql
INSERT INTO dim_customers (customer_id, first_name, last_name, email, country)
SELECT DISTINCT customer_id, first_name, last_name, email, country
FROM customers;
```

**Insert data into dim_products**
```sql
INSERT INTO dim_products (product_id, product_name, category, price)
SELECT DISTINCT product_id, product_name, category, price
FROM products;
```

**Insert data into dim_date**
```sql
INSERT INTO dim_date (full_date, year, quarter, month, day, weekday)
SELECT DISTINCT
    order_date AS full_date,
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(QUARTER FROM order_date) AS quarter,
    EXTRACT(MONTH FROM order_date) AS month,
    EXTRACT(DAY FROM order_date) AS day,
    TO_CHAR(order_date, 'Day') AS weekday
FROM orders;
```

**Insert data into fact_sales**
```sql
INSERT INTO fact_sales (order_id, customer_id, product_id, date_id, quantity, total_amount)
SELECT
    o.order_id,
    o.customer_id,
    oi.product_id,
    dd.date_id,
    oi.quantity,
    o.total_amount
FROM
    orders o
JOIN
    order_items oi ON o.order_id = oi.order_id
JOIN
    dim_date dd ON o.order_date = dd.full_date;
```

# Task 3 - Create Data Mart

**Create data_mart_sales Schema**
```sql
CREATE SCHEMA data_mart_sales;
```

**Create Data Mart Table**
```sql
CREATE TABLE data_mart_sales.sales_summary (
    customer_id INT,
    order_date DATE,
    category VARCHAR(50),
    total_sales NUMERIC
);
```

**Populate the Data Mart**
```sql
INSERT INTO data_mart_sales.sales_summary (customer_id, order_date, category, total_sales)
SELECT
    f.customer_id,
    d.full_date AS order_date,
    p.category,
    SUM(f.total_amount) AS total_sales
FROM
    fact_sales f
JOIN
    dim_products p ON f.product_id = p.product_id
JOIN
    dim_date d ON f.date_id = d.date_id
GROUP BY
    f.customer_id, d.full_date, p.category;
```

# Data Analysis

## From Data Mart

### Top 5 Customers with Highest Revenue

```sql
SELECT customer_id, SUM(total_sales) AS total_revenue
FROM data_mart_sales.sales_summary
GROUP BY customer_id
ORDER BY total_revenue DESC
LIMIT 5;
```

### Monthly Sales Trends by Category
```sql
SELECT DATE_TRUNC('month', order_date) AS month, category, SUM(total_sales) AS total_revenue
FROM data_mart_sales.sales_summary
GROUP BY DATE_TRUNC('month', order_date), category
ORDER BY month, category;
```

## From Data Warehouse

### Top-Selling Products (Last 6 Months)
```sql
SELECT p.product_name, SUM(f.quantity) AS total_quantity
FROM fact_sales f
JOIN dim_products p ON f.product_id = p.product_id
JOIN dim_date d ON f.date_id = d.date_id
WHERE d.full_date >= CURRENT_DATE - INTERVAL '6 months'
GROUP BY p.product_name
ORDER BY total_quantity DESC;
```

### Region with Highest Revenue

```sql
SELECT e.region, SUM(f.total_amount) AS total_revenue
FROM fact_sales f
JOIN dim_employees e ON f.employee_id = e.employee_id
GROUP BY e.region
ORDER BY total_revenue DESC;
```