## Overview of Data Model

We will be using retail data model for this section. It contains 6 tables.
* Table list
  * orders
  * order_items
  * products
  * categories
  * departments
  * customers

* **orders** and **order_items** are transactional tables.

* **products**, **categories** and **departments** are non transactional tables which have data related to product catalog.

* **customers** is a non transactional table which have customer details.

* There is 1 to many relationship between **orders** and **order_items**.

* There is 1 to many relationship between **products** and **order_items**. Each order item will have one product and product can be part of many order_items.

* There is 1 to many relationship between **customers** and **orders**. A customer can place many orders over a period of time but there cannot be more than one customer for a given order.

* There is 1 to many relationship between **departments** and **categories**. Also there is 1 to many relationship between **categories** and **products**.

* There is hierarchical relationship from departments to products - **departments** -> **categories** -> **products**

## Define Problem Statement – Daily Product Revenue

Let us try to get daily product revenue using retail tables.
* daily is derived from orders.order_date

* product has to be derived from products.product_name

* revenue has to be derived from order_items.order_item_subtotal

* We need to join all the 3 tables, then group by order_date, product_id as well as product_name to get revenue using order_item_subtotal

* Get Daily Product Revenue using products, orders and order_items data set

* We have following fields in **orders**
  * order_id
  * order_date
  * order_customer_id
  * order_status

* We have following fields in **order_items**
  * order_item_id
  * order_item_order_id
  * order_item_product_id
  * order_item_quantity
  * order_item_subtotal
  * order_item_product_price

* We have following fields in **products**
  * product_id
  * product_category_id
  * product_name
  * product_description
  * product_price
  * product_image

* We have one to many relationship between orders and order_items

* **orders.order_id** is **primary key** and **order_items.order_item_order_id** is foreign key to **orders.order_id**

* We have one to many relationship between products and order_items

* **products.product_id** is **primary key** and **order_items.order_item_product_id** is foreign key to **products.product_id**

* By the end of this module we will explore all standard transformations and get daily product revenue using following fields
  * **orders.order_date**
  * **order_items.order_item_product_id**
  * **products.product_name**
  * **order_items.order_item_subtotal** (aggregated using date and product_id)

* We will consider only **COMPLETE** or **CLOSED** orders

* As there can be more than one product names with different ids, we have to include product_id as part of the key using which we will group the data

## Prepare Tables

Let us prepare retail tables to come up with the solution for the problem statement.
* Ensure that we have required database and user for retail data. Here are the instructions to use `psql` for setting up the required tables.

* Launch Terminal

```shell
psql
```

* List databases

```shell
\l
```

* Swith to database **suryakantkumar**

```shell
\c suryakantkumar
```

* List all the tables of database **suryakantkumar**

```shell
\d
```

* Create all the related tables using the `sql` script provided

```shell
\i /Users/suryakantkumar/Documents/SQL/data/retail/create_db_tables_pg.sql
```

* Insert data into all the related tables using the `sql` script provided

```shell
\i /Users/suryakantkumar/Documents/SQL/data/retail/load_db_tables_pg.sql
```

In [1]:
%load_ext sql

In [2]:
%env DATABASE_URL=postgresql://suryakantkumar:None@localhost:5432/suryakantkumar

env: DATABASE_URL=postgresql://suryakantkumar:None@localhost:5432/suryakantkumar


* Check for Current Database

In [3]:
%sql SELECT current_database()

1 rows affected.


current_database
suryakantkumar


* Check for the Updated Tables

In [4]:
%%sql

SELECT 
    *
FROM
    information_schema.tables
WHERE
    table_catalog = 'suryakantkumar'
    AND
    table_schema = 'public'
LIMIT 10

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
8 rows affected.


table_catalog,table_schema,table_name,table_type,self_referencing_column_name,reference_generation,user_defined_type_catalog,user_defined_type_schema,user_defined_type_name,is_insertable_into,is_typed,commit_action
suryakantkumar,public,departments,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,categories,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,products,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,customers,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,orders,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,order_items,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,users,BASE TABLE,,,,,,YES,NO,
suryakantkumar,public,courses,BASE TABLE,,,,,,YES,NO,


* Validate tables with Number of records

In [5]:
%%sql 

SELECT 'orders' AS table_name, COUNT(*) AS rows_count FROM orders
UNION ALL
SELECT 'order_items' AS table_name, COUNT(*) AS rows_count FROM order_items
UNION ALL
SELECT 'products' AS table_name, COUNT(*) AS rows_count FROM products
UNION ALL
SELECT 'categories' AS table_name, COUNT(*) AS rows_count FROM categories
UNION ALL
SELECT 'departments' AS table_name, COUNT(*) AS rows_count FROM departments
UNION ALL
SELECT 'customers' AS table_name, COUNT(*) AS rows_count FROM customers
ORDER BY rows_count

 * postgresql://suryakantkumar:***@localhost:5432/suryakantkumar
6 rows affected.


table_name,rows_count
departments,6
categories,58
products,1345
customers,12435
orders,68883
order_items,172198
