# SQL 4 Data Science: Predict Project
## Bhejane Online Trading Store

### Starter Notebook

#### © Explore Data Science Academy

### Honour Code

I {**YINKA**, **YOUR AKINDELE**}, confirm - by submitting this document - that the solutions in this notebook are a result of my own work and that I abideD by the EDSA honour code (https://drive.google.com/file/d/1QDCjGZJ8-FmJE3bZdIQNwnJyQKPhHZBn/view?usp=sharing).

Non-compliance with the honour code constitutes a material breach of contract.

<a id='Context'></a>
### 1. Context

The Bhejane trading store is an online retailer specialising in Covid essential items. The store has recently been struggling with the management of its database-related inventory system. Luckily for them, you've been hired as a consultant to fix the problem. 

Its time to flex your Ninja SQL skills! 
 
**Your mission, should you choose to accept it:**

A denormalised database consisting of two tables was given. The task was to transform the database such that is in third normal form (3NF). To ensure that a consistent normalisation process is followed, an Entity Relationship Diagram (ERD) was attached which is a guideline on what tables need to be produced. Once a normalised the database was obtained, a set of questions were required to be answered using the normalised database.  

Through the normalisation process, there was guidance into acknowledging client (Bhenjane Trading) use-cases for the database, and dealing with data anomalies in SQL. 

<img src="images/Bhejane.PNG"/>

<div align="center" >
    Bhejane, or the Black Rhino. Image by <a href="https://commons.wikimedia.org/wiki/File:Black_Rhino_(Diceros_bicornis)_browsing_..._(46584052962).jpg">Wikimedia Commons</a>
</div>

<a id='Imports'></a>
### 2. Imports


In [1]:
# NOT TO BE EDITED CELL
import sqlite3
import csv
from sqlalchemy import create_engine
%load_ext sql_magic

# Load SQLite database
engine  = create_engine("sqlite:///data/bhejane.db")
%config SQL.conn_name ='engine'

<a id='Data_description'></a>
### 3. Data description

The original database consist of 2 tables. 
* Product Table
* Transaction Table

The `Product` table consists of the stock of all inventory that Bhejane has on hand currently, or has had on hand historically. Items which are in this table are able to be purchased, and a record of all sales (transactions) in 2020 is notorised in the `Transactions` table. To link the tables - the `barcode` can be used. Any item in the `Transactions` table, must therefore appear in the `Products` table. 

#### Reading the data in the spreadsheet

pip install openpyxl

In [2]:
# NOT TO BE EDITED CELL
import pandas as pd
data_description = pd.read_excel('data/Data Description.xlsx')
data_description

Unnamed: 0,Table Name,Column Name,Desciption
0,Products,Width,Width of the product once assembled
1,Products,Length,Length of the product once assembled
2,Products,Height,Height of the product once assembled
3,Products,Barcode,The unique product identifier
4,Products,Quantity,Number of goods in stock
5,Products,Brand,Product brand name relating to product company
6,Products,NavigationPath,Navigation path to specific product
7,Products,Colour,Name default colour for the product
8,Products,StockCountry,Country where the stock was bought from
9,Products,ProductDescription,Descriptive product name


<a id='Setting_up'></a>
### 4. Setting up the database

In [3]:
# NOT TO BE EDITED CELL
conn = sqlite3.connect('data/bhejane.db')
cursor = conn.cursor()

#### Creating both Products and Transaction Tables

In [4]:
%%read_sql

--NOT TO BE EDITED

DROP TABLE IF EXISTS "Products";
DROP TABLE IF EXISTS "Transactions";

CREATE TABLE "Products" (
    "Width"   REAL,
    "Length"  REAL,
    "Height"  REAL,
    "Barcode" VARCHAR(150),
    "Quantity" REAL,
    "Brand" VARCHAR(150), 
    "NavigationPath" VARCHAR(150),
    "Colour" VARCHAR(150),
    "StockCountry" VARCHAR(150),
    "ProductDescription" VARCHAR(150),
    "PackType" VARCHAR(150), 
    "Volume_litre" REAL, 
    "Warranty" VARCHAR(150), 
    "Weight_kg" REAL,
    "ItemDescription" VARCHAR(150), 
    "Price" REAL
);


CREATE TABLE "Transactions" (
    "CartID" INTEGER,
    "Barcode" VARCHAR(150), 
    "Total" REAL,
    "UserName" VARCHAR(150), 
    "InvoiceDate" DATETIME
);

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1adbbd0>

#### Load Data into Product and Transaction tables

The data engineering expert, has created a brief script which can be used to extract the data from the csv files, and load it into a sqlite database. 

This will be included in the project folder, so that the end-to-end processing of the data is visible, and repeatable for any additional consultants who may be brought on board at a later stage. 


#### The script provided by the data engineering expert to read csv files into sqlite database is provided below

In [5]:
# NOT TOBE EDITED
#Load data into Product table
with open('data/bhejane_covid_essentials_Products.csv','r') as fin: # `with` statement available in 2.5+
    # csv.DictReader uses first line in file for column headings by default
    dr = csv.DictReader(fin) # comma is default delimiter
    to_db = [(i['Width'],i['Length'],i['Height'], i['Barcode'], i['Quantity'], i['Brand'], i['NavigationPath'], i['Colour'], i['StockCountry'], i['ProductDescription'],i['PackType'],i['Volume_litre'],i['Warranty'],i['Weight_kg'],i["ItemDescription"],i['Price']) for i in dr]

cursor.executemany("INSERT INTO Products VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", to_db)
conn.commit()

with open('data/bhejane_covid_essentials_Transactions.csv','r') as fin: # `with` statement available in 2.5+
    # csv.DictReader uses first line in file for column headings by default
    dr = csv.DictReader(fin) # comma is default delimiter
    to_db = [(i['CartID'],i['Barcode'], i['Total'], i['UserName'], i['InvoiceDate']) for i in dr]

cursor.executemany("INSERT INTO Transactions VALUES (?, ?, ?, ?, ?);", to_db)
conn.commit()

<a id='ERD'></a>
### 5. Denormalized Database Tables

<br>

<img src="images/Denormalized_Tables.PNG" alt="Denormalized Tables" border="0">

#### 5.1 Exploring the Denormalized Tables

Understanding the data given in the two tables by writing SQL queries to explore properties of the dataset. that is, Look for data inconsistencies, anormalies, redundancies etc to guide in the normalization process. 


In [6]:
%%read_sql

SELECT *
FROM Products
LIMIT 2;

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Width,Length,Height,Barcode,Quantity,Brand,NavigationPath,Colour,StockCountry,ProductDescription,PackType,Volume_litre,Warranty,Weight_kg,ItemDescription,Price
0,,,,300507946,493.0,Hikvision,Computers & Tablets / Smart Home & Connected L...,,,Hikvision 1080P 2MP Turbo HD IR Bullet Camera,,0.0,Limited (6 months),,1 x Hikvision 1080P Bullet camera,399.0
1,,,,300507946,493.0,Hikvision,Computers & Tablets / Smart Home & Connected L...,,,Hikvision 1080P 2MP Turbo HD IR Bullet Camera,,0.0,Limited (6 months),,Manual,399.0


In [7]:
%%read_sql

SELECT *
FROM Transactions
--LIMIT 2;

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,CartID,Barcode,Total,UserName,InvoiceDate
0,102,300507946,1523.0,DIMPHO,2020-07-02 0:00:00
1,1,43859499182,149.0,Hendrik,2020-08-05 0:00:00
2,2,614143543746,99.0,Faristha,2020-07-29 0:00:00
3,179,617566827837,3464.0,Zanele,2020-04-04 0:00:00
4,136,619659097318,3301.0,Junaid,2020-08-04 0:00:00
...,...,...,...,...,...
273,190,MPTAL72849953,126.0,Shameer,2020-07-18 0:00:00
274,191,MPTAL72849955,53.0,Eathon,2020-05-14 0:00:00
275,192,MPTAL72849955,757.0,Janet,2020-06-01 0:00:00
276,113,MPTALP14631,2535.0,Melandi,2020-08-11 0:00:00


In [8]:
%%read_sql

SELECT *
FROM Products
--LIMIT 2;

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Width,Length,Height,Barcode,Quantity,Brand,NavigationPath,Colour,StockCountry,ProductDescription,PackType,Volume_litre,Warranty,Weight_kg,ItemDescription,Price
0,,,,300507946,493.0,Hikvision,Computers & Tablets / Smart Home & Connected L...,,,Hikvision 1080P 2MP Turbo HD IR Bullet Camera,,0.0,Limited (6 months),,1 x Hikvision 1080P Bullet camera,399.0
1,,,,300507946,493.0,Hikvision,Computers & Tablets / Smart Home & Connected L...,,,Hikvision 1080P 2MP Turbo HD IR Bullet Camera,,0.0,Limited (6 months),,Manual,399.0
2,,,,10325354918,467.0,ZEE,Fashion / Accessories / Scarves,Grey,,ZEE 3-in-1 Unisex Gaiter,,0.0,Limited (6 months),,,139.0
3,,,,27131187035,275.0,Estee Lauder,Beauty / Luxury Beauty / Makeup / Face / Found...,Fresco,South Africa,Estee Lauder Double Wear Stay In Place Makeup,,0.0,Non-Returnable,,,655.0
4,,,,27131187035,275.0,Estee Lauder,Beauty / Luxury Beauty / Shop By Brand / Estee...,Fresco,South Africa,Estee Lauder Double Wear Stay In Place Makeup,,0.0,Non-Returnable,,,655.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1157,,,,TAL00035388021,220.0,Angelcare,Baby & Toddler / Nappies & Changing / Changing...,Blue,South Africa,Angelcare - Nappy Bin Refill - 3 Pack,,0.0,Limited (6 months),,x 3 Nappy Bin Refills,269.0
1158,,,,TAL00035388021,220.0,Angelcare,Baby & Toddler / Nappies & Changing / Changing...,Blue,South Africa,Angelcare - Nappy Bin Refill - 3 Pack,,0.0,Limited (6 months),,x 3 Nappy Bin Refills,269.0
1159,213,198,32,TAL00035388407,130.0,Pampers,Baby & Toddler / Nappies & Changing / Wipes,White,South Africa,Pampers Complete Clean Baby Wipes - 6 x 64 - 3...,,0.0,Non-Returnable (6 months),,384 Complete Clean Wipes,159.0
1160,,,,TAL00035394505,410.0,RCT,Computers & Tablets / Smart Home & Connected L...,,South Africa,RCT 650VA Line Interactive UPS,,1072678,Limited (12 months),,1 x UPS,675.0


### 6. Normalizing the given Database tables to the 1st Normal Form (1NF)

Given the below below target ERD create new tables such the the database conforms to the 1st Normal Form

<img src="images/1stNF_bhejane.PNG"/>


It is suggested that all tables be created before attempting to populate them with data, this will help reduce errors that might creep up due to logical dependencies.

#### 6.1 Create the tables required for the 1st Normal Form

The above ERD sketch was used to create the required tables.

In [9]:
%%read_sql

DROP TABLE IF EXISTS "Products_1NF";
DROP TABLE IF EXISTS "Transactions_1NF";

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c3e310>

In [10]:
%%read_sql
--#Create tables required for 1NF

CREATE TABLE "Products_1NF"(
    "Barcode" VARCHAR(150),
    "NavigationPath" VARCHAR(150),
    "ItemDescription" VARCHAR(150),
    "Colour" VARCHAR(150),
    "ProductDescription" VARCHAR(150) NOT NULL,
    "Brand" VARCHAR(150),
    "Price" REAL NOT NULL,
    "Quantity" INTEGER NOT NULL,
    "PackType" VARCHAR(150),
    "Warranty" VARCHAR(150),
    "StockCountry" VARCHAR(150),
    "Weight_kg" REAL,
    "Volume_litre" REAL,
    "Length" REAL,
    "Width" REAL,
    "Height" REAL,
    PRIMARY KEY("Barcode","NavigationPath","ItemDescription")
);

CREATE TABLE "Transactions_1NF" (
    "CartID" INTEGER NOT NULL,
    "Barcode" VARCHAR(150) NOT NULL,
    "UserName" VARCHAR(150) NOT NULL,
    "InvoiceDate" DATETIME NOT NULL,
    "Total" REAL NOT NULL,
    FOREIGN KEY ("Barcode") REFERENCES "Products_1NF" ("Barcode"),
    PRIMARY KEY("CartID", "Barcode", "UserName")
);

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c3e590>

#### 6.2 Populate the tables you have create in the above section.

Populate the tables such that the database conforms to the 1st Normal Form

In [11]:
%%read_sql

DELETE FROM "Products_1NF";
DELETE FROM "Transactions_1NF";

Query started at 02:05:29 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c84050>

In [12]:
%%read_sql

--#Populate the 1NF tables
INSERT INTO "Products_1NF" ("Barcode","NavigationPath","ItemDescription","Colour","ProductDescription","Brand",
                            "Price","Quantity","PackType","Warranty","StockCountry","Weight_kg",
                            "Volume_litre","Length","Width","Height")
SELECT DISTINCT
    Barcode,
    NavigationPath,
    ItemDescription,
    Colour,
    ProductDescription,
    Brand,
    Price,
    Quantity,
    PackType,
    Warranty,
    StockCountry,
    Weight_kg,
    Volume_litre,
    Length,
    Width,
    Height 
FROM 
    Products;

INSERT INTO "Transactions_1NF"("CartID","Barcode","UserName","InvoiceDate","Total")
SELECT DISTINCT 
    CartID,
    Barcode,
    UserName,
    InvoiceDate,
    Total
FROM
    Transactions;
    

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c85390>

###  7. Converting the database into its 2nd Normal Form (2NF).

#### 7.1 Entity Relationship Diagram

<img src="images/2ndNF_bhejane.PNG"/>

#### 7.2 2NF Requirements

To transition from 1NF to 2NF, all columns in all tables should not have a partial dependancy on the PK of the table. This means that any tables which have a composite key e.g. `PRIMARY KEY("Barcode","NavigationPath","ItemDescription")` cannot have columns in the table which are dependant on only `Barcode`, `NavigationPath`, or `ItemDescription`. 

In [13]:
%%read_sql

SELECT count(distinct barcode)
FROM Transactions_1NF
WHERE barcode not in (
    SELECT barcode
    FROM products_1NF
)

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,count(distinct barcode)
0,0


### <font color='turquoise'>Activity: Removing anomalous entries with a `delete` query </font>

In [14]:
%%read_sql
DELETE FROM Transactions_1NF AS T1
WHERE T1.barcode NOT IN (
    SELECT barcode
    FROM Products_1NF
)

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1ad6cd0>

In [15]:
%%read_sql
-- Should show no entries!
SELECT COUNT(DISTINCT barcode)
FROM Transactions_1NF
WHERE barcode NOT IN (
    SELECT barcode
    FROM products_1NF)

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,COUNT(DISTINCT barcode)
0,0


### <font color='turquoise'>Activity: Investigate anomalies in tables </font>

In [16]:
%%read_sql 

SELECT DISTINCT 
    ItemDescription,
    PackType,
    Warranty 
FROM
    Products
WHERE ItemDescription = '';

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,ItemDescription,PackType,Warranty
0,,,Limited (6 months)
1,,,Non-Returnable
2,,,Limited (12 months)
3,,,Limited (120 months)
4,,,Limited (180 months)
5,,,Limited (18 months)
6,,,Supplier (12 months)
7,,Single,Limited (6 months)
8,,,Limited (24 months)
9,,,


### Constructing the database in second normal form

In [17]:
%%read_sql

DROP TABLE IF EXISTS "Products_2NF";
DROP TABLE IF EXISTS "Transactions_2NF";
DROP TABLE IF EXISTS "Navigation_2NF";
DROP TABLE IF EXISTS "PackageContents_2NF";
DROP TABLE IF EXISTS "Colours_2NF";

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c87fd0>

In [18]:
%%read_sql
--#Create tables required for 2NF

CREATE TABLE Products_2NF(
    "RegistryID" INTEGER NOT NULL,
    "Barcode" VARCHAR(150) NOT NULL,
    "ProductDescription" VARCHAR(150) NOT NULL,
    "Brand" VARCHAR(150),
    "Price" REAL NOT NULL,
    "Quantity" INTEGER NOT NULL,
    "StockCountry" VARCHAR(150),
    "Weight_kg" REAL,
    "Volume_litre" REAL,
    "Length" REAL,
    "Width" REAL,
    "Height" REAL,
    "PathID" INTEGER,
    "ItemID" INTEGER,
    "ColourID" INTEGER,
    PRIMARY KEY(RegistryID AUTOINCREMENT),
    CONSTRAINT fk_nav FOREIGN KEY('PathID') REFERENCES Navigation_2NF ('PathID'),
    CONSTRAINT fk_pack FOREIGN KEY('ItemID') REFERENCES PackageContents_2NF ('ItemID'),
    CONSTRAINT fk_col FOREIGN KEY('ColourID') REFERENCES Colours_2NF ('ColourID') 
);

CREATE TABLE "Transactions_2NF" (
    "CartID||Barcode||UserName" VARCHAR(150) NOT NULL,
    "CartID" INTEGER NOT NULL,
    "Barcode" VARCHAR(150) NOT NULL,
    "UserName" VARCHAR(150) NOT NULL,
    "InvoiceDate" DATETIME NOT NULL,
    "Total" REAL NOT NULL,
    PRIMARY KEY("CartID||Barcode||UserName")
    CONSTRAINT fk_trans FOREIGN KEY('Barcode') REFERENCES Products_2NF ('Barcode')
);


CREATE TABLE "Navigation_2NF"(
    "PathID" INTEGER NOT NULL,
    "NavigationPath" VARCHAR(150),
    PRIMARY KEY("PathID" AUTOINCREMENT)
);

CREATE TABLE "Colours_2NF"(
    "ColourID" INTEGER NOT NULL,
    "Colour" VARCHAR(150),
    PRIMARY KEY("ColourID" AUTOINCREMENT)
);

CREATE TABLE "PackageContents_2NF"(
    "ItemID" INTEGER NOT NULL,
    "ItemDescription" VARCHAR(150),
    "PackType" VARCHAR(150),
    "Warranty" VARCHAR(150),
    PRIMARY KEY("ItemID" AUTOINCREMENT)
);


Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c9b210>

### _Populating_ the database in second normal form

Firstly, let's consider the entries which are `null` and `= ''`. And we will only insert the relevant entries. 

In [19]:
%%read_sql
-- #Populate the tables so that they conform to 2NF

DELETE FROM "Products_2NF";
DELETE FROM "Transactions_2NF";
DELETE FROM "Navigation_2NF";
DELETE FROM "PackageContents_2NF";
DELETE FROM "Colours_2NF";


INSERT INTO "Navigation_2NF"("NavigationPath")
SELECT DISTINCT 
    NavigationPath 
FROM
    Products_1NF;
    
INSERT INTO "PackageContents_2NF"("ItemDescription","PackType","Warranty")
SELECT DISTINCT 
    ItemDescription,
    PackType,
    Warranty 
FROM
    Products_1NF
WHERE ItemDescription <> ''
        ;
    
INSERT INTO "Colours_2NF"("Colour")
SELECT DISTINCT 
    Colour
FROM
    Products_1NF
WHERE Colour <> ''
        ;

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1ca2910>

A more complicated scenario below as we insert into `Products_2NF`. We will insert from the Products_1NF table. We can use LEFT JOIN, because the data in each of these tables that `Products_1NF` has a FK to, originally came from `Products_1NF`, so we are gauranteed to get matches back. Normally however, what you would get from the FK restriction is the requirement to use _parent-table_ `LEFT JOIN` _child-table_. 

In [20]:
%%read_sql 

INSERT INTO "Products_2NF" ("PathID","ItemID","ColourID","Barcode","ProductDescription","Brand","Price",
                            "Quantity","StockCountry","Weight_kg","Volume_litre",
                            "Length","Width","Height")
SELECT
    PathID,
    ItemID,
    ColourID,
    Products_1NF.Barcode,
    ProductDescription,
    Brand,
    Price,
    Quantity,
    StockCountry,
    Weight_kg, 
    Volume_litre, 
    Length,
    Width,
    Height 
FROM 
    Products_1NF
LEFT JOIN Navigation_2NF ON Products_1NF.NavigationPath = Navigation_2NF.NavigationPath 
LEFT JOIN PackageContents_2NF ON Products_1NF.ItemDescription = PackageContents_2NF.ItemDescription
LEFT JOIN Colours_2NF ON Products_1NF.Colour = Colours_2NF.Colour

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1ca94d0>

Last but not least, we insert into the `Transactions_2NF`table. Recall that we created a new PK in this table, and pay attention to how we are inserting values into the table. 

In [21]:
%%read_sql 

INSERT INTO "Transactions_2NF"("CartID||Barcode||UserName","CartID","Barcode","UserName","InvoiceDate","Total")
SELECT DISTINCT 
    CartID||Transactions_1NF.Barcode||UserName,
    CartID,
    Transactions_1NF.Barcode,
    UserName,
    InvoiceDate,
    Total
FROM
    Transactions_1NF
LEFT JOIN Products_2NF ON Transactions_1NF.Barcode = Products_2NF.Barcode ;

Query started at 02:05:30 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1cc4910>

<a id='Target_ERD'></a>
### 8. Convert the table into its 3rd Normal Form (3NF)

<br>

<img src="images/3rdNF_bhejane.PNG"/>


In [22]:
%%read_sql

DROP TABLE IF EXISTS "Transactions_3NF";
DROP TABLE IF EXISTS "Carts_3NF";
DROP TABLE IF EXISTS "Products_3NF";
DROP TABLE IF EXISTS "Users_3NF";
DROP TABLE IF EXISTS "Navigation_3NF";
DROP TABLE IF EXISTS "PackageContents_3NF";
DROP TABLE IF EXISTS "Colours_3NF";
DROP TABLE IF EXISTS "Brands_3NF";
DROP TABLE IF EXISTS "Locations_3NF";

Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1cc7110>

In [23]:
%%read_sql

--#Create tables required for 2NF
CREATE TABLE "Products_3NF" (
    "RegistryID" INTEGER NOT NULL,
    "Barcode" VARCHAR(150) NOT NULL,
    "ProductDescription" VARCHAR(150) NOT NULL,
    "Price" REAL NOT NULL,
    "Quantity" INTEGER NOT NULL,
    "Weight_kg" REAL,
    "Volume_litre" REAL,
    "Length" REAL,
    "Width" REAL,
    "Height" REAL,
    "PathID" INTEGER,
    "ItemID" INTEGER,
    "ColourID" INTEGER,
    "BrandID" INTEGER,
    "LocationID" INTEGER,
    PRIMARY KEY (RegistryID AUTOINCREMENT),
    CONSTRAINT fk_nav FOREIGN KEY ('PathID') REFERENCES Navigation_3NF ("PathID"),
    CONSTRAINT fk_pack FOREIGN KEY ('ItemID') REFERENCES PackageContents_3NF ("ItemID"),
    CONSTRAINT fk_col FOREIGN KEY ('ColourID') REFERENCES Colours_3NF ("ColourID") ,
    FOREIGN KEY ('BrandID') REFERENCES Brands_3NF ("BrandID"),
    FOREIGN KEY ('LocationID') REFERENCES Locations_3NF ("LocationID")
);


CREATE TABLE "Carts_3NF" (
    "CartID" INTEGER NOT NULL,
    "InvoiceDate" DATETIME NOT NULL,
    "Total" REAL NOT NULL,
    PRIMARY KEY ("CartID")
);


CREATE TABLE "Users_3NF"(
    "UserID" INTEGER NOT NULL,
    UserName VARCHAR(50) NOT NULL,
    PRIMARY KEY("UserID" AUTOINCREMENT)
);

Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1cc51d0>

### <font color='turquoise'>Action: proceed with the construction of the remaining tables mentioned </font>

In [24]:
%%read_sql

CREATE TABLE "Transactions_3NF" (
    "CartID||Barcode||UserName" VARCHAR(150) NOT NULL,
    "Barcode" VARCHAR(150) NOT NULL,
    "CartID" INTEGER NOT NULL,
    "UserID" INTEGER NOT NULL,
    PRIMARY KEY ("CartID||Barcode||UserName"),
    FOREIGN KEY ("Barcode") REFERENCES Products_3NF ("Barcode"),
    FOREIGN KEY ("CartID") REFERENCES Carts_3NF ("CartID") ,
    FOREIGN KEY ("UserID") REFERENCES Users_3NF ("UserID")
);
    
CREATE TABLE "Navigation_3NF" (
    "PathID" INTEGER,
    "NavigationPath" VARCHAR(150),
    PRIMARY KEY ("PathID" AUTOINCREMENT)
    
);

CREATE TABLE "PackageContents_3NF"(
    "ItemID" INTEGER NOT NULL,
    "ItemDescription" VARCHAR(150),
    "PackType" VARCHAR(150),
    "Warranty" VARCHAR(150),
    PRIMARY KEY ("ItemID" AUTOINCREMENT)
);

CREATE TABLE "Colours_3NF"(
    "ColourID" INTEGER NOT NULL,
    "Colour" VARCHAR(150),
    PRIMARY KEY ("ColourID" AUTOINCREMENT)
);

CREATE TABLE "Brands_3NF"(
    "BrandID" INTEGER NOT NULL,
    "Brand" VARCHAR(150),
    PRIMARY KEY ("BrandID" AUTOINCREMENT)
);
    
CREATE TABLE "Locations_3NF"(
    "LocationID" INTEGER NOT NULL,
    "StockCountry" VARCHAR(150),
    PRIMARY KEY ("LocationID" AUTOINCREMENT)
);

Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1cabc90>

In [25]:
%%read_sql

DELETE FROM "Products_3NF";
DELETE FROM "Transactions_3NF";
DELETE FROM "Carts_3NF";
DELETE FROM "Users_3NF";
DELETE FROM "Navigation_3NF";
DELETE FROM "PackageContents_3NF";
DELETE FROM "Colours_3NF";
DELETE FROM "Brands_3NF";
DELETE FROM "Locations_3NF";

Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c45390>

In [26]:
%%read_sql

--# Populate the tables to that they conform to the 3rd Normal Form
INSERT INTO "Users_3NF" ("UserName")
SELECT DISTINCT UserName FROM Transactions_2NF;


Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c37c90>

### <font color='turquoise'>Action: proceed with the insertion into `Carts_3NF`.   </font>

Take note that there are duplicate values coming from `Transactions_2NF` - retain the `distinct` combinations of `CartID`,`InvoiceDate`,`Total` only.

In [27]:
%%read_sql

INSERT INTO "Carts_3NF"("CartID","InvoiceDate","Total")
SELECT DISTINCT
    CartID,
    InvoiceDate,
    Total
FROM Transactions_2NF;

Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c9ba10>

### <font color='turquoise'>Action: proceed with the insertion into the tables below. </font>

Similarly to the table above, mind the occurances of duplicate entries. These can be avoided by using `SELECT distinct` to avoid inserting redundant rows. 


In [28]:
%%read_sql

INSERT INTO "Locations_3NF" ("StockCountry")
SELECT DISTINCT
    StockCountry
FROM
    Products_2NF
WHERE StockCountry <> ''
        ;
    

INSERT INTO "Brands_3NF" ("Brand")
SELECT DISTINCT
    Brand
FROM
    Products_2NF
WHERE Brand <> ''
        ;
    

INSERT INTO "Colours_3NF" ("Colour")
SELECT DISTINCT
    Colour
FROM
    Colours_2NF;
    

INSERT INTO "PackageContents_3NF" ("ItemDescription","PackType","Warranty")
SELECT DISTINCT
    ItemDescription,
    PackType,
    Warranty
FROM
    PackageContents_2NF;


INSERT INTO "Navigation_3NF" ("NavigationPath")
SELECT DISTINCT
    NavigationPath
FROM
    Navigation_2NF;

Query started at 02:05:31 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c45a50>

### <font color='turquoise'>Action: proceed with the insertion into `Products_3NF`. </font>
Take note of the joins required for additional data contained in other tables. 

In [29]:
%%read_sql

INSERT INTO "Products_3NF" ("PathID","ItemID","ColourID","BrandID","LocationID",
                            "Barcode","ProductDescription","Price",
                            "Quantity","Weight_kg","Volume_litre",
                            "Length","Width","Height")
SELECT
    Products_2NF.PathID,
    Products_2NF.ItemID,
    Products_2NF.ColourID,
    BrandID,
    LocationID,
    Products_2NF.Barcode,
    ProductDescription,
    Price,
    Quantity,
    Weight_kg,
    Volume_litre,
    Length,
    Width,
    Height
FROM
    Products_2NF

LEFT JOIN Navigation_3NF ON Products_2NF.PathID = Navigation_3NF.PathID
LEFT JOIN PackageContents_3NF ON Products_2NF.ItemID = PackageContents_3NF.ItemID
LEFT JOIN Colours_3NF ON Products_2NF.ColourID = Colours_3NF.ColourID
LEFT JOIN Brands_3NF ON Products_2NF.Brand = Brands_3NF.Brand
LEFT JOIN Locations_3NF ON Products_2NF.StockCountry = Locations_3NF.StockCountry


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c86150>

### <font color='turquoise'>Action: proceed with the insertion into `Transactions_3NF`.   </font>

Follow previous proceedure, and checks. 

In [30]:
%%read_sql

INSERT INTO "Transactions_3NF"("CartID||Barcode||UserName","CartID","Barcode","UserID")
SELECT
    Transactions_2NF.CartID||Transactions_2NF.Barcode||Transactions_2NF.UserName,
    Transactions_2NF.CartID,
    Transactions_2NF.Barcode,
    UserID
FROM
    Transactions_2NF
LEFT JOIN Users_3NF ON Transactions_2NF.UserName = Users_3NF.UserName

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

<sql_magic.exceptions.EmptyResult at 0x22fd1c370d0>

### <font color='turquoise'>Action: Investigate the DB.   </font>

Note - the tables counts below are what you should arrive at. If you don't, reconsider how you have done the inserts above. Having the tables correctly formatted will allow to answer the questions more easily, and without errors.

| Table Name | Count |
| --- | --- |
| Brands_3NF | 232 |
| Carts_3NF |  190 |
| Colours_3NF | 17 |
| Locations_3NF | 2 |
| Navigation_3NF | 396 |
| PackageContents_3NF | 600 |
| Products_3NF | 1214 |
| Transactions_3NF | 275 |
| Users_3NF | 158 |

In [31]:
%%read_sql 

-- How many entries are here in each of the tables in the database now? 
SELECT 'Products_3NF' as table_name, count(*) FROM Products_3NF
UNION
SELECT 'Transactions_3NF' as table_name, count(*) FROM Transactions_3NF
UNION
SELECT 'Users_3NF' as table_name, count(*) FROM Users_3NF
UNION
SELECT 'Navigation_3NF' as table_name, count(*) FROM Navigation_3NF
UNION
SELECT 'PackageContents_3NF' as table_name, count(*) FROM PackageContents_3NF
UNION
SELECT 'Colours_3NF' as table_name, count(*) FROM Colours_3NF
UNION
SELECT 'Brands_3NF' as table_name, count(*) FROM Brands_3NF
UNION
SELECT 'Locations_3NF' as table_name, count(*) FROM Locations_3NF
UNION
SELECT 'Carts_3NF' as table_name, count(*) FROM Carts_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,table_name,count(*)
0,Brands_3NF,0
1,Carts_3NF,0
2,Colours_3NF,0
3,Locations_3NF,0
4,Navigation_3NF,0
5,PackageContents_3NF,0
6,Products_3NF,0
7,Transactions_3NF,0
8,Users_3NF,0


Q1) How many unique products does the company have?

In [32]:
%%read_sql

SELECT *
FROM Brands_3NF
--LIMIT 5;

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,BrandID,Brand


In [33]:
%%read_sql

SELECT *
FROM Carts_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,CartID,InvoiceDate,Total


In [34]:
%%read_sql

SELECT *
FROM Colours_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,ColourID,Colour


In [35]:
%%read_sql

SELECT *
FROM Locations_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,LocationID,StockCountry


In [36]:
%%read_sql

SELECT *
FROM Navigation_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,PathID,NavigationPath


In [37]:
%%read_sql

SELECT *
FROM Packagecontents_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,ItemID,ItemDescription,PackType,Warranty


In [38]:
%%read_sql

SELECT *
FROM Products_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,RegistryID,Barcode,ProductDescription,Price,Quantity,Weight_kg,Volume_litre,Length,Width,Height,PathID,ItemID,ColourID,BrandID,LocationID


In [39]:
%%read_sql

SELECT *
FROM Transactions_3NF


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,CartID||Barcode||UserName,Barcode,CartID,UserID


In [40]:
%%read_sql

SELECT *
FROM Users_3NF

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,UserID,UserName


<br>
<br>
<br>

<a id='MCQ_questions'></a>
# Questions

Having completed the normalisation of the database, the following cells were used to answer specific questions

#### How many unique products does the company have?

In [41]:
%%read_sql


SELECT COUNT (DISTINCT Barcode) AS "Number_of_Unique_products"
FROM Products_3NF

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_Unique_products
0,0


<br>

#### How many users bought from Bhejane in April 2020?

In [42]:
%%read_sql


SELECT COUNT(Users_3NF.UserID) AS "Number_of_users"
FROM Users_3NF
INNER JOIN Transactions_3NF
ON Users_3NF.UserID = Transactions_3NF.UserID
INNER JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID
WHERE SUBSTR(CAST(Carts_3NF.InvoiceDate AS VARCHAR), 1, 7) = '2020-04'


Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_users
0,0


<br>

#### How many users bought 3 or more items that cost more than R1000?

In [43]:
%%read_sql


SELECT Transactions_3NF.UserID, Carts_3NF.Total
FROM Transactions_3NF
LEFT JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID
GROUP BY Carts_3NF.InvoiceDate, Transactions_3NF.UserID
HAVING COUNT(Transactions_3NF.UserID) >= 3
AND Carts_3NF.Total > 1000
ORDER BY Carts_3NF.Total DESC

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,UserID,Total


In [44]:
%%read_sql

-- Write your query here:
SELECT COUNT(Products_3NF.ItemID) AS Number_of_Trans_Per_User, Products_3NF.Price
FROM Transactions_3NF
INNER JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID
INNER JOIN Products_3NF
ON Transactions_3NF.Barcode = Products_3NF.Barcode
GROUP BY Transactions_3NF.UserID
HAVING [Number_of_Trans_Per_User] > 3
ORDER BY Products_3NF.Price DESC

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_Trans_Per_User,Price


<br>

#### Which user made the largest purchase on a single transaction?

In [45]:
%%read_sql

SELECT Users_3NF.UserName, Carts_3NF.Total
FROM Users_3NF
INNER JOIN Transactions_3NF
ON Users_3NF.UserID = Transactions_3NF.UserID
INNER JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID

ORDER BY Carts_3NF.Total DESC
LIMIT 5;

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,UserName,Total


<br>

#### How many components does the product: "5m Colour Changing RGB LED Strip Light" (MPTAL57588104) come with?

In [46]:
%%read_sql


SELECT Packagecontents_3NF.ItemDescription, Products_3NF.ProductDescription
FROM Products_3NF
INNER JOIN Packagecontents_3NF
ON Packagecontents_3NF.ItemID = Products_3NF.ItemID
WHERE Products_3NF.Barcode = 'MPTAL57588104'

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,ItemDescription,ProductDescription


<br>

#### How many brands are available at Bhejane?

In [47]:
%%read_sql


SELECT COUNT(DISTINCT Brand) AS "Number_of_brands_in_Bhejane"
FROM Brands_3NF

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Number_of_brands_in_Bhejane
0,0


<br>

#### What is the price of the "Verimark - Floorwiz 2in1 Mop"?

In [48]:
%%read_sql

SELECT DISTINCT Barcode, Quantity, Price
FROM Products_3NF
WHERE ProductDescription = 'Verimark - Floorwiz 2in1 Mop'

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Barcode,Quantity,Price


<br>

#### Calculate the package volume of the "Russell Hobbs - Slow Cooker" using the given dimensions

In [49]:
%%read_sql


SELECT DISTINCT ProductDescription, Length*Width*Height AS Package_Volume
FROM Products_3NF
WHERE ProductDescription = 'Russell Hobbs - Slow Cooker'

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,ProductDescription,Package_Volume


<br>

#### Which user made the most transactions in the Year 2020?

In [50]:
%%read_sql


SELECT Carts_3NF.InvoiceDate, Users_3NF.UserName, COUNT(Transactions_3NF.UserID) AS Number_of_Transactions, Carts_3NF.Total
FROM Users_3NF
INNER JOIN Transactions_3NF
ON Users_3NF.UserID = Transactions_3NF.UserID
INNER JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID
WHERE SUBSTR(CAST(Carts_3NF.InvoiceDate AS VARCHAR), 1, 4) = '2020'

GROUP BY Transactions_3NF.UserID
ORDER BY [Number_of_Transactions] DESC

Query started at 02:05:32 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,InvoiceDate,UserName,Number_of_Transactions,Total


<br>

#### What is the total number of users that shop at Bhejane?

In [51]:
%%read_sql


SELECT COUNT(DISTINCT UserId) AS "Total_users_that_shop_at_Bhejane"
FROM Transactions_3NF

Query started at 02:05:33 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Total_users_that_shop_at_Bhejane
0,0


<br>

#### What is the record count for the Colours_3NF Table?

In [52]:
%%read_sql


SELECT COUNT(DISTINCT Colour)
FROM Colours_3NF
WHERE Colour IS NOT NULL

Query started at 02:05:33 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,COUNT(DISTINCT Colour)
0,0


<br>

#### What would the total price be if I had the following items in my cart?
* MPTAL57588104
* 5000394203921
* 6932391917652

In [53]:
%%read_sql
-- Write your query here:

SELECT SUM(DISTINCT Price) AS "Total_price"
FROM Products_3NF
WHERE Barcode = 'MPTAL57588104'
OR Barcode = '5000394203921'
OR Barcode = '6932391917652'

Query started at 02:05:33 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Total_price
0,


<br>

#### What is the barcode of the most sold product?

In [54]:
%%read_sql


SELECT Barcode, COUNT(Barcode) AS Number_of_Barcodes
FROM Transactions_3NF
GROUP BY Barcode
ORDER BY [Number_of_Barcodes] DESC

Query started at 02:05:33 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,Barcode,Number_of_Barcodes


<br>

#### What are the products of Cornelis’ cart on the 2020-06-28 0:00:00?

In [55]:
%%read_sql


SELECT DISTINCT Products_3NF.ProductDescription
FROM Products_3NF
INNER JOIN Transactions_3NF
ON Products_3NF.Barcode = Transactions_3NF.Barcode
INNER JOIN Users_3NF
ON Transactions_3NF.UserID = Users_3NF.UserID
INNER JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID
WHERE Users_3NF.UserName = 'Cornelis'
AND Carts_3NF.InvoiceDate = '2020-06-28 0:00:00'

Query started at 02:05:33 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,ProductDescription


<br>

#### Which users bought locally produced, black-coloured products on odd-numbered months of the year, and what was the total cost (rounded to the nearest integer) of the carts containing these products?

In [56]:
%%read_sql


SELECT DISTINCT Users_3NF.UserName
FROM Users_3NF
INNER JOIN Transactions_3NF
ON Users_3NF.UserID = Transactions_3NF.UserID
INNER JOIN Carts_3NF
ON Transactions_3NF.CartID = Carts_3NF.CartID
INNER JOIN Products_3NF
ON Transactions_3NF.Barcode = Products_3NF.Barcode
INNER JOIN Colours_3NF
ON Products_3NF.ColourID = Colours_3NF.ColourID
INNER JOIN Locations_3NF
ON Products_3NF.LocationID = Locations_3NF.LocationID
WHERE LOWER(Locations_3NF.StockCountry) = 'south africa'
AND LOWER(Colours_3NF.Colour) = 'black'
AND CAST(SUBSTR(CAST(Carts_3NF.InvoiceDate AS VARCHAR), 6, 2) AS INT)%2 = 0

Query started at 02:05:33 AM W. Central Africa Standard Time; Query executed in 0.00 m

Unnamed: 0,UserName
