### Project Overview: Northwind Traders Dataset Analysis

### Table of Contents

#### 1. Purpose/Objective

#### 2. Project Workflow
- Data Downloading
- Data Transformation in Excel
- Database Creation and Table Design in MySQL-Workbench
- Importing CSV Files into MySQL Database
- Data Retrieval and Analysis with Jupyter Notebook
#### 3. Showcases in Data Retrieving and Analysis)
- Using Having clause, Group by clause, and In operator
- Using With clause, Window functions, Case statement, Join, Union, Group_Concat, Format, and Aggregated functions



####  Purpose/Objective:

The purpose of this project is to showcase proficiency in MySQL database management, data transformation, and analysis using real-world data from the Northwind Traders dataset. 

                                           

###  Project Workflow
The Northwind dataset obtained from Kaggle comprises seven CSV files. However, for this project, I focused on six of these files.<br> 
##### 1. Data Downloading:
- Six CSV files were downloaded from the Kaggle dataset.<br>
##### 2. Data Transformation in Excel:
- Downloaded and processed six CSV files from the Northwind dataset in Microsoft Excel.<br>
- Transformed columns to ensure data accuracy and relevance.<br>
- Dropped unnecessary columns to streamline data before importation.<br>
##### 3. Database Creation and Table Design in MySQL Workbench:
- Created the Northwind database in MySQL Workbench.<br>
- Designed six tables representing the dataset.<br>
   - Categories
   - Customers
   - Employees
   - OrderDetails
   - Orders
   - Products
- Tables were designed with appropriate column names, data types, and constraints to accurately reflect the dataset structure.<br>

* The following query was used to create the **Northwind** Database and the **Six Tables** in MySQL workbench.

%%sql

CREATE DATABASE IF NOT EXISTS Northwind;
USE Northwind;


CREATE TABLE Categories
(      
    CategoryID INT PRIMARY KEY AUTO_INCREMENT,
    CategoryName VARCHAR(25),
    Description VARCHAR(255)
);

CREATE TABLE Customers
(      
    CustomerID INT PRIMARY KEY AUTO_INCREMENT,
    CustomerName VARCHAR(50),
    City VARCHAR(30),
    Country VARCHAR(30)
);

CREATE TABLE Employee
(
    EmployeeID INT PRIMARY KEY AUTO_INCREMENT,
    employeeName VARCHAR(50),
    Title VARCHAR(50),
    City VARCHAR(30),
    Country VARCHAR(30)
);

CREATE TABLE Shippers(
    ShipperID INT PRIMARY KEY AUTO_INCREMENT,
    ShipperName VARCHAR(25)

);

CREATE TABLE Products(
    ProductID INT PRIMARY KEY AUTO_INCREMENT,
    ProductName VARCHAR(50),
    CategoryID INT,
	FOREIGN KEY (CategoryID) REFERENCES Categories (CategoryID)
	
);

CREATE TABLE Orders(
    OrderID INT PRIMARY KEY AUTO_INCREMENT,
    CustomerID INT,
    EmployeeID INT,
    OrderDate DATETIME,
    ShipperID INT,
    FOREIGN KEY (EmployeeID) REFERENCES Employee (EmployeeID),
    FOREIGN KEY (CustomerID) REFERENCES Customers (CustomerID),
    FOREIGN KEY (ShipperID) REFERENCES Shippers (ShipperID)
);

CREATE TABLE OrderDetails(
    OrderDetailID INT PRIMARY KEY AUTO_INCREMENT,
    OrderID INT,
    ProductID INT,
    unitPrice FLOAT,
    Quantity INT,
	FOREIGN KEY (OrderID) REFERENCES Orders (OrderID),
	FOREIGN KEY (ProductID) REFERENCES Products (ProductID)


##### 4. Importing CSV Files into MySQL Database:**<br>
   - Imported the transformed CSV files into the Northwind database using MySQL-Workbench.<br>
   - Mapped CSV columns to MySQL table columns to ensure data integrity.<br>
   - Resolved any importation challenges encountered during the process.<br>


##### 5. Data Retrieval and Analysis with Jupyter Notebook:
   - Utilized Jupyter Notebook with Python libraries (pymysql, sqlalchemy) to connect to the MySQL database.<br>
   - Executed complex SQL queries to retrieve and analyze data from the Northwind database.<br>
   - Leveraged Jupyter Notebook magic commands to streamline query execution.<br>
   - Applied data analysis techniques to derive insights and generate reports.<br>


**Note:** <div style='text-align:justify;'>
 MySQL database tables along with their corresponding CSV files that are used for this data retrieval and analysis project are displayed here 
      so that it helps to understand the transformation process and the queries easily.</div>

**Customers CSV file vs Customers Table**

In [1]:
import pandas as pd

In [7]:
Customers = pd.read_csv(r"C:\Users\tinsu\Desktop\Northwind Traders\customers.csv", encoding='latin1')
Customers.head(2)


Unnamed: 0,customerID,companyName,contactName,contactTitle,city,country
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Berlin,Germany
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Mexico City,Mexico


In [8]:
%%sql
SELECT * FROM Customers
LIMIT 2

CustomerID,CustomerName,City,Country
1,Alfreds Futterkiste,Berlin,Germany
2,Ana Trujillo Emparedados y helados,Mexico City,Mexico


**Orders CSV file Vs Orders Table**

In [9]:
Orders = pd.read_csv(r"C:\Users\tinsu\Desktop\Northwind Traders\orders.csv", encoding='latin1')
Orders.head(2)

Unnamed: 0,orderID,customerID,employeeID,orderDate,requiredDate,shippedDate,shipperID,freight
0,10248,VINET,5,2013-07-04,2013-08-01,2013-07-16,3,32.38
1,10249,TOMSP,6,2013-07-05,2013-08-16,2013-07-10,1,11.61


In [10]:
%%sql
SELECT * FROM Orders
LIMIT 2

OrderID,CustomerID,EmployeeID,OrderDate,ShipperID
10248,85,5,2013-07-04 00:00:00,3
10249,79,6,2013-07-05 00:00:00,1


**order_details CSV file vs OrdersDetails Table** 

In [11]:
Orderdetails = pd.read_csv(r"C:\Users\tinsu\Desktop\Northwind Traders\order_details.csv", encoding='latin1')
Orderdetails.head(2)

Unnamed: 0,orderID,productID,unitPrice,quantity,discount
0,10248,11,14.0,12,0.0
1,10248,42,9.8,10,0.0


In [12]:
%%sql

SELECT * FROM OrderDetails
LIMIT 2

OrderDetailID,OrderID,ProductID,unitPrice,Quantity
1,10248,11,14.0,12
2,10248,42,9.8,10


**products CSV file vs Products Table**

In [13]:
Products = pd.read_csv(r"C:\Users\tinsu\Desktop\Northwind Traders\products.csv", encoding='latin1')
Products.head(2)

Unnamed: 0,productID,productName,quantityPerUnit,unitPrice,discontinued,categoryID
0,1,Chai,10 boxes x 20 bags,18.0,0,1
1,2,Chang,24 - 12 oz bottles,19.0,0,1


In [14]:
%%sql
SELECT * FROM Products
LIMIT 2

ProductID,ProductName,CategoryID
1,Chai,1
2,Chang,1


### SHOWCASES IN DATA RETRIEVING AND ANALYSIS

 ##### 1. USING HAVING CLAUSE, GROUP BY CLAUSE, AND IN OPERATOR

Display countries that have more than 10 customers from the customer table.

In [15]:
%%sql
 
SELECT 
    Country,
    COUNT(CustomerName) as Number_Of_Customer   
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerName) > 10
ORDER BY COUNT(CustomerName) DESC 




Country,Number_Of_Customer
USA,13
Germany,11
France,11


Display cities along with their countries that have more than 3 customers from the customer table.

In [16]:
%%sql 
SELECT 
    Country,
    City,
    COUNT(CustomerName) as Number_Of_customer
FROM Customers
GROUP BY Country, City
HAVING COUNT(CustomerName) > 3
ORDER BY COUNT(CustomerName) DESC 



Country,City,Number_Of_customer
UK,London,6
Mexico,Mexico City,5
Brazil,Sao Paulo,4


Return distinct number of customers from 'USA', 'France', or 'UK'

In [17]:
%%sql
SELECT 
    Country,
    COUNT(DISTINCT CustomerName) AS Number_Of_Customers
FROM Customers
WHERE Country IN ('USA', 'France', 'UK')
GROUP BY Country
ORDER BY Number_Of_Customers DESC

Country,Number_Of_Customers
USA,13
France,11
UK,7


Return countries with more than 3 distinct number of customers that are not from 'USA', 'France', or 'UK'

In [18]:
%%sql

SELECT 
    Country,
    COUNT(DISTINCT CustomerName) AS Number_Of_Customers  
FROM Customers
WHERE Country NOT IN ('USA', 'France', 'UK')
GROUP BY Country
HAVING Number_Of_Customers > 3
ORDER BY Number_Of_Customers desc

Country,Number_Of_Customers
Germany,11
Brazil,9
Mexico,5
Spain,5
Venezuela,4


### COMPLEX QUERIES

##### 2.USING WITH CLAUSE, WINDOW FUNCTIONS, CASE STATEMENT, JOIN, UNION, GROUP_CONCAT, FORMAT, AND AGGREGATED FUNCTIONS.

Fetch from first to third-ranked customers by orders in 2014 along with CustomeName, CustomerID, Number of Orders, and their dense_rank.

In [19]:
%%sql    
WITH Customer_Order_Join AS(   
    SELECT
        CustomerID,
        Order_2014 AS Top3_Order_2014,
        rn
    FROM(
        WITH Order_Years AS
             (
            SELECT 
                CustomerID, 
                COUNT(CASE WHEN YEAR(OrderDate) = 2013 THEN OrderID END) AS Order_2013,
                COUNT(CASE WHEN YEAR(OrderDate) = 2014 THEN OrderID END) AS Order_2014, 
                COUNT(CASE WHEN YEAR(OrderDate) = 2015 THEN OrderID END) AS Order_2015
            FROM Orders  
            GROUP BY CustomerID
          )
        SELECT 
            CustomerID, 
            Order_2014,
            DENSE_RANK() OVER(ORDER BY Order_2014 DESC) AS rn   
        FROM Order_Years
      ) AS subquery
    
    WHERE rn <= 3
            )
SELECT 
    COJ.CustomerID,
    CustomerName,
    COJ.Top3_Order_2014,
    rn
FROM Customer_Order_Join COJ
JOIN Customers C
ON C.CustomerID = COJ.CustomerID

CustomerID,CustomerName,Top3_Order_2014,rn
71,Save-a-lot Markets,17,1
20,Ernst Handel,15,2
37,Hungry Owl All-Night Grocers,15,2
63,QUICK-Stop,14,3


Fetch from first to second-ranked customers by orders in all years along with CustomerName, CustomerID, and Number of Orders.

In [20]:
%%sql
WITH Customer_Order_Join AS(
WITH Order_Years AS (
    SELECT 
        CustomerID, 
        COUNT(CASE WHEN Year(OrderDate) = 2013 THEN OrderID END) AS Order_2013,
        COUNT(CASE WHEN Year(OrderDate) = 2014 THEN OrderID END) AS Order_2014, 
        COUNT(CASE WHEN Year(OrderDate) = 2015 THEN OrderID END) AS Order_2015    
    FROM Orders  
    GROUP BY CustomerID
),
Order_2013 AS (
    SELECT 
        CustomerID, 
        CASE WHEN Order_2013 THEN Order_2013 END AS Top2_Order_2013,
        COALESCE('') AS Top2_Order_2014,
        COALESCE('') AS Top2_Order_2015,
        DENSE_RANK() OVER(ORDER BY Order_2013 DESC) AS RN
    FROM Order_Years 
),

Order_2014 AS (
SELECT 
         CustomerID,    
         COALESCE("") AS Top2_Order_2013,
         CASE WHEN Order_2014 THEN Order_2014 END AS Top2_Order_2014,
         COALESCE("") AS Top2_Order_2015,                 
         DENSE_RANK() OVER(ORDER BY Order_2014 DESC) AS RN
    FROM Order_Years 
),

Order_2015 as(
    SELECT 
         CustomerID,    
         COALESCE("") AS Top2_Order_2013,
         COALESCE("") AS Top2_Order_2014,
         CASE WHEN Order_2015 THEN Order_2015 END AS Top2_Order_2015,                  
         DENSE_RANK() OVER(ORDER BY Order_2015 DESC) AS RN
    FROM Order_Years
)

SELECT * FROM Order_2013 WHERE RN <= 2
UNION ALL
SELECT * FROM Order_2014 WHERE RN <= 2
UNION ALL
SELECT * FROM Order_2015 WHERE RN <= 2

 )           
SELECT 
    CUJ.CustomerID,
    C.CustomerName, 
    CUJ.Top2_Order_2013,
    CUJ.Top2_Order_2014,
    CUJ.Top2_Order_2015
FROM Customer_Order_Join CUJ 
JOIN Customers C 
ON CUJ. CustomerID = C.CustomerID


CustomerID,CustomerName,Top2_Order_2013,Top2_Order_2014,Top2_Order_2015
37,Hungry Owl All-Night Grocers,8.0,,
63,QUICK-Stop,6.0,,
20,Ernst Handel,6.0,,
65,Rattlesnake Canyon Grocery,6.0,,
71,Save-a-lot Markets,,17.0,
20,Ernst Handel,,15.0,
37,Hungry Owl All-Night Grocers,,15.0,
71,Save-a-lot Markets,,,11.0
24,Folk och f HB,,,9.0
20,Ernst Handel,,,9.0


Display the number of orders shipped by each shipper.

In [21]:
%%sql
SELECT
    ShipperName,
    COUNT(OrderID) AS Number_of_Orders   
FROM orders O
JOIN shippers S
ON O.ShipperID = S.shipperID
GROUP BY ShipperName
Order BY number_of_Orders DESC

ShipperName,Number_of_Orders
United Package,326
Federal Shipping,255
Speedy Express,249


Fetch the name of the employee/s who sold the highest number of orders. 

In [22]:
%%sql
 WITH Employee_Orders AS (  
   
    SELECT
        EmployeeName,
        COUNT(OrderID) As Number_of_Orders
    FROM employee E
    JOIN orders O
    ON E.EmployeeID = O.EmployeeID
    GROUP BY EmployeeName
     )
SELECT *
FROM Employee_Orders
WHERE Number_of_Orders = (SELECT Max(Number_of_Orders) FROM Employee_Orders )

EmployeeName,Number_of_Orders
Margaret Peacock,156


Fetch the highest-sold product by quantity.

In [23]:
%%sql
SELECT 
    PrOductName, 
    FORMAT(Max_Total_Quantity, 0) as Max_Total_Quantity
FROM(
    WITH Max_Total_Quantity AS (
        SELECT 
            O.ProductID, 
            SUM(O.Quantity) AS Total_Quantity, 
            ProductName 
        FROM orderdetails O
        JOIN products P 
        ON O.ProductID = P.ProductID
        GROUP BY O.ProductID, ProductName
    )
    SELECT 
        ProductName,
        Total_Quantity AS Max_Total_Quantity
    FROM Max_Total_Quantity
    WHERE Total_Quantity = (SELECT MAX(Total_Quantity) FROM Max_Total_Quantity)

) AS Product_max_Total_Quantity 
               
           

ProductName,Max_Total_Quantity
Camembert Pierrot,1577


Fetch the highest-sold category by quantity.

In [24]:
%%sql


SELECT
    CategoryID,
    CategoryName,
    FORMAT(Total_Quantity_By_category, 0) AS Total_Quantity_By_Category
FROM(
    WITH Max_Sold_Category AS ( 
        SELECT
            C.CategoryID,
            CategoryName,
            SUM(Quantity)  AS Total_Quantity_By_Category
        FROM categories C
        JOIN products p
        ON C.CategoryID = P.CategoryID
        JOIN orderdetails O
        ON P.ProductID = O.ProductID
        GROUP BY C.CategoryID, CategoryName
    
    ) 
    SELECT *
    FROM Max_Sold_Category
    WHERE Total_Quantity_By_Category = (
                                        SELECT
                                            MAX(Total_Quantity_By_Category) AS Max_Total_Quantity_By_Category 
                                        FROM Max_Sold_Category
                                        )
)AS Total_Quantity


        

CategoryID,CategoryName,Total_Quantity_By_Category
1,Beverages,9532


Fetch the top 5 products by total average sales.

In [25]:
%%sql
       
SELECT
    ProductName,
    Total_Quantity,
    FORMAT(Average_Unit_Price, 2) AS Average_Unit_Price,
    FORMAT(Total_Sales, 2) AS Total_Sales
FROM(  
    SELECT *
    
    FROM(
        WITH Products_Total_Sales AS(
            SELECT 
                ProductName,
                SUM(Quantity) AS Total_Quantity,
                AVG(unitPrice) AS Average_Unit_Price  
            FROM products P
            JOIN orderdetails O
            ON P.ProductID = O.ProductID
            GROUP BY ProductName
        )
        SELECT * ,
        
            Total_Quantity * Average_Unit_Price AS Total_Sales
        FROM Products_Total_Sales
    ) AS Total_Sales
ORDER BY Total_Sales DESC
LIMIT 5
) AS Top5_Total_Sales


ProductName,Total_Quantity,Average_Unit_Price,Total_Sales
Côte de Blaye,623,246.33,153465.67
Thüringer Rostbratwurst,746,116.19,86675.88
Raclette Courdavault,1496,51.13,76489.93
Camembert Pierrot,1577,32.08,50587.69
Tarte au sucre,1083,46.08,49908.25


Fetch each category's total sold quantity and total average sales.

In [26]:
%%sql




SELECT
    CategoryID, 
    CategoryName,
    FORMAT(Total_Quantity, 0) AS Total_Quantity_By_Category,
    FORMAT(Total_Average_Sales, 2) AS Total_Average_Sales_By_Category
FROM(
    WITH Total_Average_Sales as (    
        SELECT *
        
        FROM(
            WITH Products_Total_Sales AS(
                SELECT 
                    ProductName,
                    SUM(Quantity) AS Total_Quantity,
                    AVG(unitPrice) AS Average_Unit_Price  
                FROM products P
                JOIN orderdetails O
                ON P.ProductID = O.ProductID
                GROUP BY ProductName
                                 )
            SELECT * ,
                Total_Quantity * Average_Unit_Price AS Total_Sales
            FROM Products_Total_Sales 
           )  AS PTS
    
                     ) 
    SELECT 
        C.CategoryID, 
        CategoryName, 
        SUM(Total_Quantity) AS Total_Quantity,
        SUM(Total_Sales) AS Total_Average_Sales            
    FROM Total_Average_Sales  TAS                
    JOIN products P
    ON P.ProductName = TAS.ProductName
    JOIN categories C
    ON P.categoryID = C.categoryID
    GROUP BY CategoryName, CategoryID
    ORDER BY Total_Average_Sales DESC
   ) AS TAS;
  
               

CategoryID,CategoryName,Total_Quantity_By_Category,Total_Average_Sales_By_Category
1,Beverages,9532,290587.22
4,Dairy Products,9149,252580.26
6,Meat & Poultry,4199,178076.18
3,Confections,7906,177490.79
8,Seafood,7681,142050.59
2,Condiments,5298,115527.66
7,Produce,2990,105121.63
5,Grains & Cereals,4562,100084.27


Fetch the first three highest-sold products in each category along with their CategoryID, CategoryName, and ProductName. 

In [27]:
%%sql

SELECT 
    CategoryID,
    CategoryName,
    GROUP_CONCAT(ProductName) AS ProductName
FROM (
    WITH Total_Average_Sales as (         
        SELECT * 
        FROM(       
            WITH Products_Total_Sales AS( 
                SELECT 
                    ProductName,
                    SUM(Quantity) AS Total_Quantity,             
                    AVG(unitPrice) AS Average_Unit_Price
                FROM products P            
                JOIN orderdetails O 
                ON P.ProductID = O.ProductID 
                GROUP BY ProductName 
                                  ) 
            SELECT * ,  
            Total_Quantity * Average_Unit_Price AS Total_Sales
            FROM Products_Total_Sales 
         ) AS PTS                 
    )           
    
    SELECT          
        C.CategoryID,
        CategoryName, 
        P.ProductName, 
        Total_Sales ,    
        ROW_NUMBER() OVER(PARTITION BY CategoryName ORDER BY Total_Sales DESC) AS RN   
    FROM Total_Average_Sales  TAS 
    JOIN products P   
    ON P.ProductName = TAS.ProductName   
    JOIN categories C   
    ON P.categoryID = C.categoryID
) AS RN_TAS
WHERE RN <= 3
GROUP BY CategoryID, CategoryName


CategoryID,CategoryName,ProductName
1,Beverages,"Côte de Blaye,Ipoh Coffee,Chang"
2,Condiments,"Vegie-spread,Sirop d'érable,Louisiana Fiery Hot Pepper Sauce"
3,Confections,"Tarte au sucre,Sir Rodney's Marmalade,Gumbär Gummibärchen"
4,Dairy Products,"Raclette Courdavault,Camembert Pierrot,Mozzarella di Giovanni"
5,Grains & Cereals,"Gnocchi di nonna Alice,Wimmers gute Semmelknödel,Singaporean Hokkien Fried Mee"
6,Meat & Poultry,"Thüringer Rostbratwurst,Alice Mutton,Perth Pasties"
7,Produce,"Manjimup Dried Apples,Rössle Sauerkraut,Uncle Bob's Organic Dried Pears"
8,Seafood,"Carnarvon Tigers,Ikura,Boston Crab Meat"
