[![General Assembly Logo](https://camo.githubusercontent.com/1a91b05b8f4d44b5bbfb83abac2b0996d8e26c92/687474703a2f2f692e696d6775722e636f6d2f6b6538555354712e706e67)](https://generalassemb.ly/education/web-development-immersive)
![Misk Logo](https://i.ibb.co/KmXhJbm/Webp-net-resizeimage-1.png)

*Instructor: Marcus Lim*

In [1]:
import sqlite3
import pandas as pd

In [2]:
conn = sqlite3.connect('northwind')

This is a stripped-down version of the Northwind database, a set of tables representing shipping information. There are three tables in this database: `order_details`, `orders` and `products`. Your task will be to answer the questions *only using SQL*. You can use `pd.read_sql` to communicate with the database and `pandas` to check your answers, but all your results should be gotten purely by executing SQL queries.

## The `products` table

**1. Display the first three rows of the `products` table.**

In [4]:
df = pd.read_sql('SELECT * FROM products', conn)

In [5]:
df.head(3)

Unnamed: 0,productid,productname,supplierid,categoryid,quantityperunit,unitprice,unitsinstock,unitsonorder,reorderlevel,discontinued
0,1,Chai,1,1,10 boxes x 20 bags,18.0,39,0,10,0
1,2,Chang,1,1,24 - 12 oz bottles,19.0,17,40,25,0
2,3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0,13,70,25,0


**2. What categories of products does the company sell?**

In [8]:
df.categoryid.unique()

array([1, 2, 7, 6, 8, 4, 3, 5])

In [9]:
sql ='''
SELECT DISTINCT categoryid
FROM products
'''

pd.read_sql(sql, conn)

Unnamed: 0,categoryid
0,1
1,2
2,7
3,6
4,8
5,4
6,3
7,5


**3. How many products per category are there?**

In [13]:
sql ='''
SELECT DISTINCT categoryid, COUNT(productid) AS num_of_products
FROM products
GROUP BY categoryid 
'''

pd.read_sql(sql, conn)

Unnamed: 0,categoryid,num_of_products
0,1,12
1,2,12
2,3,13
3,4,10
4,5,7
5,6,6
6,7,5
7,8,12


**4. How many products per category have *not* been discontinued?**

In [15]:
sql ='''
SELECT DISTINCT categoryid, COUNT(productid) AS num_of_products
FROM products
WHERE discontinued = 0
GROUP BY categoryid 
'''

pd.read_sql(sql, conn)

Unnamed: 0,categoryid,num_of_products
0,1,11
1,2,11
2,3,13
3,4,10
4,5,6
5,6,2
6,7,4
7,8,12


**5a. What are the five most expensive products (that haven't been discontinued)?**

In [21]:
sql = '''
Select productname, MAX(unitprice)
FROM products
WHERE discontinued = 0
LIMIT 5
'''

pd.read_sql(sql, conn)

Unnamed: 0,productname,MAX(unitprice)
0,Côte de Blaye,263.5


In [20]:
sql = '''
SELECT productname , unitprice  
FROM products 
WHERE discontinued = 0
ORDER BY unitprice DESC
LIMIT 5
'''
pd.read_sql(sql , conn)

Unnamed: 0,productname,unitprice
0,Côte de Blaye,263.5
1,Sir Rodney's Marmalade,81.0
2,Carnarvon Tigers,62.5
3,Raclette Courdavault,55.0
4,Manjimup Dried Apples,53.0


**5b. How many units of each of the answers to 5a are in stock?**

In [25]:
sql = '''
SELECT productname , unitprice, unitsinstock
FROM products 
WHERE discontinued = 0
ORDER BY unitprice DESC
LIMIT 5
'''
pd.read_sql(sql , conn)

Unnamed: 0,productname,unitprice,unitsinstock
0,Côte de Blaye,263.5,17
1,Sir Rodney's Marmalade,81.0,40
2,Carnarvon Tigers,62.5,42
3,Raclette Courdavault,55.0,79
4,Manjimup Dried Apples,53.0,20


**6. What is the ratio of units in stock to units on order for all the products?**

In [26]:
sql = '''
SELECT AVG(unitsinstock) AS average_stock,   
AVG(unitsonorder) AS average_order  
FROM products 
'''
pd.read_sql(sql , conn)

Unnamed: 0,average_stock,average_order
0,40.506494,10.12987


**7. [Challenge] What is the total cost of all the stock?**

In [30]:
sql = '''
SELECT productname , (unitprice * unitsinstock) AS Total_Cost  
FROM products 
WHERE discontinued = 0
'''
pd.read_sql(sql , conn)

Unnamed: 0,productname,Total_Cost
0,Chai,702.00
1,Chang,323.00
2,Aniseed Syrup,130.00
3,Chef Anton's Cajun Seasoning,1166.00
4,Grandma's Boysenberry Spread,3000.00
...,...,...
64,Röd Kaviar,1515.00
65,Longlife Tofu,40.00
66,Rhönbräu Klosterbier,968.75
67,Lakkalikööri,1026.00


## The `order_details` and `orders` tables

In [34]:
df_order_details = pd.read_sql('SELECT * FROM order_details', conn)
df_orders = pd.read_sql('SELECT * FROM orders', conn)

In [35]:
df_order_details.head(2)

Unnamed: 0,index,orderid,productid,unitprice,quantity,discount
0,0,10248,11,14.0,12,0.0
1,1,10248,42,9.8,10,0.0


In [36]:
df_orders.head(2)

Unnamed: 0,orderid,customerid,employeeid,orderdate,requireddate,shippeddate,shipvia,freight,shipname,shipaddress,shipcity,shipregion,shippostalcode,shipcountry
0,10248,VINET,5,1996-07-04 00:00:00.000,1996-08-01 00:00:00.000,1996-07-16 00:00:00.000,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France
1,10249,TOMSP,6,1996-07-05 00:00:00.000,1996-08-16 00:00:00.000,1996-07-10 00:00:00.000,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany


**1. How many orders are there in total?**

In [37]:
sql = '''
SELECT COUNT(orderid) 
FROM orders 
'''


pd.read_sql(sql , conn)

Unnamed: 0,COUNT(orderid)
0,830


**2. List all the countries receiving orders.**

In [38]:
sql = '''
SELECT DISTINCT shipcountry
FROM orders
'''

pd.read_sql(sql, conn)

Unnamed: 0,shipcountry
0,France
1,Germany
2,Brazil
3,Belgium
4,Switzerland
5,Venezuela
6,Austria
7,Mexico
8,USA
9,Sweden


**3. Which country receives the most orders?**

In [45]:
sql = '''
SELECT shipcountry  , SUM(quantity) AS orders  
FROM orders  
JOIN order_details
ON orders.orderid = order_details.orderid
GROUP BY 1
ORDER BY 2 DESC
LIMIT 1
'''

pd.read_sql(sql, conn)

Unnamed: 0,shipcountry,orders
0,USA,9330


**4. Which customers are submitting more orders than average?**

In [52]:
sql = '''
SELECT customerid
FROM orders  
JOIN order_details
ON orders.orderid = order_details.orderid
WHERE quantity > (
    SELECT AVG(quantity)
    FROM order_details)

'''
pd.read_sql(sql , conn)

Unnamed: 0,customerid
0,TOMSP
1,HANAR
2,SUPRD
3,SUPRD
4,SUPRD
...,...
855,LEHMS
856,ERNSH
857,ERNSH
858,RICSU


**5. Which order generated the most revenue? (Don't forget the discount)**

In [51]:
sql = '''
SELECT  orderid, unitprice * quantity - discount AS revenue
FROM  order_details
 ORDER BY 2 DESC
 LIMIT 1
'''
pd.read_sql(sql , conn)

Unnamed: 0,orderid,revenue
0,10981,15810.0


**6. [Challenge] Which 3 customers generated the most revenue?**

In [50]:
sql = '''
SELECT  customerid, unitprice * quantity - discount AS revenue
FROM orders  
JOIN order_details
ON orders.orderid = order_details.orderid 
ORDER BY 2 DESC
LIMIT 3
'''
pd.read_sql(sql , conn)

Unnamed: 0,customerid,revenue
0,HANAR,15810.0
1,QUICK,15809.95
2,SIMOB,10540.0
