<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice SQL with Pandas pt. 2


---

We've learned about relational databases and the language most use to query them, SQL.  

In this lab we are going to gain more practice converting information to a SQL DB, querying the data and then analyzing it with Python.

In [1]:
# Necessary Libraries
import pandas as pd
import sqlite3
from pandas.io import sql

#### 1.  Read in the EuroMart CSV Data.
- 'EuroMart-ListOfOrders.csv'
- 'EuroMart-OrderBreakdown.csv'
- 'EuroMart-SalesTargets.csv'

In [9]:
# Reading CSV to Dataframe
orders = pd.read_csv('./datasets/csv/EuroMart-ListOfOrders.csv', encoding = 'utf-8')
OBD =  pd.read_csv('./datasets/csv/EuroMart-OrderBreakdown.csv', encoding = 'utf-8')
sales_targets =  pd.read_csv('./datasets/csv/EuroMart-SalesTargets.csv', encoding = 'utf-8')

#### 2. Rename columns to remove any spaces.

In [35]:
# A: 
new_o_col = {o:o.replace(' ', '_') for o in orders.columns}
new_obd_col = {obd:obd.replace(' ', '_') for obd in OBD.columns}
new_st_col = {st:st.replace(' ', '_') for st in sales_targets.columns}
orders.rename(columns=new_col, inplace=True)
OBD.rename(columns=new_obd_col, inplace=True)
sales_targets.rename(columns=new_st_col, inplace=True)

#### 3. Remove dollar signs from sales and profit columns in the order breakdown dataframe.

Convert the columns to float.

In [36]:
# A: 
OBD.head()

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,45.0,-26.0,3,Office Supplies,Paper
1,AZ-2011-9050313,"Dania Corner Shelving, Traditional",0.0,854.0,290.0,7,Furniture,Bookcases
2,AZ-2011-6674300,"Binney & Smith Sketch Pad, Easy-Erase",0.0,140.0,21.0,3,Office Supplies,Art
3,BN-2011-2819714,"Boston Markers, Easy-Erase",0.5,27.0,-22.0,2,Office Supplies,Art
4,BN-2011-2819714,"Eldon Folders, Single Width",0.5,17.0,-1.0,2,Office Supplies,Storage


In [21]:
def remove_str(ele):
    ele = ele.replace('$','')
    ele = ele.replace(',','')
    return ele

In [23]:
OBD[['Sales', 'Profit']] = OBD[['Sales', 'Profit']].applymap(remove_str)

In [37]:
OBD.head()

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,45.0,-26.0,3,Office Supplies,Paper
1,AZ-2011-9050313,"Dania Corner Shelving, Traditional",0.0,854.0,290.0,7,Furniture,Bookcases
2,AZ-2011-6674300,"Binney & Smith Sketch Pad, Easy-Erase",0.0,140.0,21.0,3,Office Supplies,Art
3,BN-2011-2819714,"Boston Markers, Easy-Erase",0.5,27.0,-22.0,2,Office Supplies,Art
4,BN-2011-2819714,"Eldon Folders, Single Width",0.5,17.0,-1.0,2,Office Supplies,Storage


#### 4. Create a SQL Database called 'EuroMart' and save the three dataframes as SQL tables. 

In [28]:
# Establishing Local DB connection
db_connection = sqlite3.connect('datasets/sql/EuroMart.db.sqlite')


In [41]:
# A: 
OBD.to_sql('OBD', db_connection, if_exists='replace')
orders.to_sql('orders_new', db_connection, if_exists='replace')
sales_targets.to_sql('sales_targets_new', db_connection, if_exists='replace')

#### 5. How many orders has each Customer placed? 

In [42]:
orders.head(3)

Unnamed: 0,Order_ID,Order_Date,Customer_Name,City,Country,Region,Segment,Ship_Date,Ship_Mode,State
0,BN-2011-7407039,1/1/2011,Ruby Patel,Stockholm,Sweden,North,Home Office,1/5/2011,Economy Plus,Stockholm
1,AZ-2011-9050313,1/3/2011,Summer Hayward,Southport,United Kingdom,North,Consumer,1/7/2011,Economy,England
2,AZ-2011-6674300,1/4/2011,Devin Huddleston,Valence,France,Central,Consumer,1/8/2011,Economy,Auvergne-Rhône-Alpes


In [78]:
# A:
query = '''
SELECT "Customer_Name", COUNT("Order_ID") as Number_of_Orders
FROM orders_new
GROUP BY "Customer_Name"
ORDER BY "Number_of_Orders" DESC
'''
df = pd.read_sql(query, db_connection)
df

Unnamed: 0,Customer_Name,Number_of_Orders
0,Jose Gambino,13
1,Kayla Tearle,12
2,Mark Washington,12
3,Aaron Bootman,11
4,Georgina Garner,11
5,Hayden Perkins,11
6,Jason Roger,11
7,Jessica Paramor,11
8,Lilly Le Grand,11
9,Lori Miller,11


> *If you're doubting your output check using Pandas*

#### 6. Create a Query to return a table of only geographic features from the List of Orders Table.

In [47]:
orders.head(3)

Unnamed: 0,Order_ID,Order_Date,Customer_Name,City,Country,Region,Segment,Ship_Date,Ship_Mode,State
0,BN-2011-7407039,1/1/2011,Ruby Patel,Stockholm,Sweden,North,Home Office,1/5/2011,Economy Plus,Stockholm
1,AZ-2011-9050313,1/3/2011,Summer Hayward,Southport,United Kingdom,North,Consumer,1/7/2011,Economy,England
2,AZ-2011-6674300,1/4/2011,Devin Huddleston,Valence,France,Central,Consumer,1/8/2011,Economy,Auvergne-Rhône-Alpes


In [49]:
# A:
query = '''
SELECT "Order_ID", "Country", "Region", "State", "City"
FROM orders_new
'''
df = pd.read_sql(query, db_connection)
df.head()

Unnamed: 0,Order_ID,Country,Region,State,City
0,BN-2011-7407039,Sweden,North,Stockholm,Stockholm
1,AZ-2011-9050313,United Kingdom,North,England,Southport
2,AZ-2011-6674300,France,Central,Auvergne-Rhône-Alpes,Valence
3,BN-2011-2819714,United Kingdom,North,England,Birmingham
4,AZ-2011-617423,France,Central,Auvergne-Rhône-Alpes,Echirolles


#### 7. Create a Query to return a table with all of the orders that had a negative profit from the Order Breakdown Table.

In [50]:
OBD.head()

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,45.0,-26.0,3,Office Supplies,Paper
1,AZ-2011-9050313,"Dania Corner Shelving, Traditional",0.0,854.0,290.0,7,Furniture,Bookcases
2,AZ-2011-6674300,"Binney & Smith Sketch Pad, Easy-Erase",0.0,140.0,21.0,3,Office Supplies,Art
3,BN-2011-2819714,"Boston Markers, Easy-Erase",0.5,27.0,-22.0,2,Office Supplies,Art
4,BN-2011-2819714,"Eldon Folders, Single Width",0.5,17.0,-1.0,2,Office Supplies,Storage


In [51]:
# A:
query = '''
SELECT "Order_ID", "Profit"
FROM OBD
WHERE "Profit" < 0
'''
df = pd.read_sql(query, db_connection)
df.head()

Unnamed: 0,Order_ID,Profit
0,BN-2011-7407039,-26.0
1,BN-2011-2819714,-22.0
2,BN-2011-2819714,-1.0
3,BN-2011-3248724,-1012.0
4,BN-2011-3248724,-56.0


#### 8. Construct a query to return a table with the Customer Name and Product Name.  

> **Note:** This will require a join!

In [52]:
orders.head(2)

Unnamed: 0,Order_ID,Order_Date,Customer_Name,City,Country,Region,Segment,Ship_Date,Ship_Mode,State
0,BN-2011-7407039,1/1/2011,Ruby Patel,Stockholm,Sweden,North,Home Office,1/5/2011,Economy Plus,Stockholm
1,AZ-2011-9050313,1/3/2011,Summer Hayward,Southport,United Kingdom,North,Consumer,1/7/2011,Economy,England


In [54]:
OBD.head(2)

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,45.0,-26.0,3,Office Supplies,Paper
1,AZ-2011-9050313,"Dania Corner Shelving, Traditional",0.0,854.0,290.0,7,Furniture,Bookcases


In [55]:
# A:
query = '''
SELECT orders_new."CUSTOMER_NAME", OBD."Product_Name"
FROM orders_new INNER JOIN OBD
ON orders_new."Order_ID" = OBD."Order_ID"
'''
df = pd.read_sql(query, db_connection)
df.head()

Unnamed: 0,Customer_Name,Product_Name
0,Ruby Patel,"Enermax Note Cards, Premium"
1,Summer Hayward,"Dania Corner Shelving, Traditional"
2,Devin Huddleston,"Binney & Smith Sketch Pad, Easy-Erase"
3,Mary Parker,"Boston Markers, Easy-Erase"
4,Mary Parker,"Eldon Folders, Single Width"


#### 9.  How many orders for "Office Supplies" (Category) has Sweden made?

> **Note:** from this point on you'll probably be combining SQL and Pandas, in that you would use SQL querys to gather the relevant information and use Pandas to analyze it.

In [56]:
# A:
query = '''
SELECT "Country", COUNT("Category") as Number_of_Orders
FROM (SELECT orders_new."Country", OBD."Category"
    FROM orders_new INNER JOIN OBD
    ON orders_new."Order_ID" = OBD."Order_ID") as sub
WHERE "Country" = 'Sweden' AND "Category" = 'Office Supplies'
'''

df = pd.read_sql(query, db_connection)
df.head()

Unnamed: 0,Country,Number_of_Orders
0,Sweden,133


#### 10. What was the total sales for products that have been discounted? 

In [80]:
# A:
query = '''
SELECT "Product_Name", "Discount", COUNT("Order_ID") as Number_of_Orders, SUM("Sales") as Total_Sales
FROM OBD
WHERE "Discount" > 0
GROUP BY "Product_Name"
'''
df = pd.read_sql(query, db_connection)
df['Total_Sales'].sum()

1115614.0

#### 11. What is the total quantity of objects sold for each country?

In [63]:
# A:
query = '''
SELECT orders_new."Country", SUM(OBD."Quantity") as Total_Quantity_Sold
FROM orders_new INNER JOIN OBD
ON orders_new."Order_ID" = OBD."Order_ID"
GROUP BY orders_new."Country"
'''
df = pd.read_sql(query, db_connection)
df.head()

Unnamed: 0,Country,Total_Quantity_Sold
0,Austria,973
1,Belgium,532
2,Denmark,204
3,Finland,201
4,France,7329


#### 12. In what Countries are profits lowest? (Report lowest 5-10)

In [73]:
# A:
query = '''
SELECT orders_new."Country", SUM(OBD."Profit") as Total_Profit
FROM orders_new INNER JOIN OBD
ON orders_new."Order_ID" = OBD."Order_ID"
GROUP BY orders_new."Country"
ORDER BY Total_Profit ASC
'''
df = pd.read_sql(query, db_connection)
df.head(10)

Unnamed: 0,Country,Total_Profit
0,Netherlands,-37188.0
1,Sweden,-17524.0
2,Portugal,-8704.0
3,Ireland,-6886.0
4,Denmark,-3608.0
5,Finland,3908.0
6,Norway,5167.0
7,Switzerland,7234.0
8,Belgium,9912.0
9,Italy,15802.0


#### 13. What Counties have the best and worst Sales to Profit Ratios?
(Total Sales divided by Total Profits.)
Essentially this is saying for every dollar of product sold, how much is profit.

In [76]:
# A:
query = '''
SELECT orders_new."Country", SUM(OBD."Profit")/SUM(OBD."Sales") as Profit_Margin_Ratio
FROM orders_new INNER JOIN OBD
ON orders_new."Order_ID" = OBD."Order_ID"
GROUP BY orders_new."Country"
ORDER BY Profit_Margin_Ratio ASC
'''
df = pd.read_sql(query, db_connection)
df

Unnamed: 0,Country,Profit_Margin_Ratio
0,Portugal,-0.576195
1,Sweden,-0.574746
2,Netherlands,-0.528892
3,Denmark,-0.464769
4,Ireland,-0.430429
5,Italy,0.062522
6,France,0.114924
7,Germany,0.176555
8,Spain,0.188719
9,Finland,0.188774


#### 14. What Shipping method is most common for 'Bookcases' (Sub Category)?

In [16]:
# A:


#### 15 .What city in the Orders table generated the highest net sales?  (List all the cities and countries in descending order by net sales.)

In [17]:
# A:

#### BONUS: Create a Column called 'Shipping Delay' on the 'orders' table, which is the difference in days between 'Order Date' and 'Ship Date'.

In [18]:
# A:

In [19]:
# A:

#### BONUS: Update your Orders table in your Sqlite DB to include the 'Shipping Delay' feature.

In [20]:
# A:

#### BONUS: Which Product Category has the highest average 'Shipping Delay'?

In [21]:
# A:

### Challenge problem:   
**In what months and Categories were Sales Targets Exceeded?**

---

This may require a considerable amount of data processing.

In [22]:
# A: