<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Practice SQL with Pandas pt. 2

---

We've learned about relational databases and the language most used to query them, SQL.  

In this lab we are going to gain more practice converting information to a SQL DB, querying the data and then analysing it with Python.

In [37]:
# Necessary Libraries
import pandas as pd
import sqlite3
from pandas.io import sql

### 1.  Read in the EuroMart CSV Data
- 'EuroMart-ListOfOrders.csv'
- 'EuroMart-OrderBreakdown.csv'
- 'EuroMart-SalesTargets.csv'

In [38]:
# Reading CSV to Dataframe
orders = pd.read_csv('./datasets/csv/EuroMart-ListOfOrders.csv', encoding = 'utf-8')
OBD =  pd.read_csv('./datasets/csv/EuroMart-OrderBreakdown.csv', encoding = 'utf-8')
sales_targets =  pd.read_csv('./datasets/csv/EuroMart-SalesTargets.csv', encoding = 'utf-8')

### 2. Rename columns to remove any spaces

In [39]:
# A: 
orders.columns = [col.strip().replace(' ','_') for col in orders.columns]
OBD.columns = [col.strip().replace(' ','_') for col in OBD.columns]
sales_targets.columns = [col.strip().replace(' ','_') for col in sales_targets.columns]

### 3. Remove dollar signs from sales and profit columns in the order breakdown dataframe.

Convert the columns to float.

In [40]:
# A: 
OBD[['Sales','Profit']] = OBD[['Sales','Profit']].applymap(lambda x: x.replace('$','').replace(',',''))
OBD[['Sales','Profit']] = OBD[['Sales','Profit']].apply(pd.to_numeric)

### 4. Create a SQL Database called 'EuroMart' and save the three dataframes as SQL tables. 

In [41]:
# Establishing Local DB connection
db_connection = sqlite3.connect('./datasets/EuroMart.db.sqlite')


In [42]:
# A:
orders.to_sql('orders', con=db_connection, if_exists='replace', index=False)
OBD.to_sql('OBD', con=db_connection, if_exists='replace', index=False)
sales_targets.to_sql('sales_targets', con=db_connection, if_exists='replace', index=False)

### 5. How many orders has each Customer placed? 

In [121]:
# A:
pd.read_sql('''SELECT Customer_Name, COUNT(Order_ID)
                FROM orders
                GROUP BY Customer_Name
                ORDER BY COUNT(Order_ID) DESC
                LIMIT 5''', con=db_connection)

Unnamed: 0,Customer_Name,COUNT(Order_ID)
0,Jose Gambino,13
1,Kayla Tearle,12
2,Mark Washington,12
3,Aaron Bootman,11
4,Georgina Garner,11


> *If you're doubting your output check using Pandas*

### 6. Create a Query to return a table of only geographic features from the List of Orders Table.

In [52]:
# A:
pd.read_sql('''SELECT City, Country, Region, State FROM orders LIMIT 5''', con=db_connection)

Unnamed: 0,City,Country,Region,State
0,Stockholm,Sweden,North,Stockholm
1,Southport,United Kingdom,North,England
2,Valence,France,Central,Auvergne-Rhône-Alpes
3,Birmingham,United Kingdom,North,England
4,Echirolles,France,Central,Auvergne-Rhône-Alpes


### 7. Create a Query to return a table with all of the orders that had a negative profit from the Order Breakdown Table.

In [56]:
# A:
pd.read_sql('''SELECT * FROM OBD WHERE Profit < 0 LIMIT 5''', con=db_connection)

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,45.0,-26.0,3,Office Supplies,Paper
1,BN-2011-2819714,"Boston Markers, Easy-Erase",0.5,27.0,-22.0,2,Office Supplies,Art
2,BN-2011-2819714,"Eldon Folders, Single Width",0.5,17.0,-1.0,2,Office Supplies,Storage
3,BN-2011-3248724,"Ikea Classic Bookcase, Metal",0.6,987.0,-1012.0,6,Furniture,Bookcases
4,BN-2011-3248724,"Binney & Smith Sketch Pad, Blue",0.5,116.0,-56.0,5,Office Supplies,Art


### 8. Construct a query to return a table with the Customer Name and Product Name.  

> **Note:** This will require a join!

In [70]:
pd.read_sql('''SELECT A.Product_Name, B.Customer_Name
                FROM OBD A LEFT JOIN orders B
                WHERE A.Order_ID=B.Order_ID
                LIMIT 5''', con=db_connection)

Unnamed: 0,Product_Name,Customer_Name
0,"Enermax Note Cards, Premium",Ruby Patel
1,"Dania Corner Shelving, Traditional",Summer Hayward
2,"Binney & Smith Sketch Pad, Easy-Erase",Devin Huddleston
3,"Boston Markers, Easy-Erase",Mary Parker
4,"Eldon Folders, Single Width",Mary Parker


### 9.  How many orders for "Office Supplies" (Category) has Sweden made?

> **Note:** from this point on you'll probably be combining SQL and Pandas, in that you would use SQL querys to gather the relevant information and use Pandas to analyze it.

In [84]:
country_offices = pd.read_sql('''SELECT A.Country, B.Category
                FROM orders A LEFT JOIN OBD B
                WHERE A.Order_ID = B.Order_ID''', con=db_connection)
country_offices[country_offices['Country']=='Sweden'].groupby('Category').count()

Unnamed: 0_level_0,Country
Category,Unnamed: 1_level_1
Furniture,36
Office Supplies,133
Technology,34


### 10. What was the total sales for products that have been discounted? 

In [92]:
# A:
pd.read_sql('''SELECT * FROM OBD WHERE Discount>0''',con=db_connection)['Sales'].sum()

1115614.0

### 11. What is the total quantity of objects sold for each country?

In [98]:
# A:
pd.read_sql('''SELECT A.Quantity, B.Country
                FROM OBD A LEFT JOIN orders B
                WHERE A.Order_ID = B.Order_ID''', con=db_connection).groupby('Country').sum()

Unnamed: 0_level_0,Quantity
Country,Unnamed: 1_level_1
Austria,973
Belgium,532
Denmark,204
Finland,201
France,7329
Germany,6179
Ireland,392
Italy,3612
Netherlands,1526
Norway,261


In [130]:
pd.read_sql('''SELECT * FROM OBD LIMIT 5''', con=db_connection)

Unnamed: 0,Order_ID,Product_Name,Discount,Sales,Profit,Quantity,Category,Sub-Category
0,BN-2011-7407039,"Enermax Note Cards, Premium",0.5,45.0,-26.0,3,Office Supplies,Paper
1,AZ-2011-9050313,"Dania Corner Shelving, Traditional",0.0,854.0,290.0,7,Furniture,Bookcases
2,AZ-2011-6674300,"Binney & Smith Sketch Pad, Easy-Erase",0.0,140.0,21.0,3,Office Supplies,Art
3,BN-2011-2819714,"Boston Markers, Easy-Erase",0.5,27.0,-22.0,2,Office Supplies,Art
4,BN-2011-2819714,"Eldon Folders, Single Width",0.5,17.0,-1.0,2,Office Supplies,Storage


In [94]:
pd.read_sql('''SELECT * FROM orders LIMIT 5''', con=db_connection)

Unnamed: 0,Order_ID,Order_Date,Customer_Name,City,Country,Region,Segment,Ship_Date,Ship_Mode,State
0,BN-2011-7407039,1/1/2011,Ruby Patel,Stockholm,Sweden,North,Home Office,1/5/2011,Economy Plus,Stockholm
1,AZ-2011-9050313,1/3/2011,Summer Hayward,Southport,United Kingdom,North,Consumer,1/7/2011,Economy,England
2,AZ-2011-6674300,1/4/2011,Devin Huddleston,Valence,France,Central,Consumer,1/8/2011,Economy,Auvergne-Rhône-Alpes
3,BN-2011-2819714,1/4/2011,Mary Parker,Birmingham,United Kingdom,North,Corporate,1/9/2011,Economy,England
4,AZ-2011-617423,1/5/2011,Daniel Burke,Echirolles,France,Central,Home Office,1/7/2011,Priority,Auvergne-Rhône-Alpes


### 12. In what countries are profits lowest? (Report lowest 5-10)

In [104]:
# A:
pd.read_sql('''SELECT A.Profit, B.Country
                FROM OBD A LEFT JOIN orders B
                WHERE A.Order_ID = B.Order_ID''', con=db_connection).groupby('Country').sum().sort_values('Profit').head(1)

Unnamed: 0_level_0,Profit
Country,Unnamed: 1_level_1
Netherlands,-37188.0


### 13. What counties have the best and worst profit to sales ratios?

Total profits divided by total sales -
this is saying for every dollar of product sold, how much is the profit.

In [114]:
# A:
Ratio = pd.read_sql('''SELECT A.Profit, A.Sales, B.Country
                FROM OBD A LEFT JOIN orders B
                WHERE A.Order_ID = B.Order_ID''', con=db_connection).groupby('Country').sum()

Ratio['Ratio'] = Ratio.apply(lambda x: x['Profit']/x['Sales'], axis=1)
Ratio.sort_values('Ratio', ascending=False).head(1)

Unnamed: 0_level_0,Profit,Sales,Ratio
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Switzerland,7234.0,24874.0,0.290826


### 14. What Shipping method is most common for 'Bookcases' (Sub Category)

In [139]:
# A:
pd.read_sql('''SELECT A."Sub-Category", B.Ship_Mode, COUNT(Ship_Mode)
                FROM OBD A LEFT JOIN orders B
                ON A.Order_ID = B.Order_ID
                WHERE "Sub-Category" like "Bookcases"
                GROUP BY Ship_Mode''', con=db_connection)

Unnamed: 0,Sub-Category,Ship_Mode,COUNT(Ship_Mode)
0,Bookcases,Economy,234
1,Bookcases,Economy Plus,76
2,Bookcases,Immediate,22
3,Bookcases,Priority,59


In [141]:
pd.read_sql('''SELECT A."Sub-Category", B.Ship_Mode
                FROM OBD A LEFT JOIN orders B
                ON A.Order_ID = B.Order_ID
                WHERE "Sub-Category" like "Bookcases"''', con=db_connection).groupby('Ship_Mode').count()

Unnamed: 0_level_0,Sub-Category
Ship_Mode,Unnamed: 1_level_1
Economy,234
Economy Plus,76
Immediate,22
Priority,59


### 15 .What city in the Orders table generated the highest net sales?  (List all the cities and countries in descending order by net sales.)

In [150]:
# A:
pd.read_sql('''SELECT A.Sales, B.City, B.Country, SUM(Sales)
                FROM OBD A LEFT JOIN orders B
                ON A.Order_ID = B.Order_ID
                GROUP BY City
                ORDER BY SUM(Sales) DESC''', con=db_connection)

Unnamed: 0,Sales,City,Country,SUM(Sales)
0,162.0,London,United Kingdom,69230.0
1,364.0,Berlin,Germany,52555.0
2,55.0,Vienna,Austria,51844.0
3,32.0,Madrid,Spain,44981.0
4,24.0,Paris,France,42245.0
5,47.0,Rome,Italy,28330.0
6,729.0,Barcelona,Spain,27405.0
7,31.0,Hamburg,Germany,23574.0
8,50.0,Marseille,France,21677.0
9,744.0,Turin,Italy,19829.0
