## Business requirements
Sprocket Central Pty Ltd needs help with its customer and transactions data. The organisation has a large dataset relating to its customers, but their team is unsure how to effectively analyse it to help optimise its marketing strategy. 
## Task 1
please find the 3 datasets attached from Sprocket Central Pty Ltd: 
1) *Customer Demographic* <br />
2) *Customer Addresses*  <br/>
3) *Transaction data in the past three months*  
<br />
Can you please review the data quality to ensure that it is ready for our analysis in phase two. Remember to take note of any assumptions or issues we need to go back to the client on. As well as recommendations going forward to mitigate current data quality concerns.


“Hi there – Welcome again to the team! The client has asked our team to assess the quality of their data; as well as make recommendations on ways to clean the underlying data and mitigate these issues.  Can you please take a look at the datasets we’ve received and draft an email to them identifying the data quality issues and how this may impact our analysis going forward?

I will send through an example of a typical data quality framework that can be used as a guide. Remember to consider the join keys between the tables too. Thanks again for your help.”

In [108]:
import pandas as pd 
from matplotlib import pyplot as plt 

In [109]:
excelFile = pd.ExcelFile("kpmg.xlsx")       # pip install openpyxl 

In [110]:
Transactions = pd.read_excel(excelFile, 'Transactions', skiprows=[0]) 
CustomerDemographic = pd.read_excel(excelFile,'CustomerDemographic', skiprows=[0]) 
CustomerAddress = pd.read_excel(excelFile, 'CustomerAddress', skiprows=[0])
pd.set_option("display.max_columns",100) 
pd.set_option("display.max_rows",None) 

In [111]:
Transactions.columns 

Index(['transaction_id', 'product_id', 'customer_id', 'transaction_date',
       'online_order', 'order_status', 'brand', 'product_line',
       'product_class', 'product_size', 'list_price', 'standard_cost',
       'product_first_sold_date', 'Unnamed: 13', 'Unnamed: 14', 'Unnamed: 15',
       'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
       'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',
       'Unnamed: 24', 'Unnamed: 25'],
      dtype='object')

In [112]:
Transactions = Transactions.iloc[:,0:13] 
CustomerDemographic.columns

Index(['customer_id', 'first_name', 'last_name', 'gender',
       'past_3_years_bike_related_purchases', 'DOB', 'job_title',
       'job_industry_category', 'wealth_segment', 'deceased_indicator',
       'default', 'owns_car', 'tenure', 'Unnamed: 13', 'Unnamed: 14',
       'Unnamed: 15', 'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18',
       'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22',
       'Unnamed: 23', 'Unnamed: 24', 'Unnamed: 25'],
      dtype='object')

In [113]:
CustomerDemographic = CustomerDemographic.iloc[:,0:13] 
CustomerAddress.columns 

Index(['customer_id', 'address', 'postcode', 'state', 'country',
       'property_valuation', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',
       'Unnamed: 9', 'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12',
       'Unnamed: 13', 'Unnamed: 14', 'Unnamed: 15', 'Unnamed: 16',
       'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20',
       'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24',
       'Unnamed: 25'],
      dtype='object')

In [114]:
CustomerAddress = CustomerAddress.iloc[:,0:6]  
CustomerAddress.head(0) 

Unnamed: 0,customer_id,address,postcode,state,country,property_valuation


In [115]:
Transactions.head(0) 

Unnamed: 0,transaction_id,product_id,customer_id,transaction_date,online_order,order_status,brand,product_line,product_class,product_size,list_price,standard_cost,product_first_sold_date


In [116]:
CustomerDemographic.head(0) 

Unnamed: 0,customer_id,first_name,last_name,gender,past_3_years_bike_related_purchases,DOB,job_title,job_industry_category,wealth_segment,deceased_indicator,default,owns_car,tenure


In [117]:
CustomerAddress.head(0) 

Unnamed: 0,customer_id,address,postcode,state,country,property_valuation


In [118]:
data = pd.merge(CustomerDemographic,CustomerAddress, on="customer_id")  

In [119]:
data = pd.merge(Transactions,data, on="customer_id")  

In [120]:
data.head(2)  

Unnamed: 0,transaction_id,product_id,customer_id,transaction_date,online_order,order_status,brand,product_line,product_class,product_size,list_price,standard_cost,product_first_sold_date,...,DOB,job_title,job_industry_category,wealth_segment,deceased_indicator,default,owns_car,tenure,address,postcode,state,country,property_valuation
0,1,2,2950,2017-02-25,0.0,Approved,Solex,Standard,medium,medium,71.49,53.62,41245.0,...,1955-01-11,Software Engineer I,Financial Services,Mass Customer,N,ã»(ï¿£âï¿£)ã»:*:,Yes,10.0,984 Hoepker Court,3064,VIC,Australia,6
1,11065,1,2950,2017-10-16,0.0,Approved,Giant Bicycles,Standard,medium,medium,1403.5,954.82,37659.0,...,1955-01-11,Software Engineer I,Financial Services,Mass Customer,N,ã»(ï¿£âï¿£)ã»:*:,Yes,10.0,984 Hoepker Court,3064,VIC,Australia,6


In [122]:
for column in data.columns:
    print(data[column].head(0))

Series([], Name: transaction_id, dtype: int64)
Series([], Name: product_id, dtype: int64)
Series([], Name: customer_id, dtype: int64)
Series([], Name: transaction_date, dtype: datetime64[ns])
Series([], Name: online_order, dtype: float64)
Series([], Name: order_status, dtype: object)
Series([], Name: brand, dtype: object)
Series([], Name: product_line, dtype: object)
Series([], Name: product_class, dtype: object)
Series([], Name: product_size, dtype: object)
Series([], Name: list_price, dtype: float64)
Series([], Name: standard_cost, dtype: float64)
Series([], Name: product_first_sold_date, dtype: float64)
Series([], Name: first_name, dtype: object)
Series([], Name: last_name, dtype: object)
Series([], Name: gender, dtype: object)
Series([], Name: past_3_years_bike_related_purchases, dtype: int64)
Series([], Name: DOB, dtype: datetime64[ns])
Series([], Name: job_title, dtype: object)
Series([], Name: job_industry_category, dtype: object)
Series([], Name: wealth_segment, dtype: object)
