### Prepping Data Challenge: Bike Sales Target (week 4)

This challenge on Joins 

#### Requirement:
 1. Input the file 
 2. Union the Stores data together 
 3. Remove any unnecessary data fields your Input step might create and rename the 'Table Names' as 'Store' 
 4. Pivot the product columns 
 5. Split the 'Customer Type - Product' field to create: 
     - Customer Type
     - Product
 6. Also rename the Values column resulting from you pivot as 'Products Sold'
 7. Turn the date into a 'Quarter' number 
 8. Sum up the products sold by Store and Quarter 
 9. Add the Targets data 
 10. Join the Targets data with the aggregated Stores data 
     - Note: this should give you 20 rows of data
 11. Remove any duplicate fields formed by the Join
 12. Calculate the Variance between each Store's Quarterly actual sales and the target. Call this field 'Variance to Target' 
 13. Rank the Store's based on the Variance to Target in each quarter 
     - The greater the variance the better the rank
 14. Output the data 

### 1 & 2. Input the file and Union the Stores data together

In [1]:
#import libraries
import pandas as pd

In [2]:
xlsx = pd.ExcelFile('WK4-Bike Sales Target.xlsx')

Target = xlsx.parse('Targets')

df = None
for sheet_name in [x for x in xlsx.sheet_names if x != 'Targets']:
    df1 = xlsx.parse(sheet_name)
    df1['Store'] = sheet_name
    df = pd.concat([df,df1])

In [3]:
df.head()

Unnamed: 0,Date,New - Saddles,New - Mudguards,New - Wheels,New - Bags,Existing - Saddles,Existing - Mudguards,Existing - Wheels,Existing - Bags,Store
0,2021-01-21,13.0,42.0,19.0,38.0,17.0,48.0,19.0,13.0,Manchester
1,2021-02-21,1.0,9.0,14.0,6.0,2.0,4.0,19.0,24.0,Manchester
2,2021-03-21,8.0,22.0,6.0,35.0,0.0,48.0,17.0,16.0,Manchester
3,2021-04-21,3.0,9.0,8.0,16.0,18.0,50.0,18.0,25.0,Manchester
4,2021-05-21,2.0,8.0,5.0,34.0,17.0,3.0,12.0,19.0,Manchester


### 4 & 6. Pivot the product columns and rename the resulting column as 'Products Sold'

In [4]:
df = pd.melt(df, id_vars=['Date','Store'], var_name='Customer', value_name='Product Sold')
#df

### 3 & 5. Split the 'Customer Type - Product' field to create: 
   - Customer Type
   - Product
   ### Remove any unnecessary data fields 

In [5]:
df[['Customer Type','Product']] = df['Customer'].str.split('-', expand=True)
#df

In [6]:
df.drop('Customer', axis = 'columns', inplace=True)

### 7. Turn the date into a 'Quarter' number 

In [7]:
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Quarter'] = df['Date'].dt.quarter

### 8. Sum up the products sold by Store and Quarter

In [8]:
df_sum = df.groupby(['Store','Quarter'])['Product Sold'].sum().reset_index()

### 9 & 10. Add the Targets data and Join the Targets data with the aggregated Stores data 
   - Note: this should give you 20 rows of data

In [9]:
final_output = df_sum.merge(Target, how = 'left', on=['Store','Quarter'])

### 11. Remove any duplicate fields formed by the Join

In [10]:
final_output = final_output.drop_duplicates()

### 12. Calculate the Variance between each Store's Quarterly actual sales and the target. Call this field 'Variance to Target' 

In [11]:
final_output['Variance to Target'] = final_output['Product Sold'] - final_output['Target']

###  13. Rank the Store's based on the Variance to Target in each quarter 
   - The greater the variance the better the rank

In [12]:
final_output['Rank'] = final_output.groupby(['Quarter'])['Variance to Target'].rank(ascending=False).astype(int)

In [13]:
final_output = final_output.sort_values(by = ['Quarter','Rank'])[['Quarter','Rank', 'Store', 'Product Sold', 'Target', 
                                                                  'Variance to Target']]
final_output.head()

Unnamed: 0,Quarter,Rank,Store,Product Sold,Target,Variance to Target
16,1,1,York,499.0,490.0,9.0
0,1,2,Birmingham,477.0,475.0,2.0
4,1,3,Leeds,488.0,490.0,-2.0
12,1,4,Manchester,440.0,475.0,-35.0
8,1,5,London,425.0,475.0,-50.0


###  14. Output the data

In [14]:
final_output.to_csv('WK4-Bike Sales Target Output.csv', index=False)