### Prepping Data Challenge: C&BSCo Actual Sales Values (Week 30)
 
### Requirements
- Input the 'Top 3 Sales People per Store' for both regions: East & West
- Combine these files
  - Bonus challenge for experienced Preppers - take the Region Name from the File Name. For newer Preppers, use the Region name field from the Week 27 Input later in the challenge
- Input the 'Store Lookup' file to provide the name of the Stores instead of the ID number
- Remove any duplicate fields you have in the data set so far
- Input the Week 27 Input file
- Use Week 27 Input file to create Sales Values for each Store
- Combine this data with the rest of the prepared data
- Use the data set you have created to determine the actual sales value (rather than percentage) for each sales person
  - Multiply the Sales Person percentage contribution against their Store's total sales for the year
- Output the data (removing any remaining duplicated fields)

In [1]:
import pandas as pd
import numpy as np
import os

In [2]:
#Input the 'Top 3 Sales People per Store' for both regions: East & West
#Combine these files
all_file = ['wk30 - Top 3 Sales People per Store (East).csv','wk30 - Top 3 Sales People per Store (West).csv']
df1 = pd.concat([pd.read_csv(X).assign(Region=os.path.basename(X).split('.')[0]) for X in all_file])
df1['Region'] = df1['Region'].str.extract('\((.*)\)')
#Input the 'Store Lookup' file 
df2 = pd.read_csv('wk30 - Store Lookup.csv')
#Input the Week 27 Input file
df3 = pd.read_csv('wk27-input.csv', parse_dates=['Sale Date'], dayfirst=True)

In [3]:
df1.head()

Unnamed: 0,Store,Sales Person,Percent of Store Sales,Region
0,1,ML,26,East
1,1,PR,21,East
2,1,JK,17,East
3,3,CA,21,East
4,3,JM,15,East


In [4]:
df2.head()

Unnamed: 0,StoreID,Store Name,Region
0,1,Lewisham,East
1,2,Wimbledon,West
2,3,Dulwich,East
3,4,Chelsea,West
4,5,Shoreditch,East


In [5]:
df3.head()

Unnamed: 0,Sale Date,Order ID,Sale Value,Product Name,Store Name,Region,Scent Name
0,2022-12-12,937,109.84,Liquid - 25ml,Lewisham,East,Rose
1,2022-10-14,427,207.61,Liquid - 25ml,Lewisham,East,Rose
2,2022-09-09,135,111.96,Liquid - 25ml,Lewisham,East,Rose
3,2022-12-11,791,170.68,Liquid - 25ml,Wimbledon,West,Rose
4,2022-09-08,270,214.12,Liquid - 25ml,Wimbledon,West,Rose


In [6]:
#Input the 'Store Lookup' file to provide the name of the Stores instead of the ID number
df = pd.merge(df1, df2, how = 'left', left_on = ['Store','Region'], right_on = ['StoreID','Region'])\
       .drop(columns = ['Store'])

In [7]:
#df.head(10)

In [8]:
#Use Week 27 Input file to create Sales Values for each Store
df4 = df3.groupby(['Store Name', 'Region'])['Sale Value'].sum().reset_index()

In [9]:
#df4.head()

In [10]:
#Combine this data with the rest of the prepared data
df = pd.merge(df, df4, how = 'left', on = ['Store Name','Region'])

In [11]:
#Use the data set you have created to determine the actual sales value (rather than percentage) for each sales person
df['Sales per Person'] = df['Sale Value'] * (df['Percent of Store Sales']/100)

In [12]:
output = df[["Sales per Person","Store Name",'Region','Sales Person','Percent of Store Sales','Sale Value']]

In [13]:
output.head(10)

Unnamed: 0,Sales per Person,Store Name,Region,Sales Person,Percent of Store Sales,Sale Value
0,29749.2546,Lewisham,East,ML,26,114420.21
1,24028.2441,Lewisham,East,PR,21,114420.21
2,19451.4357,Lewisham,East,JK,17,114420.21
3,22388.8959,Dulwich,East,CA,21,106613.79
4,15992.0685,Dulwich,East,JM,15,106613.79
5,12793.6548,Dulwich,East,TP,12,106613.79
6,39349.08,Shoreditch,East,JA,40,98372.7
7,14755.905,Shoreditch,East,OF,15,98372.7
8,8853.543,Shoreditch,East,TF,9,98372.7
9,15097.8352,Wimbledon,West,YP,14,107841.68


In [14]:
#Output data
output.to_csv('wk30-output.csv', index=False)