### Prepping Data Challenge: C&BSCo Meeting Targets? (Week 29)
 
### Requirements
- Input both data sets
- Remove unnecessary values from the Product Name field to just leave the Product Type
- Total Sales for each Store and Product Type
- Change the Targets data set into three columns
  - Product
  - Store
  - Sales Target (k's)
- Multiple the Sales Target (k's) by 1000 to create the full sales target number (i.e. 75 becomes 75000)
- Prepare your data sets for joining together by choosing your next step:
  - Easy - make your Sales input Product Type and Store name UPPER CASE
  - Hard - make your Targets' Store and Product fields TitleCase
- Join the data sets together and remove any duplicated fields
- Calculate whether each product in each store beats the target
- Output the results

In [1]:
import pandas as pd
import numpy as np

In [2]:
#input the both data sets
df1 = pd.read_csv('wk29 input1.csv', parse_dates = ["Sale Date"], dayfirst=True )
df2 = pd.read_csv('wk29 input2.csv' )

In [3]:
df1.head()

Unnamed: 0,Sale Date,Order ID,Sale Value,Product Name,Store Name,Region,Scent Name
0,2022-12-12,937,109.84,Liquid - 25ml,Lewisham,East,Rose
1,2022-10-14,427,207.61,Liquid - 25ml,Lewisham,East,Rose
2,2022-09-09,135,111.96,Liquid - 25ml,Lewisham,East,Rose
3,2022-12-11,791,170.68,Liquid - 25ml,Wimbledon,West,Rose
4,2022-09-08,270,214.12,Liquid - 25ml,Wimbledon,West,Rose


In [4]:
#Remove unnecessary values from the Product Name field to just leave the Product Type
df1['Product Name'] = df1['Product Name'].str.extract('(.*)\s\-')

In [5]:
#Total Sales for each Store and Product Type
df1['Total Sales'] = df1.groupby(['Store Name','Product Name'])['Sale Value'].transform('sum')

In [6]:
df1.head()

Unnamed: 0,Sale Date,Order ID,Sale Value,Product Name,Store Name,Region,Scent Name,Total Sales
0,2022-12-12,937,109.84,Liquid,Lewisham,East,Rose,78734.58
1,2022-10-14,427,207.61,Liquid,Lewisham,East,Rose,78734.58
2,2022-09-09,135,111.96,Liquid,Lewisham,East,Rose,78734.58
3,2022-12-11,791,170.68,Liquid,Wimbledon,West,Rose,72279.03
4,2022-09-08,270,214.12,Liquid,Wimbledon,West,Rose,72279.03


In [7]:
df2.head()

Unnamed: 0,PRODUCT,CHELSEA,DULWICH,LEWISHAM,NOTTING HILL,SHOREDITCH,WIMBLEDON
0,BAR,25,30,35,30,30,35
1,LIQUID,60,75,75,65,70,70


In [8]:
#Change the Targets data set into three columns
target = pd.melt(df2, id_vars=['PRODUCT'], var_name='Store', value_name="Target")

In [9]:
#Multiple the Sales Target (k's) by 1000 to create the full sales target number (i.e. 75 becomes 75000)
target["Target"] = target["Target"] * 1000

In [10]:
target.head()

Unnamed: 0,PRODUCT,Store,Target
0,BAR,CHELSEA,25000
1,LIQUID,CHELSEA,60000
2,BAR,DULWICH,30000
3,LIQUID,DULWICH,75000
4,BAR,LEWISHAM,35000


In [11]:
#Prepare your data sets for joining together by choosing your next step:
target['PRODUCT'] = target['PRODUCT'].str.title()
target['Store'] = target['Store'].str.title()

In [12]:
#Join the data sets together and remove any duplicated fields
df = df1.merge(target, how='left', left_on = ['Product Name','Store Name'], right_on = ['PRODUCT','Store'])

In [13]:
df.head()

Unnamed: 0,Sale Date,Order ID,Sale Value,Product Name,Store Name,Region,Scent Name,Total Sales,PRODUCT,Store,Target
0,2022-12-12,937,109.84,Liquid,Lewisham,East,Rose,78734.58,Liquid,Lewisham,75000
1,2022-10-14,427,207.61,Liquid,Lewisham,East,Rose,78734.58,Liquid,Lewisham,75000
2,2022-09-09,135,111.96,Liquid,Lewisham,East,Rose,78734.58,Liquid,Lewisham,75000
3,2022-12-11,791,170.68,Liquid,Wimbledon,West,Rose,72279.03,Liquid,Wimbledon,70000
4,2022-09-08,270,214.12,Liquid,Wimbledon,West,Rose,72279.03,Liquid,Wimbledon,70000


In [14]:
#Calculate whether each product in each store beats the target
df["Beats Target?"] = np.where(df["Total Sales"] > df["Target"], "TRUE", "FALSE")

In [15]:
output = df[["Beats Target?","Target",'Store Name','Region','Total Sales','PRODUCT']]

In [16]:
output.drop_duplicates(keep = 'first', inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


In [17]:
output

Unnamed: 0,Beats Target?,Target,Store Name,Region,Total Sales,PRODUCT
0,True,75000,Lewisham,East,78734.58,Liquid
3,True,70000,Wimbledon,West,72279.03,Liquid
5,True,75000,Dulwich,East,76457.58,Liquid
14,False,60000,Chelsea,West,59640.5,Liquid
18,False,70000,Shoreditch,East,68881.38,Liquid
27,True,65000,Notting Hill,West,67772.14,Liquid
2832,True,35000,Lewisham,East,35685.63,Bar
2843,True,35000,Wimbledon,West,35562.65,Bar
2851,True,30000,Dulwich,East,30156.21,Bar
2854,True,25000,Chelsea,West,29245.81,Bar


In [18]:
#Output data
output.to_csv('wk29-output.csv', index=False)