### Prepping Data Challenge: C&BSCo Good Sales but Wrong Sizes (Week 19)
The data team have raised an issue that sales are being made but the products sold are being recorded poorly. Each product sold is recorded but it seems the sales team in each store has been wrongly recording the size of the product. 

Each product in our range has a set size. Can you help prepare the data to show how bad the issue is?

### Requirements
- Input all three sheets of data
- Change the Size ID to an actual Size value in the Sales table
- Link the Product Code to the Sales Table to provide the Scent information
- Create an Output that contains the products sold that have the sizes recorded correctly (Output 1)
- Create another data set that contains all the Products sold with the incorrect sizes and what the sizes should have been
   - Aggregate this data to show each Product Sold, the Scent and the Size it should be with how many sales have incorrectly been recorded for each. 
   - Output this data (Output 2)

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Input all three sheets of data.
with pd.ExcelFile("PD 2022 Wk 19 Input.xlsx") as xlsx:
    product = pd.read_excel(xlsx, 'Product Set')
    sales = pd.read_excel(xlsx, 'Sales')
    size = pd.read_excel(xlsx, 'Size Table')

In [3]:
sales.head()

Unnamed: 0,Product,Store,Size
0,B25,Lewisham,4.0
1,L32,Wimbledon,7.0
2,L2,Wimbledon,5.0
3,B18,Wimbledon,2.0
4,L12,Lewisham,3.0


In [4]:
product.tail()

Unnamed: 0,Product Code,Size,Scent
65,LS31,Multi-Size,Coffee
66,LS32,Multi-Size,Lemongrass
67,LS33,Multi-Size,Strawberry
68,LS34,Multi-Size,Lavender
69,LS35,Multi-Size,Rose


In [5]:
size.head(7)

Unnamed: 0,Size,Size ID
0,25.0,1.0
1,50.0,2.0
2,100.0,3.0
3,250.0,4.0
4,500.0,5.0
5,1000.0,6.0
6,Multi-Size,7.0


In [6]:
#Change the Size ID to an actual Size value in the Sales table
size_dict = dict(zip(size['Size ID'], size["Size"]))
sales['Product Size'] = sales['Size'].map(size_dict)

In [7]:
#Correct the product code
product['Re_Prod_code'] = product['Product Code'].str.replace("S",'', regex=True)

In [8]:
#Link the Product Code to the Sales Table to provide the Scent information
df = pd.merge(sales, product, left_on='Product', right_on='Re_Prod_code', how='left').drop(['Product Code','Size_x'], axis=1)\
           .rename(columns={'Size_y':'Product Size2'})

In [9]:
df['verify'] = np.where(df["Product Size"] == df['Product Size2'], "correct", "False")
output1 = df.loc[(df['verify'] == "correct")]

In [10]:
#Create an Output that contains the products sold that have the sizes recorded correctly
output1 = output1[['Product Size','Scent','Product','Store']]
output1.head(10)

Unnamed: 0,Product Size,Scent,Product,Store
1,Multi-Size,Lemongrass,L32,Wimbledon
4,100.0,Lemongrass,L12,Lewisham
7,Multi-Size,Coffee,L31,Lewisham
8,Multi-Size,Lavender,B34,Lewisham
10,500.0,Lemon,B22,Wimbledon
12,25.0,Lemon,B2,Lewisham
20,500.0,Rose,L25,Wimbledon
37,500.0,Mint,B21,Wimbledon
51,250.0,Lemongrass,L17,Lewisham
52,250.0,Mint,B16,Wimbledon


In [11]:
#output the data (output 1)
output1.to_excel('wk19-output1.xlsx', index=False)

In [12]:
#Create another data set that contains all the Products sold with the incorrect sizes and what the sizes should have been
output2 = df.loc[(df['verify'] == "False")]
output2 = output2.drop(['Product','Product Size'], axis=1)\
                 .rename(columns={'Product Size2':"Product Size (based on the product list)",
                                  'Re_Prod_code':'Product Code'})

In [13]:
output2.head()

Unnamed: 0,Store,Product Size (based on the product list),Scent,Product Code,verify
0,Lewisham,500.0,Raspberry,B25,False
2,Wimbledon,25.0,Lemongrass,L2,False
3,Wimbledon,250.0,Lime,B18,False
5,Wimbledon,1000.0,Rose,L30,False
6,Lewisham,25.0,Lemon,B2,False


In [14]:
re_size_dict = dict(zip(size['Size'], size["Size ID"]))
output2['Number of Sales with wrong size'] = output2["Product Size (based on the product list)"].map(re_size_dict)

In [15]:
output2 = output2[['Number of Sales with wrong size','Product Code',"Product Size (based on the product list)",'Scent']]

In [16]:
output2.head(10)

Unnamed: 0,Number of Sales with wrong size,Product Code,Product Size (based on the product list),Scent
0,5.0,B25,500.0,Raspberry
2,1.0,L2,25.0,Lemongrass
3,4.0,B18,250.0,Lime
5,6.0,L30,1000.0,Rose
6,1.0,B2,25.0,Lemon
9,2.0,B10,50.0,Raspberry
11,2.0,L10,50.0,Rose
13,4.0,L19,250.0,Lavender
14,6.0,B30,1000.0,Raspberry
15,7.0,L31,Multi-Size,Coffee


In [17]:
#output the data (output2)
output2.to_excel('wk19-output2.xlsx', index=False)