### Prepping Data Challenge: Cocktail Dataset (week 11)


#### Requirement:
 
 1. Input the dataset 
 2. Split out the recipes into the different ingredients and their measurements
 3. Calculate the price in pounds, for the required measurement of each ingredient
 4. Join the ingredient costs to their relative cocktails
 5. Find the total cost of each cocktail 
 6. Include a calculated field for the profit margin i.e. the difference between each cocktail's price and it's overall cost 
 7. Round all numeric fields to 2 decimal places 
 8. Output the data

### 1. Input the data

In [1]:
#import libraries
import pandas as pd

In [2]:
with pd.ExcelFile('WK11-Cocktails Dataset.xlsx') as xlsx:
    cocktail = pd.read_excel(xlsx, 'Cocktails', index_col='Cocktail')
    sourcing = pd.read_excel(xlsx, 'Sourcing')
    conversion = pd.read_excel(xlsx, 'Conversion Rates', index_col='Currency', squeeze=True).to_dict()

In [None]:
#cocktail.head()

In [None]:
#sourcing.head()

In [None]:
#conversion

###  2. Split out the recipes into the different ingredients and their measurements

In [3]:
regex = r'(?P<Ingredient>.+)\:(?P<ml>\d+)ml'
recipe = cocktail['Recipe (ml)'].str.split('; ').explode().str.extract(regex, expand=True)
recipe['ml'] = [float(m) for m in recipe['ml']]

In [None]:
#recipe.head()

### 3. Calculate the price in pounds, for the required measurement of each ingredient

In [4]:
#convert prices to pounds
sourcing['price'] = [float(p) / float(conversion[c]) for c,p in zip(sourcing['Currency'], sourcing['Price'])]

### 4. Join the ingredient costs to their relative cocktails

In [5]:
recipe = recipe.reset_index().merge(sourcing[['Ingredient','ml per Bottle', 'price']], on='Ingredient', how='left')

### 5 & 7. Find the total cost of each cocktail and round all numeric fields to 2 decimal places

In [6]:
recipe['Cost'] = recipe.apply(lambda x: x['ml']* x['price'] / float(x['ml per Bottle']), axis=1)

In [7]:
T_cost = recipe.groupby(['Cocktail'],as_index=False)['Cost'].agg('sum').round(2)

### 6. Include a calculated field for the profit margin i.e. the difference between each cocktail's price and it's overall cost

In [8]:
output = cocktail.merge(T_cost, on='Cocktail', how='left').rename(columns = {'Price (Â£)':'Price'})

In [9]:
output['Price'] = output['Price'].round(2)
output['Margin'] = output['Price'] - output['Cost']

### 8. Output the data 

In [10]:
output = output[['Cocktail', 'Price','Cost','Margin']]

In [11]:
output.to_csv('WK11-Cocktail Output.csv', index=False)

In [12]:
output.head()

Unnamed: 0,Cocktail,Price,Cost,Margin
0,Raspberry Lemon Drop,8.5,2.85,5.65
1,Bay Breeze,7.2,1.78,5.42
2,Alabama Slammer,8.25,1.52,6.73
3,Watermelon Man,7.0,3.58,3.42
4,Orange Blossom,8.7,0.88,7.82
