#### Requirements
Input the data
<br>Clean up the Model field to leave only the letters to represent the Brand of the bike
<br>Workout the Order Value using Value per Bike and Quantity.
<br>Aggregate Value per Bike, Order Value and Quantity by Brand and Bike Type to form:
<br>Quantity Sold
<br>Order Value
<br>Average Value Sold per Brand, Type
<br>Calculate Days to ship by measuring the difference between when an order was placed and when it was shipped as 'Days to Ship'
<br>Aggregate Order Value, Quantity and Days to Ship by Brand and Store to form:
<br>Total Quantity Sold
<br>Total Order Value
<br>Average Days to Ship
<br>Round any averaged values to one decimal place to make the values easier to read
<br>Output both data sets

<br> Solution Reference: https://github.com/kelly-gilbert/preppin-data-challenge/blob/master/2021/preppin-data-2021-02/preppin-data-2021-02.py

In [16]:
import pandas as pd

In [28]:
df = pd.read_csv('./input/PD 2021 Wk 2 Input - Bike Model Sales.csv',
                parse_dates=['Order Date','Shipping Date'], dayfirst=True)

In [38]:
df.head()

Unnamed: 0,Bike Type,Store,Order Date,Quantity,Value per Bike,Shipping Date,Model,Brand,Order Value
0,Mountain,Manchester,2020-05-15,4,1543,2020-06-01,GIA31292/003,GIA,6172
1,Gravel,Manchester,2020-06-16,2,2076,2020-06-24,GIA21312/001,GIA,4152
2,Road,Birmingham,2020-05-04,1,2616,2020-05-13,GIA94221/129,GIA,2616
3,Gravel,York,2020-09-05,2,1359,2020-09-19,GIA12442/120,GIA,2718
4,Gravel,Birmingham,2020-03-28,4,1599,2020-04-04,GIA12492/123,GIA,6396


In [36]:
# clean up the Model field to leave only the letters to represent the 
# Brand of the bike
# [^A-Z]+ - not any character between A-Z, + -> 1 or more character

df['Brand'] = df['Model'].str.replace('[^A-Z]+', '')

In [37]:
# work out the Order Value using Value per Bike and Quantity

df['Order Value'] = df['Value per Bike'] * df['Quantity']

In [40]:
# aggregate Value per Bike, Order Value and Quantity by Brand and Bike Type 
# note, the avg value sold is the straight average and not weighted

#https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.aggregate.html?highlight=aggregation

df_brand = df.groupby(['Brand', 'Bike Type']).agg({ 'Quantity' : ['sum'],
                                                    'Order Value' : ['sum'],
                                                    'Value per Bike' : ['mean'] })
df_brand.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Order Value,Value per Bike,Quantity
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean,sum
Brand,Bike Type,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
BROM,Gravel,433885,2335.461538,186
BROM,Mountain,674770,2359.348315,277
BROM,Road,656539,2500.743243,257
GIA,Gravel,733087,2303.18018,323
GIA,Mountain,1021329,2378.900709,425


In [43]:
df_brand.columns = ['Quantity Sold', 'Order Value', 'Avg Bike Value Sold per Brand, Type']
df_brand.reset_index(inplace = True)

In [45]:
df_brand.head()

Unnamed: 0,Brand,Bike Type,Quantity Sold,Order Value,"Avg Bike Value Sold per Brand, Type"
0,BROM,Gravel,433885,2335.461538,186
1,BROM,Mountain,674770,2359.348315,277
2,BROM,Road,656539,2500.743243,257
3,GIA,Gravel,733087,2303.18018,323
4,GIA,Mountain,1021329,2378.900709,425


In [47]:
df_brand['Avg Bike Value Sold per Brand, Type'] = df_brand['Avg Bike Value Sold per Brand, Type'].round(1)
df_brand.head()

Unnamed: 0,Brand,Bike Type,Quantity Sold,Order Value,"Avg Bike Value Sold per Brand, Type"
0,BROM,Gravel,433885,2335.461538,186
1,BROM,Mountain,674770,2359.348315,277
2,BROM,Road,656539,2500.743243,257
3,GIA,Gravel,733087,2303.18018,323
4,GIA,Mountain,1021329,2378.900709,425


In [48]:
# calculate Days to ship by measuring the difference between when an order was 
#  placed and when it was shipped as 'Days to Ship'

df['Days to Ship'] = (df['Shipping Date'] - df['Order Date']).dt.days

In [50]:
# aggregate Order Value, Quantity and Days to Ship by Brand and Store

df_store = df.groupby(['Brand', 'Store']).agg({ 'Quantity' : ['sum'],
                                                'Order Value' : ['sum'],
                                                'Days to Ship' : ['mean']})
df_store.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Order Value,Quantity,Days to Ship
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,sum,mean
Brand,Store,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
BROM,Birmingham,349759,155,11.755556
BROM,Leeds,389116,150,9.833333
BROM,London,324635,133,10.97619
BROM,Manchester,339832,137,10.888889
BROM,York,361852,145,9.833333


In [51]:
df_store.columns = ['Total Quantity Sold', 'Total Order Value', 'Avg Days to Ship']
df_store.reset_index(inplace = True)

In [53]:
df_store.head()

Unnamed: 0,Brand,Store,Total Quantity Sold,Total Order Value,Avg Days to Ship
0,BROM,Birmingham,349759,155,11.755556
1,BROM,Leeds,389116,150,9.833333
2,BROM,London,324635,133,10.97619
3,BROM,Manchester,339832,137,10.888889
4,BROM,York,361852,145,9.833333


In [55]:
df_store['Avg Days to Ship'] = df_store['Avg Days to Ship'].round(1)
df_store.head()

Unnamed: 0,Brand,Store,Total Quantity Sold,Total Order Value,Avg Days to Ship
0,BROM,Birmingham,349759,155,11.8
1,BROM,Leeds,389116,150,9.8
2,BROM,London,324635,133,11.0
3,BROM,Manchester,339832,137,10.9
4,BROM,York,361852,145,9.8


In [56]:
# output both data sets

df_brand.to_csv('./output/output-2021-02-brand-type.csv', index = False)
df_store.to_csv('./output/output-2021-02-brand-store.csv', index = False)