### Prepping Data Challenge: Bike Model Sales (week 2)

Building on from Week 1's challenge, this week looks at the following:

 - Aggregation - changing the level of granularity of your data. The combination of the categorical fields often sets what each row represents, so aggregating data changes this. 
 
 - Calculations - If the value or variable that you need to use isn't in your data set, you will often be able to create it from the other data fields you do have. 

#### Requirement:

 1. Input the data
 2. Clean up the Model field to leave only the letters to represent the Brand of the bike 
 3. Workout the Order Value using Value per Bike and Quantity.
 4. Aggregate Value per Bike, Order Value and Quantity by Brand and Bike Type to form:  
     - Quantity Sold
     - Order Value
     - Average Value Sold per Brand, Type
 5. Calculate Days to ship by measuring the difference between when an order was placed and when it was shipped as 'Days to Ship' 
 6. Aggregate Order Value, Quantity and Days to Ship by Brand and Store to form:
     - Total Quantity Sold
     - Total Order Value
     - Average Days to Ship
 7. Round any averaged values to one decimal place to make the values easier to read
 8. Output both data sets (help)

### 1. Imput the data

In [1]:
# import libraries and csv file
import pandas as pd
import numpy as np
import re

Data = pd.read_csv('WK2-Bike Model Sales.csv')
Data.head(10)

Unnamed: 0,Bike Type,Store,Order Date,Quantity,Value per Bike,Shipping Date,Model
0,Mountain,Manchester,15/05/2020,4,1543,01/06/2020,GIA31292/003
1,Gravel,Manchester,16/06/2020,2,2076,24/06/2020,GIA21312/001
2,Road,Birmingham,04/05/2020,1,2616,13/05/2020,GIA94221/129
3,Gravel,York,05/09/2020,2,1359,19/09/2020,GIA12442/120
4,Gravel,Birmingham,28/03/2020,4,1599,04/04/2020,GIA12492/123
5,Mountain,Birmingham,08/09/2020,3,1263,28/09/2020,GIA14293/003
6,Gravel,Birmingham,01/01/2020,4,3140,10/01/2020,GIA31292/004
7,Gravel,Leeds,08/06/2020,5,1411,13/06/2020,GIA21312/002
8,Road,York,30/06/2020,4,1343,17/07/2020,GIA94221/130
9,Road,London,12/09/2020,2,3138,24/09/2020,GIA12442/121


###  2. Clean up the Model field to leave only the letters to represent the Brand of the bike

In [2]:
Data['Brand'] = Data['Model'].str.replace(r'[^A-Z]+', '')

  """Entry point for launching an IPython kernel.


In [3]:
Data['Brand'].unique()

array(['GIA', 'SPEC', 'ORRO', 'BROM', 'KONA'], dtype=object)

###  3. Workout the Order Value using Value per Bike and Quantity.

In [4]:
Data['Order Value'] = Data['Quantity']*Data['Value per Bike']

### 4. Aggregate Value per Bike, Order Value and Quantity by Brand and Bike Type to form: 

In [5]:
Agg1 = Data.groupby(['Brand', 'Bike Type'])['Quantity','Order Value'].sum().reset_index()
Agg2 = Data.groupby(['Brand', 'Bike Type'])['Value per Bike'].mean().reset_index()
Agg2.rename(columns={'Value per Bike':'Avg Value per Bike, Type'},inplace=True)
Brand_Type = pd.merge(Agg1,Agg2,on=['Brand','Bike Type'])

  """Entry point for launching an IPython kernel.


### 5. Calculate Days to ship by measuring the difference between when an order was placed and when it was shipped as 'Days to Ship'

In [6]:
#assign datetime datatypes to the columns
Data['Order Date'] = pd.to_datetime(Data['Order Date'],format='%d/%m/%Y')
Data['Shipping Date'] = pd.to_datetime(Data['Shipping Date'],format='%d/%m/%Y')
Data['Days to Ship'] = (Data['Shipping Date'] - Data['Order Date']).dt.days

### 6. Aggregate Order Value, Quantity and Days to Ship by Brand and Store to form:

In [7]:
Agg3 = Data.groupby(['Brand', 'Store'])['Quantity','Order Value'].sum().reset_index()
Agg4 = Data.groupby(['Brand', 'Store'])['Days to Ship'].mean().reset_index()
Agg4.rename(columns={'Days to Ship':'Avg Days to Ship'},inplace=True)
Brand_Store = pd.merge(Agg3,Agg4,on=['Brand','Store'])

  """Entry point for launching an IPython kernel.


### 7. Round any averaged values to one decimal place to make the values easier to read

In [8]:
Brand_Type['Avg Value per Bike, Type']=round(Brand_Type['Avg Value per Bike, Type'],1)
Brand_Store['Avg Days to Ship']=round(Brand_Store['Avg Days to Ship'],1)

### 8. Output both data sets (help)

In [9]:
Brand_Type.to_csv('WK2-Bike Brand_type Sales Output.csv', index=False)
Brand_Type.to_csv('WK2-Bike Brand_Store Sales Output.csv', index=False)