### Prepping Data Challenge: Bike Sales (week 1)

This week's focus is on cleaning the dataset, and make them ready to answer some questions from our stakeholders. 

#### Requirement:

 1. Connect and load the csv file.
 2. Split the 'Store-Bike' field into 'Store' and 'Bike' 
 3. Clean up the 'Bike' field to leave just three values in the 'Bike' field (Mountain, Gravel, Road) 
 4. Create two different cuts of the date field: 'quarter' and 'day of month' 
 5. Remove the first 10 orders as they are test values
 6. Output the data as a csv 

### 1. Connect and Load the CSV file

In [1]:
# import libraries and csv file
import numpy as np
import pandas as pd

Data = pd.read_csv('WK1-Bike Sales.csv')
Data.head(10)

Unnamed: 0,Order ID,Customer Age,Bike Value,Existing Customer?,Date,Store - Bike
0,1,22,481,No,25/04/2021,York - Road
1,2,28,1825,No,23/01/2021,York - Road
2,3,51,1903,No,03/07/2021,York - Rood
3,4,59,1059,No,24/01/2021,York - Road
4,5,44,1764,Yes,12/08/2021,York - Mountain
5,6,16,967,Yes,15/08/2021,London - Mountain
6,7,35,1575,Yes,13/03/2021,London - Mountain
7,8,50,1074,No,22/09/2021,London - Mountain
8,9,37,1977,No,09/02/2021,London - Gravel
9,10,55,1352,No,24/11/2021,Leeds - Gravel


### 2. Split the 'Store-Bike' field into 'Store' and 'Bike'

In [2]:
Data[['Store','Bike']] = Data['Store - Bike'].str.split('-', expand = True)
Data.drop('Store - Bike', axis = 'columns', inplace=True)
Data.head(10)

Unnamed: 0,Order ID,Customer Age,Bike Value,Existing Customer?,Date,Store,Bike
0,1,22,481,No,25/04/2021,York,Road
1,2,28,1825,No,23/01/2021,York,Road
2,3,51,1903,No,03/07/2021,York,Rood
3,4,59,1059,No,24/01/2021,York,Road
4,5,44,1764,Yes,12/08/2021,York,Mountain
5,6,16,967,Yes,15/08/2021,London,Mountain
6,7,35,1575,Yes,13/03/2021,London,Mountain
7,8,50,1074,No,22/09/2021,London,Mountain
8,9,37,1977,No,09/02/2021,London,Gravel
9,10,55,1352,No,24/11/2021,Leeds,Gravel


###  3. Clean up the 'Bike' field to leave just three values in the 'Bike' field (Mountain, Gravel, Road)

In [3]:
#.strip() removes all the spaces before and after the string
Data['Bike'] = Data['Bike'].str.strip()

In [4]:
#we first check the values in the Bike Column
Data['Bike'].unique()

array(['Road', 'Rood', 'Mountain', 'Gravel', 'Mountaen', 'Graval', 'Rowd',
       'Gravle'], dtype=object)

Sometimes you might face situations in which you need to perform the same operation on all the items of an input iterable to build a new iterable. The quickest and most common approach to this problem is to use a Python for loop. However, you can also tackle this problem without an explicit loop by using *map()*.

we first need to define how we want to transform, or want we want to map. In this particular code, I used a python Dictionary

After the *.map()* is used, it leaves the others with _Nan_, so we need to fill those ones too. So the _.fillna_ is called.

https://realpython.com/python-map-function/

In [5]:
correct_values={'Rood':'Road','Rowd':'Road',
              'Mountaen':'Mountain',
              'Graval':'Gravel','Gravle':'Gravel'}

Data["Bike"]=Data["Bike"].map(correct_values).fillna(Data["Bike"])

### 4. Create two different cuts of the date field: 'quarter' and 'day of month' 

In [6]:
Data['Date'] = pd.to_datetime(Data['Date'], format='%d/%m/%Y')
Data['Quarter'] = Data['Date'].dt.quarter
Data['Day of Month'] = Data['Date'].dt.day

### 5. Remove the first 10 orders as they are test values

In [7]:
Data = Data.iloc[10:]

### 6. Output the data as a csv 

In [8]:
Data.columns

Index(['Order ID', 'Customer Age', 'Bike Value', 'Existing Customer?', 'Date',
       'Store', 'Bike', 'Quarter', 'Day of Month'],
      dtype='object')

In [9]:
Data.head(20)

Unnamed: 0,Order ID,Customer Age,Bike Value,Existing Customer?,Date,Store,Bike,Quarter,Day of Month
10,11,57,902,No,2021-10-04,Birmingham,Road,4,4
11,12,31,946,Yes,2021-01-17,Leeds,Road,1,17
12,13,17,1296,Yes,2021-10-25,Birmingham,Road,4,25
13,14,59,1166,Yes,2021-07-18,Manchester,Road,3,18
14,15,24,1781,No,2021-10-10,Manchester,Mountain,4,10
15,16,59,1074,No,2021-10-06,York,Mountain,4,6
16,17,57,1188,No,2021-09-14,York,Mountain,3,14
17,18,56,544,No,2021-11-23,York,Mountain,4,23
18,19,34,579,Yes,2021-11-24,York,Gravel,4,24
19,20,17,1021,Yes,2021-06-24,York,Gravel,2,24


In [10]:
columns = ['Quarter','Store','Bike','Order ID','Customer Age', 'Bike Value', 'Existing Customer?',
       'Bike', 'Day of Month']
Data = Data[columns]

In [10]:
Data.to_csv('WK1-Bike Sales Output.csv', index=False)