### Prepping Data Challenge:  On yer bike! (week 44)

### Requirements
- Input the data
- Convert the Value field to just be Kilometres ridden 
  - Carl cycles at an average of 30 kilometres per hour whenever he is measuring his sessions in minutes
- Create a field called measure to convert KM measurements into 'Outdoors' and any measurement in 'mins' as 'Turbo Trainer'.
- Create a separate column for Outdoors and Turbo Trainer (indoor static bike values)
- Ensure there is a row for each date between 1st Jan 2021 and 1st Nov 2021(inclusive)
- Count the number of activities per day and work out the total distance cycled Outdoors or on the Turbo Trainer
- Change any null values to zero
- Work out how many days I did no activities
- Output a file to help me explore the analysis further

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
with pd.ExcelFile(r"\Dataprep\2021\Carl's 2021 cycling.xlsx") as xl:
    df = pd.read_excel(xl)

df.head()

Unnamed: 0,Date,Value,Measure,Type,Detail
0,2021-01-01,30.0,min,Apple Fitness,Gregg - 422 - everything rock
1,2021-01-01,20.0,min,Apple Fitness,Kym - 306 - everything rock
2,2021-01-02,30.0,min,Apple Fitness,Kym - 425 - upbeat anthems
3,2021-01-03,45.0,min,Apple Fitness,Tyrell - 668 - hiphop
4,2021-01-04,20.0,min,Apple Fitness,Tyrell - 263 - latin grooves


In [3]:
#- Convert the Value field to just be Kilometres ridden 
#  - Carl cycles at an average of 30 kilometres per hour whenever he is measuring his sessions in minutes
df['km'] = np.where(df['Measure'].str.lower() == 'min', df['Value'] * 30 / 60, df['Value'])

In [4]:
print(df['Measure'].unique())

['min' 'km']


In [5]:
#Create a field called measure to convert KM measurements into 'Outdoors' and any measurement in 'mins' as 'Turbo Trainer'.
df['measure2'] = np.where(df['Measure'].str.contains('km', case=False), 'Outdoors',  
    np.where(df['Measure'].str.contains('min', case=False), 'Turbo Trainer', 'Unknown'))

In [6]:
print(df['measure2'].unique())

['Turbo Trainer' 'Outdoors']


In [7]:
#Create a separate column for Outdoors and Turbo Trainer (indoor static bike values)
df2 = df.pivot_table(values='km', index='Date', columns='measure2', aggfunc='sum')

In [8]:
#Count the number of activities per day and work out the total distance cycled Outdoors or on the Turbo Trainer
df2['Activities per day'] = df.groupby('Date')['km'].count().astype('Int64')

In [9]:
#Ensure there is a row for each date between 1st Jan 2021 and 1st Nov 2021(inclusive)
#Change any null values to zero
d_r = pd.date_range(start='2021-01-01', end='2021-11-01')
df2 = df2.reindex(d_r).fillna(0).rename_axis('Date').reset_index()

In [10]:
#Output a file to help me explore the analysis further
output = df2[['Date','Activities per day','Turbo Trainer', 'Outdoors']]
output.columns.name = None

In [11]:
output

Unnamed: 0,Date,Activities per day,Turbo Trainer,Outdoors
0,2021-01-01,2,25.0,0.0
1,2021-01-02,1,15.0,0.0
2,2021-01-03,1,22.5,0.0
3,2021-01-04,1,10.0,0.0
4,2021-01-05,1,15.0,0.0
...,...,...,...,...
300,2021-10-28,1,0.0,18.7
301,2021-10-29,1,0.0,1.6
302,2021-10-30,0,0.0,0.0
303,2021-10-31,0,0.0,0.0


In [12]:
#output the data
output.to_csv('wk44-output.csv', index=False)