### Prepping Data Challenge: C&BSCo Parameters, Parameters, Parameters (week 34)

### Requirements
- Input data
- Merge Km's and min's as Minutes
- Change 'Value' to Mins
- Split up the unnamed column into:
       - Coach
       - Calories
       - Music Type
- Change Music Type values to be Title Case (first letter of each word is capitalised)
- Create three parameters:
       -  Music Type
       - Coach
       - Top N 
- Create a way to return the Top N value selected and order the file with the highest calories burnt at the top
- Create filters so only the parameter selection remains in the output data set
       - For Top N parameter it's all the values up to that number
- Output the data but use the parameter values in the file name so the CEO knows what it contains


For this solution, we took inspiration from Kelly Gilbert's python solution.

In [13]:
import pandas as pd
import numpy as np

In [15]:
def get_user_input(value_name, value_list):
    
    value_list = sorted(value_list)
    options_str = '\n'.join([f'  {i+1} - {c}' for i,c in enumerate(value_list)])
    
    while True:
        user_input = input(f'\n{value_name.title()} list:\n{options_str}\n\n'
                           + f'Select an option (1 - {len(value_list)}) or press Enter to cancel:')
        
        if user_input.isnumeric() and int(user_input) in range(1, len(value_list)+1):
            return value_list[int(user_input)-1]
        elif user_input == '':
            return user_input
        else:
            print(f'\n*** ERROR: {user_input} is not a valid choice. '
                  + f'Please enter a number between 1 and {len(value_list)}.')


def input_and_prep_data(input_path):
    
    # input the data
    df = ( pd.read_csv(input_path, parse_dates=['Date'], dayfirst=True)
             .rename(columns={'Value' : 'Mins'})
             .rename(columns=lambda c: 'Unnamed' if 'Unnamed' in c else c))

    # split the unnamed column and change music type to title case
    df[['Coach', 'Calories', 'Music Type']] = df['Unnamed'].str.extract(r'(.*)\s+-\s+(\d+)\s+-\s+(.*)')
    df['Music Type'] = df['Music Type'].str.strip().str.title()
    df['Calories'] = df['Calories'].astype(int)
    
    return df[['Coach', 'Calories', 'Music Type', 'Date', 'Mins']]


def output_file(df, coach, music_type, n):
    
    # top in by calories burned
    df_out = df.loc[(df['Coach'] == coach) & (df['Music Type'] == music_type), df.columns]
    
    # rank by calories burned
    df_out['Rank'] = df_out.groupby(['Coach', 'Music Type'])['Calories'].rank(ascending=False,  method='dense')
    
    # output the file
    filepath = f'.\\Dataprep\\2022\\wk34-output - Top {n} for rides with {coach} powered by {music_type}.csv'
    ( df_out[df_out['Rank'] <= n].sort_values('Rank', ascending=True).to_csv(filepath, index=False, date_format='%d/%m/%Y'))
    
    print(f"\n*** File created: {filepath}")

In [20]:
#Input the data

df = input_and_prep_data(r"\Dataprep\2022\Preppin' Summer 2022 - CEO Cycling.csv")

while True:    
    coach = get_user_input('Coach', df['Coach'].unique())
    if coach == '':
        break
        
    music_type = get_user_input('Music type', df['Music Type'].unique())
    if music_type == '':
        break
    
    n = input('\nReturn the top n sessions by calories burned (or press Enter to cancel):\n')
    if n == '':
        break
    elif not n.isnumeric() or int(n) <= 0:
        print(f'\n*** ERROR: {n} is not a valid number. Please enter a number greater than zero.')
        break
    
    output_file(df, coach, music_type, int(n))

In [22]:
df.head(10)

Unnamed: 0,Coach,Calories,Music Type,Date,Mins
0,Bakari,125,Everything Rock,16/12/2020,10
1,Kym,134,Everything Rock,16/12/2020,10
2,Gregg,375,Everything Rock,17/12/2020,30
3,Kym,232,Everything Rock,18/12/2020,20
4,Bakari,565,Latest Hits,19/12/2020,45
5,Kym,271,Hiphop,21/12/2020,20
6,Emily,279,Latest Hits,23/12/2020,20
7,Sherica,588,Latest Hits,24/12/2020,45
8,Emily,401,Everything Rock,28/12/2020,30
9,Kym,445,Upbeat Anthems,29/12/2020,30
