---
## <font color=#FF8181>Unit 10 - Exercises: </font>

In the following exercises, we will apply the preprocessing techniques we covered on a real-life dataset. The dataset is stored in **bike_rental.csv** and contains daily records on bike rentals together with weather-specific information for the day. Additionally, the exercises leverage previous knowledge on element-wise operations, aggregations, formatting and handling missing data.


In [None]:
import pandas as pd
import numpy as np

In [None]:
# Data Load
filename = r'../data/bike_rental.csv'
df = pd.read_csv(filename)
display(df)

---
### <font color=#14F278> Task 1: Create a Python function `numerical_transformations()` which does the following: </font>
- takes the dataframe `df` as an argument
- drops columns `'day'`,`'mnth'` and `'year'`
- renames all column names which are not lowercase
- drops any duplicate records
- drops any records with missing values
- standardises columns `'temp'`, `'atemp'`, '`humidity'` and `'windspeed'`
- returns the dataframe object

**NB**: Ensure all formatting is conducted in-place to optimise memory usage. This, however, means that the function has **side-effects**

In [None]:
# Solution

def numerical_transformations(df):
    df.drop(columns = ['day', 'mnth', 'year'], inplace = True)
    df.rename(columns = {'HUMIDITY':'humidity', 'Rentals':'rentals'}, inplace = True)
    df.drop_duplicates(inplace=True)
    df.dropna(axis = 0, inplace = True)
    df['temp'] = (df['temp'] - df['temp'].mean())/df['temp'].std()
    df['atemp'] = (df['atemp'] - df['atemp'].mean())/df['atemp'].std()
    df['humidity'] = (df['humidity'] - df['humidity'].mean())/df['humidity'].std()
    df['windspeed'] = (df['windspeed'] - df['windspeed'].mean())/df['windspeed'].std()
    return df

In [None]:
# Call the function and display returned object
numerical_transformations(df)

---
### <font color=#14F278> Task 2: Create a Python function `categorical_transformations()` which does the following: </font>
- takes the output dataframe from Task 1 (important to use the formatted dataframe, not the raw one)
- replaces the values in column `season` with the appropriate season names - 1 = winter, 2 = spring, 3 = summer, 4 = autumn
- performs One Hot Encoding on the 'season' column, using the 'is' prefix
- returns the dataframe object

In [None]:
# Solution

def categorical_transformations(df):
    season_dict = {'season':{1:'winter', 2:'spring', 3:'summer', 4:'autumn'}}
    df.replace(season_dict, inplace=True)
    df = pd.get_dummies(df, columns = ['season'], prefix = ['is'])
    return df

In [None]:
# Call the function and display returned object
df = categorical_transformations(df)
display(df)

---
### <font color=#14F278> Task 3: Create a Python function `above_average()` which does the following: </font>
- takes the output dataframe from Task 2 (important to use this one rather than the raw dataset)
- creates a column `'temp_above_average'`, based on the standardised `'temp'` column - this column should have binary 0/1 values
- drops all columns except for `'rentals'`, `'temp_above_average'`,`'is_autumn'`,`'is_spring'`,`'is_summer'`,`'is_winter'`
- groups by `'temp_above_average'` and calculates the total number of days per season as well as the average bike rentals for each category 
- returns the aggregated dataframe

In [None]:
#Solutions

def above_average(df):
    df['temp_above_average'] = df.apply(lambda row: 1 if row['temp']>=0 else 0, axis = 1)
    df = df[['temp_above_average', 'is_winter', 'is_spring', 'is_summer', 'is_autumn', 'rentals']]
    group_df = df.groupby('temp_above_average').agg({'is_winter':np.sum, 'is_spring':np.sum, 
                                                     'is_summer':np.sum, 'is_autumn':np.sum,
                                                     'rentals':np.mean})
    return group_df


In [None]:
# Call the function and display returned object
above_average(df)

__NB__: *Solutions to these exercises are distributed separately in the form of a stand-alone unit at a later point in time. This is to ensure that consultants have had the chance to attempt the exercises autonomously, leveraging the reading materials and concept check solutions.*