# Deep Learning with Python
Performing Deep Learning on the Bike Sharing Dataset

Defining Steps for the Project:
- Load the dataset from csv file onto a pandas dataframe
- Data Preprocessing
    - Clean the dataset
        - Remove Missing Values
    - Encode Data as per their type
        - Convert Datetime Variables to Categorical Variables
        - Convert Binary Categorical Variables to [0,1]
        - Convert Categorical Variables to Dummy Variables (One Hot Encoding)
        - Convert Numerical Variables to Standardized Variables
            - Can also be done within the Sequential model as well.
            - But, it is better to do it in the model, as it stores the respective ranges of the variables.
    - Split the dataset into Training, Validation, and Test Sets
- Data Preparation
    - Convert the dataset into tf.data.Dataset object

## Initializations

### Defining CONSTANTS

#### Importing required Libraries

In [12]:
# supporting libraries
import pandas as pd

In [13]:
# deep learning libraries and modules
import tensorflow as tf

# Data Loading, Cleaning, and Preparation


In [14]:
df = pd.read_csv(
    "data/SeoulBikeData.csv",
    encoding_errors='ignore',
    header = 0,
    names = [
        'Date', 'RentedBikeCount', 'Hour', 'Temperature', 'Humidity',
        'Windspeed', 'Visibility', 'DewPointTemperature', 'SolarRadiation',
        'Rainfall', 'Snowfall', 'Seasons', 'Holiday', 'FunctionalDay'
    ],
    dtype={
        'Date': 'str',
        'RentedBikeCount': 'int',
        'Hour': 'str',
        'Temperature': 'float',
        'Humidity': 'int',
        'Windspeed': 'float',
        'Visibility': 'int',
        'DewPointTemperature': 'float',
        'SolarRadiation': 'float',
        'Rainfall': 'float',
        'Snowfall': 'float',
        'Seasons': 'str',
        'Holiday': 'str',
        'FunctionalDay': 'str'
    }
)
df.head()

Unnamed: 0,Date,RentedBikeCount,Hour,Temperature,Humidity,Windspeed,Visibility,DewPointTemperature,SolarRadiation,Rainfall,Snowfall,Seasons,Holiday,FunctionalDay
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


In [15]:
# Checking for null values
print("Null Value columns in the dataset: ")
df.isnull().sum()

Null Value columns in the dataset: 


Date                   0
RentedBikeCount        0
Hour                   0
Temperature            0
Humidity               0
Windspeed              0
Visibility             0
DewPointTemperature    0
SolarRadiation         0
Rainfall               0
Snowfall               0
Seasons                0
Holiday                0
FunctionalDay          0
dtype: int64

No null values were observed in the datset.

In [16]:
# Converting the date column to datetime format
df['Date'] = pd.to_datetime(df['Date'], format = '%d/%m/%Y')

In [17]:
df['DayMonth'] = df['Date'].dt.day.astype('str')
df['Month'] = df['Date'].dt.month_name()
# df['Year'] = df['Date'].dt.year       # Not using year as a feature, since we don't have enough data for multiple years
df['DayWeek'] = df['Date'].dt.day_name()
df['DayYear'] = df['Date'].dt.dayofyear

In [18]:
df.head()

Unnamed: 0,Date,RentedBikeCount,Hour,Temperature,Humidity,Windspeed,Visibility,DewPointTemperature,SolarRadiation,Rainfall,Snowfall,Seasons,Holiday,FunctionalDay,DayMonth,Month,DayWeek,DayYear
0,2017-12-01,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
1,2017-12-01,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
2,2017-12-01,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
3,2017-12-01,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
4,2017-12-01,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   Date                 8760 non-null   datetime64[ns]
 1   RentedBikeCount      8760 non-null   int32         
 2   Hour                 8760 non-null   object        
 3   Temperature          8760 non-null   float64       
 4   Humidity             8760 non-null   int32         
 5   Windspeed            8760 non-null   float64       
 6   Visibility           8760 non-null   int32         
 7   DewPointTemperature  8760 non-null   float64       
 8   SolarRadiation       8760 non-null   float64       
 9   Rainfall             8760 non-null   float64       
 10  Snowfall             8760 non-null   float64       
 11  Seasons              8760 non-null   object        
 12  Holiday              8760 non-null   object        
 13  FunctionalDay        8760 non-nul

In [20]:
df.describe()

Unnamed: 0,RentedBikeCount,Temperature,Humidity,Windspeed,Visibility,DewPointTemperature,SolarRadiation,Rainfall,Snowfall,DayYear
count,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0,8760.0
mean,704.602055,12.882922,58.226256,1.724909,1436.825799,4.073813,0.569111,0.148687,0.075068,183.0
std,644.997468,11.944825,20.362413,1.0363,608.298712,13.060369,0.868746,1.128193,0.436746,105.372043
min,0.0,-17.8,0.0,0.0,27.0,-30.6,0.0,0.0,0.0,1.0
25%,191.0,3.5,42.0,0.9,940.0,-4.7,0.0,0.0,0.0,92.0
50%,504.5,13.7,57.0,1.5,1698.0,5.1,0.01,0.0,0.0,183.0
75%,1065.25,22.5,74.0,2.3,2000.0,14.8,0.93,0.0,0.0,274.0
max,3556.0,39.4,98.0,7.4,2000.0,27.2,3.52,35.0,8.8,365.0


Change this list to select features to consider for training the model. Since, we are using deep learning, we will use all the features.

In [21]:
# Creating a list of features to be used in the model
target = ['RentedBikeCount']
binary_features = ['Holiday', 'FunctionalDay']
numeric_features = ['Temperature', 'Humidity', 'Windspeed', 'Visibility', 'DewPointTemperature', 'SolarRadiation', 'Rainfall', 'Snowfall', 'DayMonth', 'DayYear']
categorical_features = ['Hour', 'Seasons', 'Month', 'DayWeek']

# Here the test split is 10% of the data, and the validation split is 10% of the remaining training data
TEST_SPLIT = 0.1
VAL_SPLIT = 0.15

In [22]:
df

Unnamed: 0,Date,RentedBikeCount,Hour,Temperature,Humidity,Windspeed,Visibility,DewPointTemperature,SolarRadiation,Rainfall,Snowfall,Seasons,Holiday,FunctionalDay,DayMonth,Month,DayWeek,DayYear
0,2017-12-01,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
1,2017-12-01,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
2,2017-12-01,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
3,2017-12-01,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
4,2017-12-01,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes,1,December,Friday,335
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8755,2018-11-30,1003,19,4.2,34,2.6,1894,-10.3,0.0,0.0,0.0,Autumn,No Holiday,Yes,30,November,Friday,334
8756,2018-11-30,764,20,3.4,37,2.3,2000,-9.9,0.0,0.0,0.0,Autumn,No Holiday,Yes,30,November,Friday,334
8757,2018-11-30,694,21,2.6,39,0.3,1968,-9.9,0.0,0.0,0.0,Autumn,No Holiday,Yes,30,November,Friday,334
8758,2018-11-30,712,22,2.1,41,1.0,1859,-9.8,0.0,0.0,0.0,Autumn,No Holiday,Yes,30,November,Friday,334
