## Data preparation
### Dataset Description
Currently, rental bikes are introduced in many urban
cities for the enhancement of mobility comfort. It is
important to make the rental bike available and
accessible to the public at the right time as it lessens
 the waiting time. Eventually, providing the city with a
 stable supply of rental bikes becomes a major concern.
 The crucial part is the prediction of bike count
 required at each hour for the stable supply of rental
 bikes. The dataset contains weather information
 (Temperature,Humidity, Windspeed, Visibility, Dewpoint, Solar
radiation, Snowfall, Rainfall), the number of bikes
rented per hour and date information.

### Import packages and datasets

In [46]:
#Import package
import pandas as pd

In [47]:
# import dataset
df = pd.read_csv("Raw_data.csv",
                 encoding='latin1',parse_dates=['Date'])
df.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,2017-01-12,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,2017-01-12,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,2017-01-12,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,2017-01-12,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,2017-01-12,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


In [48]:
df.tail()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
8755,2018-11-30,1003,19,4.2,34,2.6,1894,-10.3,0.0,0.0,0.0,Autumn,No Holiday,Yes
8756,2018-11-30,764,20,3.4,37,2.3,2000,-9.9,0.0,0.0,0.0,Autumn,No Holiday,Yes
8757,2018-11-30,694,21,2.6,39,0.3,1968,-9.9,0.0,0.0,0.0,Autumn,No Holiday,Yes
8758,2018-11-30,712,22,2.1,41,1.0,1859,-9.8,0.0,0.0,0.0,Autumn,No Holiday,Yes
8759,2018-11-30,584,23,1.9,43,1.3,1909,-9.3,0.0,0.0,0.0,Autumn,No Holiday,Yes


### More information based on inspection
The dataset contains dates from 01/12/2017 to 30/11/2018.
Each day was split into 24 hours,so we had 24 rows for
each day.



### Display some basic information about the data frame

In [49]:
# Print the number of rows and columns in the DataFrame
print('No. of columns = ',len(df.columns))
print('No. of rows = ',len(df))
print(df.shape)

No. of columns =  14
No. of rows =  8760
(8760, 14)


In [50]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   Date                       8760 non-null   datetime64[ns]
 1   Rented Bike Count          8760 non-null   int64         
 2   Hour                       8760 non-null   int64         
 3   Temperature(°C)            8760 non-null   float64       
 4   Humidity(%)                8760 non-null   int64         
 5   Wind speed (m/s)           8760 non-null   float64       
 6   Visibility (10m)           8760 non-null   int64         
 7   Dew point temperature(°C)  8760 non-null   float64       
 8   Solar Radiation (MJ/m2)    8760 non-null   float64       
 9   Rainfall(mm)               8760 non-null   float64       
 10  Snowfall (cm)              8760 non-null   float64       
 11  Seasons                    8760 non-null   object        
 12  Holida

### Check for null values

In [51]:
df.isna().sum()

Date                         0
Rented Bike Count            0
Hour                         0
Temperature(°C)              0
Humidity(%)                  0
Wind speed (m/s)             0
Visibility (10m)             0
Dew point temperature(°C)    0
Solar Radiation (MJ/m2)      0
Rainfall(mm)                 0
Snowfall (cm)                0
Seasons                      0
Holiday                      0
Functioning Day              0
dtype: int64

There is no null value in the dataset.

### Check for missing dates

In [52]:
# how many unique dates should have:
pd.date_range('2017-12-01', '2018-11-30')

DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04',
               '2017-12-05', '2017-12-06', '2017-12-07', '2017-12-08',
               '2017-12-09', '2017-12-10',
               ...
               '2018-11-21', '2018-11-22', '2018-11-23', '2018-11-24',
               '2018-11-25', '2018-11-26', '2018-11-27', '2018-11-28',
               '2018-11-29', '2018-11-30'],
              dtype='datetime64[ns]', length=365, freq='D')

In [53]:
# Actual unique dates in dataframe:
df['Date'].unique()

array(['2017-01-12T00:00:00.000000000', '2017-02-12T00:00:00.000000000',
       '2017-03-12T00:00:00.000000000', '2017-04-12T00:00:00.000000000',
       '2017-05-12T00:00:00.000000000', '2017-06-12T00:00:00.000000000',
       '2017-07-12T00:00:00.000000000', '2017-08-12T00:00:00.000000000',
       '2017-09-12T00:00:00.000000000', '2017-10-12T00:00:00.000000000',
       '2017-11-12T00:00:00.000000000', '2017-12-12T00:00:00.000000000',
       '2017-12-13T00:00:00.000000000', '2017-12-14T00:00:00.000000000',
       '2017-12-15T00:00:00.000000000', '2017-12-16T00:00:00.000000000',
       '2017-12-17T00:00:00.000000000', '2017-12-18T00:00:00.000000000',
       '2017-12-19T00:00:00.000000000', '2017-12-20T00:00:00.000000000',
       '2017-12-21T00:00:00.000000000', '2017-12-22T00:00:00.000000000',
       '2017-12-23T00:00:00.000000000', '2017-12-24T00:00:00.000000000',
       '2017-12-25T00:00:00.000000000', '2017-12-26T00:00:00.000000000',
       '2017-12-27T00:00:00.000000000', '2017-12-28

The actual number of unique dates meets our expectation,
There is no missing values in dates
between the duration
'2017-12-01' to '2018-11-30'.


### Check for missing hours


In [54]:
pd.date_range('2017-12-01', '2018-11-30')

DatetimeIndex(['2017-12-01', '2017-12-02', '2017-12-03', '2017-12-04',
               '2017-12-05', '2017-12-06', '2017-12-07', '2017-12-08',
               '2017-12-09', '2017-12-10',
               ...
               '2018-11-21', '2018-11-22', '2018-11-23', '2018-11-24',
               '2018-11-25', '2018-11-26', '2018-11-27', '2018-11-28',
               '2018-11-29', '2018-11-30'],
              dtype='datetime64[ns]', length=365, freq='D')

There are 365 days, ideally for each day we should get 24
 rows of data, the 'Hours' should have 365*24 = 8760 rows.
 No. of rows =  8760, so there is no missing hours as well.

### Rename the longer column names

In [55]:
df.rename(columns={
    'Temperature(°C)':'Temperature',
     'Dew point temperature(°C)':'Dew point temperature'
}, inplace=True)
#Print the column labels and data types.
df.info(verbose=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   Date                     8760 non-null   datetime64[ns]
 1   Rented Bike Count        8760 non-null   int64         
 2   Hour                     8760 non-null   int64         
 3   Temperature              8760 non-null   float64       
 4   Humidity(%)              8760 non-null   int64         
 5   Wind speed (m/s)         8760 non-null   float64       
 6   Visibility (10m)         8760 non-null   int64         
 7   Dew point temperature    8760 non-null   float64       
 8   Solar Radiation (MJ/m2)  8760 non-null   float64       
 9   Rainfall(mm)             8760 non-null   float64       
 10  Snowfall (cm)            8760 non-null   float64       
 11  Seasons                  8760 non-null   object        
 12  Holiday                  8760 non-

### Delete unnecessary columns

The 'Dew point temperature(°C)' is highly dependent on the
'Temperature(°C)', it will be removed.
The 'Holidays' is not related to our question, it will
also be removed.

In [56]:
df.drop('Dew point temperature', 1,inplace=True)
df.drop('Holiday', 1,inplace=True)
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype         
---  ------                   --------------  -----         
 0   Date                     8760 non-null   datetime64[ns]
 1   Rented Bike Count        8760 non-null   int64         
 2   Hour                     8760 non-null   int64         
 3   Temperature              8760 non-null   float64       
 4   Humidity(%)              8760 non-null   int64         
 5   Wind speed (m/s)         8760 non-null   float64       
 6   Visibility (10m)         8760 non-null   int64         
 7   Solar Radiation (MJ/m2)  8760 non-null   float64       
 8   Rainfall(mm)             8760 non-null   float64       
 9   Snowfall (cm)            8760 non-null   float64       
 10  Seasons                  8760 non-null   object        
 11  Functioning Day          8760 non-null   object        
dtypes: datetime64[ns](1), float64(5), 

  df.drop('Dew point temperature', 1,inplace=True)
  df.drop('Holiday', 1,inplace=True)


## Save the prepared dataset

In [57]:
df.to_csv('Prepared_data.csv',index=False)
