# Data preparation
## Dataset Description
Currently, rental bikes are introduced in many urban
cities for the enhancement of mobility comfort. It is
important to make the rental bike available and
accessible to the public at the right time as it lessens
 the waiting time. Eventually, providing the city with a
 stable supply of rental bikes becomes a major concern.
 The crucial part is the prediction of bike count
 required at each hour for the stable supply of rental bikes.
The dataset contains weather information (Temperature,
Humidity, Windspeed, Visibility, Dewpoint, Solar
radiation, Snowfall, Rainfall), the number of bikes
rented per hour and date information.


In [1]:
#Import package
import pandas as pd

In [2]:
# import dataset
df = pd.read_csv("SeoulBikeData.csv",
                 encoding='latin1')
df.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


In [3]:
df.tail()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
8755,30/11/2018,1003,19,4.2,34,2.6,1894,-10.3,0.0,0.0,0.0,Autumn,No Holiday,Yes
8756,30/11/2018,764,20,3.4,37,2.3,2000,-9.9,0.0,0.0,0.0,Autumn,No Holiday,Yes
8757,30/11/2018,694,21,2.6,39,0.3,1968,-9.9,0.0,0.0,0.0,Autumn,No Holiday,Yes
8758,30/11/2018,712,22,2.1,41,1.0,1859,-9.8,0.0,0.0,0.0,Autumn,No Holiday,Yes
8759,30/11/2018,584,23,1.9,43,1.3,1909,-9.3,0.0,0.0,0.0,Autumn,No Holiday,Yes


## More information based on inspection
The dataset contains data from 01/12/2017 to 30/11/2018.
Each day was split into 24 hours,so we had 24 rows for
each day.



In [4]:
#Display some basic information about the data frame
#1. Print the number of rows and columns in the DataFrame
print('No. of columns = ',len(df.columns))
print('No. of rows = ',len(df))
print(df.shape)

No. of columns =  14
No. of rows =  8760
(8760, 14)


In [5]:
##Rename the longer column names
df.rename(columns={
    'Temperature(°C)':'Temperature',
     'Dew point temperature(°C)':'Dew point temperature'
}, inplace=True)

#Print the column labels and data types.
df.info(verbose=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Date                     8760 non-null   object 
 1   Rented Bike Count        8760 non-null   int64  
 2   Hour                     8760 non-null   int64  
 3   Temperature              8760 non-null   float64
 4   Humidity(%)              8760 non-null   int64  
 5   Wind speed (m/s)         8760 non-null   float64
 6   Visibility (10m)         8760 non-null   int64  
 7   Dew point temperature    8760 non-null   float64
 8   Solar Radiation (MJ/m2)  8760 non-null   float64
 9   Rainfall(mm)             8760 non-null   float64
 10  Snowfall (cm)            8760 non-null   float64
 11  Seasons                  8760 non-null   object 
 12  Holiday                  8760 non-null   object 
 13  Functioning Day          8760 non-null   object 
dtypes: float64(6), int64(4),

In [14]:
#Convert 'Date' to datetime
df['Date'] = pd.to_datetime(df['Date'])
print(df['Date'].dtypes)
#Now we have the right data types for 'Date'.

datetime64[ns]


## Delete unnecessary columns

The 'Dew point temperature(°C)' is highly dependent on the
'Temperature(°C)', it will be removed.
The 'Holidays' is not related to our question, it will
also be removed.

In [7]:
df.drop('Dew point temperature', 1,inplace=True)
df.drop('Holiday', 1,inplace=True)
df.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 12 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Date                     8760 non-null   object 
 1   Rented Bike Count        8760 non-null   int64  
 2   Hour                     8760 non-null   int64  
 3   Temperature              8760 non-null   float64
 4   Humidity(%)              8760 non-null   int64  
 5   Wind speed (m/s)         8760 non-null   float64
 6   Visibility (10m)         8760 non-null   int64  
 7   Solar Radiation (MJ/m2)  8760 non-null   float64
 8   Rainfall(mm)             8760 non-null   float64
 9   Snowfall (cm)            8760 non-null   float64
 10  Seasons                  8760 non-null   object 
 11  Functioning Day          8760 non-null   object 
dtypes: float64(5), int64(4), object(3)
memory usage: 821.4+ KB


  df.drop('Dew point temperature', 1,inplace=True)
  df.drop('Holiday', 1,inplace=True)


In [8]:
# check for null values:
df.isna().sum()

Date                       0
Rented Bike Count          0
Hour                       0
Temperature                0
Humidity(%)                0
Wind speed (m/s)           0
Visibility (10m)           0
Solar Radiation (MJ/m2)    0
Rainfall(mm)               0
Snowfall (cm)              0
Seasons                    0
Functioning Day            0
dtype: int64

There is no null value in the dataset.

Check for missing values:


In [10]:
df.to_csv('Prepared_data.csv',index=False)
print('saved')

saved
