<a href="https://colab.research.google.com/github/Rushabhtikale92/Bike-Sharing-Demand-Prediction/blob/main/Bike_Sharing_Demand_Prediction_Rushabh_Tikale_Capstone_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <b><u> Project Title : Seoul Bike Sharing Demand Prediction </u></b>

## <b> Problem Description </b>

### Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.


## <b> Data Description </b>

### <b> The dataset contains weather information (Temperature, Humidity, Windspeed, Visibility, Dewpoint, Solar radiation, Snowfall, Rainfall), the number of bikes rented per hour and date information.</b>


### <b>Attribute Information: </b>

* ### Date : year-month-day
* ### Rented Bike count - Count of bikes rented at each hour
* ### Hour - Hour of he day
* ### Temperature-Temperature in Celsius
* ### Humidity - %
* ### Windspeed - m/s
* ### Visibility - 10m
* ### Dew point temperature - Celsius
* ### Solar radiation - MJ/m2
* ### Rainfall - mm
* ### Snowfall - cm
* ### Seasons - Winter, Spring, Summer, Autumn
* ### Holiday - Holiday/No holiday
* ### Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)

Inviting the library for Kaggle party.

In [1]:
# Importing the library.
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns 
import datetime as dt
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import OneHotEncoder  
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import ElasticNet
import warnings
warnings.filterwarnings('ignore')

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Importing the dataset
data = pd.read_csv("/content/drive/MyDrive/CSV/SeoulBikeData.csv", encoding= 'unicode_escape') 

In [4]:
# Presenting the sample 
data.head()

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes


# Rename the feature as new feature

In [5]:
column_dict = {'Date':'Date', 'Rented Bike Count':'Rented Bikes', 'Hour':'Hour', 'Temperature(°C)':'Temperature', 'Humidity(%)':'Humidity', 'Wind speed (m/s)':'wind speed', 'Visibility (10m)':'visibility', 
               'Dew point temperature(°C)':'Dew point temperature','Solar Radiation (MJ/m2)':'Solar Radiation', 'Rainfall(mm)':'Rainfall', 'Snowfall (cm)': 'snowfall'}
data.rename(columns= column_dict, inplace=True)

# Checking For NaN Values

In [6]:
data.columns[data.isna().any()]

Index([], dtype='object')

Converting the data into easy and convenient form so we can easily deal with the robust data and more comfortable to predict for future model



In [7]:
data['Date'] = data['Date'].apply(lambda x: dt.datetime.strptime(x,"%d/%m/%Y"))

In [8]:
data['year'] = data['Date'].dt.year
data['month'] = data['Date'].dt.month
data['month name'] = data['Date'].dt.month_name()
data['day'] = data['Date'].dt.day
data['day name'] = data['Date'].dt.day_name()

In [9]:
data['week'] = data['day name'].apply(lambda x: "weekend" if x=='Saturday' or x == 'Sunday' else 'weekday')

As we seperate the date feature into various form like month, year, month name, day name so we dont required to move with repeated data as our moto is analyse with data and prepared for more convertable.

In [10]:
# Droping the date feature
data.drop(columns= ['Date'], inplace=True)

In [11]:
data.head()

Unnamed: 0,Rented Bikes,Hour,Temperature,Humidity,wind speed,visibility,Dew point temperature,Solar Radiation,Rainfall,snowfall,Seasons,Holiday,Functioning Day,year,month,month name,day,day name,week
0,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,2017,12,December,1,Friday,weekday
1,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,2017,12,December,1,Friday,weekday
2,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes,2017,12,December,1,Friday,weekday
3,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes,2017,12,December,1,Friday,weekday
4,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes,2017,12,December,1,Friday,weekday


In [12]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Rented Bikes           8760 non-null   int64  
 1   Hour                   8760 non-null   int64  
 2   Temperature            8760 non-null   float64
 3   Humidity               8760 non-null   int64  
 4   wind speed             8760 non-null   float64
 5   visibility             8760 non-null   int64  
 6   Dew point temperature  8760 non-null   float64
 7   Solar Radiation        8760 non-null   float64
 8   Rainfall               8760 non-null   float64
 9   snowfall               8760 non-null   float64
 10  Seasons                8760 non-null   object 
 11  Holiday                8760 non-null   object 
 12  Functioning Day        8760 non-null   object 
 13  year                   8760 non-null   int64  
 14  month                  8760 non-null   int64  
 15  mont

In [13]:
data['Seasons'].value_counts()

Spring    2208
Summer    2208
Autumn    2184
Winter    2160
Name: Seasons, dtype: int64

In [14]:
data['Holiday'].value_counts()

No Holiday    8328
Holiday        432
Name: Holiday, dtype: int64

In [15]:
data['Functioning Day'].value_counts()

Yes    8465
No      295
Name: Functioning Day, dtype: int64

From the above snippets we can easily understand what the data has trying to say. It gives us the clear intuition about each feature so by reading this we can easily predict the present data in the large feature



In [16]:
data.columns

Index(['Rented Bikes', 'Hour', 'Temperature', 'Humidity', 'wind speed',
       'visibility', 'Dew point temperature', 'Solar Radiation', 'Rainfall',
       'snowfall', 'Seasons', 'Holiday', 'Functioning Day', 'year', 'month',
       'month name', 'day', 'day name', 'week'],
      dtype='object')