<a href="https://colab.research.google.com/github/ady909/Capstone-Project-Bike-Sharing-Demand-Prediction-/blob/main/Capstone_Project_Bike_Sharing_Demand_Prediction_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Project Title : Seoul Bike Sharing Demand Prediction

### Project Type - Regression

### Contribution - Individual

## Problem Description

Currently Rental bikes are introduced in many urban cities for the enhancement of mobility comfort. It is important to make the rental bike available and accessible to the public at the right time as it lessens the waiting time. Eventually, providing the city with a stable supply of rental bikes becomes a major concern. The crucial part is the prediction of bike count required at each hour for the stable supply of rental bikes.

##Attribute Information:

Date : year-month-day

Rented Bike count - Count of bikes rented at each hour

Hour - Hour of he day

Temperature-Temperature in Celsius

Humidity - %

Windspeed - m/s

Visibility - 10m

Dew point temperature - Celsius

Solar radiation - MJ/m2

Rainfall - mm

Snowfall - cm

Seasons - Winter, Spring, Summer, Autumn

Holiday - Holiday/No holiday

Functional Day - NoFunc(Non Functional Hours), Fun(Functional hours)

In [1]:
from google.colab import files
upload = files.upload()

Saving SeoulBikeData.csv to SeoulBikeData (1).csv


In [2]:
#importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import datetime as dt
from sklearn.model_selection import train_test_split , GridSearchCV
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.linear_model import Ridge, Lasso, LinearRegression

import warnings
warnings.filterwarnings('ignore')

%matplotlib inline
sns.set_style("whitegrid",{'grid.linestyle': '--'})


In [3]:
#loading data
bike_df = pd.read_csv("SeoulBikeData.csv" , encoding = "unicode_escape")

## Getting to know about data

In [4]:
#first 20 rows look
bike_df.head(n = 20)

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
0,01/12/2017,254,0,-5.2,37,2.2,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
1,01/12/2017,204,1,-5.5,38,0.8,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
2,01/12/2017,173,2,-6.0,39,1.0,2000,-17.7,0.0,0.0,0.0,Winter,No Holiday,Yes
3,01/12/2017,107,3,-6.2,40,0.9,2000,-17.6,0.0,0.0,0.0,Winter,No Holiday,Yes
4,01/12/2017,78,4,-6.0,36,2.3,2000,-18.6,0.0,0.0,0.0,Winter,No Holiday,Yes
5,01/12/2017,100,5,-6.4,37,1.5,2000,-18.7,0.0,0.0,0.0,Winter,No Holiday,Yes
6,01/12/2017,181,6,-6.6,35,1.3,2000,-19.5,0.0,0.0,0.0,Winter,No Holiday,Yes
7,01/12/2017,460,7,-7.4,38,0.9,2000,-19.3,0.0,0.0,0.0,Winter,No Holiday,Yes
8,01/12/2017,930,8,-7.6,37,1.1,2000,-19.8,0.01,0.0,0.0,Winter,No Holiday,Yes
9,01/12/2017,490,9,-6.5,27,0.5,1928,-22.4,0.23,0.0,0.0,Winter,No Holiday,Yes


In [5]:
#last 20 rows look
bike_df.tail(n = 20)

Unnamed: 0,Date,Rented Bike Count,Hour,Temperature(°C),Humidity(%),Wind speed (m/s),Visibility (10m),Dew point temperature(°C),Solar Radiation (MJ/m2),Rainfall(mm),Snowfall (cm),Seasons,Holiday,Functioning Day
8740,30/11/2018,116,4,-0.5,71,0.4,1345,-5.1,0.0,0.0,0.0,Autumn,No Holiday,Yes
8741,30/11/2018,149,5,-0.7,66,0.5,1336,-6.2,0.0,0.0,0.0,Autumn,No Holiday,Yes
8742,30/11/2018,293,6,-0.8,68,0.8,1322,-5.9,0.0,0.0,0.0,Autumn,No Holiday,Yes
8743,30/11/2018,750,7,-1.2,70,0.8,1351,-5.9,0.0,0.0,0.0,Autumn,No Holiday,Yes
8744,30/11/2018,1527,8,-1.5,68,1.1,1286,-6.6,0.02,0.0,0.0,Autumn,No Holiday,Yes
8745,30/11/2018,809,9,-0.4,57,0.6,1270,-7.8,0.45,0.0,0.0,Autumn,No Holiday,Yes
8746,30/11/2018,554,10,1.9,51,0.8,1029,-7.1,1.01,0.0,0.0,Autumn,No Holiday,Yes
8747,30/11/2018,642,11,5.3,43,1.8,1177,-6.2,1.38,0.0,0.0,Autumn,No Holiday,Yes
8748,30/11/2018,720,12,6.6,35,1.3,1409,-7.8,1.7,0.0,0.0,Autumn,No Holiday,Yes
8749,30/11/2018,740,13,7.1,24,2.8,1838,-12.1,1.83,0.0,0.0,Autumn,No Holiday,Yes


Data in consist of info about the bike rented on hourly bases from 01/12/2017 to 30/11/2018. Weather conditions are also taken into consideration.

In [6]:
#chcking dimensions of dataset
bike_df.shape

(8760, 14)

There are total 8760 data points , 13 features and one target output i.e. Rented Bike Count

In [7]:
#summary
bike_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8760 entries, 0 to 8759
Data columns (total 14 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Date                       8760 non-null   object 
 1   Rented Bike Count          8760 non-null   int64  
 2   Hour                       8760 non-null   int64  
 3   Temperature(°C)            8760 non-null   float64
 4   Humidity(%)                8760 non-null   int64  
 5   Wind speed (m/s)           8760 non-null   float64
 6   Visibility (10m)           8760 non-null   int64  
 7   Dew point temperature(°C)  8760 non-null   float64
 8   Solar Radiation (MJ/m2)    8760 non-null   float64
 9   Rainfall(mm)               8760 non-null   float64
 10  Snowfall (cm)              8760 non-null   float64
 11  Seasons                    8760 non-null   object 
 12  Holiday                    8760 non-null   object 
 13  Functioning Day            8760 non-null   objec

Numerical as well as categorical variables are present. There are no null values in any column.Date has datatype named object so we need to change it to date datatype.


In [8]:
#creating function to change data type of date
def get_date(Date):
  date_obj = dt.datetime.strptime(Date , "%d/%m/%Y")
  date_obj = pd.to_datetime(date_obj, format = "%Y-%m-%d")
  return date_obj


In [9]:
 #applying function
 bike_df["Date"] = bike_df["Date"].apply(get_date)

In [10]:
# checking for duplicate values
bike_df.duplicated().sum()

0

In [11]:
# checking for missing values
bike_df.isnull().sum()

Date                         0
Rented Bike Count            0
Hour                         0
Temperature(°C)              0
Humidity(%)                  0
Wind speed (m/s)             0
Visibility (10m)             0
Dew point temperature(°C)    0
Solar Radiation (MJ/m2)      0
Rainfall(mm)                 0
Snowfall (cm)                0
Seasons                      0
Holiday                      0
Functioning Day              0
dtype: int64

There are no duplicate as  well as missing values.

##Exploratory Data Analysis