# Case Study - Bike Sharing system 

### Problem Statement: 
A bike-sharing system is a service in which bikes are made available for shared use to individuals on a short term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" which is usually computer-controlled wherein the user enters the payment information, and the system unlocks it. This bike can then be returned to another dock belonging to the same system.


A US bike-sharing provider BoomBikes has recently suffered considerable dips in their revenues due to the ongoing Corona pandemic. BoomBikes aspires to understand the demand for shared bikes among the people after this ongoing quarantine situation ends across the nation due to Covid-19. They have planned this to prepare themselves to cater to the people's needs once the situation gets better all around and stand out from other service providers and make huge profits.

### Aim

The company wants to understand the factors affecting the demand, particulary:
* Which variables are significant in predicting the demand for shared bikes.
* How well those variables describe the bike demands

#### The steps we will follow:
1. Reading, understanding and visualizing the data
2. Preparing the data for modelling
3. Training the model
4. Residual analysis
5. Prediction and evaluation on test set

### Step 1:Reading, understanding and visualizing the data


In [1]:
# importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
#lets analyze the dataset
bike = pd.read_csv('day.csv')
bike.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,01-01-2018,1,0,1,0,6,0,2,14.110847,18.18125,80.5833,10.749882,331,654,985
1,2,02-01-2018,1,0,1,0,0,0,2,14.902598,17.68695,69.6087,16.652113,131,670,801
2,3,03-01-2018,1,0,1,0,1,1,1,8.050924,9.47025,43.7273,16.636703,120,1229,1349
3,4,04-01-2018,1,0,1,0,2,1,1,8.2,10.6061,59.0435,10.739832,108,1454,1562
4,5,05-01-2018,1,0,1,0,3,1,1,9.305237,11.4635,43.6957,12.5223,82,1518,1600


In [3]:
bike.shape

(730, 16)

In [4]:
bike.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     730 non-null    int64  
 1   dteday      730 non-null    object 
 2   season      730 non-null    int64  
 3   yr          730 non-null    int64  
 4   mnth        730 non-null    int64  
 5   holiday     730 non-null    int64  
 6   weekday     730 non-null    int64  
 7   workingday  730 non-null    int64  
 8   weathersit  730 non-null    int64  
 9   temp        730 non-null    float64
 10  atemp       730 non-null    float64
 11  hum         730 non-null    float64
 12  windspeed   730 non-null    float64
 13  casual      730 non-null    int64  
 14  registered  730 non-null    int64  
 15  cnt         730 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.4+ KB


In [5]:
bike.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0,730.0
mean,365.5,2.49863,0.5,6.526027,0.028767,2.99726,0.683562,1.394521,20.319259,23.726322,62.765175,12.76362,849.249315,3658.757534,4508.006849
std,210.877136,1.110184,0.500343,3.450215,0.167266,2.006161,0.465405,0.544807,7.506729,8.150308,14.237589,5.195841,686.479875,1559.758728,1936.011647
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,2.424346,3.95348,0.0,1.500244,2.0,20.0,22.0
25%,183.25,2.0,0.0,4.0,0.0,1.0,0.0,1.0,13.811885,16.889713,52.0,9.04165,316.25,2502.25,3169.75
50%,365.5,3.0,0.5,7.0,0.0,3.0,1.0,1.0,20.465826,24.368225,62.625,12.125325,717.0,3664.5,4548.5
75%,547.75,3.0,1.0,10.0,0.0,5.0,1.0,2.0,26.880615,30.445775,72.989575,15.625589,1096.5,4783.25,5966.0
max,730.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,35.328347,42.0448,97.25,34.000021,3410.0,6946.0,8714.0


##### Checking for null Columns/Rows

In [6]:
missing_col = bike.isnull().sum()
missing_col

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

In [7]:
missing_row =bike.isnull().sum(axis=1).sort_values(ascending=False)
missing_row

0      0
479    0
481    0
482    0
483    0
      ..
245    0
246    0
247    0
248    0
729    0
Length: 730, dtype: int64

No null Columns or Rows

##### Checking for Duplicates

In [8]:
bike.duplicated(subset=None, keep='first').count()

730

No duplicates

Looking at the dictionary, the below do not serve much purpose, hence we shall remove these variables:

* instant- records index values, all are unique.
* dteday- records date, we already have 'year' & 'month' categories, hence removing this.
* casual- contains the count casual users.
* registered- contains the count registered users.
    * We have the total counts cnt,which is our target variable, hence removing the above two.

In [9]:
bike.columns

Index(['instant', 'dteday', 'season', 'yr', 'mnth', 'holiday', 'weekday',
       'workingday', 'weathersit', 'temp', 'atemp', 'hum', 'windspeed',
       'casual', 'registered', 'cnt'],
      dtype='object')

In [10]:
bike_new=bike.drop(['instant','dteday','casual','registered'],axis=1)
bike_new.columns

Index(['season', 'yr', 'mnth', 'holiday', 'weekday', 'workingday',
       'weathersit', 'temp', 'atemp', 'hum', 'windspeed', 'cnt'],
      dtype='object')

In [11]:
bike_new.shape

(730, 12)

In [12]:
bike_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 730 entries, 0 to 729
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   season      730 non-null    int64  
 1   yr          730 non-null    int64  
 2   mnth        730 non-null    int64  
 3   holiday     730 non-null    int64  
 4   weekday     730 non-null    int64  
 5   workingday  730 non-null    int64  
 6   weathersit  730 non-null    int64  
 7   temp        730 non-null    float64
 8   atemp       730 non-null    float64
 9   hum         730 non-null    float64
 10  windspeed   730 non-null    float64
 11  cnt         730 non-null    int64  
dtypes: float64(4), int64(8)
memory usage: 68.6 KB


### Lets visualize the dataset

In [None]:
sns.pairplot(bike_new)
plt.show()

In [None]:
# Numeric variables:
sns.pairplot(data=bike_new,vars=['cnt', 'temp', 'atemp', 'hum','windspeed'])
plt.show()

In [None]:
sns.heatmap(bike_new[['temp','atemp','hum','windspeed','cnt']].corr(), cmap='Blues', annot = True)
plt.show()

* Observation:
1. Temp and Atemp are positively correlated with cnt.
2. They have the highest corelation with the target variable cnt.
3. Temp and Atemp are highly co-related with each other, hence we can remove one of them.
4. Hum and Windspeed values are more scattered around.
5. And they are negatively correlated with the target,cnt decreases with increase in hum/windspeed.

In [None]:
# Categorical variables: 
plt.figure(figsize=(20, 12))
plt.subplot(3,3,1)
sns.boxplot(x = 'season', y = 'cnt', data = bike_new)
plt.subplot(3,3,2)
sns.boxplot(x = 'mnth', y = 'cnt', data = bike_new)
plt.subplot(3,3,3)
sns.boxplot(x = 'holiday', y = 'cnt', data = bike_new)
plt.subplot(3,3,4)
sns.boxplot(x = 'weathersit', y = 'cnt', data = bike_new)
plt.subplot(3,3,5)
sns.boxplot(x = 'yr', y = 'cnt', data = bike_new)
plt.subplot(3,3,6)
sns.boxplot(x = 'weekday', y = 'cnt', data = bike_new)
plt.subplot(3,3,7)
sns.boxplot(x = 'workingday', y = 'cnt', data = bike_new)
plt.show()


* Observation:
1. The demand is least for spring season.
2. The number of bike shares gradually increase until September and then starts to decrease.
3. The cnt values are less during holidays.
4. We do not find any values for weathersit - Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
5. However, there is high number of bike sharing during Clear, Few clouds, Partly cloudy, Partly cloudy weathersit.
6. The cnt values have increased in 2019 compared to 2018.
7. Weekday doesnt give many details on the demand trend.

### Step2: Preparing the data for modelling 

#### Encoding 
    * -Converting binary categorical variables to 1/0
    * -Converting other categorical variables to dummy variables

Lets create dummy variables for the categorical variables: season ,weathersit, mnth and weekday. Convert them to values mentioned in the data dictionary.
* season : season (1:spring, 2:summer, 3:fall, 4:winter)
* weathersit : 
		- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
		- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
		- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
		- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
* mnth : month ( 1 to 12)
* weekday : day of the week

In [None]:
bike_new.season.value_counts()

In [None]:
bike_new.info()

In [None]:
#season:
bike_new['season'] = bike_new['season'].map({1: 'spring',2:'summer',3:'fall',4:'winter'}) 
bike_new.head()

In [None]:
bike_new.shape

In [None]:
#weathersit
bike_new.weathersit.value_counts()

In [None]:
bike_new['weathersit'] = bike_new['weathersit'].map({1:'Clear',2:'Mist',3:'Snow'})
bike_new.head()

In [None]:
#mnth
bike_new.mnth.value_counts()

In [None]:
bike_new['mnth'] = bike_new['mnth'].map({1: 'jan',2: 'feb',3: 'mar',4: 'apr',5: 'may',6: 'jun',7: 'jul',8: 'aug',9: 'sept',10: 'oct',11: 'nov',12: 'dec'})
bike_new.head()

In [None]:
#weekday
bike_new.weekday.value_counts()

In [None]:
bike_new['weekday'] = bike_new['weekday'].map({0: 'tue',1: 'wed',2: 'thur',3: 'fri',4: 'sat',5: 'sun',6: 'mon'})
bike_new.head()

In [None]:
bike_new.head()

In [None]:
bike_new.info()

In [None]:
# Create dummies for the above categories 
# Drop original variable for which the dummy was created

dummy = bike_new[['season','mnth','weekday','weathersit']]
dummy = pd.get_dummies(dummy,drop_first=True )

In [None]:
bike_new = pd.concat([bike_new,dummy],axis = 1)

In [None]:
bike_new.shape

In [None]:
bike_new.head()

In [None]:
bike_new.drop(['season', 'mnth', 'weekday','weathersit'], axis = 1, inplace = True)

In [None]:
bike_new.head()

In [None]:
bike_new.shape

In [None]:
bike_new.info()

#### Splitting into train-test and Rescaling the variables

In [None]:
#splitting into train-test
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler


In [None]:
df_train, df_test =train_test_split(bike_new, train_size=0.7,random_state=100)
print(df_train.shape)
print(df_test.shape)

In [None]:
df_train.head()

In [None]:
scaler = MinMaxScaler()

# create a list of only numeric variables(non-binary),apply scaler() to all the columns except the 'yes-no' and 'dummy' variables
num_vars = ['cnt','hum','windspeed','temp','atemp']

In [None]:
df_train[num_vars] = scaler.fit_transform(df_train[num_vars])
df_train.head()

In [None]:
df_train.describe()

In [None]:
# Let's check the correlation coefficients to see which variables are highly correlated

plt.figure(figsize = (16, 10))
sns.heatmap(df_train.corr(), annot = True, cmap="Greens")
plt.show()
# We will refer this map while building the linear model.

### Step3: Training the model

In [None]:
#Building a linear model:

In [None]:
# lets create X_train and y_train
y_train = df_train.pop('cnt')
X_train = df_train

In [None]:
X_train.head()

In statsmodels, we need to explicitly fit a constant using sm.add_constant(X).

In [None]:
y_train.head()

#### Variable Selection using RFE

In [None]:
from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression

In [None]:
lm = LinearRegression()
lm.fit(X_train, y_train)

rfe = RFE(lm, 20)             # We are considering 20 variables
rfe = rfe.fit(X_train, y_train)

In [None]:
list(zip(X_train.columns,rfe.support_,rfe.ranking_))

In [None]:
#Print Columns selected by RFE. We will start with these columns for manual elimination
col = X_train.columns[rfe.support_]
col

In [None]:
X_train.columns[~rfe.support_]

In [None]:
# Creating X_test dataframe with RFE selected variables
# Model 1
X_train_rfe = X_train[col]
X_train_rfe

In [None]:
# Adding a constant variable 
import statsmodels.api as sm  
X_train_rfe = sm.add_constant(X_train_rfe)

In [None]:
# Running the linear model
lm = sm.OLS(y_train,X_train_rfe).fit()   
print(lm.summary())

In [None]:
lm.params

In [None]:
import statsmodels.api as sm  
from statsmodels.stats.outliers_influence import variance_inflation_factor

In [None]:
# Calculate the VIFs for the new model
vif = pd.DataFrame()
X = X_train_rfe
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
# Model 2
# Removing 'atemp' as it has a high P value and high VIF
X_train_new = X_train_rfe.drop(['atemp'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 3
# Removing 'mnth_may' as it has a high P value
X_train_new = X_train_rfe.drop(['mnth_may'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 4
# Removing 'mnth_feb' as it has a high P value
X_train_new = X_train_rfe.drop(['mnth_feb'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 5
# Removing 'holiday' as it has a high P value
X_train_new = X_train_rfe.drop(['holiday'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 6
# P values are now below 0.05
# Removing 'season_spring' as it has a high VIF value
X_train_new = X_train_rfe.drop(['season_spring'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 7
# Removing 'mnth_nov' due to high P value
X_train_new = X_train_rfe.drop(['mnth_nov'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 8
# Removing 'mnth_dec' due to high P value
X_train_new = X_train_rfe.drop(['mnth_dec'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 9
# Removing 'mnth_jan' as it has a non-zero p value and from the above visualization we noticed that it doesnt have a huge demand
X_train_new = X_train_rfe.drop(['mnth_jan'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 10
# Removing 'mnth_jul' as it has a non-zero p value and from the above visualization we noticed that it doesnt have a huge demand
X_train_new = X_train_rfe.drop(['mnth_jul'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 11
# Removing 'const' as it has high VIF
X_train_new = X_train_rfe.drop(['const'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

In [None]:
# Model 12
# Removing 'hum' as it has high VIF
X_train_new = X_train_rfe.drop(['hum'], axis=1)

In [None]:
vif = pd.DataFrame()
X = X_train_new
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

In [None]:
X_train_rfe = sm.add_constant(X_train_new)
lm = sm.OLS(y_train,X_train_rfe).fit()   # Running the linear model
print(lm.summary())

* We can conclude this to be our final model as the P values are zero and VIFs are optimal.
* The F-Statistics value of 253.0 (which is greater than 1) and the p-value of 0 implying that the overall model is significant.

### Step4: Residual Analysis

In [None]:
y_train_cnt = lm.predict(X_train_rfe)

##### Error terms are following a normal distribution: 

In [None]:
fig = plt.figure()
sns.distplot((y_train - y_train_cnt), bins = 20)
fig.suptitle('Error Terms', fontsize = 20)                  # Plot heading 
plt.xlabel('Errors', fontsize = 18) 

##### VIF values are less than 5 which is good.

In [None]:
# Multi-collinearity check:
vif = pd.DataFrame()
X = X_train_rfe
vif['Features'] = X.columns
vif['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.sort_values(by = "VIF", ascending = False)
vif

##### Homoscedacity test: No visible pattern observed from above plot for residuals.

In [None]:
y_train_cnt = lm.predict(X_train_rfe)
res = y_train -y_train_cnt

p = sns.scatterplot(y_train_cnt,res)
plt.xlabel('predicted values')
plt.ylabel('residuals')
p = plt.plot([0,1],[0,0],color='blue')
p = plt.title('homoscedasticity check')

##### Independence of errors can be checked using Durbin-Watson:
* Durbin-Watson value of final model is 2.089, which signifies there is no autocorrelation.

### Step5: Prediction and evaluation on test set

In [None]:
# lets perform scaling on the test data.
num_vars = ['cnt','hum','windspeed','temp','atemp']
df_test[num_vars] = scaler.transform(df_test[num_vars])

In [None]:
df_test.describe()

In [None]:
df_test.shape

In [None]:
y_test = df_test.pop('cnt')
X_test = df_test

In [None]:
X_test = sm.add_constant(X_test)

In [None]:
X_test_new = X_test[X_train_rfe.columns]
# Adding constant variable to test dataframe
X_test_new = sm.add_constant(X_test_new)

X_test_new.info()

In [None]:
y_pred = lm.predict(X_test_new)

In [None]:
from sklearn.metrics import r2_score
r2 =r2_score(y_test, y_pred)
r2

In [None]:
# Calculating Adjusted-R^2 value for the test dataset
adjust_r2 = round(1-(1-r2)*(X_test_new.shape[0]-1)/(X_test_new.shape[0]-X_test_new.shape[1]-1),4)
adjust_r2

In [None]:
# Plotting y_test and y_pred to understand the spread.
fig = plt.figure()
plt.scatter(y_test,y_pred)
fig.suptitle('y_test vs y_pred', fontsize=20)              # Plot heading 
plt.xlabel('y_test', fontsize=18)                          # X-label
plt.ylabel('y_pred', fontsize=16)                          # Y-label


In [None]:
lm.summary()

### Final Equation
We can see that the equation of our best fitted line is:
* cnt = 0.0750 + (temp × 0.5499) + (yr × 0.2331) + (season_winter x 0.1307) + (mnth_sept x 0.0974) + (season_summer x 0.0886) + (weekday_mon x 0.0675) + (workingday × 0.0561) − (weathersit_Mist x 0.0800) - (windspeed × 0.1552) - (weathersit_Snow x 0.2871)

### Conclusion:

* All the positive coefficients like temp, yr, season_winter, mnth_sept, season_Summer, weekday_mon and workingday indicate that these are positively correlated to the target variable, cnt.

* And all the negative coefficients indicate that an increase in these values will lead to an decrease in the value of cnt.
* Temp is the most significant with the largest coefficient of 0.549892.
* Followed by Year and Season.
* Bike sharing is more during the month of September.
* There is a decrease in the demand when the Weather situation is bad (Snow).

* Therefore as per our final Model, below are the top 3 variables that influence the demand:
1. Temperature
2. Year
3. Season