
<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

#   1. Problem Statement

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

A bike-sharing system is a service in which bikes are made available for shared use to individuals on a short term basis for a price or free. Many bike share systems allow people to borrow a bike from a "dock" which is usually computer-controlled wherein the user enters the payment information, and the system unlocks it. This bike can then be returned to another dock belonging to the same system.


A US bike-sharing provider <b>BoomBikes</b> has recently suffered considerable dips in their revenues due to the ongoing Corona pandemic. The company is finding it very difficult to sustain in the current market scenario. So, it has decided to come up with a mindful business plan to be able to accelerate its revenue as soon as the ongoing lockdown comes to an end, and the economy restores to a healthy state. 


In such an attempt, BoomBikes aspires to understand the demand for shared bikes among the people after this ongoing quarantine situation ends across the nation due to Covid-19. They have planned this to prepare themselves to cater to the people's needs once the situation gets better all around and stand out from other service providers and make huge profits.


They have contracted a consulting company to understand the factors on which the demand for these shared bikes depends. Specifically, they want to understand the factors affecting the demand for these shared bikes in the American market. The company wants to know:
 - Which variables are significant in predicting the demand for shared bikes.
 - How well those variables describe the bike demands.
 
Based on various meteorological surveys and people's styles, the service provider firm has gathered a large dataset on daily bike demands across the American market based on some factors. 


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

#   2. Business Goal

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

To model the demand for shared bikes with the available independent variables. It will be used by the management to understand how exactly the demands vary with different features. They can accordingly manipulate the business strategy to meet the demand levels and meet the customer's expectations. Further, the model will be a good way for management to understand the demand dynamics of a new market. 

<hr>
<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

#   3. Setting Up Jupyter Notebook

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

In [None]:
# Importing and supressing warnings

import warnings
warnings.filterwarnings("ignore")

In [None]:
# Disabling output scroll bar (using Javascript)

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

#   4. Reading and Understanding the Data

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

In [None]:
# Importing required libraries

import numpy as np
import pandas as pd

In [None]:
# Reading the data dictionary

from pathlib import Path
file = Path.cwd() / 'Readme.txt'
with open(file, 'r') as text:
    textfile = text.read()
    print(textfile)

In [None]:
# Reading the dataset

bike=pd.read_csv("day.csv")

In [None]:
# Inspecting the top 5 rows of the dataset

bike.head()

In [None]:
# Inspecting the rows,columns (shape) of the dataset

bike.shape

In [None]:
# Getting the information about NaN Values and Data Types of all the features in the dataset

bike.info()

In [None]:
# Inspecting the statistical summary of the dataset

bike.describe()

In [None]:
# Checking for the null values

bike.isnull().sum()

In [None]:
# Checking number of distinct elements in each feature

bike.nunique()

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

   ### Summary:
- Performed manual elimination of features based on high p-value and high VIF.
- The feature `dteday` is object type while rest are int/float type.
- There are some features which are categorical in nature, but are present in int/float type.
- There are no missing/null values in the dataset.

<p style="padding: 0.05px;
              color:white;">
</p>
</div>

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

#   5. Cleaning the Data

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

In [None]:
# Removing Redundant/Unwanted/Irrelevant Columns

bike_clean = bike.drop(columns=['instant','dteday','casual','registered'])

In [None]:
# Segregating numerical and categorical type features

num_feat=[]
cat_feat=[]
for col in bike_clean.columns:
    if (bike_clean[col].nunique()>12):
        num_feat.append(col)
    else:
        cat_feat.append(col)
print(f'Numeric features: {num_feat}')
print()
print(f'Categorical features: {cat_feat}')

In [None]:
# Checking correlation among the numeric features

bike_clean[num_feat].corr()

In [None]:
# Removing atemp due to high correlation with temp

bike_clean = bike_clean.drop(columns=["atemp"])

In [None]:
# Mapping numeric values of features with actual values as per the data dictionary

bike_clean.season = bike_clean.season.map({1:'spring', 2:'summer', 3:'fall', 4:'winter'})
bike_clean.mnth = bike_clean.mnth.map({1:'jan',2:'feb',3:'mar',4:'apr',5:'may',6:'june',7:'july',8:'aug',9:'sep',10:'oct',11:'nov',12:'dec'})
bike_clean.weekday = bike_clean.weekday.map({0:'sun',1:'mon',2:'tue',3:'wed',4:'thu',5:'fri',6:'sat'})
bike_clean.weathersit = bike_clean.weathersit.map({1:'clear',2:'misty',3:'light_snowrain',4:'heavy_snowrain'})
bike_clean.head()

In [None]:
# Checking the information of cleaned dataset

bike_clean.info()

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary

The following variables were dropped because they were found to be either redundant, unwanted or irrelevant:
1. `instant`:    
    Its only an index value  
2. `dteday`:              
    This feature can be removed as we already have seperate column for `yr` and `mnth`.    
3. `casual` & `registered`:
    Both these columns contains the count of bike booked by different categories of customers. Since our objective is to find 
    the total count of bikes and not by specific category, we will ignore these two columns. More over, we have created a new 
    variable to have the ratio of these customer types.
4. `atemp`:
    It has high correlation with `temp` hence decided to keep only `temp` as both feature more or less capture similar kind of data.
    

Also, the features `season`, `mnth`, `weekday` and `weathersit` were mapped with their actual values for better understanding.
<p style="padding: 0.05px;
              color:white;">
</p>
</div>

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

#   6. Exploratory Data Analysis

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

In [None]:
# Importing required libraries

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style("whitegrid")

In [None]:
# Segregating numerical and categorical type features

num_vars=[]
cat_vars=[]
for col in bike_clean.columns:
    if (bike_clean[col].nunique()>12):
        num_vars.append(col)
    else:
        cat_vars.append(col)
print(f'Numeric features: {num_vars}')
print()
print(f'Categorical features: {cat_vars}')

In [None]:
# Visualizing the correlation coefficients of numerical features using pair plot

sns.pairplot(bike_clean[num_vars])
plt.show()

In [None]:
# Visualizing categorical features using box plot and bar plot

for feature in cat_vars:
    df=bike_clean[[feature,'cnt']].groupby([feature],as_index=False).sum()
    df.sort_values(by='cnt', ascending=False, inplace=True)
    df1=bike_clean[[feature,'cnt']].groupby([feature],as_index=False).sum()
    df1['cnt']=df1["cnt"]/(df1["cnt"].sum())*100
    df1.sort_values(by='cnt', ascending=False, inplace=True)
          
    fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14,6))
    
    # 1. Subplot 1: box plot of categorical column
    s=sns.boxplot(ax=ax1,x=feature,y="cnt",data=bike_clean)
    ax1.set_xlabel("")
    ax1.set_title(f"Feature : {feature} \n", fontdict={'fontsize' : 15, 'fontweight' : 12, 'color' : 'Black'})
    
    # 2. Subplot 2: cnt% within the categorical column
    s = sns.barplot(ax=ax2,x = feature, y='cnt', data=df1,order=df1[feature], palette='YlOrBr')
    ax2.set_xlabel("")
    ax2.bar_label(ax2.containers[0], fmt='\n%.2f', label_type='edge')
    ax2.set_title(f"Feature : {feature} \n", fontdict={'fontsize' : 15, 'fontweight' : 12, 'color' : 'Black'})
    ax2.set_ylabel("cnt %")
    plt.tight_layout()
    plt.show()
    print("-"*127)

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

<b>Insights from the pairplot of the numerical features:</b>

1. There is a linear relationship between `temp`, `atemp` and `count`. 
2. Both `temp` and `atemp` have strong linear relationship hence cannot be used in the model due to multicolinearity. 

<b>Insights from the box-plot of the categorical features:</b>

1. `season`, `yr`, `mnth`, `weathersit` show trends for booking, thus seem to be a good predictor for the dependent variable
2. `weekday` shows close trend and may or may not be a good predictor for the dependent variable.
3. `workingday` does not show any trends for booking hence it may not be a good predictor for the dependent variable.
4. `holiday` shows trends for booking and can be a predictor for the dependent variable.
    
<b>Insights from the bar-plot of the categorical features:</b>    
1. `season`: Fall season seems to have attracted more bookings.
2. `yr`: 2019 attracted more number of bookings as compared to 2018, which shows good progress in terms of business.
3. `mnth`: Most of the bookings have been done between the month of may to oct, with aug recoding highest bookings.
4. `holiday`: Less bookings when its not a holiday as people may want to spend time at home and enjoy with family.
5. `weekday`: Number of bookings look identical across the weekdays.
6. `workingday`: More bookings on workingday.
7. `weathersit`: Clear weather attracted more bookings which seems obvious.
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 7. Preparing the Data for Modelling

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

### a. Encoding

In [None]:
# Selecting the columns fit for categorical type

dummy_vars=[]
for col in bike_clean.columns:
    if (bike_clean[col].nunique()<13 and bike_clean[col].nunique()>2):
        dummy_vars.append(col)
dummy_vars

In [None]:
# Converting the selected columns to categorical type

for col in dummy_vars:
    bike_clean[col]=bike_clean[col].astype('category')

In [None]:
# Validating changes

bike_clean.info()

In [None]:
# Creating dummy variables for the selected features

bike_clean = pd.get_dummies(bike_clean, drop_first=True)

In [None]:
# Checking information of the prepared dataset

bike_clean.info()

In [None]:
# Checking the shape of prepared data

bike_clean.shape

### b. Test-Train Splitting 

In [None]:
# Importing required libraries

import sklearn 
from sklearn.model_selection import train_test_split

In [None]:
# Splitting the data into train and test

np.random.seed(0)
df_train,df_test=train_test_split(bike_clean,train_size=0.7,random_state=100)
print(f'df_train shape: {df_train.shape}')
print(f'df_test shape: {df_test.shape}')

### c. Rescaling

In [None]:
# Importing required libraries

from sklearn.preprocessing import MinMaxScaler

In [None]:
# Instantiating an object

scaler=MinMaxScaler()

# Fitting and transforming the train-set

df_train[num_vars]=scaler.fit_transform(df_train[num_vars])

# Displaying statistical summary of train-set after rescaling

df_train[num_vars].describe()

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

- Created dummy variables for categorical features `season`, `mnth`, `weekday` and `weathersit`
- Performed the test-train split
- Fitted and transformed the numerical features `temp`, `atemp`, `hum`, `windspeed` and `cnt` of train data with Min-Max scaling
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 8. Training the Model

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

### a. Checking correlation coefficients

In [None]:
# Visualizing the correlation coefficients of all features using heatmap

plt.figure(figsize=(20,20))
sns.heatmap(df_train.corr(),annot=True,cmap='RdYlGn')
plt.tight_layout()
plt.show()

### b. Splitting into X_train and y_train

In [None]:
# Creating X_train and y_train

y_train=df_train.pop("cnt")
X_train=df_train

### c. Selecting features using RFE

In [None]:
# Importing RFE and LinearRegression

from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression

In [None]:
# Running RFE with the output number of the variable equal to 15

lm = LinearRegression()
lm.fit(X_train, y_train)

# running RFE
rfe = RFE(lm, n_features_to_select=15)

rfe = rfe.fit(X_train, y_train)

In [None]:
# Investigating feature ranking

list(zip(X_train.columns,rfe.support_,rfe.ranking_))

In [None]:
# Checking top ranked features

col = X_train.columns[rfe.support_]
col

In [None]:
# Checking insignificant features

X_train.columns[~rfe.support_]

In [None]:
# Creating X_test dataframe with RFE selected variables

X_train_rfe = X_train[col]

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

- It is evident from the heatmap that the target variable has correlation with some of the independent variables.
- The heatmap also suggests that some of the independent variables are correlated.
- The data is fit for building linear regression model, hence proceeded with splitting and recursive function elimination(RFE) for feature ranking for selecting the features.
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 9. Modelling the Data

<p style="padding: 0.015px;
              color:white;">
</p>
</div>


In [None]:
# Importing required library

from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm

In [None]:
# Defining VIF function

def vif(train_set):
    vif = pd.DataFrame()
    vif['Features'] = train_set.columns
    vif['VIF'] = [variance_inflation_factor(train_set.values, i) for i in range(train_set.shape[1])]
    vif['VIF'] = round(vif['VIF'], 2)
    vif = vif.sort_values(by = "VIF", ascending = False)
    return vif

In [None]:
# Defining modelling function

def modelling(train_set,lm_name,lr_name):
    # Adding a constant
    globals()[lm_name]=sm.add_constant(train_set)
    
    # Creating a fitted model
    globals()[lr_name]=sm.OLS(y_train, globals()[lm_name]).fit()

    
    # Printing OLS Regression Results
    print((globals()[lr_name]).summary())

##  Model 1

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm1","lr1")

## Model 2

In [None]:
# Removing the variable 'hum' based on its high VIF

X_train_rfe = X_train_rfe.drop(["hum"], axis = 1)

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm2","lr2")

## Model 3

In [None]:
# Removing the variable 'mnth_nov' based on its high p-value

X_train_rfe = X_train_rfe.drop(["mnth_nov"], axis = 1)

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm3","lr3")

## Model 4

In [None]:
# Removing the variable 'mnth_dec' based on its high p-value

X_train_rfe= X_train_rfe.drop(["mnth_dec"], axis = 1)

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm4","lr4")

## Model 5

In [None]:
# Removing the variable 'mnth_jan' based on its high p-value

X_train_rfe = X_train_rfe.drop(["mnth_jan"], axis = 1)

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm5","lr5")

## Model 6

In [None]:
# Removing the variable 'mnth_july' based on its high VIF

X_train_rfe = X_train_rfe.drop(["mnth_july"], axis = 1)

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm6","lr6")

## Model 7

In [None]:
# Removing the variable 'season-spring' based on its high p-value

X_train_rfe = X_train_rfe.drop(["season_spring"], axis = 1)

In [None]:
# VIF check

vif(X_train_rfe)

In [None]:
# Adding constant, creating a fitted model and printing the OLS Regression Results

modelling(X_train_rfe,"X_train_lm7","lr7")

In [None]:
# Checking the parameters and their coefficient values of final model (lr7)

lr7.params

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

- Manually eliminated independent variables on the basis of high p-value and high VIF
- Considered Model 7(lr7) as the final model, since it has all important statistics high (R-square, Adjusted R-squared and F-statistic), along with no insignificant variables and no multi coliinear (high VIF) variables. 
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 10. Residual Analysis and Validation of Assumptions

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

In [None]:
# Making predictions on the train set

y_train_pred=lr7.predict(X_train_lm7)

# Finding residuals

res=y_train-y_train_pred

### a. Linear Relationship

In [None]:
# Plotting CCPR plot to check linear relationship between the outcome and the independent variables

fig = plt.figure(figsize=(12, 20))
sm.graphics.plot_ccpr_grid(lr7, fig=fig)
plt.show()

### b. Multivariate Normality

In [None]:
# Ploiting the histogram of the error terms

fig = plt.figure()
sns.distplot((res), bins = 20)
fig.suptitle('Error Terms', fontsize = 20)                  # Plot heading 
plt.xlabel('Errors', fontsize = 18)                         # X-label
plt.show()

### c. No multicollinearity

In [None]:
# Validating no multicollinearity using heatmap

plt.figure(figsize=(12,6))
sns.heatmap(X_train_rfe.corr(),annot = True, cmap="coolwarm")
plt.show()

### d. Homoscedasticity

In [None]:
# Homoscedasticity check using scatter plot

sns.regplot(y_train_pred,res)
plt.xlabel('Count')
plt.ylabel('Residual')
plt.show()

### e. No Autocorrelation

In [None]:
# Checking Durbin Watson for Autocorrelation

from statsmodels.stats.stattools import durbin_watson
round(durbin_watson(lr7.resid),3)

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

- The CCPR Plot confirms the linear relationship between the independent and the depended variables.
- The histogram confirms that the errors between observed and predicted values (i.e., the residuals of the regression) are normally distributed.
- The heatmap confirms that there is no multicollinearity in the data as the magnitude of the correlation coefficients is less than .80.
- The regplot of residuals versus predicted values confirms that there is no clear pattern in the distribution.
- The Durbin-Watson value of final model lr7 is 2.097, which signifies there is no autocorrelation.
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 11. Prediction on Test Set

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

### a). Applying Scaling on Test Set

In [None]:
# Transforming numeric features using min-max scaler

df_test[num_vars]=scaler.transform(df_test[num_vars])
df_test.describe()

### b). Splitting into X_test and y_test

In [None]:
# X_test and y-test split

y_test = df_test.pop('cnt')
X_test = df_test

### c). Making Predictions

In [None]:
# Now let's use our model to make predictions.

# Creating X_test_new dataframe by dropping variables from X_test
X_test_new = X_test[X_train_rfe.columns]

# Adding a constant variable 
X_test_new = sm.add_constant(X_test_new)

# Making predictions
y_test_pred = lr7.predict(X_test_new)

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

- Transformed the numeric features of the test data with Min Max scaler.
- Splitted the test data into X_test and y_test and dropped features in X_test to aling with X_train.
- Made predications on the test set using the final model (lr7).
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 12. Model Evaluation

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

In [None]:
# Plotting Actual vs Predicted to understand the spread.

fig, ax = plt.subplots()
ax.scatter(y_test, y_test_pred)
ax.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=4)
ax.set_title("Actual Vs Predicted",fontsize=25)             # Plot heading 
ax.set_xlabel('Actual',fontsize=12)                         # X-label
ax.set_ylabel('Predicted',fontsize=12)                      # Y-label
plt.show()

In [None]:
# Evaluating on R2_Score

from sklearn.metrics import r2_score

print(f'R2-Score of Train Set: {r2_score(y_true=y_train,y_pred=y_train_pred)}')
print(f'R2-Score of Test Set: {r2_score(y_true=y_test,y_pred=y_test_pred)}')                                       

In [None]:
# Evaluating on Adjusted_R2

print(f'Adjusted R2 of Train Set: {1-(1-r2_score(y_true=y_train,y_pred=y_train_pred))*(X_train_rfe.shape[0]-1)/(X_train_rfe.shape[0]-X_train_rfe.shape[1]-1)}')
print(f'Adjusted R2 of Test Set: {1-(1-r2_score(y_true=y_test,y_pred=y_test_pred))*(X_test_new.shape[0]-1)/(X_test_new.shape[0]-X_test_new.shape[1]-1)}')

In [None]:
# Evaluating of Actual Vs Predicted Plot of Train Set

c = [i for i in range(0,len(X_train_rfe),1)]
plt.figure(figsize=(14,5))
plt.plot(c,y_train, color="blue",linewidth=2.0)
plt.plot(c,y_train_pred, color="red",linewidth=2.0)
plt.suptitle('Actual vs Predicted - Train Set', fontsize = 20)  # Plot title 
plt.xlabel('Index', fontsize=18)                                 # X-label
plt.ylabel('Counts', fontsize=16)                                # Y-label
plt.show()

In [None]:
# Evaluating of Actual Vs Predicted Plot of Test Set

c = [i for i in range(0,len(X_test_new),1)]
plt.figure(figsize=(14,5))
plt.plot(c,y_test, color="blue",linewidth=2.0)
plt.plot(c,y_test_pred, color="red",linewidth=2.0)
plt.suptitle('Actual vs Predicted - Test Set', fontsize = 20)  # Plot title 
plt.xlabel('Index', fontsize=18)                                 # X-label
plt.ylabel('Counts', fontsize=16)                                # Y-label
plt.show()

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### Summary:

- The variance of the residuals (error terms) is constant across predictions, i.e error term does not vary much as the value of the predictor variable changes.
- The R2 scores of train and test data are 0.83 and 0.79 respectively.
- The adjusted R2 scores of train and test data are 0.83 and 0.78 respectively which is pretty decent.
- Predictions on train and test data is very close to actuals.
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#1663BE;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.015px;
              color:black;">  

# 13. Conclusion

<p style="padding: 0.015px;
              color:white;">
</p>
</div>

<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### 1. Equation of best fit line:  
    
      


      
`cnt` = 0.125926 + 0.232861 x `yr` - 0.098685 x `holiday` + 0.548008 x `temp` - 0.153246 x `windspeed` + 0.088080 x `season_summer` +
    
0.129345 x `season_winter` + 0.101195 x `mnth_sep` - 0.282869 x `weathersit_light_snowrain` - 0.078375 x `weathersit_misty`      </b>      
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### 2. Variables significant in predicting the demand for shared bikes:  
    
   
`yr`, `holiday`, `temp`, `windspeed`, `season_summer`, `season_winter`,   

`mnth_sep`, `weathersit_light_snowrain`, `weathersit_misty`    
 </b>      
<p style="padding: 0.05px;
              color:white;">
</p>
</div>


<div style="color:maroon;
           display:fill;
           border-radius:5px;
           background-color:#FFE5B4;
           font-size:110%;
           font-family:Calibri;
           letter-spacing:0.5px">

<p style="padding: 0.02px;
              color:white;">

### 3. How well the significant variables describe the bike demands?  
    
   
- `yr`, `temp`, `season_summer`, `season_winter`, `mnth_sep` : Positively impact the bike demands. 
    
    
- `holiday`,`windspeed`,`weathersit_light_snowrain`,`weathersit_misty`: Negatively impact the bike demands.
    
 </b>      
<p style="padding: 0.05px;
              color:white;">
</p>
</div>
