<a href="https://colab.research.google.com/github/ItzmeAkash/Bike-Sharing-Demand-Prediction-ML-Regression./blob/main/Bike_Sharing_Demand_Prediction_ML_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**  -  Bike Sharing Demand Prediction.



##### **Project Type** - Regression
##### **Contribution** - Individual
##### **Name** -  Akash Ps

# **Project Summary -**

Bike rental companies often encounter challenges in accurately predicting bike demand, crucial for optimizing inventory and pricing strategies. This project focuses on developing a supervised machine learning regression model to forecast bike demand within a specific time frame.

Initially, utilizing a dataset sourced from a bike sharing company, which included rental details like bike count, rental timestamps, as well as various weather and seasonal factors, alongside other relevant variables such as holidays and operational status.

Following data preprocessing and partitioning into training and test sets, the training data was used to train multiple machine learning models, exploring various architectures and hyperparameter configurations to identify the most effective model based on performance metrics.

Performance evaluation of the model was conducted using metrics like mean absolute error, root mean squared error, and R-squared, demonstrating high predictive accuracy with an R-squared value of 0.88 and a mean absolute error of 2.58 on the test dataset.

Furthermore, ablation studies were conducted to analyze the impact of individual features on model performance, revealing temperature, weather, and seasonality as significant factors influencing bike demand.

# **GitHub Link -**

https://github.com/ItzmeAkash/Bike-Sharing-Demand-Prediction-ML-Regression.

# **Problem Statement**


Cities are introducing rental bikes to make it easier to get around. It's important to have enough bikes available when people need them to avoid waiting. Predicting how many bikes are needed each hour is crucial to keep a steady supply.

My goal is to build a model that's really accurate and can tell us what factors affect how many bikes people want. This will help bike rental companies make better decisions about how to run their service.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

# Data Visualization

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns


# Datetime library
from datetime import datetime
import datetime as dt

import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
data_path = '/content/drive/MyDrive/Self Projects/AlmaBetter Capstone Projects/Ml Regression/SeoulBikeData.csv'

In [None]:
# Load Dataset
bikeDf = pd.read_csv(data_path, encoding='latin')

### Dataset First View

In [None]:
# Top 5 datas
bikeDf.head()


In [None]:
# bottom 5 datas
bikeDf.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(bikeDf.shape)

In [None]:
# Printing the Number of row and columns
rows_count, columns_count = bikeDf.shape
print("Number of Rows:", rows_count)
print("Number of Columns:", columns_count)

In [None]:
# Getting all the Columns
bikeDf.columns

In [None]:
# Displaying All the lists
column_names = bikeDf.columns.tolist()
for column in column_names:
    print(column)

### Dataset Information

In [None]:
# Dataset Info
bikeDf.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicates = bikeDf.duplicated()
duplicate_count = duplicates.sum()
unique_duplicates = len(bikeDf[duplicates])

print(f"Number of duplicate rows: {duplicate_count}, unique duplicates: {unique_duplicates}")


#### Check Unique Values for Each Variable

In [None]:
# Check Unique Values for Each variables
for column in bikeDf.columns.tolist():
    unique_values_count = bikeDf[column].nunique()
    print(f"No of Unique values in {column} is {unique_values_count}")


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values_count = bikeDf.isnull().sum()
print(missing_values_count)

In [None]:
# Total Missing Values
total_missing_values = bikeDf.isnull().sum().sum()
print("Total missing values:", total_missing_values)


In [None]:
# Visualizing the missing values


# Create a heatmap of missing values
plt.figure(figsize=(10, 6))
sns.heatmap(bikeDf.isnull(), cmap='viridis', cbar=False)
plt.title('Missing Values Heatmap')
plt.show()


* **As we can see above there are no missing value**

### What did you know about your dataset?



*   The dataset consists of 8,760 rows and 14 columns

* The dataset contains information for each of the 8,760 hours in a year

* There are no null values.

* The dataset contains only unique values, meaning there are no duplicates. This ensures the data is unbiased and avoids potential issues in later analysis, like skewing results or complicating data summaries




## ***2. Understanding Your Variables***

In [None]:
# Print the list of features in the DataFrame
features_list = bikeDf.columns.tolist()
print(f"Features: {features_list}")

In [None]:
# Dataset Describe
bikeDf.describe()

### Variables Description

**Date** :  The date of the day, during 365 days from 01/12/2017 to 30/11/2018

**Rented Bike Count** : Number of rented bikes per hour which our dependent variable

**Hour**:  The hour of the day

**Temperature(°C)**: Temperature in Celsius

**Humidity(%)** :  Humidity in the air

**Wind Speed(m/s)** : Speed of the wind in m/s

**Visibility(10m)** : Visibility

**Dew point temperature(°C)** : Temperature at the beggining of the day

**Solar Radiation (MJ/m2)**: It is
 measurement electromagnetic energy emitted by the Sun, including visible light, ultraviolet light, and infrared radiation

**Rainfall(mm)**: Amount of raining in mm

**Snowfall (cm)**: Amount of snowing in cm

**Seasons** : Season of the Year

**Holiday**: If the day is Holiday or Not

**Funcationing Day** : if the day is a Functioning Day or Not

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for column in bikeDf.columns.tolist():
    unique_values_count = bikeDf[column].nunique()
    print(f"No of Unique values in {column} is {unique_values_count}")


## 3. ***Data Wrangling***

### Data Wrangling Code

#### Changing column name

In [None]:
#Rename the complex columns name
bikeDf = bikeDf.rename(columns={'Rented Bike Count':'Rented_Bike_Count',
                                'Temperature(°C)':'Temperature',
                                'Humidity(%)':'Humidity',
                                'Wind speed (m/s)':'Wind_speed',
                                'Visibility (10m)':'Visibility',
                                'Dew point temperature(°C)':'Dew_point_temperature',
                                'Solar Radiation (MJ/m2)':'Solar_Radiation',
                                'Rainfall(mm)':'Rainfall',
                                'Snowfall (cm)':'Snowfall',
                                'Functioning Day':'Functioning_Day'

})

In [None]:
# new feature names
bikeDf.columns.tolist()

In [None]:
bikeDf.head()

**Split the date columns into 3 column year,month day**

In [None]:
#Date Columns into 3 columns Year, Month,day (Converted Object into DateTime Format)

bikeDf['Date'] = pd.to_datetime(bikeDf['Date'], format="%d/%m/%Y")

In [None]:
bikeDf.head()

In [None]:
bikeDf.info()

In [None]:
bikeDf['Year'] = bikeDf['Date'].dt.year
bikeDf['Month'] = bikeDf['Date'].dt.month
bikeDf['Day'] =  bikeDf['Date'].dt.day_name()

In [None]:
bikeDf.tail()

In [None]:
#creating a new column of "weekdays_weekend" and drop the column "Date","day","year"
bikeDf['weekdays_weekend']=bikeDf['Day'].apply(lambda x : 1 if x=='Saturday' or x=='Sunday' else 0 )
bikeDf=bikeDf.drop(columns=['Date','Day','Year'],axis=1)

In [None]:
bikeDf.head()

In [None]:
# Count the weekend and weekdays values
bikeDf['weekdays_weekend'].value_counts()

**0 weekdays**

**1 weekends**

#### Changing data type

* **As "Hour","month","weekdays_weekend" column are show as a integer data type but actually it is a category data tyepe**

In [None]:
cols=['Hour','Month','weekdays_weekend']
for col in cols:
  bikeDf[col]=bikeDf[col].astype('category')

In [None]:
bikeDf

In [None]:
# Lets check the data type
bikeDf.info()

In [None]:
# List final Columns
bikeDf.columns

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

### **Analysation of categorical variables**

* **Our dependent variable is "Rented Bike Count," so we need to analyze this column along with the other columns using visualization plots. First, we'll analyze the columns with categorical data types, and then we'll proceed with those containing numerical data types.**

* **Month**

In [None]:
fig, ax = plt.subplots(figsize=(12, 7))
sns.barplot(data=bikeDf, x='Month', y='Rented_Bike_Count', ax=ax, capsize=0.2, palette='bright')
ax.set(title='Count of Rented bikes according to Month')
plt.show()

**Observations** : - The bar plot shows that bike rentals are highest during the summer months from May to October.

* **Weekdays_weekend**

In [None]:
fig, ax = plt.subplots(figsize=(12, 7))
sns.barplot(data=bikeDf, x='weekdays_weekend', y='Rented_Bike_Count', ax=ax, capsize=0.2, palette='bright')
ax.set(title='Count of Rented bikes acording to weekdays_weekenday')
plt.show()

**Observations** :- In the above bar plot, the blue bars indicate higher bike rentals on weekdays, with only a slight difference observed on weekends.

In [None]:
fig, ax = plt.subplots(figsize=(12, 7))
sns.pointplot(data=bikeDf, x='Hour', y='Rented_Bike_Count', hue='weekdays_weekend',ax=ax)
ax.set(title='Count of Rented bikes acording to weekdays_weekenday')
plt.show()

**Observations** : Based on the point plot and bar plot above, we observe that the blue bars representing weekdays show higher bike demand, likely due to commuting to and from office. Peak times are observed between 7 am to 9 am and 5 pm to 7 pm. On the other hand, the orange bars representing weekends indicate lower bike demand, especially in the morning hours. However, demand slightly increases in the evening from 4 pm to 8 pm.

* **Hour**

In [None]:
fig,ax=plt.subplots(figsize=(12,7))
sns.barplot(data=bikeDf,x='Hour',y='Rented_Bike_Count',ax=ax,capsize=.2, palette='pastel')
ax.set(title='Count of Rented bikes acording to Hour ')

**Observation**
*   The plot above displays the hourly usage of rented bikes aggregated across all months of the year.


* Typically, people use rented bikes during their working hours, primarily from 7 am to 9 am and 5 pm to 7 pm.

**Functioning Day**

In [None]:
fig,ax=plt.subplots(figsize=(8,6))
sns.barplot(data=bikeDf,x='Functioning_Day',y='Rented_Bike_Count',ax=ax,capsize=.2)
ax.set(title='Count of Rented bikes acording to Functioning Day ')

In [None]:
fig,ax=plt.subplots(figsize=(12,7))
sns.pointplot(data=bikeDf,x='Hour',y='Rented_Bike_Count',hue='Functioning_Day',ax=ax)
ax.set(title='Count of Rented bikes acording to Functioning Day ')

**Observations**

* The above bar plot and point plot illustrate the usage of rented bikes on working days versus non-working days.

* Peoples don't use reneted bikes in no functioning day.

**Seasons**

In [None]:
fig,ax=plt.subplots(figsize=(12,6))
sns.barplot(data=bikeDf,x='Seasons',y='Rented_Bike_Count',ax=ax,capsize=.2,palette='dark')
ax.set(title='Count of Rented bikes acording to Seasons ')

In [None]:
fig,ax=plt.subplots(figsize=(12,6))
sns.pointplot(data=bikeDf,x='Hour',y='Rented_Bike_Count',hue='Seasons',ax=ax)
ax.set(title='Count of Rented bikes acording to seasons ')

* In the above bar plot and point plot
which shows, the use of rented bike in four different seasons, and it clearly shows that,
* In summer season the use of rented bike is high and peak time is 7am-9am and 5pm-7pm.
* In winter season the use of rented bike is very low maybe because of snowfall, fog, cold etc.

**Holiday**

In [None]:
fig,ax=plt.subplots(figsize=(8,6))
sns.barplot(data=bikeDf,x='Holiday',y='Rented_Bike_Count',ax=ax,capsize=.2,palette='pastel')
ax.set(title='Count of Rented bikes acording to Holiday ')

In [None]:
fig,ax=plt.subplots(figsize=(12,6))
sns.pointplot(data=bikeDf,x='Hour',y='Rented_Bike_Count',hue='Holiday',ax=ax)
ax.set(title='Count of Rented bikes acording to Holiday ')

* In the above bar plot and point plot,
the usage of rented bikes on holidays is depicted. It's clear from the plots that.

*  In holiday, people uses the rented bike from 2pm-8pm

### **Analysation of Numberical variables**

In [None]:
numericalFeatures= bikeDf.select_dtypes(exclude=['object','category'])
numericalFeatures

In [None]:
# Listing all the Numberical features
numericalFeatures.columns.tolist()

In [None]:
# Analysing the distribution of all numberical features

n=1
plt.figure(figsize=(15,10))
for i in numericalFeatures.columns:
  plt.subplot(3,3,n)
  n=n+1
  sns.distplot(bikeDf[i])
  plt.title(i)
  plt.tight_layout()


**Right skewed columns are**

Rented Bike Count (Its also our Dependent variable), Wind speed (m/s), Solar Radiation (MJ/m2), Rainfall(mm), Snowfall (cm),

**Left skewed columns are**


Visibility (10m), Dew point temperature(°C)

**Find out the relation of numerical featuers with our dependent variable**


1.**Numerical VS Rented Bike Count**

In [None]:
#Analyze the relationship between "Rented_Bike_Count" and "Temperature

bikeDf.groupby('Temperature').mean()['Rented_Bike_Count'].plot()

**Observation**

The plot shows that people prefer riding bikes when it's around 25°C, suggesting they enjoy warmer weather.

In [None]:
#Analyze the relationship between "Rented_Bike_Count" and "Dew_point_temperature

bikeDf.groupby('Dew_point_temperature').mean()['Rented_Bike_Count'].plot()

**Observations**

From the above plot of "Dew_point_temperature', is almost same as the 'temperature' there is some similarity present we can check it in our next step

In [None]:
#Analyze the relationship between "Rented_Bike_Count" and "Solar_Radiation

bikeDf.groupby('Solar_Radiation').mean()['Rented_Bike_Count'].plot()

**Observations**


The plot indicates that when there's solar radiation, there's a high number of rented bikes, usually around 1000.

In [None]:
#Analyze the relationship between "Rented_Bike_Count" and "Snowfall

bikeDf.groupby('Snowfall').mean()['Rented_Bike_Count'].plot()

**Observations**


The plot shows that bike rentals are low on the y-axis. When there's more than 4 cm of snow, bike rentals decrease a lot.

In [None]:
#Analyze the relationship between "Rented_Bike_Count" and "Rainfall

bikeDf.groupby('Rainfall').mean()['Rented_Bike_Count'].plot()

**Observations**

From the plot above, we notice that even when it rains heavily, the demand for rented bikes doesn't decrease. For instance, when there's 20 mm of rain, there's a significant peak in rented bikes.

In [None]:
#Analyze the relationship between "Rented_Bike_Count" and "Wind_speed

bikeDf.groupby('Wind_speed').mean()['Rented_Bike_Count'].plot()

**Observations**

From the plot above, we observe that the demand for rented bikes remains consistent regardless of wind speed. However, when the wind speed reaches 7 m/s, there's an increase in bike demand, suggesting that people enjoy biking when it's a bit windy.

 **REGRESSION PLOT**

Seaborn's regression plots are used to highlight patterns in a dataset during data analysis. These plots create a line between two parameters, showing their linear relationship visually.

In [None]:
# Regression plot for all the numerical features

for col in numericalFeatures:
  fig,ax=plt.subplots(figsize=(8,4))
  sns.regplot(x=bikeDf[col],y=bikeDf['Rented_Bike_Count'],scatter_kws={"color": 'lightgreen'}, line_kws={"color": "black"})



**Observations**

* Looking at the regression plot above for all numerical features, we notice that the columns 'Temperature', 'Wind_speed', 'Visibility', 'Dew_point_temperature', and 'Solar_Radiation' are positively related to the target variable.


* which means the rented bike count increases with increase of these features.


* Rainfall','Snowfall','Humidity' these features are negatively related with the target variaable which means the rented bike count decreases when these features increase.



**Normalise Rented_Bike_Count column data**

Data normalization, also known as data pre-processing, is an important step in data mining. It involves converting the source data into a different format to make it easier to process. The main goal of data normalization is to reduce or remove duplicate data.

In [None]:
#Distribution plot of Rented Bike Count
plt.figure(figsize=(10,6))
plt.xlabel('Rented_Bike_Count')
plt.ylabel('Density')
ax=sns.distplot(bikeDf['Rented_Bike_Count'],hist=True ,color="y")
ax.axvline(bikeDf['Rented_Bike_Count'].mean(), color='magenta', linestyle='dashed', linewidth=2)
ax.axvline(bikeDf['Rented_Bike_Count'].median(), color='black', linestyle='dashed', linewidth=2)
plt.show()

**Observations**

The above graph shows that, Rented Bike Count has moderate right skewness. Since the assumption of linear regression is that 'the distribution of dependent variable has to be normal', so we should perform some operation to make it normaL

**Finding Outliers and treatment**

In [None]:
# Boxplot for Rented bike Count to check outliers
plt.figure(figsize=(10,6))

plt.ylabel('Rented_Bike_Count')
sns.boxplot(x=bikeDf['Rented_Bike_Count'])
plt.show()

In [None]:
# Treat outliers by capping values at specified thresholds
bikeDf['Rainfall'] = bikeDf['Rainfall'].clip(upper=4)
bikeDf['Solar_Radiation'] = bikeDf['Solar_Radiation'].clip(upper=2.5)
bikeDf['Snowfall'] = bikeDf['Snowfall'].clip(upper=2)
bikeDf['Wind_speed'] = bikeDf['Wind_speed'].clip(upper=4)


we have applied outlier treatment techniques to the dataset by replacing the outliers with the maximum values.

In [None]:
#Applying square root to Rented Bike Count to improve skewness
plt.figure(figsize=(8,6))
plt.xlabel('Rented Bike Count')
plt.ylabel('Density')

ax=sns.distplot(np.sqrt(bikeDf['Rented_Bike_Count']), color="y")
ax.axvline(np.sqrt(bikeDf['Rented_Bike_Count']).mean(), color='magenta', linestyle='dashed', linewidth=2)
ax.axvline(np.sqrt(bikeDf['Rented_Bike_Count']).median(), color='black', linestyle='dashed', linewidth=2)

plt.show()

Since we have generic rule of applying Square root for the skewed variable in order to make it normal .After applying Square root to the skewed Rented Bike Count, here we get almost normal distribution.

In [None]:

#After applying sqrt on Rented Bike Count check wheater we still have outliers
plt.figure(figsize=(10,6))

plt.ylabel('Rented_Bike_Count')
sns.boxplot(x=np.sqrt(bikeDf['Rented_Bike_Count']))
plt.show()

****

**Observations**

After applying Square root to the Rented Bike Count column, we find that there is no outliers present.

In [None]:
bikeDf.corr()

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***