<a href="https://colab.research.google.com/github/Ash99-commits/Unsupervised_ML_Zomato-Restaurant_Clustering_And_Sentiment_Analysis/blob/main/Unsupervised_ML_Zomato_Restaurant_Clustering_And_Sentiment_Analysis_Ashwani_Kumar_Patra.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Unsupervised ML - Zomato Restaurant Clustering & Sentiment Analysis



##### **Project Type**    - Unsupervised
##### **Contribution**    - Individual
##### **Name**   - Ashwani Kumar Patra


# **Project Summary -**

**Project Aim** : Restaurant Clustering & Review Sentiment Analysis for Zomato

**Key Objectives** :-

1.   Cluster Restaurants: Group restaurants into meaningful clusters based on their attributes.
2.   Analyze Customer Sentiments: Extract insights from user reviews to classify sentiments as positive or negative, revealing customer satisfaction trends and preferences.

**Business Context** :-

Zomato, a leading restaurant aggregator and food delivery platform in India, has witnessed massive growth in restaurant listings and customer engagement over the years.
With the surge in the number and variety of restaurants, deriving actionable insights becomes vital for enhancing user experience and maintaining competitive advantage.

This project aims to provide data-driven recommendations that help both customers and Zomato improve decision-making and strategic planning.

**Data Overview** :-

*   Restaurant Information: Dataset comprising restaurant details such as name, cuisine, ratings, and more.
*   Customer Reviews: Textual feedback shared by diners, containing reviews and ratings.

**Approach & Methodology** :-

Data Preparation :-
1. Performing thorough data cleaning: Handling missing values, standardizing formats, and normalizing data.
2. Performing complete EDA.
3. Structuring the data for analytical tasks.
4. Performing Hypothesis Testing

Restaurant Clustering :-
1. Applying unsupervised learning techniques to group similar restaurants:
    * K-Means Clustering
    * Hierarchical Clustering
    * DBSCAN (Density-Based Spatial Clustering)
2. Determining optimal number of clusters using evaluation metrics such as Silhouette Score and Elbow Method.

Sentiment Analysis :-
1. Using Natural Language Processing (NLP) methods to analyze customer reviews.
2. Classifying review sentiments as Positive or Negative using sentiment scoring techniques or pre-trained models.

**Deliverables & Insights** :-
1. Visual Outputs :
    * 16 Intuitive visualizations presenting restaurant clusters and sentiment distribution across various dimensions (e.g., cost distribution, cuisine type etc).

2. Customer-Focused Insights :
    * Recommendations to help customers discover restaurants matching their taste, price range, and quality expectations.

3. Strategic Insights for Zomato :
    * Identification of top-performing cuisines and areas needing service improvement.
    * Understanding of customer pain points and preferences to drive targeted marketing campaigns.
    * Analysis of reviewer influence to identify key critics or opinion leaders.

**Business Impact** :-
1. For Customers:
    * Simplified decision-making process when selecting restaurants.
    * Clear overview of positive and negative aspects of establishments.

2. For Zomato:
    * Actionable insights to enhance service delivery.
    * Better resource allocation based on customer feedback trends.
    * Improved customer segmentation for personalized promotions.

**Future Applications** :-

1. Personalized Recommendations: Leveraging clustering and sentiment data to build customized restaurant suggestions.
2. Market Trends Analysis: Monitoring shifts in customer preferences, popular cuisines, and emerging dining habits.
3. Service Quality Improvement: Pinpointing service-related bottlenecks from negative sentiment analysis.
4. Influencer Tracking: Identifying key reviewers and understand their impact on customer decisions.

# **GitHub Link -**

[Github Repository Link](https://github.com/Ash99-commits/Unsupervised_ML_Zomato-Restaurant_Clustering_And_Sentiment_Analysis)

# **Problem Statement**


**The rapid growth of restaurants in India has created a need for deeper insights into customer preferences and restaurant performance. While customers seek the best dining and delivery options in their locality, Zomato as a company must identify strengths and areas for improvement. This project aims to analyze customer reviews through sentiment analysis and cluster restaurants into meaningful segments, enabling better decision-making for both customers and the company. The insights will help customers discover the best options while guiding Zomato to enhance its services and competitiveness.**

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing Libraries

!pip install contractions

import contractions  # It Expands words like "can't" → "cannot" for cleaner text
import pandas as pd  # For Data handling and analysis (tables, CSVs, etc.)
import numpy as np  # For Numerical operations
import matplotlib.pyplot as plt  # For Static plots/graphs
import seaborn as sns  # For Prettier statistical plots
from wordcloud import WordCloud  # To Visualize most frequent words in reviews
from scipy import stats, sparse as sp  # stats = statistical tests, sparse = memory-efficient text matrices
import statsmodels.api as sm  # For Advanced statistical analysis (regression, tests)
from sklearn.preprocessing import OneHotEncoder  # To Convert categorical data (e.g., cuisines) into numeric format
from sklearn.feature_extraction.text import TfidfVectorizer  # To Convert reviews into numeric word importance
from sklearn.cluster import KMeans, DBSCAN  # Algorithms to cluster/group restaurants
from sklearn.metrics import silhouette_score, classification_report, accuracy_score, confusion_matrix  # To evaluate clustering & sentiment models
from sklearn.decomposition import PCA  # To reduce features for easier visualization (2D/3D)
import plotly.express as px  # For Quick interactive plots
import plotly.graph_objects as go  # For Detailed/custom interactive plots
import joblib  # To save and load trained ML models
from joblib import dump, load  # To save/load functions (same as above)
import nltk  # NLP toolkit for cleaning and processing reviews
from nltk.corpus import stopwords  # To remove common useless words like "is", "the"
from nltk.tokenize import word_tokenize  # To split reviews into words
from nltk.stem import WordNetLemmatizer  # To reduce words to their base form ("loved" → "love")
import string  # To handle punctuation removal
import re  # To clean text with regex (remove URLs, numbers, etc.)
from scipy.cluster.hierarchy import dendrogram, linkage  # For Hierarchical clustering visualization (tree of clusters)
from sklearn.model_selection import GridSearchCV, train_test_split  # To tune model parameters & split data for training/testing
from sklearn.linear_model import LogisticRegression  # Classification model for sentiment analysis
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier  # For Advanced tree-based classifiers for sentiment
import xgboost as xgb  # Fast and powerful gradient boosting for classification
from collections import Counter  # To count most common words, cuisines, etc.

# NLTK (Natural Language ToolKit) Downloads
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

### Dataset Loading

In [None]:
# Loading the Datasets via Github Repository

# Uploading 'Zomato Restaurant names and Metadata' Dataset
restaurants_raw_df = pd.read_csv('https://raw.githubusercontent.com/Ash99-commits/Unsupervised_ML_Zomato-Restaurant_Clustering_And_Sentiment_Analysis/main/Zomato%20Restaurant%20names%20and%20Metadata.csv')

# Uploading 'Zomato Restaurant reviews' Dataset
reviews_raw_df = pd.read_csv('https://raw.githubusercontent.com/Ash99-commits/Unsupervised_ML_Zomato-Restaurant_Clustering_And_Sentiment_Analysis/main/Zomato%20Restaurant%20reviews.csv')

### Dataset First View

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset First Look

restaurants_raw_df.head()

In [None]:
# 'Zomato Restaurant reviews' Dataset First Look

reviews_raw_df.head()

### Dataset Rows & Columns count

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset Rows & Columns count
restaurants_raw_df.shape

restaurants_raw_df Dataset: 105 rows and 6 columns.

In [None]:
# 'Zomato Restaurant reviews' Dataset Rows & Columns count
reviews_raw_df.shape

reviews_raw_df Dataset: 10,000 rows and 7 columns.

### Dataset Information

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset Info

restaurants_raw_df.info()

restaurants_raw_df Dataset :-
* All columns are of object type.
* 'Collections' column has significant number of missing values.

In [None]:
# 'Zomato Restaurant reviews' Dataset Info

reviews_raw_df.info()

reviews_raw_df Dataset :-
* Most columns are of object type, exceot for the 'Pictures' column, which is of integer type.
* Also, there are some notable amount of missing values present in the columns 'Reviewer', 'Review', 'Rating', 'Metadata' and 'Time'.

#### Duplicate Values

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset Duplicate Value Count

len(restaurants_raw_df[restaurants_raw_df.duplicated()])

restaurants_raw_df Dataset has no duplicate values in it.

In [None]:
# 'Zomato Restaurant reviews' Dataset Duplicate Value Count

len(reviews_raw_df[reviews_raw_df.duplicated()])

reviews_raw_df Dataset has 36 duplicate values in it which needs to be treated accordingly.

#### Missing Values/Null Values

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset Missing/Null Values Count

restaurants_raw_df.isna().sum()

We observe 54 missing/null values in 'Collections' column and 1 in 'Timings' column.

In [None]:
# 'Zomato Restaurant reviews' Dataset Missing/Null Values Count

reviews_raw_df.isna().sum()

There are 38 missing/null values each in the columns 'Reviewer', 'Rating', 'Metadata' and 'Time'. There are 45 missing/null values in 'Review' column.

In [None]:
# Visualizing the missing values

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
sns.heatmap(restaurants_raw_df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values in Restaurants Dataset')

plt.subplot(1, 2, 2)
sns.heatmap(reviews_raw_df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values in Reviews Dataset')

plt.tight_layout()
plt.show()

print("\nNull values in reviews_raw_df:")
print(reviews_raw_df.isnull().sum())

Observations :-
1. The restaurants_raw_df has missing values in the 'Collections' and 'Timings' columns.
2. The reviews_raw_df has missing values in the 'Reviewer', 'Review', 'Rating', 'Metadata', and 'Time' columns.
3.  In restaurants_raw_df's case, it appears the missing values are scattered randomly within the columns rather than appearing in large blocks or specific rows. This suggests the missingness might be random and not dependent on other variables in a systematic way that's easily visible in this plot.
4. In reviews_raw_df's case, we can see a clear horizontal line of missing values towards the bottom. This indicates that the missing values in the columns 'Reviewer', 'Review', 'Rating', 'Metadata', and 'Time' are concentrated in the same rows.This suggests that these rows might be entirely missing information for these specific columns, possibly due to issues with data collection or processing for those particular entries.

### What did you know about your dataset?

The restaurant dataset offers key details such as names, URLs, costs, cuisines, timings, and category tags. Notably, the ‘Collections’ field has significant missing entries, which may limit analyses that depend on this attribute.

The reviews dataset captures rich customer feedback, including reviewer information, ratings, review text, timestamps, and images. Its large volume makes it highly suitable for sentiment analysis and customer insights, though missing data in crucial fields like ‘Reviewer’ and ‘Review’ must be carefully addressed to ensure reliability.

## ***2. Understanding Your Variables***

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset Columns

restaurants_raw_df.columns.tolist()

In [None]:
# 'Zomato Restaurant reviews' Dataset Columns

reviews_raw_df.columns.tolist()

In [None]:
# 'Zomato Restaurant names and Metadata' Dataset Describe

restaurants_raw_df.describe(include='all')

**Restaurant Dataset :-**

1. Each restaurant has a unique name and link.
2. The 'Cost' variable shows a range of values, with 500 appearing most frequently.
3. 'Collections' and 'Cuisines' show a wide variety of categories and types.
4. Most common timing is "11 AM to 11 PM".

In [None]:
# 'Zomato Restaurant reviews' Dataset Describe

reviews_raw_df.describe(include='all')

**Reviews Dataset:**

1. Out of the 105 restaurants in Restaurants Dataset, only 100 of them have reviews in Reviews Dataset.
The name 'Beyond Flavours' shows up 100 times in the Reviews dataset.
2. The dataset features a large number of unique reviewers and reviews.
3. The most common rating is 5.
4. A significant portion of the reviews does not include pictures.

### Variables Description

**Zomato Restaurant names and Metadata** (columns listed below with their description) :-
- **Name** : Name of the restaurant
- **Links** : URL links of thr restaurants
- **Cost** : Per person estimated cost of dining
- **Collections** : Tagging of restaurants w.r.t Zomato categories
- **Cuisines** : Cuisines served by restaurants
- **Timings** : Restaurant Timings

**Zomato Restaurant Reviews** (columns listed below with their description) :-
- **Reviewer** : Name of the reviewer
- **Review** : Review text
- **Rating** : Rating provided
- **MetaData** : Reviewer metadata - No. of reviews and followers
- **Time** : Date and Time of Review
- **Pictures** : No. of pictures posted with review

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable in 'Zomato Restaurant names and Metadata' Dataset

restaurants_raw_df.nunique()

**Restaurant Dataset:-**

Name: 105 unique restaurant names.

Links: 105 unique links (each restaurant has a unique link).

Cost: 29 unique cost values.

Collections: 42 unique collections.

Cuisines: 92 unique cuisines.

Timings: 77 unique timings.

In [None]:
# Check Unique Values for each variable in 'Zomato Restaurant Reviews' Dataset

reviews_raw_df.nunique()

**Reviews Dataset:**

Restaurant: 100 unique restaurants reviewed.

Reviewer: 7,446 unique reviewers.

Review: 9,364 unique reviews.

Rating: 10 unique rating scores.

Metadata: 2,477 unique metadata entries.

Time: 9,782 unique timestamps.

Pictures: 36 unique counts of pictures included in reviews.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***