<a href="https://colab.research.google.com/github/Nethra0503/Hotel_Booking_Data_Analysis_Project/blob/main/Zomato%5BRestaurant%20clustering%5D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Zomato Data Analysis



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Team Member 1 -**
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**
The Indian restaurant industry is growing rapidly, with customers increasingly opting for both dining out and food delivery services. With thousands of restaurants across various cities, finding the best dining options can be overwhelming for customers. Similarly, restaurants and food delivery platforms like Zomato need data-driven strategies to optimize their services, improve customer satisfaction, and enhance business performance.

This project aims to analyze Zomato’s restaurant data using  key approach:

Restaurant Clustering – Grouping restaurants into meaningful segments based on cuisine, pricing, ratings, and location.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**
With the rapid expansion of the restaurant industry in India, customers often struggle to find the best dining options that match their preferences and budget. Similarly, food delivery platforms like Zomato face challenges in categorizing restaurants effectively and optimizing pricing strategies for better customer satisfaction.

This project aims to address the following key problems:

Restaurant Clustering:

How can we group restaurants based on key attributes such as cuisine type, price range, customer ratings, and location to improve restaurant discovery and business insights.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
zom_review = pd.read_csv('/content/Zomato Restaurant reviews.csv')
zom_review

In [None]:
zom_res_names = pd.read_csv('/content/Zomato Restaurant names and Metadata.csv')
zom_res_names

### Dataset First View

In [None]:
# Dataset First Look
zom_review.head(5)

In [None]:
zom_res_names.head(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(list(zom_review.columns))

In [None]:
print(list(zom_res_names.columns))

### Dataset Information

In [None]:
# Dataset Info of restaurant
zom_res_names.info()

In [None]:
# datset info of reviews
zom_review.info()

#### Duplicate Values

In [None]:
#  Duplicate Value Count of review datset
zom_review.duplicated().sum()

In [None]:
zom_review.drop_duplicates()

In [None]:
# duplicates in restaurant dataset
zom_res_names.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
zom_review.isnull().sum()

In [None]:
import missingno as msno

In [None]:
# Visualizing the missing values
msno.bar(zom_review)
plt.show()

In [None]:
zom_review.dropna(subset=['Reviewer','Rating','Review','Metadata'],inplace=True)

In [None]:
zom_review.isnull().sum()

In [None]:
zom_res_names.shape

In [None]:
zom_res_names.isnull().sum()

In [None]:
# viewing what type of values are found in collections
zom_res_names['Collections'].value_counts()

In [None]:
zom_res_names.groupby('Cuisines')['Collections'].agg(lambda x:x.mode())

In [None]:
#By replacing null values with most common collections in each cuisines
zom_res_names["Collections"] = zom_res_names["Collections"].fillna(
    zom_res_names.groupby("Cuisines")["Collections"].transform(lambda x: x.mode()[0] if not x.mode().empty else "Unknown")
)


In [None]:
zom_res_names['Collections'].isnull().sum()

### What did you know about your dataset?

Zomato reviews dataset contains variables that accounts for reviewer name,reviews and rating given to particular restaurant by the reviewer. On the otherhand zomato restaurant dataset enables us to understand the restaurant's cuisines and collection of food offered by each restaurants.



## ***2. Understanding Your Variables***

In [None]:
list(zom_res_names.columns)

In [None]:
list(zom_review.columns)

In [None]:
# Dataset Describe
zom_res_names.describe()

In [None]:
zom_review.describe()

In [None]:
zom_review.dtypes

In [None]:
zom_res_names.dtypes

### Variables Description

# Review Dataset:
1.Reviewer - Name of the reviwer

2.Review - Review text

3.Rating - Rating provided

4.Metadata  - no of reviews and followers

5.Time   - time of review

6.Pictures - no of pictures posted with review




# Restaurant Datset
1.Name  -  name of the restaurant

2.Cost -   per person estimated cost of dinning

3.Collections -  tagging of restaurants with respect to zomato categories

4.Cuisines  -  cuisines offered by restaurants

5.Timings  -  restaurant timings



### Check Unique Values for each variable.

In [None]:
selected = zom_review[["Reviewer","Rating","Restaurant"]]
unique_count = {col : zom_review[col].nunique() for col in selected.columns}
print(unique_count)

In [None]:
selected_columns = zom_res_names[["Collections","Cuisines","Name"]]


In [None]:
unique_count = {col : zom_res_names[col].nunique() for col in selected_columns.columns}
print(unique_count)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
zom_review.rename(columns={"Restaurant": "Name"}, inplace=True)


In [None]:
zom_review.columns

In [None]:
# Write your code to make your dataset analysis ready.
#merging datsets
new_data = pd.merge(zom_res_names,zom_review,on="Name")

In [None]:
new_data.columns

In [None]:
new_data.shape

In [None]:
new_data.isnull().sum()

In [None]:
new_data.dropna(axis=0,inplace=True)

In [None]:
new_data.isnull().sum()

In [None]:
new_data.to_excel("Transformed_data.xlsx")

In [None]:
new_data.duplicated().sum()

### What all manipulations have you done and insights you found?

Duplicate values were removed from both datasets. The Zomato restaurant dataset contained 54 null values. Dropping these values would have created inconsistencies in the data. As a result, the values in the 'collections' column were replaced with the mode concerning each cuisine. The Zomato restaurant dataset had approximately 1% null values. Since dropping these values would not affect the analysis, they were removed

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
new_data =pd.read_excel('/content/Transformed_data.xlsx')

In [None]:
#view of new data
new_data.head()


In [None]:
new_data.dtypes

In [None]:
new_data['Cost']= new_data['Cost'].str.replace(',','').astype(int)

In [None]:
new_data['Cost'].dtype

In [None]:
mean_cost = new_data.groupby('Cuisines',as_index=False)['Cost'].mean()
mean_cost

In [None]:
mean_cost.sort_values(by='Cost',ascending=False)

#### Chart - 1

In [None]:
import plotly.express as px

fig = px.treemap(mean_cost, path=['Cuisines'], values='Cost', color='Cost',
                  color_continuous_scale='blues', title="Cost Distribution by Cuisine")
fig.show()


##### 1. Why did you pick the specific chart?

Treemap was performed to see the distribution of price across different cuisines.Since there are more categorical values visualization using treemap would give clear cut distribution of data.

##### 2. What is/are the insight(s) found from the chart?

The combination of Italian,chinese and north indian contributes to high cost of about 2800 whereas street foods and arabian have least cost.

##### 3. Will the gained insights help creating a positive business impact?
The analysis shows that Italian, Chinese, and North Indian cuisines contribute to high costs (~2800), while Street Food and Arabian have the lowest costs. This insight can help businesses optimize menu pricing, marketing, and cost control. Premium cuisines can be priced strategically to maximize revenue, while affordable options can drive high-volume sales. Restaurants can use this data for targeted promotions, location-based expansion, and better supply chain management. Overall, these insights enable data-driven decisions for improved profitability and customer satisfaction.









#### Chart - 2

In [None]:
# Chart - 2 visualization code
new_data['Collections'].value_counts()

In [None]:
#converting both collection and name to string type to perform aggregation
new_data['Name'] = new_data['Name'].astype(str)
new_data['Collections'] = new_data['Collections'].astype(str)


In [None]:
print(type(new_data['Collections']),type(new_data['Name']))

In [None]:
print(type(new_data))

In [None]:
new_data.columns = new_data.columns.str.strip()


In [None]:
summary = new_data.groupby(['Name', 'Collections']).size().reset_index(name='Count').sort_values(by='Count',ascending=False)
summary

In [None]:
summary['Count'].value_counts()

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Sort data
top_trending = summary.sort_values(by="Count", ascending=False).head(10)  # Top 10 trending

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(x="Count", y="Name", data=top_trending, palette="Blues_r")
plt.xlabel("Trending Count")
plt.ylabel("Restaurant Name")
plt.title("Top Trending Restaurants")
plt.show()


##### 1. Why did you pick the specific chart?
In order to visualise which restaurant was most trending one.

##### 2. What is/are the insight(s) found from the chart?

All the restaurants has equal value counts so all these retaurants which are visualized is the chart were on trend for the particular week.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
new_data['Name'] = new_data['Name'].astype(str).str.strip()  # Ensure it's string & remove extra spaces
restaurant_counts = new_data['Name'].value_counts().reset_index()
restaurant_counts.columns = ['Restaurant', 'Frequency']
restaurant_counts


In [None]:


# Select top 15 most frequent restaurants
top_15 = restaurant_counts.head(15)

# Plot bar chart
plt.figure(figsize=(12, 6))
sns.barplot(x=top_15['Frequency'], y=top_15['Restaurant'], palette="viridis")

plt.xlabel("Number of Occurrences")
plt.ylabel("Restaurant Name")
plt.title("Top 15 Most Frequent Restaurants")
plt.show()


In [None]:
top_15

##### 1. Why did you pick the specific chart?

In order to analyze the top 15 restaurants in the city.

##### 2. What is/are the insight(s) found from the chart?

The bar chart displays the top 15 most frequent restaurants based on their occurrences in the dataset.
Beyond Flavours appears the most, indicating it might be a popular or frequently listed restaurant.
The occurrences are fairly uniform, suggesting these restaurants are consistently present in the dataset.
Various cuisines and dining types (fine dining, casual, hotels) are represented, highlighting diverse consumer preferences.
These insights can help in targeting high-traffic restaurants for marketing, collaborations, or trend analysis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Restaurants with high occurrences can be prioritized for promotions, partnerships, or premium listings.
Understanding consumer preferences helps in tailoring menus, pricing, and marketing strategies.
Businesses can use this data to expand high-demand cuisines or improve underperforming locations.
Overall, leveraging these insights can enhance customer engagement, increase sales, and boost market presence.

In [None]:
new_data.columns

#### Chart - 14 - Correlation Heatmap

Answer Here

## ***5. Hypothesis Testing***

### Hypothetical Statement - 1

In [None]:
# Remove rows where "Rating" or "Cost" contains "Like"
new = new_data[~new_data[['Rating', 'Cost']].astype(str).apply(lambda x: x.str.contains("Like")).any(axis=1)]

# Convert "Rating" and "Cost" columns to float
new[['Rating', 'Cost']] = new_data[['Rating', 'Cost']].astype(float)


#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.
Null hypothesis: There is no relationship between the cost of restaurant and the rating it receives. (H0: 𝛽1 = 0)

Alternative hypothesis: There is a positive relationship between the cost of a restaurant and the rating it receives. (H1: 𝛽1 > 0)

Test : Simple Linear Regression Analysis

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value
import statsmodels.formula.api as smf


# fit the linear model
model = smf.ols(formula='Rating ~ Cost', data= new).fit()

# Check p-value of coefficient
p_value = model.pvalues[1]
if p_value < 0.05:
    print("Reject Null Hypothesis - There is no relationship between the cost of\
 restaurant and the rating it receives.")
else:
    print("Fail to reject Null Hypothesis - There is a positive relationship \
 between the cost of a restaurant and the rating it receives.")

##### Which statistical test have you done to obtain P-Value?

I have used Linear regression test for checking the relationship between the cost of a restaurant and its rating.

##### Why did you choose the specific statistical test?

I chose this test because it is a common and straightforward method for testing the relationship between two continuous variables. This would involve fitting a linear model with the rating as the dependent variable and the cost as the independent variable. The p-value of the coefficient for the cost variable can then be used to determine if there is a statistically significant relationship between the two variable.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation
new_data.isnull().sum()

In [None]:
new_data['Timings'].shape

# Feature Engineering

In [None]:
data = new_data.copy()

**Using labeling encoding to transform categorical variable to numerical type**

In [None]:
# labeling the columns cuisines and collections by using labelencoder
from sklearn.preprocessing import LabelEncoder

label_enc = LabelEncoder()
data['Cuisines'] = label_enc.fit_transform(data['Cuisines'])
data['Collections'] = label_enc.fit_transform(data['Collections'])


In [None]:
data['Reviewer'] = label_enc.fit_transform(data['Reviewer'])
data['Rating'] = label_enc.fit_transform(data['Rating'])


In [None]:
data['Cuisines']

**Normalization of variables cost,rating,reviewer**

In [None]:
data.dtypes

In [None]:
data['Cost'] = data['Cost'].str.replace(',', '').astype(float)


In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
data[['Cost', 'Rating', 'Reviewer']] = scaler.fit_transform(data[['Cost', 'Rating', 'Reviewer']])


**Reason for opting standardscalar:**

Here I have used standarscalar because cost,rating and reviewer are three different variables which has different measures,so using standardscalar scales there values according to mean and standardeviation where all the three variables are standardised.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Try different values of k
inertia = []
K = range(1, 11)

for k in K:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(data[['Cuisines', 'Rating', 'Cost']])
    inertia.append(kmeans.inertia_)

# Plot Elbow Curve
plt.figure(figsize=(8, 5))
plt.plot(K, inertia, 'bo-')
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.title("Elbow Method for Optimal K")
plt.show()


In [None]:
from sklearn.cluster import KMeans
from yellowbrick.cluster import SilhouetteVisualizer
from sklearn.metrics import silhouette_score

def silhouette_analysis(n):
  for n_clusters in range(2,n):
    km = KMeans(n_clusters=n_clusters)
    preds = km.fit_predict(data[['Cuisines', 'Rating', 'Cost']])
    centers = km.cluster_centers_

    score = silhouette_score(data[['Cuisines', 'Rating', 'Cost']], preds, metric='euclidean')
    print('For n_clusters = {}, silhouette score is {}'.format(n_clusters, score))

    visualizer = SilhouetteVisualizer(km)

    visualizer.fit(data[['Cuisines', 'Rating', 'Cost']])
    visualizer.poof()

In [None]:
silhouette_analysis(8)

In [None]:
data_cluster= data[['Cuisines', 'Rating', 'Cost']]

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
optimal_k = 4  # Replace this with the best value from the Elbow Method
kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
data_cluster['Cluster'] = kmeans.fit_predict(data_cluster)

# Display first few rows with clusters
print(data_cluster.head())


In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(data_cluster['Cost'], data_cluster['Rating'], c=data_cluster['Cluster'], cmap='viridis', alpha=0.6)
plt.xlabel('Cost')
plt.ylabel('Rating')
plt.title('KMeans Clustering of Restaurants')
plt.colorbar(label='Cluster')
plt.show()


In [None]:
cluster_summary = data_cluster.groupby('Cluster').agg({'Cost': 'mean', 'Rating': 'mean'}).reset_index()
cluster_summary

In [None]:
cluster_labels = {
    0: "Moderate Cost, low Rating",
    1: "High Cost, moderate Rating",
    2: "Low Cost, high Rating",
    3: "Low Cost, low Rating"
}
data_cluster['Cluster_Label'] = data_cluster['Cluster'].map(cluster_labels)

# Create scatter plot
plt.figure(figsize=(10, 6))
scatter = sns.scatterplot(
    x=data_cluster['Cost'],
    y=data_cluster['Rating'],
    hue=data_cluster['Cluster_Label'],
    palette='viridis',
    alpha=0.7
)

# Label clusters at their centroid positions
for cluster, row in cluster_summary.iterrows():
    plt.text(
        row['Cost'], row['Rating'],
        cluster_labels[cluster], fontsize=12,
        bbox=dict(facecolor='white', alpha=0.7),
        ha='center'
    )

plt.title("KMeans Clustering of Restaurants")
plt.xlabel("Cost")
plt.ylabel("Rating")
plt.legend(title="Cluster Categories")
plt.show()

#### Business Insights:

Business Insights from Restaurant Clustering Analysis
Identifying Budget-Friendly Popular Restaurants

**cluster-2** low-cost restaurants with high ratings, indicating customer preference for affordable yet quality dining.
Business Strategy: Promote these restaurants through marketing campaigns, emphasizing their affordability and positive reviews.
Premium Dining with Moderate Satisfaction

**Cluster- 1** consists of high-cost restaurants with moderate ratings, suggesting a need for service or quality improvement.
Business Strategy: Gather customer feedback to enhance the dining experience and justify premium pricing.
Low-Rated, Low-Cost Restaurants – Need for Improvement

**Cluster-3** includes low-cost restaurants with low ratings, which could indicate poor service, hygiene, or food quality.
Business Strategy: Conduct quality checks, improve menu offerings, and provide incentives for better customer engagement.
Moderate-Cost, Low-Rated Restaurants – Competitive Risk

**Cluster-0** represents moderate-cost restaurants with low ratings, possibly struggling to differentiate themselves.
Business Strategy: Offer discounts, loyalty programs, or collaborations to enhance their competitive positioning.
Strategic Recommendations for Growth

Enhance partnerships with top-rated budget-friendly restaurants to drive more traffic.
Work with higher-cost restaurants to improve customer satisfaction and justify pricing.
Monitor low-rated clusters for necessary business interventions such as menu revamps, better service training, or improved marketing efforts.

### ML Model - 2

# Hierarchical Clustering

In [None]:
data_cluster.head()

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster

In [None]:
# Perform hierarchical clustering using Ward's method
Z = linkage(data_cluster, method='ward')  # Ward minimizes variance within clusters
Z

In [None]:
plt.figure(figsize=(10, 5))
dendrogram(Z, leaf_rotation=90, leaf_font_size=10)
plt.title("Dendrogram for Hierarchical Clustering")
plt.xlabel("Data Points")
plt.ylabel("Euclidean Distance")
plt.show()


In [None]:
from sklearn.cluster import AgglomerativeClustering

range_n_clusters = [2,3,4,5,6,7,8,9,10,11]
for n_clusters in range_n_clusters:
    hc = AgglomerativeClustering(n_clusters = n_clusters, linkage = 'ward')
    y_hc = hc.fit_predict(data_cluster)
    score = silhouette_score(data_cluster, y_hc)
    print("For n_clusters = {}, silhouette score is {}".format(n_clusters, score))


In [None]:
from sklearn.cluster import AgglomerativeClustering
import seaborn as sns

# Perform Hierarchical Clustering
hc = AgglomerativeClustering(n_clusters=4, linkage='ward')
data_cluster['Cluster'] = hc.fit_predict(data_cluster)

# Plot clusters
plt.figure(figsize=(8, 5))
sns.scatterplot(data=data_cluster, x='Cost', y='Rating', hue='Cluster', palette='viridis', s=50)
plt.title("Hierarchical Clustering of Restaurants")
plt.xlabel("Cost")
plt.ylabel("Rating")
plt.legend(title="Cluster")
plt.show()


#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

**Summary of Findings**

Using K-Means and Hierarchical Clustering, we categorized restaurants based on key features such as Cuisines, Cost, and Rating. The clusters revealed distinct groups of restaurants that share common characteristics. After analyzing the clusters, we identified four key restaurant categories:

1️⃣ Premium Fine Dining (High Cost, High Rating) – High-end restaurants that maintain a strong reputation with excellent reviews.

2️⃣ Affordable Popular Restaurants (Low Cost, High Rating) – Budget-friendly eateries that provide high value for money.


3️⃣ Mid-Range Casual Dining (Moderate Cost, Moderate Rating) – Restaurants that attract a broad audience but have mixed reviews.

4️⃣ Overpriced or Underperforming (High Cost, Low Rating) – Restaurants that charge premium prices but do not meet customer expectations.

Business Use Cases

📌 1. Customer Segmentation & Targeted Marketing

Use clustering insights to identify key customer segments and create personalized promotions.
Offer discounts for underperforming high-cost restaurants to improve customer engagement.

📌 2. Pricing & Revenue Optimization

Adjust pricing strategies based on the cost and rating of competitors within the same cluster.
Identify restaurants with high cost but low ratings and suggest improvements in service or menu.
📌 3. Expansion & Location Strategy

Determine gaps in the market by analyzing clusters where certain types of restaurants are missing.
Expand popular, budget-friendly restaurants in areas where demand is high.

📌 4. Operational Improvements

Restaurants with low ratings but high prices can focus on enhancing food quality, service, or ambiance.
High-rated budget restaurants can scale operations while maintaining cost efficiency.

📌 5. Customer Experience Enhancement

Develop a recommendation system that suggests restaurants to customers based on their preferences.
Improve restaurant offerings based on the feedback and behavior of different clusters.
Final Thoughts
This clustering analysis provides actionable business insights that help restaurant owners and food delivery platforms make data-driven decisions. Whether it's improving customer satisfaction, optimizing pricing, or expanding strategically, these insights ensure that restaurants stay competitive in the market.

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***