<a href="https://colab.research.google.com/github/Nagasai122/Covid_sentiment_analysis/blob/main/Netflix_movie_suggestion_UnsuvervisedML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Netflix Movies and TV shows Clustering

##### **Project Type**    - Unsupervised
##### **Contribution**    - Individual
##### **Member 1 -** Dasari Naga Sai

# **Project Summary -**

Netflix, the world’s largest on-demand internet streaming media and online DVD movie rental service provider.it Founded August 29, 1997, in Los Gatos, California by Marc and Reed. It has 69 million members in over 60 countries enjoying more than 100 million hours of TV shows and movies per day Netflix is the world’s leading internet entertainment service with enjoying TV series, documentaries, and feature films across a wide variety of genres and languages. I was curious to analyze the content released in Netflix platform which led me to create these simple, interactive, and exciting visualizations and find similar groups of people.

This dataset consists of tv shows and movies available on Netflix as of 2019. The dataset is collected from Fixable which is a third-party Netflix search engine. In 2018, they released an interesting report which shows that the number of TV shows on Netflix has nearly tripled since 2010. The streaming service’s number of movies has decreased by more than 2,000 titles since 2010, while its number of TV shows has nearly tripled. It will be interesting to explore what all other insights can be obtained from the same dataset. Integrating this dataset with other external datasets such as IMDB ratings, rotten tomatoes can also provide many interesting findings.

In this project, we worked on a text clustering problem where we had to classify/group the Netflix movie/shows into certain clusters such that the shows within a cluster are similar to each other and the shows in different clusters are dissimilar to each other.

The dataset contained about 7787 records, and 11 attributes.

In the initial phase, we have focused more on the data cleaning and analyzed data in various categories and then we did exploratory data analysis (EDA).

We Created cluster using following attributes like director, cast, country, genre, rating and description. These attributes were tokenized, preprocessed, and then vectorized using TFIDF vectorizer.

We used Principal Component Analysis (PCA) to handle the curse of dimensionality.

We built Two types of clusters using the K-Means Clustering and Agglomerative Heirachycal clustering algorithm and find out optimal number of clusters using diffrent technique such as elbow method, silhoutte score and dendogram etc.

A content based recommender system was built using the similarity matrix obtained after using cosine similarity. This recommender system will make 10 recommendations to the user based on the type of show they watched.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(style="darkgrid")
import plotly.express as px

from datetime import datetime

# Word Cloud library
from wordcloud import WordCloud, STOPWORDS

# library used for textual data prerocessing
import string,unicodedata
string.punctuation
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
from nltk.stem.snowball import SnowballStemmer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.decomposition import PCA

# library used for building recommandation system
from sklearn.metrics.pairwise import cosine_similarity

# library used for Clusters impelementation
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as shc


import warnings
warnings.filterwarnings('ignore')




### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('https://raw.githubusercontent.com/Nagasai122/Netflix_movie_recommedations/main/NETFLIX%20MOVIES%20AND%20TV%20SHOWS%20CLUSTERING.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

In [None]:
# missing value percentage
columnwise_percent_missing_values = round((df.isna().sum())/len(df)*100,2).sort_values(ascending=False)
columnwise_percent_missing_values

In [None]:
# Visualizing the missing values

plt.figure(figsize=(12,6))
plt.title("Percentage of Missing Values per Column",fontsize=15)
sns.heatmap(df.isnull(),yticklabels=False,cbar=False,cmap='Greens')
plt.xlabel('Column name')
plt.ylabel('% missing values')
plt.xticks(rotation=90)
plt.show()

### What did you know about your dataset?

**NaN values are present in the following columns:**

*   director
*   cast
*   country
*   date_added
*   rating

Since data set provided of small size, we can't drop null values. So we will impute NaN values with mode.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe(include='all').transpose()

### Variables Description 


*   **show_id**      : Unique ID for every Movie/Show
*   **type**         : Identifier - Movie/Show
*   **title**        : Title of the Movie/Show
*   **director**     : Director of the Movie/Show
*   **cast**         : Actors involved in the Movie/Show
*   **country**      : Country where the Movie/Show was produced
*   **date_added**   : Date it was added on Netflix
*   **release_year** : Actual Release year of the Movie/Show
*   **rating**       : TV Rating of the Movie/Show
*   **duration**     : Total Duration - in minutes or number of seasons
*   **listed_in**    : Genre
*   **description**  : The Summary description





### Check Unique Values for each variable

In [None]:
# Check Unique Values for each variable.
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique())

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['date_added']= pd.to_datetime(df['date_added'])
df.head()

In [None]:
df["added_year"] = df["date_added"].dt.year
df["added_month_num"] = df["date_added"].dt.month
df["added_day_num"] = df["date_added"].dt.day
df["added_day"] = df["date_added"].dt.day_name()
df["added_month"] = df["date_added"].dt.month_name()
df

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
print(df.type.value_counts())
print('-'*50)
sns.countplot(x='type',data=df)
plt.title('Count vs Type of show')

In [None]:
yearwise_addition = df.groupby(['added_year','type'])['type'].count().unstack().rename(columns={'Movie':'movie','TV Show':'tv_show'})
yearwise_addition.columns

In [None]:

plt.figure(figsize=(16,8))

# ax= df['added_year'].unique()
# ay = df['added_year'].value_counts()
plt.bar(yearwise_addition.index, yearwise_addition.movie, color='Olive')
plt.bar(yearwise_addition.index, yearwise_addition.tv_show,bottom=yearwise_addition.movie,  color= 'Teal')
plt.ylabel('No of content added')
plt.xlabel("Added Year")
plt.title("Shows Added Year wise",fontsize= 15)
plt.legend(['Movies','Tv Shows'])
plt.show()

In [None]:
Monthwise_addition = df['added_month'].value_counts()
Monthwise_addition

In [None]:
Monthwise_addition.plot(kind='bar',color='skyblue')
#plt.bar(Monthwise_addition.index,Monthwise_addition.values(),color='Cadetblue')

In [None]:
(df['added_day'].value_counts()).plot(kind='barh')

In [None]:
# count the number of occurences for each genre in the data set

counts = dict()
for i in df.index:
   for g in df.loc[i,'listed_in'].split(','):
      if g.strip() not in counts:
         counts[g.strip()] = 1
      else:
         counts[g.strip()] = counts[g.strip()] + 1

In [None]:
s= pd.DataFrame.from_dict(counts,orient='index',columns=['count'])
s =s['count'].sort_values(ascending=False)
top20_genres = s[:
                 20]
print(top20_genres)

print('-'*150)


# create a bar chart
plt.figure(figsize=(20,8))
top20_genres.plot(kind='bar',color='teal')
plt.xticks(rotation=90)
plt.title('Top 20 genre Categories', fontsize=15)
plt.xlabel('Genres')
plt.ylabel('Counts')

In [None]:
df['cast']= df['cast'].fillna('Unknown')

In [None]:
# count the number of occurences for each director in the data set

cast_counts = dict()
for i in df.index:
   for g in df.loc[i,'cast'].split(','):
      if g.strip() not in cast_counts:
         cast_counts[g.strip()] = 1
      else:
         cast_counts[g.strip()] = cast_counts[g.strip()] + 1

In [None]:
cast= pd.DataFrame.from_dict(cast_counts,orient='index',columns=['count'])
cast =cast['count'].sort_values(ascending=False)
top30_actress = cast[1:31]
print(top30_actress)

print('-'*150)


# create a bar chart
plt.figure(figsize=(20,8))
top30_actress.plot(kind='bar',color='olivedrab')
plt.xticks(rotation=90)
plt.title('Top 30 Actor/Actress with most films', fontsize=15)
plt.xlabel('Cast')
plt.ylabel('Counts')

In [None]:
## Checking ou
df[df['cast'].str.contains('Anupam Kher')].shape

In [None]:
df.head()

In [None]:
# Dividing the dataset in to two catogeries : Movies and TV Show
movie_df = df[df['type'] == 'Movie']
tv_df = df[df['type'] == 'TV Show']

In [None]:
# MOVIE RATINGS 

plt.figure(figsize=(16,8))
sns.countplot(x="rating", data= movie_df, palette="Set2", order=movie_df['rating'].value_counts().index[0:15])

* The largest count of movies are made with the 'TV-MA' rating."TV-MA" is a rating assigned by the TV Parental Guidelines to a television program that was designed for mature audiences only.
* Second largest is the 'TV-14' stands for content that may be inappropriate for children younger than 14 years of age.
* Third largest is the very popular 'R' rating.An R-rated film is a film that has been assessed as having material which may be unsuitable for children under the age of 17.

In [None]:
# TV SHOWS RATINGS
plt.figure(figsize=(12,10))
sns.countplot(x="rating", data=tv_df, palette="Accent", order=tv_df['rating'].value_counts().index[0:15])

* Most of the TV Shows has 'TV-MA', for which the content is for matured audience only.
* Second highest count of ratings is 'TV-14' ratings which stands for the content can be inappropriate for children under 14 years of age.
* TV Shows has least amount of counts with 'TV-Y7-FV' ratings.

In [None]:
# Release year wise Tv Shows Vs Movies released
plt.figure(figsize=(12,10))
sns.countplot(y="release_year", data= df, palette="magma",hue=df['type'], order= df['release_year'].value_counts().index[0:15])

* Most of the TV shows & movies were released in the year 2018.
* 2017 is the year with most number of movies released.
* 2020 is the year with most number of TV shows released (due to covid,demand for OTT content increase rapidly).

In [None]:
# Movies Duration
movie_df['duration']=movie_df['duration'].str.replace(' min','')
movie_df['duration']=movie_df['duration'].astype(str).astype(int)
movie_df['duration']

In [None]:
#plotting the distribution of movies duration
plt.figure(figsize=(16,8))
sns.kdeplot(data=movie_df['duration'], shade=True, color='Green')
plt.title("Movie Duration distribution",fontsize=15)

* So, a good amount of movies on Netflix are among the duration of 75-130 mins.

In [None]:
# year wise movie duration trend analysis
plt.figure(figsize=(15,6))
sns.lineplot(x=duration_year.index, y=duration_year.duration.values, color= 'slateblue',linewidth=3)
plt.box(on=None)
plt.ylabel('Movie duration in minutes');
plt.xlabel('Year of released');
plt.title("Trends of Movie's Duration over the Years", fontsize=15);

In [None]:
duration_year = movie_df.groupby(['release_year']).mean()
duration_year = duration_year.sort_index()
duration_year

In [None]:
yearwise_trend = df.groupby(['added_year','type'])['type'].count().unstack()
yearwise_trend

In [None]:
# Plotting the above results

plt.figure(figsize=(18,8))
plt.title('Average release of different types of content yearwise', fontsize=15)
sns.lineplot(data= yearwise_trend)
plt.ylabel('Number of releases')
plt.xlabel('year')
plt.show()

In [None]:
# TV show - Seasons 
# removing textual data from duration column and replacing with nothing
tv_df['duration']=tv_df['duration'].str.replace(' Season','')
tv_df['duration']=tv_df['duration'].str.replace('s','')
tv_df['duration']=tv_df['duration'].astype(str).astype(int) # for converting string season numbers to integer
tv_df['duration']

In [None]:
#Extract the columns from tv_df
columns=['title','duration']
tv_shows = tv_df[columns]

In [None]:
#sort the dataframe by number of seasons
tv_shows = tv_shows.sort_values(by='duration',ascending=False)
#tv_shows = tv_shows.drop('index')
top20 = tv_shows[0:20]
top20

In [None]:
# Plotting the above results 

plt.figure(figsize=(16,6))
#top20.plot(kind='bar',x='title',y='duration', color='purple')
plt.bar(top20.title, top20.duration, color='#003f5c',width=0.4)
plt.xlabel('TV show')
plt.xticks(rotation=90)
plt.ylabel('No of Seasons')
plt.title('Top 20 - TV shows with highest seasons')

In [None]:
# rechecking the results with filters 
df[df['title']== 'Grey\'s Anatomy']

* "Grey's Anatomy" is the TV show with highest number of seasons.
* "NCIS" and "Supernatural" are the TV shows with second highest seasons (15 seasons each)

In [None]:
tv_df.duration.value_counts()[:5]

In [None]:
# TV SHOWS AND THEIR SEASONS
plt.figure(figsize=(16, 8))
labels=['1 Season', '2 Season', '3 Season', '4 Seasons', '5 Seasons']
_, _, texts = plt.pie(df.duration.value_counts()[:5], labels=labels, autopct='%1.2f%%', startangle=0, 
                      explode=(0.0, 0.05, 0.1, 0.15, 0.2), colors=['#003f5c', '#bc5090', '#ffa600', 'Green', 'Maroon'])
plt.axis('equal')
plt.title('Seasons Available on Netflix', fontsize=15, fontweight='bold');
for text in texts:
    text.set_color('white')

* From the chart we can analyze, 65.87% TV Shows has only 1 Season,15.65% TV Shows has 2 seasons, 7.54% TV Shows has 3 seasons, 5.57% TV Shows has 4 seasons, 5.37% TV Shows as 5 seaons available.

In [None]:
plt.figure(figsize=(15,8))
sns.set(style="darkgrid")
sns.countplot(x="country", data=movie_df, palette="Accent", order=movie_df['country'].value_counts().index[0:15])
plt.xticks(rotation=90)

In [None]:
plt.figure(figsize=(18,8))
sns.countplot(x="country", data=tv_df, palette="Accent", order=tv_df['country'].value_counts().index[0:15])
plt.xticks(rotation=90)

In [None]:
plt.figure(figsize=(18,10))
sns.barplot(x= df.listed_in.value_counts()[:10].sort_values().index, y=df.listed_in.value_counts()[:10].sort_values().values,palette='GnBu');
plt.title('Most Popular Genre', color='Blue', fontsize=20)
plt.yticks(df.listed_in.value_counts()[:10].sort_values().values);
plt.xlabel('GENRES');
plt.xticks(rotation=90)
plt.ylabel('Number of contents');

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact? 
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot 

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

df['country'] = df['country'].fillna(df['country'].mode()[0])
df['date_added'] = df['date_added'].fillna(df['date_added'].mode()[0])
df['rating'] = df['rating'].fillna(df['country'].mode()[0])

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing 
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why? 

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***