<a href="https://colab.research.google.com/github/Nikkkhhill97/Tata_Steel_Machine_Failure_Prediction/blob/main/Tata_Steel_Machine_Failure_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



# Tata Steel Machine Failure Prediction  
### Capstone Project: Machine Learning & GenAI with Microsoft Azure  
**Domain:** Manufacturing | **Type:** Classification | **Tools:** Pandas, Scikit-Learn, XGBoost, LightGBM, SHAP  





# **Project Summary -**

This comprehensive capstone project focuses on the development of a high-precision predictive maintenance system for Tata Steel, specifically designed to mitigate the substantial financial and operational risks associated with unplanned machine downtime. Utilizing a massive industrial dataset of over 1.36 lakh training records, the project addressed the inherent challenge of extreme class imbalance, where actual machine failures constituted only 3.39% of the total observations. To solve this "needle in a haystack" problem, the workflow integrated rigorous data cleaning, where zero missing values were confirmed, followed by an intensive Exploratory Data Analysis (EDA) phase that validated fundamental mechanical laws, such as the sharp inverse correlation between Torque and Rotational Speed. The analysis further identified five distinct failure modes—Tool Wear Failure (TWF), Heat Dissipation Failure (HDF), Power Failure (PWF), Overstrain Failure (OSF), and Random Failures (RNF)—revealing that mechanical fatigue and thermal inefficiency were the primary catalysts for breakdown. To bridge the gap between raw sensor data and physical reality, significant feature engineering was performed to create three high-impact synthetic variables: temp_diff (to monitor cooling efficiency), power_est (calculating the product of Torque and Speed to reflect total workload), and torque_per_speed (to identify mechanical strain).

The modeling phase moved beyond a baseline Logistic Regression, which served as a performance floor, to evaluate more sophisticated ensemble architectures including Random Forest and XGBoost. Because the dataset was heavily skewed toward "No Failure," the training process utilized SMOTE (Synthetic Minority Over-sampling Technique) and the scale_pos_weight parameter to ensure the model was highly sensitive to rare failure events. The Tuned XGBoost Classifier was identified as the champion model after undergoing hyperparameter optimization via RandomizedSearchCV with 3-fold Stratified Cross-Validation, specifically targeting the F1-macro score to ensure a robust balance between Precision and Recall. To ensure the model was not a "black box," SHAP (SHapley Additive exPlanations) was implemented, revealing that Torque, Tool Wear, and Temperature Difference are the three most critical indicators of an impending failure. From a business perspective, these results allow Tata Steel to transition from a reactive "break-fix" mentality to a proactive strategy, potentially reducing unplanned downtime by 20–30%. By monitoring real-time sensor thresholds identified by this model, maintenance teams can schedule interventions during planned windows, thereby optimizing resource allocation, improving workplace safety, and ensuring a consistent production output that aligns with the rigorous demands of modern steel manufacturing. This end-to-end machine learning pipeline demonstrates the power of integrating domain expertise with advanced gradient boosting to solve complex, real-world industrial challenges.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The core challenge in Tata Steel’s manufacturing units is the occurrence of unplanned machine downtime, which leads to significant production delays, high repair costs, and safety risks. Currently, maintenance is often performed reactively only after a failure has occurred or based on fixed schedules that do not account for the actual condition of the machine.

The goal of this project is to develop a Predictive Maintenance System using Machine Learning. By analyzing real-time sensor data such as Air Temperature, Process Temperature, Rotational Speed, Torque, and Tool Wear the model must accurately:

Predict Binary Failure: Determine whether a machine is likely to fail in the near future (Yes/No).

Identify Failure Modes: Classify the specific type of failure (e.g., Tool Wear Failure, Heat Dissipation Failure, or Power Failure) to help maintenance teams carry out the right repairs.

Success is defined by building a model that minimizes False Negatives (missed failures) while maintaining high Precision, ensuring that Tata Steel can shift from a reactive to a proactive, data-driven maintenance strategy.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [2]:
# 1. CORE LIBRARIES & CONFIGURATION

import pandas as pd
import numpy as np
import warnings

# Ignore warnings for clean output
warnings.filterwarnings('ignore')

# General pandas configuration for better visibility
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# 2. VISUALIZATION LIBRARIES

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# 3. MODELING & PREPROCESSING

from sklearn.model_selection import train_test_split, StratifiedKFold, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.metrics import (
    classification_report,
    confusion_matrix,
    f1_score,
    accuracy_score,
    precision_score,
    recall_score
)

# Advanced Algorithms
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# Handling Imbalance
from imblearn.over_sampling import SMOTE

# 4. MODEL EXPLAINABILITY

import shap

print("All libraries imported successfully!")

All libraries imported successfully!


### Dataset Loading

In [3]:
# Load Dataset
#DATASET LOADING


# File IDs for Google Drive
train_file_id = "1GO6z8wd3MV6LtIL2dp44phaP27oIYKMV"
test_file_id = "1N2a_AgZ4hCil9-jJbXbgGZYwmM9Amc1V"

# Constructing URLs
train_url = f"https://drive.google.com/uc?id={train_file_id}"
test_url = f"https://drive.google.com/uc?id={test_file_id}"

# Loading datasets
df_train = pd.read_csv(train_url)
df_test = pd.read_csv(test_url)

# Initial Validation
print("Data loaded successfully!")
print(f"Training Set Shape: {df_train.shape}")
print(f"Testing Set Shape:  {df_test.shape}")


✅ Data loaded successfully!
Training Set Shape: (136429, 14)
Testing Set Shape:  (90954, 13)


Unnamed: 0,id,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],TWF,HDF,PWF,OSF,RNF
0,136429,L50896,L,302.3,311.5,1499,38.0,60,0,0,0,0,0
1,136430,L53866,L,301.7,311.0,1713,28.8,17,0,0,0,0,0
2,136431,L50498,L,301.3,310.4,1525,37.7,96,0,0,0,0,0
3,136432,M21232,M,300.1,309.6,1479,47.6,5,0,0,0,0,0
4,136433,M19751,M,303.4,312.3,1515,41.3,114,0,0,0,0,0


### Dataset First View

In [4]:
# Dataset First Look

# 3.1 Inspecting Dataset Structure & Data Types
print("--- TRAIN DATA INFO ---")
df_train.info()

print("\n--- TEST DATA INFO ---")
df_test.info()

# 3.2 Checking for Missing Values
print("\n--- Missing Values in Train ---")
print(df_train.isnull().sum())

print("\n--- Missing Values in Test ---")
print(df_test.isnull().sum())

# 3.3 Summary Statistics
print("\n--- Summary Statistics (Train) ---")
display(df_train.describe().T)

# 3.4 Random Sample Peeking
print("\n--- Sample Records ---")
display(df_train.sample(5))

--- TRAIN DATA INFO ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136429 entries, 0 to 136428
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   id                       136429 non-null  int64  
 1   Product ID               136429 non-null  object 
 2   Type                     136429 non-null  object 
 3   Air temperature [K]      136429 non-null  float64
 4   Process temperature [K]  136429 non-null  float64
 5   Rotational speed [rpm]   136429 non-null  int64  
 6   Torque [Nm]              136429 non-null  float64
 7   Tool wear [min]          136429 non-null  int64  
 8   Machine failure          136429 non-null  int64  
 9   TWF                      136429 non-null  int64  
 10  HDF                      136429 non-null  int64  
 11  PWF                      136429 non-null  int64  
 12  OSF                      136429 non-null  int64  
 13  RNF                      136429 non

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
id,136429.0,68214.0,39383.804,0.0,34107.0,68214.0,102321.0,136428.0
Air temperature [K],136429.0,299.863,1.862,295.3,298.3,300.0,301.2,304.4
Process temperature [K],136429.0,309.941,1.385,305.8,308.7,310.0,310.9,313.8
Rotational speed [rpm],136429.0,1520.331,138.737,1181.0,1432.0,1493.0,1580.0,2886.0
Torque [Nm],136429.0,40.349,8.502,3.8,34.6,40.4,46.1,76.6
Tool wear [min],136429.0,104.409,63.965,0.0,48.0,106.0,159.0,253.0
Machine failure,136429.0,0.016,0.124,0.0,0.0,0.0,0.0,1.0
TWF,136429.0,0.002,0.039,0.0,0.0,0.0,0.0,1.0
HDF,136429.0,0.005,0.072,0.0,0.0,0.0,0.0,1.0
PWF,136429.0,0.002,0.049,0.0,0.0,0.0,0.0,1.0



--- Sample Records ---


Unnamed: 0,id,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
55415,55415,M24142,M,297.5,308.1,1610,32.4,115,0,0,0,0,0,0
52952,52952,M22606,M,300.4,311.7,1508,36.3,164,0,0,0,0,0,0
79153,79153,M18813,M,302.3,311.4,1556,39.5,22,0,0,0,0,0,0
101572,101572,L53138,L,300.6,310.8,1533,38.5,137,0,0,0,0,0,0
10965,10965,L54255,L,300.7,310.5,1663,35.2,189,0,0,0,0,0,0


### Dataset Rows & Columns count

In [5]:
# Dataset Rows & Columns count

# Calculating counts for Train Dataset
train_rows, train_cols = df_train.shape
print(f"Number of rows in Training set:    {train_rows:,}")
print(f"Number of columns in Training set: {train_cols}")

print("-" * 40)

# Calculating counts for Test Dataset
test_rows, test_cols = df_test.shape
print(f"Number of rows in Test set:        {test_rows:,}")
print(f"Number of columns in Test set:     {test_cols}")

# Quick summary of the data split
print(f"\nTotal Records: {train_rows + test_rows:,}")

Number of rows in Training set:    136,429
Number of columns in Training set: 14
----------------------------------------
Number of rows in Test set:        90,954
Number of columns in Test set:     13

Total Records: 227,383


### Dataset Information

In [7]:
# Dataset Info

# Detailed information for Training Data
print("Detailed Information: Training Dataset")
print("="*45)
df_train.info()

print("\n" + "="*45 + "\n")

# Detailed information for Test Data
print(" Detailed Information: Test Dataset")
print("="*45)
df_test.info()

Detailed Information: Training Dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136429 entries, 0 to 136428
Data columns (total 14 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   id                       136429 non-null  int64  
 1   Product ID               136429 non-null  object 
 2   Type                     136429 non-null  object 
 3   Air temperature [K]      136429 non-null  float64
 4   Process temperature [K]  136429 non-null  float64
 5   Rotational speed [rpm]   136429 non-null  int64  
 6   Torque [Nm]              136429 non-null  float64
 7   Tool wear [min]          136429 non-null  int64  
 8   Machine failure          136429 non-null  int64  
 9   TWF                      136429 non-null  int64  
 10  HDF                      136429 non-null  int64  
 11  PWF                      136429 non-null  int64  
 12  OSF                      136429 non-null  int64  
 13  RNF                 

#### Duplicate Values

In [8]:
# Dataset Duplicate Value Count

# Checking for duplicates in the training set
train_duplicates = df_train.duplicated().sum()
print(f"Number of duplicate rows in Training set: {train_duplicates}")

# Checking for duplicates in the test set
test_duplicates = df_test.duplicated().sum()
print(f"Number of duplicate rows in Test set:     {test_duplicates}")

# Verification logic
if train_duplicates == 0:
    print("\nNo duplicate records found. Data integrity is maintained.")
else:
    print(f"\nAction Required: {train_duplicates} duplicate(s) found in Training set.")

Number of duplicate rows in Training set: 0
Number of duplicate rows in Test set:     0

No duplicate records found. Data integrity is maintained.


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

In [None]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

In [None]:
# Dataset Describe

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Answer Here.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation

#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments

##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

## ***8.*** ***Future Work (Optional)***

### 1. Save the best performing ml model in a pickle file or joblib file format for deployment process.


In [None]:
# Save the File

### 2. Again Load the saved model file and try to predict unseen data for a sanity check.


In [None]:
# Load the File and predict unseen data.

### ***Congrats! Your model is successfully created and ready for deployment on a live server for a real user interaction !!!***

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***