<a href="https://colab.research.google.com/github/ankijoshi2011/Credit-card-Default-Predictiontion/blob/main/Credit_Card_Default_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - Classification
##### **Contribution**    - Individual
##### **Name -** Akanksha Joshi

# **Project Summary -**

Credit card default prediction is like foreseeing whether someone might have trouble paying their credit card bills on time. This is important because when people can't pay, banks lose money. By using clever computer models, banks can guess who might struggle to pay and help them before things get worse. This helps banks save money, use their resources better, and keep their customers happy. It's like giving a heads-up to both the bank and the customers to prevent money troubles and make things work smoothly for everyone.

The "Credit Card Default Prediction" project aims to develop a predictive model for identifying potential credit card defaulters using historical transaction data and customer information. By accurately assessing the likelihood of default, the project aims to reduce financial losses, improve resource allocation, and enhance customer relationships through proactive measures. The project involves data collection, preprocessing, feature engineering, model selection, training, and validation. The chosen model will be integrated into operational systems to enable early risk identification and tailored interventions. Successful implementation is expected to mitigate risks, lower costs, increase customer satisfaction, and enhance overall operational efficiency within the credit card industry.

# **GitHub Link -**

https://github.com/ankijoshi2011/Credit-card-Default-Predictiontion

# **Problem Statement**


The challenge at hand is to create a reliable system for predicting credit card defaults. This system needs to use historical data and customer details to forecast if someone is likely to have difficulty paying their credit card bills on time. By doing this, we aim to help banks and financial institutions avoid losing money due to unpaid bills. We also want to assist customers by giving them early support to manage their finances better. This project requires developing smart models that can analyze patterns and make predictions.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import scipy.stats as stats

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
import sklearn.metrics as metrics
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, precision_score, recall_score, roc_auc_score,f1_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
import xgboost


### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset

df=pd.read_excel('/content/drive/MyDrive/data/default of credit card clients.xlsx')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

In [None]:
df.columns

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isna())

### What did you know about your dataset?

The Dataset has 3000 Rows and 25 columns. All the columns are integer data type. There are no null or duplicated values in the data.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

**ID:** An identifier for each individual in the dataset. This is likely not directly related to credit card default prediction but can be useful for tracking and indexing individuals.

**LIMIT_BAL:** The credit limit granted to the individual by the credit card company. A higher credit limit might indicate a greater ability to manage credit, but it could also lead to higher potential debt.

**SEX:** Gender of the individual. This could potentially impact their financial behaviors, but gender alone is unlikely to be a strong predictor of credit card default.(1 = male; 2 = female)

**EDUCATION:** Educational background of the individual. Education might influence financial literacy and income, which could impact credit behavior.Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).

**MARRIAGE:** Marital status of the individual(1 = married; 2 = single; 3 = others). Similar to gender, marital status alone might not be a strong predictor, but it could be related to financial stability.

**AGE:** Age of the individual. Younger individuals might have less financial experience and stability, potentially leading to higher default rates.

**PAY_0, PAY_2, PAY_3, PAY_4, PAY_5, PAY_6:** Repayment status of the credit card for the last six months. These variables can give insight into the payment history of the individual. A history of late payments might suggest an increased risk of default ( -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.)

**BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5, BILL_AMT6:** Billing amounts for the credit card for the last six months. These can indicate the individual's credit card usage and debt accumulation over time.

**PAY_AMT1, PAY_AMT2, PAY_AMT3, PAY_AMT4, PAY_AMT5, PAY_AMT6:** Payment amounts made by the individual for the credit card for the last six months. Timely and sufficient payments might indicate responsible credit behavior.

**default_payment_next_month:** The target variable. This binary variable (Yes = 1, No = 0) indicates whether the individual defaulted on their credit card payment in the next month. This is the variable Iam trying to predict.

In the context of credit card default prediction, features related to payment history, credit utilization (billing amounts), and payment behavior are likely to be strong predictors. Variables like age, education, and marital status might also play a role. However, building an effective prediction model would involve analyzing the relationships between these variables and the target variable through techniques such as exploratory data analysis, feature engineering, and machine learning algorithms.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

In [None]:
df['SEX'].value_counts()

In [None]:
df['EDUCATION'].value_counts()

In [None]:
df['MARRIAGE'].value_counts()

Here we can see that there are more varities in the education column as well as in marital status column. Definitely that needs to be fixed.

In [None]:
# Fixing 'EDUCATION' column by replacing the 0,5,6 category with 4
edu = (df.EDUCATION == 5) | (df.EDUCATION == 6) | (df.EDUCATION == 0)
df.loc[edu, 'EDUCATION'] = 4

# Fixing 'MARRIAGE' column by replacing the 0 or unknown category with 3
df.loc[df.MARRIAGE == 0, 'MARRIAGE'] = 3

In [None]:
pay_status_columns = ['PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6']

# Replace -1 and -2 with -1 in the pay status columns
df[pay_status_columns] = df[pay_status_columns].replace([-1, -2], 0)

In [None]:
df = df.drop(columns=['ID'])

### What all manipulations have you done and insights you found?

 The Id column was deleted since it has no importance in the analysis. There are three categorical variables in the data as 'gender', 'education','marital status'. The gender column has only two values as 1 and 2 which is fine. But there were more varities in the information under education and marital status. so I have replaced the unknown values in those columns with the category values which have lowest number of values after them.All the pay status column are having three values as -1,0,-2 indicating as payment was made in time which corresponds to no delay.Also all the pay status column are having three values as -1,0,-2 indicating as payment was made in time which corresponds to no delay.so I have replaced these no delays of payment with one perticular value as -1 to have a clear understanding.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

In [None]:
df['default_payment_next_month'].value_counts()

In [None]:
plt.pie(df['default_payment_next_month'].value_counts(),autopct='%1.2f%%')
plt.show()

##### 1. Why did you pick the specific chart?


A pie chart visually represents the proportional distribution of data categories, making it easy to grasp the relative sizes or percentages of each category at a glance, aiding in quick comparisons and identifying dominant categories in a dataset.

##### 2. What is/are the insight(s) found from the chart?

22.12 % people out of 30,000 dataset are defaulters.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Stratergies to be made to reduce the number of defaulters.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Filter the DataFrame for rows where 'default_payment_next_month' is 1
defaulted_df = df[df['default_payment_next_month'] == 1]

# Count the number of males and females in the filtered DataFrame
male_count = defaulted_df[defaulted_df['SEX'] == 1]['SEX'].count()
female_count = defaulted_df[defaulted_df['SEX'] == 2]['SEX'].count()

# Create a bar plot
labels = ['Male', 'Female']
counts = [male_count, female_count]

plt.bar(labels, counts, color=['blue', 'pink'])
plt.xlabel('Gender')
plt.ylabel('Count')
plt.title('Number of Defaults by Gender')
plt.show()
#This code creates a bar plot with 'Male' and 'Female' on the x-axis and the respective counts on the y-axis, showing the number of defaults for each gender. You can customize the plot further to match your preferences.



##### 1. Why did you pick the specific chart?

A bar chart is effective for comparing and displaying discrete data categories or values, allowing for clear visualization of differences in magnitude, making it useful for showing trends, comparisons, and ranking of data in a straightforward and easily interpretable manner.

##### 2. What is/are the insight(s) found from the chart?

The defaulters can not be told as per gender, they are almost equal

#### Chart - 3

In [None]:
# Chart - 3 visualization code

education_counts = defaulted_df['EDUCATION'].value_counts().sort_index()

# Define labels for the education categories (you can replace these with actual category names)
education_labels = ['Graduate School', 'University', 'High School', 'Others']

# Create a bar plot
plt.bar(education_labels, education_counts)
plt.xlabel('Education Category')
plt.ylabel('Count')
plt.title('Number of Defaults by Education Category')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()

##### 2. What is/are the insight(s) found from the chart?

Max defaulters are from University and graduate school education category

#### Chart - 4

In [None]:
# Chart - 4 visualization code
marriage_counts = defaulted_df['MARRIAGE'].value_counts().sort_index()

# Define labels for the marriage categories (you can replace these with actual category names)
marriage_labels = ['Married', 'Single', 'Others']

# Create a bar plot
plt.bar(marriage_labels, marriage_counts)
plt.xlabel('Marriage Category')
plt.ylabel('Count')
plt.title('Number of Defaults by Marriage Category')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()

##### 2. What is/are the insight(s) found from the chart?

Married and single category are equal defaulters in contribution.

Chart - 5

In [None]:
# Chart - 5 visualization code

# Define the age groups
age_groups = {
    '30-40': (30, 40),
    '40-50': (40, 50),
    '50-60': (50, 60),
    '60-70': (60, 70)
}

# Create an empty dictionary to store the counts for each age group
default_counts = {}

# Loop through the age groups and count the defaulters
for group, (min_age, max_age) in age_groups.items():
    filtered_df = df[(df['AGE'] >= min_age) & (df['AGE'] < max_age) & (df['default_payment_next_month'] == 1)]
    default_counts[group] = len(filtered_df)


# Create a bar plot
plt.bar(default_counts.keys(), default_counts.values())
plt.xlabel('Age Group')
plt.ylabel('Number of Defaulters')
plt.title('Number of Defaulters in Different Age Groups')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()


##### Chart - 6

In [None]:
df['limit']=df['LIMIT_BAL']/10000

In [None]:
# Chart - 5 visualization code

for col in ['limit','AGE']:
    fig = plt.figure(figsize=(9, 6))
    ax = fig.gca()
    feature = df[col]
    feature.hist(bins=50, ax = ax)
    ax.axvline(feature.mean(), color='Red', linestyle='dashed', linewidth=2)
    ax.axvline(feature.median(), color='magenta', linestyle='dashed', linewidth=2)
    ax.set_title(col)
plt.show()

##### 1. Why did you pick the specific chart?

As histogram is a very popular tool so the chart will show the overview of each and every varriables information and gives a clear idea about the data set. It also sumarizes the measured data.

##### 2. What is/are the insight(s) found from the chart?

Red line represents the mean of the data and mergenta line represents the median. In the above plots it is clearly visible that average balance limit (CREDIT LIMIT) is 15-16 thousands.Average Age od dataset is around 36-37 years.



#### Chart - 07 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize= (20,15))
sns.heatmap(df.corr(), annot=True)


##### 1. Why did you pick the specific chart?

To check Multicolinearity.

##### 2. What is/are the insight(s) found from the chart?

From the above corelation plot we can see that bill_april to september columns are highly correlated to each other which makes sense as these columns indicates the bill amounts.


Apart from that there are no highly correlated inputs in our dataset, so there is no multicollinearity problem.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Handling Missing Values & Missing Value Imputation
df.isna().sum()

As checked there are no missing values

### 2. Handling Outliers

In [None]:
# Handling Outliers & Outlier treatments
fig = plt.figure(figsize=(16,32))
c=1
for i in df.columns :
    plt.subplot(7, 3, c)
    plt.xlabel('Distibution of {}'.format(i))
    sns.boxplot(x=i,data=df)
    if c==21 :
      c = c + 0
    else :
      c=c+1
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=5.0)
plt.show()

##### What all outlier treatment techniques have you used and why did you use those techniques?

There are outliers detected in the columns - Limit Bal, Education(Category- others) and Age(60+)
These outliers can not be removed as they play an important role in model accuracy.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns
dum=pd.get_dummies(df['SEX'])
df=pd.concat([df,dum],axis=1)
df = df.rename(columns={1: 'MALE', 2: 'FEMALE', 'default_payment_next_month' : 'default'})


In [None]:
dum=pd.get_dummies(df['MARRIAGE'])
df=pd.concat([df,dum],axis=1)
df=df.rename(columns={1: 'Married', 2: 'Single', 3:'others'})



In [None]:
dum=pd.get_dummies(df['EDUCATION'])
df=pd.concat([df,dum],axis=1)
df=df.rename(columns={1: 'graduate school', 2: 'university', 3: 'high school'})

df.drop(columns=['others', 4, 'SEX', 'EDUCATION', 'MARRIAGE', 'limit'], inplace=True)



In [None]:
df.columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

I have used dummification to do categorical encoding for the columns - SEX,EDUCATION and MARITAL STATUS

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data
X = df.drop(labels='default', axis=1)
Y = df['default']

In [None]:
# print the shape of X and Y
print(f"The Number of Rows and Columns in X is {X.shape} respectively.")
print(f"The Number of Rows and Columns in Y is {Y.shape} respectively.")

### 6. Data Scaling

In [None]:
# Scaling your data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

##### Which method have you used to scale you data and why?


The StandardScaler is used in machine learning to standardize numerical features by centering them around zero and scaling them to have a standard deviation of 1. This preprocessing step ensures that all features contribute equally to model training, improving the performance of algorithms sensitive to feature scales. It aids convergence in algorithms like support vector machines and enhances visualization. StandardScaler is particularly helpful when dealing with features of different units or scales, promoting consistent and stable model behavior. However, not all algorithms require standardization, and its usage depends on the specific problem and data characteristics.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.
X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 42)

##### What data splitting ratio have you used and why?

A 80:20 test-train data split ratio strikes a balance between having sufficient training data for model learning and a substantial test set for robust evaluation, facilitating both model development and performance assessment. However, the choice of ratio should consider specific dataset characteristics and project requirements.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# ML Model - 1 Implementation

knn = KNeighborsClassifier(n_neighbors=6, weights='distance')

# Fit the Algorithm
knn.fit(X_train, Y_train)

# Predict on the model
knn_y_pred_train = knn.predict(X_train)
knn_tree_y_pred_test = knn.predict(X_test)

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart
print('Untuned K Nearest Neighbors Model')
print("Training F1 Score: ", metrics.f1_score(Y_train, knn_y_pred_train))
print("Testing F1 Score: ", metrics.f1_score(Y_test, knn_tree_y_pred_test))


#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)
k_near = KNeighborsClassifier(weights='distance')
# Fit the Algorithm
k_param_dict={'n_neighbors': range(4,15,1)}
grid_k_neighbors = GridSearchCV(k_near, k_param_dict, cv=5, scoring='f1', verbose=1)
grid_k_neighbors.fit(X_train, Y_train)
# Predict on the model
print('F1 Score:', grid_k_neighbors.best_score_)
print('Best Hyperparameters:', grid_k_neighbors.best_params_)
print('Model object with best parameters: ')
print(grid_k_neighbors.best_estimator_)

In [None]:
#Predicting the response on both train and test data set respectively.
tuned_k_neighbors_y_pred_train = grid_k_neighbors.best_estimator_.predict(X_train)
tuned_k_neighbors_y_pred_test = grid_k_neighbors.best_estimator_.predict(X_test)

print('Tuned K Nearest Neighbors Predictions')
print("F1 on train set:",metrics.f1_score(Y_train, tuned_k_neighbors_y_pred_train))
print("F1 on test set:",metrics.f1_score(Y_test, tuned_k_neighbors_y_pred_test))

##### Which hyperparameter optimization technique have you used and why?

Since hyperparameter values are predefined so here grid search technique has been used to tune the hyper paramenter.


##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Very slight improvement but this will not help. An F1 score of 0.27 for the test data and 0.99 for the train data suggests a significant disparity in model performance between the training and testing datasets. A high F1 score on the training data (0.99) indicates that the model has likely overfitted the training data, capturing it almost perfectly, but it fails to generalize well to unseen data, as evidenced by the much lower F1 score on the test data (0.27). This discrepancy indicates a potential issue of overfitting, where the model is overly complex and lacks the ability to generalize to new examples effectively.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

logreg = LogisticRegression(C=1e9, class_weight='balanced')
logreg.fit(X_train, Y_train)
logreg_y_pred_train = logreg.predict(X_train)
logreg_y_pred_test = logreg.predict(X_test)


In [None]:
print('Untuned Logistic Regression Model')
print("Training F1 Score: ", metrics.f1_score(Y_train, logreg_y_pred_train))
print("Testing F1 Score: ", metrics.f1_score(Y_test, logreg_y_pred_test))

In [None]:
y_pred_prob = logreg.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = metrics.roc_curve(Y_test, y_pred_prob)
plt.plot(fpr, tpr)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.title('ROC curve for Logisitic Regression')
plt.xlabel('False Positive Rate (1 - Specificity)')
plt.ylabel('True Positive Rate (Sensitivity)')
plt.grid(True)

In [None]:
print(metrics.roc_auc_score(Y_test, y_pred_prob))

In [None]:
print('accuracy:', metrics.accuracy_score(Y_test, logreg_y_pred_test))
print('precision:', metrics.precision_score(Y_test, logreg_y_pred_test))
print('recall:', metrics.recall_score(Y_test, logreg_y_pred_test))
print('f1 score:', metrics.f1_score(Y_test, logreg_y_pred_test))

In [None]:
# confusion matrix
labels = ['Not default', 'default']
cm = confusion_matrix(Y_train, logreg_y_pred_train)
print(cm)

ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax) #annot=True to annotate cells

# labels, title and ticks
ax.set_xlabel('Predicted labels')
ax.set_ylabel('True labels')
ax.set_title('Confusion Matrix for Train data')
ax.xaxis.set_ticklabels(labels)
ax.yaxis.set_ticklabels(labels)

In [None]:
labels = ['Not default', 'default']
cm = confusion_matrix(Y_test, logreg_y_pred_test)
print(cm)

ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax) #annot=True to annotate cells

# labels, title and ticks
ax.set_xlabel('Predicted labels')
ax.set_ylabel('True labels')
ax.set_title('Confusion Matrix for Test data')
ax.xaxis.set_ticklabels(labels)
ax.yaxis.set_ticklabels(labels)

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
import warnings
warnings.filterwarnings("ignore")

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)
tuned_log = LogisticRegression(random_state=42, class_weight='balanced')
# Fit the Algorithm
log_param_dict={'C': range(3,7,1),
                'penalty': ['l1','l2']}
grid_log = GridSearchCV(tuned_log, log_param_dict, cv=5, scoring='f1', verbose=1)
grid_log.fit(X_train, Y_train)

In [None]:
print('F1 Score:', grid_log.best_score_)
print('Best Hyperparameters:', grid_log.best_params_)
print('Model object with best parameters: ')
print(grid_log.best_estimator_)

In [None]:
grid_log_y_pred_train = grid_log.best_estimator_.predict(X_train)
grid_log_y_pred_test = grid_log.best_estimator_.predict(X_test)

print('Tuned Logistic Regression Model Predictions')
print("F1 on train set:",metrics.f1_score(Y_train, grid_log_y_pred_train))
print("F1 on test set:",metrics.f1_score(Y_test, grid_log_y_pred_test))

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# ML Model - 3 Implementation

# Fit the Algorithm

# Predict on the model

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***