**Practical 7**

 **Aim:** To develop and evaluate a Multilayer Perceptron (MLP) classifier for predicting loan default risk in banking and financial markets by
 preprocessing banking data, training the model, and analyzing its performance metrics.

In [None]:
from google.colab import drive

In [None]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
data=pd.read_csv('/content/drive/MyDrive/BANK LOAN.csv')

In [None]:
data

Unnamed: 0,SN,AGE,EMPLOY,ADDRESS,DEBTINC,CREDDEBT,OTHDEBT,DEFAULTER
0,1,3,17,12,9.3,11.36,5.01,1
1,2,1,10,6,17.3,1.36,4.00,0
2,3,2,15,14,5.5,0.86,2.17,0
3,4,3,15,14,2.9,2.66,0.82,0
4,5,1,2,0,17.3,1.79,3.06,1
...,...,...,...,...,...,...,...,...
695,696,2,6,15,4.6,0.26,0.98,1
696,697,1,6,4,11.5,0.37,2.05,0
697,698,2,15,3,7.6,0.49,1.94,0
698,699,3,19,22,8.4,2.30,4.17,0


In [None]:
dummies=pd.get_dummies(data['AGE'],prefix='AGE',drop_first=True)

In [None]:
data=pd.concat([data,dummies],axis=1)

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()


In [None]:
# Select the columns to scale (excluding 'AGE', 'SN', and 'DEFAULTER')
columns_to_scale = data.drop(['AGE', 'SN', 'DEFAULTER'], axis=1).columns

# Apply MinMaxScaler to the selected columns
data[columns_to_scale] = scaler.fit_transform(data[columns_to_scale])

# Display the first few rows of the updated DataFrame to see the scaled values
display(data.head())

Unnamed: 0,SN,AGE,EMPLOY,ADDRESS,DEBTINC,CREDDEBT,OTHDEBT,DEFAULTER,AGE_2,AGE_3
0,1,3,0.548387,0.352941,0.217604,0.552311,0.18384,1,0.0,1.0
1,2,1,0.322581,0.176471,0.413203,0.065693,0.146405,0,0.0,0.0
2,3,2,0.483871,0.411765,0.124694,0.041363,0.078577,0,1.0,0.0
3,4,3,0.483871,0.411765,0.061125,0.128954,0.02854,0,0.0,1.0
4,5,1,0.064516,0.0,0.413203,0.086618,0.111564,1,0.0,0.0


In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
classifier=MLPClassifier(hidden_layer_sizes=(3,),max_iter=500,activation='relu',solver='adam',random_state=42)

In [None]:
x=data.drop(['DEFAULTER','AGE'],axis=1)

In [None]:
y=data['DEFAULTER']

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)


In [None]:
classifier.fit(x_train,y_train)



In [None]:
pred=classifier.predict(x_test)
pred_prob=classifier.predict_proba(x_test)[0:210,1]

In [None]:
accuracy_score(y_test,pred)

0.7428571428571429

In [None]:
predicted_class

array([0., 1., 0., 0., 1., 1., 1., 1., 1., 0., 0., 1., 0., 0., 1., 1., 0.,
       1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 0., 1., 1., 0.,
       0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1., 1.,
       0., 1., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0.,
       1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0.,
       1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 1., 0.,
       0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1.,
       1., 0., 0., 0.])

In [None]:
accuracy_score(y_test,predicted_class)

0.6785714285714286

In [None]:
predicted_class=np.zeros(pred_prob.shape)
predicted_class[pred_prob>0.3]=1

In [None]:
print(classification_report(y_test,predicted_class))

              precision    recall  f1-score   support

           0       0.82      0.72      0.76       102
           1       0.43      0.58      0.49        38

    accuracy                           0.68       140
   macro avg       0.63      0.65      0.63       140
weighted avg       0.71      0.68      0.69       140



Observation
The model achieved an accuracy of approximately 74% on the test data using the default probability threshold of 0.5. However, after adjusting the probability threshold to 0.3, the overall accuracy decreased slightly to 68%.

The classification report shows that while the precision for class 0 (non-defaulters) is high (0.82), the recall is lower (0.72). For class 1 (defaulters), the precision is low (0.43), but the recall is higher (0.58) with the adjusted threshold. This indicates a trade-off between correctly identifying defaulters and incorrectly flagging non-defaulters.

Conclusion:
The neural network model shows some ability to predict loan defaulters. The choice of probability threshold significantly impacts the model's performance metrics, particularly the trade-off between precision and recall for each class. Depending on the business requirements (e.g., minimizing false positives vs. minimizing false negatives), the threshold can be adjusted. Further model tuning, feature engineering, or exploring different model architectures could potentially improve performance.