# Model 2

This notebook is dedicated for Neural Network (bonus challenge) training and performance evaluation. Also, general conclusions of this project are presented in this notebook.

In [7]:
import pandas as pd
import numpy as np

import joblib

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from xgboost import XGBClassifier
from sklearn.metrics import plot_roc_curve, plot_precision_recall_curve, classification_report, roc_auc_score

%load_ext nb_black
%config InlineBackend.figure_format = 'svg'

In [2]:
df = pd.read_csv('/Users/drkazimieras/Turing College/Home credit default risk/df_for_Neural_net.csv')

# Neural network training

Here is the explanation of neural network architecture as used for training (summarized by GPT).

**Neural Network Architecture**

- Sequential Model: Sequential() initializes a linear stack of layers in the neural network, meaning each layer has exactly one input tensor and one output tensor.

- Dense Layers:Dense(128, activation='relu', input_shape=(X_train.shape[1],)): The first hidden layer with 128 neurons. The relu (Rectified Linear Unit) activation function is used, which is a common choice for hidden layers. The input_shape parameter is set to the shape of the input data.Dense(64, activation='relu'): The second hidden layer with 64 neurons, also using the relu activation function. Dense(1, activation='sigmoid'): The output layer with a single neuron. The sigmoid activation function is used, which is suitable for binary classification as it outputs a value between 0 and 1, representing the probability of belonging to the positive class.

**Compilation Parameters**

- Optimizer - Adam: The adam optimizer is an extension to stochastic gradient descent that has become the default optimizer for many deep learning applications. It's known for its efficiency in handling sparse gradients and adaptive learning rates.
- Loss Function - Binary Crossentropy: The loss function for binary classification problems is binary_crossentropy. It measures the performance of a classification model whose output is a probability value between 0 and 1. Binary crossentropy loss function is ideal for binary classification models.
- Metrics - Accuracy: The metric used to evaluate the model is accuracy, which calculates how often predictions match binary labels. It is the ratio of the number of correct predictions to the total number of predictions.

**Training and Evaluation**

- Training the Model - fit: The fit method trains the model for a fixed number of epochs (iterations over the entire dataset). Here, it's set to 10 epochs with a batch size of 32, meaning in each epoch, the dataset is divided into batches of 32 samples, and the network weights are updated after processing each batch.

In [3]:
categorical_cols = df.select_dtypes(include=['object']).columns
df = pd.get_dummies(df, columns=categorical_cols)

y = df['TARGET']  
X = df.drop(['TARGET', 'SK_ID_CURR'], axis=1)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=93, stratify=y)

In [4]:
model = Sequential()
model.add(Dense(128, activation='relu', input_shape=(X_train.shape[1],)))
model.add(Dense(64, activation='relu'))
model.add(Dense(1, activation='sigmoid'))  

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=32)

model.evaluate(X_test, y_test)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[0.00992470420897007, 0.9972733855247498]

In [5]:
y_pred = model.predict(X_test).ravel()

y_pred_label = (y_pred > 0.5).astype(int)

print(classification_report(y_test, y_pred_label))

roc_auc = roc_auc_score(y_test, y_pred)
print(f'ROC-AUC Score: {roc_auc}')

              precision    recall  f1-score   support

           0       1.00      1.00      1.00   1032543
           1       0.98      0.98      0.98     86785

    accuracy                           1.00   1119328
   macro avg       0.99      0.99      0.99   1119328
weighted avg       1.00      1.00      1.00   1119328

ROC-AUC Score: 0.999529997252514


In [None]:
joblib.dump(model, 'NN_model.pkl')

## XGBoost training for comparison

In [None]:
XGB_model = XGBClassifier(
    scale_pos_weight=sum(y_train == 0) / sum(y_train == 1), eval_metric="logloss"
)

XGB_model.fit(X_train, y_train)

In [None]:
RocCurveDisplay.from_estimator(XGB_model, X_test, y_test)
plt.title("ROC for XGBoost")
plt.show()

# Conclusions

## Model 2

If additional data is available to the bank, bigger models can be created. Neural networks seems to perform beter for this task.

## General

To achieve real and imaginary goals following steps were taken:

1. Financial data of past clients were structured, analyzed, and and presented.
2. Model 1 that would help bank quickly (having limited data) evaluate customers' creditworthiness was created, CatBoost model was optimized.
3. Model was deployed to the cloud and is accessible via HTTP requests.
4. Model is accessible via simple user interface so that bank employees can quickly evaluate weather client is worth further considerations for the loan. Model had live demo for the bank.
5. More complex model (Model 2) that may be used after initial filtering (Model 1), or instead of it was created and evaluated.

Possible improvements:

1. Explore all the features included in the datasets.
2. Improve the model pipelines to be able to accept new data and retrain.
3. Create a fully functional website to access Model 1.
4. Notebook code could be converted to python files for the more versatile and professional use.