## Introduction

In this challenge, you'll get the opportunity to tackle one of the most industry-relevant machine learning problems with a unique dataset that will put your modeling skills to the test. Financial loan services are leveraged by companies across many industries, from big banks to financial institutions to government loans. One of the primary objectives of companies with financial loan services is to decrease payment defaults and ensure that individuals are paying back their loans as expected. In order to do this efficiently and systematically, many companies employ machine learning to predict which individuals are at the highest risk of defaulting on their loans, so that proper interventions can be effectively deployed to the right audience.

In this challenge, we will be tackling the loan default prediction problem on a very unique and interesting group of individuals who have taken financial loans. 

Imagine that you are a new data scientist at a major financial institution and you are tasked with building a model that can predict which individuals will default on their loan payments. We have provided a dataset that is a sample of individuals who received loans in 2021. 

This financial institution has a vested interest in understanding the likelihood of each individual to default on their loan payments so that resources can be allocated appropriately to support these borrowers. In this challenge, you will use your machine learning toolkit to do just that!

## Import Python Modules

First, import the primary modules that will be used in this project. Remember as this is an open-ended project please feel free to make use of any of your favorite libraries that you feel may be useful for this challenge. For example some of the following popular packages may be useful:

- pandas
- numpy
- Scipy
- Scikit-learn
- keras
- maplotlib
- seaborn
- etc, etc

In [67]:
# Import required packages

# Data packages
import pandas as pd
import numpy as np

# Machine Learning / Classification packages
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier

# Visualization Packages
from matplotlib import pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
# Import any other packages you may want to use


## Load the Data

Let's start by loading the dataset `train.csv` into a dataframe `train_df`, and `test.csv` into a dataframe `test_df` and display the shape of the dataframes.

In [None]:
train_df = pd.read_csv("train.csv")
print('train_df Shape:', train_df.shape)
train_df.head()

## Explore, Clean, Validate, and Visualize the Data (optional)

Feel free to explore, clean, validate, and visualize the data however you see fit for this competition to help determine or optimize your predictive model. Please note - the final autograding will only be on the accuracy of the `prediction_df` predictions.

In [None]:
# your code here (optional)

In [68]:
df=pd.read_csv("train.csv")
df.head()

Unnamed: 0,LoanID,Age,Income,LoanAmount,CreditScore,MonthsEmployed,NumCreditLines,InterestRate,LoanTerm,DTIRatio,Education,EmploymentType,MaritalStatus,HasMortgage,HasDependents,LoanPurpose,HasCoSigner,Default
0,I38PQUQS96,56,85994,50587,520,80,4,15.23,36,0.44,Bachelor's,Full-time,Divorced,Yes,Yes,Other,Yes,0
1,HPSK72WA7R,69,50432,124440,458,15,1,4.81,60,0.68,Master's,Full-time,Married,No,No,Other,Yes,0
2,C1OZ6DPJ8Y,46,84208,129188,451,26,3,21.17,24,0.31,Master's,Unemployed,Divorced,Yes,Yes,Auto,No,1
3,V2KKSFM3UN,32,31713,44799,743,0,3,7.07,24,0.23,High School,Full-time,Married,No,No,Business,No,0
4,EY08JDHTZP,60,20437,9139,633,8,4,6.51,48,0.73,Bachelor's,Unemployed,Divorced,No,Yes,Auto,No,0


In [69]:
onehot_columns=['Education', 'EmploymentType', 'MaritalStatus',
       'HasMortgage', 'HasDependents', 'LoanPurpose', 'HasCoSigner']

In [70]:
onehot_df=pd.get_dummies(df[onehot_columns],dtype=int)

In [71]:
onehot_df.drop(['HasMortgage_No','HasDependents_No','HasCoSigner_No'],axis=1,inplace=True)

In [72]:
df.drop(onehot_columns,axis=1,inplace=True)

In [73]:
df.drop(["LoanID"],axis=1,inplace=True)

In [74]:
df2=pd.concat([df,onehot_df],axis=1)
df2.head()

Unnamed: 0,Age,Income,LoanAmount,CreditScore,MonthsEmployed,NumCreditLines,InterestRate,LoanTerm,DTIRatio,Default,...,MaritalStatus_Married,MaritalStatus_Single,HasMortgage_Yes,HasDependents_Yes,LoanPurpose_Auto,LoanPurpose_Business,LoanPurpose_Education,LoanPurpose_Home,LoanPurpose_Other,HasCoSigner_Yes
0,56,85994,50587,520,80,4,15.23,36,0.44,0,...,0,0,1,1,0,0,0,0,1,1
1,69,50432,124440,458,15,1,4.81,60,0.68,0,...,1,0,0,0,0,0,0,0,1,1
2,46,84208,129188,451,26,3,21.17,24,0.31,1,...,0,0,1,1,1,0,0,0,0,0
3,32,31713,44799,743,0,3,7.07,24,0.23,0,...,1,0,0,0,0,1,0,0,0,0
4,60,20437,9139,633,8,4,6.51,48,0.73,0,...,0,0,0,1,1,0,0,0,0,0


outlier_col=["DTIRatio","InterestRate","CreditScore","LoanAmount","Income"]

In [75]:
scal_col=['Age', 'Income', 'LoanAmount', 'CreditScore',
       'MonthsEmployed', 'NumCreditLines', 'InterestRate', 'LoanTerm',
       'DTIRatio']

In [76]:
from sklearn.preprocessing import StandardScaler

In [77]:
# Standart Scaler uygulama
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df2[scal_col])

# Ölçeklendirilmiş veri setini DataFrame olarak oluşturma
scaled_df = pd.DataFrame(scaled_data, columns=scal_col)

scaled_df.head()


Unnamed: 0,Age,Income,LoanAmount,CreditScore,MonthsEmployed,NumCreditLines,InterestRate,LoanTerm,DTIRatio
0,0.83399,0.089693,-1.086833,-0.341492,0.590533,1.341937,0.261771,-0.001526,-0.260753
1,1.701221,-0.823021,-0.044309,-0.731666,-1.285731,-1.343791,-1.30835,1.412793,0.778585
2,0.166888,0.043854,0.022715,-0.775718,-0.968209,0.446694,1.156831,-0.708685,-0.823728
3,-0.767053,-1.303452,-1.168538,1.061875,-1.718715,0.446694,-0.967805,-0.708685,-1.170174
4,1.10083,-1.592855,-1.671921,0.369631,-1.48779,1.341937,-1.052188,0.705634,0.995114


In [78]:
droped_df2=df2.drop(scal_col,axis=1)
droped_df2.head(1)

Unnamed: 0,Default,Education_Bachelor's,Education_High School,Education_Master's,Education_PhD,EmploymentType_Full-time,EmploymentType_Part-time,EmploymentType_Self-employed,EmploymentType_Unemployed,MaritalStatus_Divorced,MaritalStatus_Married,MaritalStatus_Single,HasMortgage_Yes,HasDependents_Yes,LoanPurpose_Auto,LoanPurpose_Business,LoanPurpose_Education,LoanPurpose_Home,LoanPurpose_Other,HasCoSigner_Yes
0,0,1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,0,0,1,1


In [79]:
df3=pd.concat([scaled_df,droped_df2],axis=1)
df3.head()

Unnamed: 0,Age,Income,LoanAmount,CreditScore,MonthsEmployed,NumCreditLines,InterestRate,LoanTerm,DTIRatio,Default,...,MaritalStatus_Married,MaritalStatus_Single,HasMortgage_Yes,HasDependents_Yes,LoanPurpose_Auto,LoanPurpose_Business,LoanPurpose_Education,LoanPurpose_Home,LoanPurpose_Other,HasCoSigner_Yes
0,0.83399,0.089693,-1.086833,-0.341492,0.590533,1.341937,0.261771,-0.001526,-0.260753,0,...,0,0,1,1,0,0,0,0,1,1
1,1.701221,-0.823021,-0.044309,-0.731666,-1.285731,-1.343791,-1.30835,1.412793,0.778585,0,...,1,0,0,0,0,0,0,0,1,1
2,0.166888,0.043854,0.022715,-0.775718,-0.968209,0.446694,1.156831,-0.708685,-0.823728,1,...,0,0,1,1,1,0,0,0,0,0
3,-0.767053,-1.303452,-1.168538,1.061875,-1.718715,0.446694,-0.967805,-0.708685,-1.170174,0,...,1,0,0,0,0,1,0,0,0,0
4,1.10083,-1.592855,-1.671921,0.369631,-1.48779,1.341937,-1.052188,0.705634,0.995114,0,...,0,0,0,1,1,0,0,0,0,0


### Example prediction submission:

The code below is a very naive prediction method that simply predicts loan defaults using a Dummy Classifier. This is used as just an example showing the submission format required. Please change/alter/delete this code below and create your own improved prediction methods for generating `prediction_df`.

In [92]:
x=df3.drop(["Default"],axis=1)
y=df3["Default"]
print(y)
x.head()

0         0
1         0
2         1
3         0
4         0
         ..
255342    0
255343    1
255344    0
255345    0
255346    0
Name: Default, Length: 255347, dtype: int64


Unnamed: 0,Age,Income,LoanAmount,CreditScore,MonthsEmployed,NumCreditLines,InterestRate,LoanTerm,DTIRatio,Education_Bachelor's,...,MaritalStatus_Married,MaritalStatus_Single,HasMortgage_Yes,HasDependents_Yes,LoanPurpose_Auto,LoanPurpose_Business,LoanPurpose_Education,LoanPurpose_Home,LoanPurpose_Other,HasCoSigner_Yes
0,0.83399,0.089693,-1.086833,-0.341492,0.590533,1.341937,0.261771,-0.001526,-0.260753,1,...,0,0,1,1,0,0,0,0,1,1
1,1.701221,-0.823021,-0.044309,-0.731666,-1.285731,-1.343791,-1.30835,1.412793,0.778585,0,...,1,0,0,0,0,0,0,0,1,1
2,0.166888,0.043854,0.022715,-0.775718,-0.968209,0.446694,1.156831,-0.708685,-0.823728,0,...,0,0,1,1,1,0,0,0,0,0
3,-0.767053,-1.303452,-1.168538,1.061875,-1.718715,0.446694,-0.967805,-0.708685,-1.170174,0,...,1,0,0,0,0,1,0,0,0,0
4,1.10083,-1.592855,-1.671921,0.369631,-1.48779,1.341937,-1.052188,0.705634,0.995114,1,...,0,0,0,1,1,0,0,0,0,0


In [93]:
from sklearn.model_selection import train_test_split

In [94]:
# Veri setini eğitim ve test olarak ayırma
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Load trained model

In [101]:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization, InputLayer
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.callbacks import EarlyStopping

In [102]:
# Sinir ağı modeli oluşturma
model = Sequential()
model.add(InputLayer(input_shape=(X_train.shape[1],)))
model.add(Dense(64, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(16, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

In [103]:
# Optimizer tanımlama (learning rate ve momentum ile)
optimizer = SGD(learning_rate=0.01, momentum=0.9)

In [104]:
# Modeli derleme
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])



In [105]:
# Erken durdurma callback'i
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

In [107]:
# Modeli eğitme
model.fit(X_train, y_train, epochs=35, batch_size=10, validation_data=(X_test, y_test), callbacks=[early_stopping])

Train on 204277 samples, validate on 51070 samples
Epoch 1/35
Epoch 2/35
Epoch 3/35
Epoch 4/35
Epoch 5/35
Epoch 6/35
Epoch 7/35
Epoch 8/35
Epoch 9/35
Epoch 10/35
Epoch 11/35
Epoch 12/35
Epoch 13/35
Epoch 14/35
Epoch 15/35
Epoch 16/35
Epoch 17/35
Epoch 18/35
Epoch 19/35
Epoch 20/35
Epoch 21/35
Epoch 22/35
Epoch 23/35
Epoch 24/35
Epoch 25/35
Epoch 26/35
Epoch 27/35
Epoch 28/35
Epoch 29/35
Epoch 30/35
Epoch 31/35
Epoch 32/35


<tensorflow.python.keras.callbacks.History at 0x7e3159849990>

In [108]:
# Modelin performansını değerlendirme
loss, accuracy = model.evaluate(X_test, y_test)
print(f'Loss: {loss}, Accuracy: {accuracy}')

Loss: 0.30992395734166084, Accuracy: 0.8878010511398315


In [109]:
# Modeli kaydetme (HDF5 formatında)
model.save('model_with_momentum.h5')

# Skalerı kaydetme
import pickle
with open('scaler_with_momentum.pkl', 'wb') as file:
    pickle.dump(scaler, file)


In [130]:
from sklearn.metrics import classification_report,roc_auc_score

In [131]:
from sklearn.metrics import classification_report, roc_auc_score
import numpy as np

# Modelin predict yöntemi ile tahmin yapma
predictions = model.predict(X_test)
binary_predictions = np.where(predictions > 0.5, 1, 0)

# Sınıflandırma raporunu oluşturma
report = classification_report(y_test, binary_predictions)
print("Sınıflandırma Raporu:\n", report)

# ROC AUC puanını hesaplama
roc_auc = roc_auc_score(y_test, predictions)
print("ROC AUC Puanı:", roc_auc)


Sınıflandırma Raporu:
               precision    recall  f1-score   support

           0       0.89      0.99      0.94     45170
           1       0.62      0.07      0.13      5900

    accuracy                           0.89     51070
   macro avg       0.76      0.53      0.54     51070
weighted avg       0.86      0.89      0.85     51070

ROC AUC Puanı: 0.7591789360720143


Unnamed: 0,Age,Income,LoanAmount,CreditScore,MonthsEmployed,NumCreditLines,InterestRate,LoanTerm,DTIRatio,Education_Bachelor's,...,MaritalStatus_Married,MaritalStatus_Single,HasMortgage_Yes,HasDependents_Yes,LoanPurpose_Auto,LoanPurpose_Business,LoanPurpose_Education,LoanPurpose_Home,LoanPurpose_Other,HasCoSigner_Yes
137187,-1.700995,1.413566,1.151277,1.710067,-0.968209,-0.448549,-0.455482,1.412793,1.34156,0,...,0,1,1,0,0,1,0,0,0,0
230334,0.233598,-0.649471,-1.716656,1.09334,-0.852747,-0.448549,0.93834,-0.001526,0.995114,0,...,0,0,0,1,0,0,1,0,0,0
19687,-1.167314,0.046934,-0.458972,-0.763132,-1.516656,-0.448549,1.620936,-1.415845,-0.217447,0,...,1,0,0,0,0,0,0,0,1,1
106509,0.633859,-0.83937,1.439897,-0.259682,1.369905,0.446694,0.142731,1.412793,-1.430008,1,...,0,1,1,0,1,0,0,0,0,1
242291,0.367019,0.845693,-1.489357,1.672308,-1.718715,1.341937,1.655593,-1.415845,-1.689843,1,...,0,1,0,1,0,0,1,0,0,1
