INTRODUCTION

This data set includes customers who have paid off their loans, who have been past due and put into collection without paying back their loan and interests, and who have paid off only after they were put in collection. The financial product is a bullet loan that customers should pay off all of their loan debt in just one time by the end of the term, instead of an installment schedule. Of course, they could pay off earlier than their pay schedule.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [None]:
data = pd.read_csv(r'/content/loan_data.csv')

In [None]:
data

Unnamed: 0,credit.policy,purpose,int.rate,installment,log.annual.inc,dti,fico,days.with.cr.line,revol.bal,revol.util,inq.last.6mths,delinq.2yrs,pub.rec,not.fully.paid
0,1,debt_consolidation,0.1189,829.10,11.350407,19.48,737,5639.958333,28854,52.1,0,0,0,0
1,1,credit_card,0.1071,228.22,11.082143,14.29,707,2760.000000,33623,76.7,0,0,0,0
2,1,debt_consolidation,0.1357,366.86,10.373491,11.63,682,4710.000000,3511,25.6,1,0,0,0
3,1,debt_consolidation,0.1008,162.34,11.350407,8.10,712,2699.958333,33667,73.2,1,0,0,0
4,1,credit_card,0.1426,102.92,11.299732,14.97,667,4066.000000,4740,39.5,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9573,0,all_other,0.1461,344.76,12.180755,10.39,672,10474.000000,215372,82.1,2,0,0,1
9574,0,all_other,0.1253,257.70,11.141862,0.21,722,4380.000000,184,1.1,5,0,0,1
9575,0,debt_consolidation,0.1071,97.81,10.596635,13.09,687,3450.041667,10036,82.9,8,0,0,1
9576,0,home_improvement,0.1600,351.58,10.819778,19.18,692,1800.000000,0,3.2,5,0,0,1


credit.policy: 1 if the customer meets the credit underwriting      
              criteria of LendingClub.com, and 0 otherwise.

purpose: The purpose of the loan (takes values "credit_card",     
          "debt_consolidation", "educational", "major_purchase", "small_business", and "all_other").

int.rate: The interest rate of the loan, as a proportion (a rate of
          11% would be stored as 0.11). Borrowers judged by LendingClub.com to be more risky are assigned higher interest rates.

installment: The monthly installments owed by the borrower if the
            loan is funded.

log.annual.inc: The natural log of the self-reported annual income  
                of the borrower.

dti: The debt-to-income ratio of the borrower (amount of debt  
     divided by annual income).

fico: The FICO credit score of the borrower.
      days.with.cr.line: The number of days the borrower has had a credit line.

revol.bal: The borrower's revolving balance (amount unpaid at the   
            end of the credit card billing cycle).

revol.util: The borrower's revolving line utilization rate (the
            amount of the credit line used relative to total credit available).

inq.last.6mths: The borrower's number of inquiries by creditors in
                the last 6 months.

delinq.2yrs: The number of times the borrower had been 30+ days past
              due on a payment in the past 2 years.

pub.rec: The borrower's number of derogatory public records
        (bankruptcy filings, tax liens, or judgments).

In [None]:
data['purpose'].value_counts()

debt_consolidation    3957
all_other             2331
credit_card           1262
home_improvement       629
small_business         619
major_purchase         437
educational            343
Name: purpose, dtype: int64

In [None]:
data['credit.policy'].value_counts()

1    7710
0    1868
Name: credit.policy, dtype: int64

In [None]:
data.drop(columns = 'purpose',axis = 1,inplace = True)

In [None]:
data

Unnamed: 0,credit.policy,int.rate,installment,log.annual.inc,dti,fico,days.with.cr.line,revol.bal,revol.util,inq.last.6mths,delinq.2yrs,pub.rec,not.fully.paid
0,1,0.1189,829.10,11.350407,19.48,737,5639.958333,28854,52.1,0,0,0,0
1,1,0.1071,228.22,11.082143,14.29,707,2760.000000,33623,76.7,0,0,0,0
2,1,0.1357,366.86,10.373491,11.63,682,4710.000000,3511,25.6,1,0,0,0
3,1,0.1008,162.34,11.350407,8.10,712,2699.958333,33667,73.2,1,0,0,0
4,1,0.1426,102.92,11.299732,14.97,667,4066.000000,4740,39.5,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9573,0,0.1461,344.76,12.180755,10.39,672,10474.000000,215372,82.1,2,0,0,1
9574,0,0.1253,257.70,11.141862,0.21,722,4380.000000,184,1.1,5,0,0,1
9575,0,0.1071,97.81,10.596635,13.09,687,3450.041667,10036,82.9,8,0,0,1
9576,0,0.1600,351.58,10.819778,19.18,692,1800.000000,0,3.2,5,0,0,1


In [None]:
data.isnull().sum()

credit.policy        0
int.rate             0
installment          0
log.annual.inc       0
dti                  0
fico                 0
days.with.cr.line    0
revol.bal            0
revol.util           0
inq.last.6mths       0
delinq.2yrs          0
pub.rec              0
not.fully.paid       0
dtype: int64

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9578 entries, 0 to 9577
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   credit.policy      9578 non-null   int64  
 1   int.rate           9578 non-null   float64
 2   installment        9578 non-null   float64
 3   log.annual.inc     9578 non-null   float64
 4   dti                9578 non-null   float64
 5   fico               9578 non-null   int64  
 6   days.with.cr.line  9578 non-null   float64
 7   revol.bal          9578 non-null   int64  
 8   revol.util         9578 non-null   float64
 9   inq.last.6mths     9578 non-null   int64  
 10  delinq.2yrs        9578 non-null   int64  
 11  pub.rec            9578 non-null   int64  
 12  not.fully.paid     9578 non-null   int64  
dtypes: float64(6), int64(7)
memory usage: 972.9 KB


In [None]:
data.describe()

Unnamed: 0,credit.policy,int.rate,installment,log.annual.inc,dti,fico,days.with.cr.line,revol.bal,revol.util,inq.last.6mths,delinq.2yrs,pub.rec,not.fully.paid
count,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0,9578.0
mean,0.80497,0.12264,319.089413,10.932117,12.606679,710.846314,4560.767197,16913.96,46.799236,1.577469,0.163708,0.062122,0.160054
std,0.396245,0.026847,207.071301,0.614813,6.88397,37.970537,2496.930377,33756.19,29.014417,2.200245,0.546215,0.262126,0.366676
min,0.0,0.06,15.67,7.547502,0.0,612.0,178.958333,0.0,0.0,0.0,0.0,0.0,0.0
25%,1.0,0.1039,163.77,10.558414,7.2125,682.0,2820.0,3187.0,22.6,0.0,0.0,0.0,0.0
50%,1.0,0.1221,268.95,10.928884,12.665,707.0,4139.958333,8596.0,46.3,1.0,0.0,0.0,0.0
75%,1.0,0.1407,432.7625,11.291293,17.95,737.0,5730.0,18249.5,70.9,2.0,0.0,0.0,0.0
max,1.0,0.2164,940.14,14.528354,29.96,827.0,17639.95833,1207359.0,119.0,33.0,13.0,5.0,1.0


In [None]:
data.drop(columns = 'credit.policy',axis = 1,inplace = True)

In [None]:
data.drop(['delinq.2yrs','pub.rec','inq.last.6mths'],axis=1,inplace=True)

In [None]:
x = data.iloc[:,:-1]

In [None]:
x

Unnamed: 0,int.rate,installment,log.annual.inc,dti,fico,days.with.cr.line,revol.bal,revol.util
0,0.1189,829.10,11.350407,19.48,737,5639.958333,28854,52.1
1,0.1071,228.22,11.082143,14.29,707,2760.000000,33623,76.7
2,0.1357,366.86,10.373491,11.63,682,4710.000000,3511,25.6
3,0.1008,162.34,11.350407,8.10,712,2699.958333,33667,73.2
4,0.1426,102.92,11.299732,14.97,667,4066.000000,4740,39.5
...,...,...,...,...,...,...,...,...
9573,0.1461,344.76,12.180755,10.39,672,10474.000000,215372,82.1
9574,0.1253,257.70,11.141862,0.21,722,4380.000000,184,1.1
9575,0.1071,97.81,10.596635,13.09,687,3450.041667,10036,82.9
9576,0.1600,351.58,10.819778,19.18,692,1800.000000,0,3.2


In [None]:
y = data.iloc[:,-1]

In [None]:
y

0       0
1       0
2       0
3       0
4       0
       ..
9573    1
9574    1
9575    1
9576    1
9577    1
Name: not.fully.paid, Length: 9578, dtype: int64

In [None]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

In [None]:
from sklearn.metrics import classification_report,accuracy_score

In [None]:
logreg =LogisticRegression()
knn = KNeighborsClassifier()
dt = DecisionTreeClassifier()
svm = SVC()

In [None]:
def mymodel(model):
    model.fit(X_train,y_train)
    y_pred = model.predict(X_test)
    print(classification_report(y_test,y_pred))
    return model

In [None]:
mymodel(logreg)

              precision    recall  f1-score   support

           0       0.84      1.00      0.91      2408
           1       1.00      0.00      0.00       466

    accuracy                           0.84      2874
   macro avg       0.92      0.50      0.46      2874
weighted avg       0.86      0.84      0.76      2874



In [None]:
mymodel(knn)


              precision    recall  f1-score   support

           0       0.84      0.96      0.90      2408
           1       0.19      0.05      0.08       466

    accuracy                           0.81      2874
   macro avg       0.52      0.50      0.49      2874
weighted avg       0.73      0.81      0.76      2874



In [None]:
mymodel(dt)

              precision    recall  f1-score   support

           0       0.84      0.80      0.82      2408
           1       0.18      0.23      0.20       466

    accuracy                           0.71      2874
   macro avg       0.51      0.52      0.51      2874
weighted avg       0.74      0.71      0.72      2874



In [None]:
mymodel(svm)

              precision    recall  f1-score   support

           0       0.84      1.00      0.91      2408
           1       0.00      0.00      0.00       466

    accuracy                           0.84      2874
   macro avg       0.42      0.50      0.46      2874
weighted avg       0.70      0.84      0.76      2874



In [None]:
svm = SVC(kernel='sigmoid')

In [None]:
mymodel(svm)

              precision    recall  f1-score   support

           0       0.84      0.87      0.86      2408
           1       0.18      0.15      0.16       466

    accuracy                           0.75      2874
   macro avg       0.51      0.51      0.51      2874
weighted avg       0.73      0.75      0.74      2874



In [None]:
svm = SVC(kernel='poly')

In [None]:
mymodel(svm)

              precision    recall  f1-score   support

           0       0.84      1.00      0.91      2408
           1       0.00      0.00      0.00       466

    accuracy                           0.84      2874
   macro avg       0.42      0.50      0.46      2874
weighted avg       0.70      0.84      0.76      2874



In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
param_grid = {'C':[1,10],'gamma':[1,10]}

In [None]:
gs = GridSearchCV(svm,param_grid,verbose=3)

In [None]:
gs.fit(X_train,y_train)

Fitting 5 folds for each of 4 candidates, totalling 20 fits


In [None]:
gs.best_params_

In [None]:
mymodel(gs)

In [None]:
svm = SVC(kernel='linear',C = 1,gamma=1)

In [None]:
mymodel(svm)