# Loan Classifier Project

The following algorithms were used to build prediction models:
- k-Nearest Neighbour
- Decision Tree
- Support Vector Machine
- Logistic Regression

The results is reported as the accuracy of each classifier, using the following metrics when these are applicable:
- Jaccard index
- F1-score
- LogLoss

Dataset Details:
- 346 rows
- 10 columns

---

[Code on IBM Watson Studio] (https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/ee3cecd1-8813-4c59-90ad-fe4e431ef407/view?access_token=367cb53e15dad3acdc058045e4cd28f69709fc25521840024d515aacda57ea74)

### Clean
- check data types
- change dtypes astype
- reorganize dataframe
### Prep
- select features
- normalize data using standardScalar
- split data train_and_test_split
- preprocessing - fit_transform,label_encoder, standardScalar
- model_selection - train_and_test_split

### Model
- linear_model - LinearRegression
- cluster - KMeans
- neighbors - KNeighborsClassifer
- tree - DecisionTreeClassifier
- SVM - SVC

### Evaluate
- metrics - r2_score, report, jaccard, f1_score, logloss

---

In [159]:
import pandas as pd
import math
from sklearn import preprocessing, model_selection, neighbors, metrics, tree, svm, linear_model

df = pd.read_csv('loan_train.csv')

# clean
rearrange_cols = ['Principal', 'terms','age', 'education', 'Gender','loan_status']
df = df[rearrange_cols]
df_visual = df[rearrange_cols]

# prep
label_encoder = preprocessing.LabelEncoder()
df['Gender'] = label_encoder.fit(['male', 'female']).transform(df['Gender'].values)
df['education'] = label_encoder.fit(['High School or Below', 'college', 'Bechalor', 'Master or Above']).transform(df['education'].values)
df['loan_status'] = label_encoder.fit_transform(df['loan_status'].values)

features = ['Principal', 'terms','age', 'education', 'Gender']
y = df['loan_status'].values
x = df[features].values

In [171]:
# Prep - KNN
x_KNN = preprocessing.StandardScaler().fit_transform(x)
x_train_KNN, x_test_KNN, y_train_KNN, y_test_KNN = model_selection.train_test_split(x_KNN,y, test_size=0.2, random_state=4)

# Model - KNN
k = math.log(df.shape[0])
k = math.ceil(k)

KNN = neighbors.KNeighborsClassifier(n_neighbors=k).fit(x_train_KNN,y_train_KNN)
y_hat_KNN = KNN.predict(x_test_KNN)
y_prob_KNN = KNN.predict_proba(x_test_KNN)

# Evaluate - KNN
jaccard_score_KNN = metrics.jaccard_similarity_score(y_test_KNN, y_hat_KNN)
f1_score_KNN = metrics.f1_score(y_test_KNN, y_hat_KNN)
class_report_KNN = metrics.classification_report(y_test_KNN, y_hat_KNN)

print('Jaccard Score (KNN): %.03f' % jaccard_score_KNN)
print('F1 Score (KNN): %.03f' % f1_score_KNN)
print('\n Classification Report: KNN \n', class_report_KNN)

Jaccard Score: 0.714
F1 Score: 0.828
Log Loss: 9.868

 Classification Report: KNN 
               precision    recall  f1-score   support

           0       0.22      0.13      0.17        15
           1       0.79      0.87      0.83        55

   micro avg       0.71      0.71      0.71        70
   macro avg       0.50      0.50      0.50        70
weighted avg       0.67      0.71      0.69        70



In [176]:
# Prep - Decision Tree
x_train_tree, x_test_tree, y_train_tree, y_test_tree = model_selection.train_test_split(x,y, test_size=0.2, random_state=4)

# Model - Decision Tree
decision_tree = tree.DecisionTreeClassifier(criterion='entropy', max_depth=3).fit(x_train_tree, y_train_tree)
y_hat_tree = decision_tree.predict(x_test_tree)
y_prob_tree = decision_tree.predict_proba(x_test_tree)

jaccard_score_tree = metrics.jaccard_similarity_score(y_test_tree, y_hat_tree)
f1_score_tree = metrics.f1_score(y_test_tree, y_hat_tree)
class_report_tree = metrics.classification_report(y_test_tree, y_hat_tree)
log_loss_tree = metrics.log_loss(y_test_tree, y_prob_tree)

print('Jaccard Score (Tree): %.03f' % jaccard_score_tree)
print('F1 Score (Tree): %.03f' % f1_score_tree)
print('\n Classification Report: KNN \n', class_report_tree)

Jaccard Score: 0.786
F1 Score: 0.880

 Classification Report: KNN 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        15
           1       0.79      1.00      0.88        55

   micro avg       0.79      0.79      0.79        70
   macro avg       0.39      0.50      0.44        70
weighted avg       0.62      0.79      0.69        70



In [180]:
# Prep - SVM
x_SVM = preprocessing.StandardScaler().fit_transform(x)
x_train_SVM, x_test_SVM, y_train_SVM, y_test_SVM = model_selection.train_test_split(x_SVM, y, test_size=0.2, random_state=4)

# Model
SVM = svm.SVC(kernel='linear').fit(x_train_SVM, y_train_SVM)
y_hat_SVM = SVM.predict(x_test)

# Evaluation
jaccard_score_SVM = metrics.jaccard_similarity_score(y_test_SVM, y_hat_SVM)
class_report_SVM = metrics.classification_report(y_test_SVM, y_hat_SVM)
f1_score_SVM = metrics.f1_score(y_test_SVM, y_hat_SVM)


print('Jaccard Score (SVM): %.03f' % jaccard_score_SVM)
print('F1 Score (SVM): %.03f' % f1_score_SVM)
print('\n Classification Report: SVM \n', class_report_SVM)

Jaccard Score (SVM): 0.786
F1 Score (SVM): 0.880

 Classification Report: SVM 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        15
           1       0.79      1.00      0.88        55

   micro avg       0.79      0.79      0.79        70
   macro avg       0.39      0.50      0.44        70
weighted avg       0.62      0.79      0.69        70



In [179]:
# Prep - Logistic Regression
x_LR = preprocessing.StandardScaler().fit_transform(x)
x_train_LR, x_test_LR, y_train_LR, y_test_LR = model_selection.train_test_split(x, y, test_size=0.2, random_state=4)

# Model
log_reg = linear_model.LogisticRegression(C=0.01, solver='liblinear').fit(x_train_LR, y_train_LR)
y_hat_LR = log_reg.predict(x_test_LR)
y_prob_LR = log_reg.predict_proba(x_test_LR)

# Evaluation
f1_score_LR = metrics.f1_score(y_test_LR, y_hat_LR)
jaccard_score_LR = metrics.jaccard_similarity_score(y_test_LR, y_hat_LR)
log_loss_score_LR = metrics.log_loss(y_test_LR, y_prob_LR)

print('Jaccard Score (LR): %.03f' % jaccard_score_LR)
print('F1 Score (LR): %.03f' % f1_score_LR)
print('Log Loss (LR): %.03f' % log_loss_score_LR)

Jaccard Score: 0.786
F1 Score: 0.880
Log Loss: 0.556


## Evaluation:

In [186]:
print('Jaccard Score (KNN): %.05f' % jaccard_score_KNN)
print('Jaccard Score (Tree): %.05f' % jaccard_score_tree)
print('Jaccard Score (LR): %.05f' % jaccard_score_LR)
print('Jaccard Score (SVM): %.05f' % jaccard_score_SVM)


print('F1 Score (KNN): %.05f' % f1_score_KNN)
print('F1 Score (Tree): %.05f' % f1_score_tree)
print('F1 Score (LR): %.05f' % f1_score_LR)
print('F1 Score (SVM): %.05f' % f1_score_SVM)

print('Log Loss (LR): %.05f' % log_loss_score_LR)

print('\n Classification Report: KNN \n', class_report_KNN)
print('\n Classification Report: KNN \n', class_report_tree)
print('\n Classification Report: SVM \n', class_report_SVM)

Jaccard Score (KNN): 0.71429
Jaccard Score (Tree): 0.78571
Jaccard Score (LR): 0.78571
Jaccard Score (SVM): 0.78571
F1 Score (KNN): 0.82759
F1 Score (Tree): 0.88000
F1 Score (LR): 0.88000
F1 Score (SVM): 0.88000
Log Loss (LR): 0.55586

 Classification Report: KNN 
               precision    recall  f1-score   support

           0       0.22      0.13      0.17        15
           1       0.79      0.87      0.83        55

   micro avg       0.71      0.71      0.71        70
   macro avg       0.50      0.50      0.50        70
weighted avg       0.67      0.71      0.69        70


 Classification Report: KNN 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        15
           1       0.79      1.00      0.88        55

   micro avg       0.79      0.79      0.79        70
   macro avg       0.39      0.50      0.44        70
weighted avg       0.62      0.79      0.69        70


 Classification Report: SVM 
               prec