# Neural Networks Project - Credit Risks Analysis

An experimental study about a neural network model aplication in a real world problem.

Neural Networks - Minister by Germano Vasconcelos

Team:  
- Lucas Alves Rufno  
- Rodrigo de Lima Oliveira  
- Ullayne Fernandes Farias de Lima 
- Vitor Jose da Silva Lima

## Emseble of Multilayer Perceptron 

Serie of experiments to evaluate the credit risks analysis using a statistical model

### Imports:
Relevant libraries to solve the problem

In [17]:
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from scipy.stats import ks_2samp as ksTest
from sklearn.utils import shuffle
from sklearn import metrics
import numpy as np
import pandas as pd

### Read dataset:
Read file as .h5 in Pandas with respective keys and describe data partially.

In [3]:
tr = pd.read_hdf("datasets/repeat/Train.h5", key='train')
va = pd.read_hdf("datasets/repeat/Validation.h5", key='validation')
te = pd.read_hdf("datasets/repeat/Test.h5", key='test')

### Modify dataset:
Modifying the dataset to only 2 sets (Train and Test). The validation set is splitter in the algorithm. 

In [4]:
tr = tr.append(va)
tr = shuffle(tr)
tr1 = tr.iloc[:,:-1]
tr2 = tr['IND_BOM_1_1']

### Trainning model:
Define a module to put on emseble

In [5]:
def mlp1():
    clf = MLPClassifier(
    hidden_layer_sizes=(200,),
    solver='sgd',
    activation='relu',
    learning_rate='constant',
    learning_rate_init=0.03,
    early_stopping=True,
    validation_fraction=0.1)
    clf.n_layers_ = 3
    clf.fit(tr1, tr2)
    #rClass = clf.predict(te.iloc[:,:-1])
    #rProba = clf.predict_proba(te.iloc[:,:-1])[:,1]
    return clf

In [6]:
def mlp2():
    clf = MLPClassifier(
    hidden_layer_sizes=(200,),
    solver='sgd',
    activation='relu',
    learning_rate='constant',
    learning_rate_init=0.03,
    early_stopping=True,
    validation_fraction=0.1)
    clf.n_layers_ = 2
    clf.fit(tr1, tr2)
    #rClass = clf.predict(te.iloc[:,:-1])
    #rProba = clf.predict_proba(te.iloc[:,:-1])[:,1]
    return clf

In [7]:
def mlp3():
    clf = MLPClassifier(
    hidden_layer_sizes=(1000,),
    solver='sgd',
    activation='relu',
    learning_rate='constant',
    learning_rate_init=0.03,
    early_stopping=True,
    validation_fraction=0.1)
    clf.fit(tr1, tr2)
    #rClass = clf.predict(te.iloc[:,:-1])
    #rProba = clf.predict_proba(te.iloc[:,:-1])[:,1]
    return clf

In [15]:
def random_forest(): 
    clf = RandomForestClassifier(n_estimators=200)
    clf.fit(tr1, tr2)
    #rClass = clf.predict(te.iloc[:,:-1])
    #rProba = clf.predict_proba(te.iloc[:,:-1])[:,1]
    return clf

In [8]:
def med(rProba, rClass): 
    print('MSE:', metrics.mean_squared_error(te['IND_BOM_1_1'], rProba))
    print('KS Test:', ksTest(te['IND_BOM_1_1'], rProba)[0])
    print('ROC AUC:', metrics.roc_auc_score(te['IND_BOM_1_1'], rProba))
    print('Accuracy:', metrics.accuracy_score(te['IND_BOM_1_1'], rClass))
    print('Precision, Recall and FScore:')
    print(metrics.precision_recall_fscore_support(te['IND_BOM_1_1'], rClass, average='binary')[:-1])
    print('Confusion Matrix:')
    print(metrics.confusion_matrix(te['IND_BOM_1_1'], rClass))

# Traning Models
Train 3 MLPS with different arquiteture, Random Forest and Gradient Boosting 

In [9]:
clf1 = mlp1()
#rClass = clf1.predict(te.iloc[:,:-1])
#rProba = clf1.predict_proba(te.iloc[:,:-1])[:,1]
#med(rProba,rClass)

In [10]:
clf2 = mlp2()
#rClass = clf2.predict(te.iloc[:,:-1])
#rProba = clf2.predict_proba(te.iloc[:,:-1])[:,1]
#med(rProba,rClass)

In [12]:
clf3 = mlp3()
#rClass = clf3.predict(te.iloc[:,:-1])
#rProba = clf3.predict_proba(te.iloc[:,:-1])[:,1]
#med(rProba,rClass)

In [18]:
clf4 = random_forest()
#rClass = clf4.predict(te.iloc[:,:-1])
#rProba = clf4.predict_proba(te.iloc[:,:-1])[:,1]
#med(rProba,rClass)

In [21]:
eclf = VotingClassifier(estimators=[('mlp1', clf1),('mlp2', clf2),('mlp3', clf3), ('random forest', clf4)], voting='soft')
eclf = eclf.fit(tr1,tr2)
rClass = eclf.predict(te.iloc[:,:-1])
rProba = eclf.predict_proba(te.iloc[:,:-1])[:,1]
med(rProba,rClass)


MSE: 0.21437902321
KS Test: 0.655450266193
ROC AUC: 0.696936151051
Accuracy: 0.648523094
Precision, Recall and FScore:
(0.7682971080227875, 0.66401668391507507, 0.71236079803519159)
Confusion Matrix:
[[20753 12771]
 [21427 42347]]


### Evaluating model:
Testing the statistical model

In [16]:
print('MSE:', metrics.mean_squared_error(te['IND_BOM_1_1'], rProba))
print('KS Test:', ksTest(te['IND_BOM_1_1'], rProba)[0])
print('ROC AUC:', metrics.roc_auc_score(te['IND_BOM_1_1'], rProba))
print('Accuracy:', metrics.accuracy_score(te['IND_BOM_1_1'], rClass))
print('Precision, Recall and FScore:')
print(metrics.precision_recall_fscore_support(te['IND_BOM_1_1'], rClass, average='binary')[:-1])
print('Confusion Matrix:')
print(metrics.confusion_matrix(te['IND_BOM_1_1'], rClass))

MSE: 0.22201866924001076
KS Test: 0.6554502661925219
ROC AUC: 0.6789338906564995
Accuracy: 0.6346481942074863
Precision, Recall and FScore:
(0.7577152039735583, 0.6506413271866278, 0.7001079840723494)
Confusion Matrix:
[[20256 13268]
 [22280 41494]]
