# CodeJam 2017 - Datadive - McGill
---

# Predicting whether a loan is going to default given data from the time of the request

## Goal
Our goal with this project was to train a neural network to recognize, given a set of inputs, whether a loan was going to default or not. 
We had no knowledge as to the possibility of doing such a thing, so we knew that we might have to recognize that we did not have sufficient data, sufficiently complete data, or that there might not be a clear enough pattern for us to predict the delinquency of a loan with a better than random accuracy.

## Datasets
We used the given dataset Lending_Club_Loans.csv, that contains data related to loan offered to clients of the Lending Club between 2009 and 2016 (to be verified). 
Among all the available information, we used 5 columns:
- 4 input columns
    - 'annual_inc': Self-reported annual income -- reported at the time at which the loan was requested
    - 'dti': debt-to-income ratio -- reported at the time at which the loan was requested
    - 'installment': amount to be paid monthly -- amount agreed upon between the loan requester and their debtor
    - 'home_ownership': 'OWN', 'MORTGAGE', 'RENT', 'OTHER', 'NONE' and 'ANY' -- living situation at the time of the request
- 1 output column
    - 'acc_now_delinq': number of accounts delinquent at the time when this dataset was saved or recorded

If we had more time, we might have used more columns, such as:
- the employment length in years: knowing how long the creditor has held a stable job could help the accuracy
- the month the creditor's credit line was first opened: associated with the debt-to-income ratio, we could have found some correlations
- some columns containing textual information, such as the job title, the 'category' of the reason for the loan, or the reason given -- that would have required natural language processing with sentiment analysis, with which we have limited experience and knowledge

Unfortunately, the credit score of the creditor wasn't available, otherwise, it would of course have been very valuable data. 

We did not use any other datasets.

## Methodology
### Data treatment

Load the required libraries for the project. 

We use the [pytorch](http://pytorch.org/) framework as well as the [pandas](http://pandas.pydata.org/) and [numpy](http://www.numpy.org/) libraries.

In [1]:
from __future__ import print_function, division
import os
import torch
import pandas as pd
from skimage import io, transform
import numpy as np

import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

plt.ion()   # interactive mode

Load the data from the csv file

In [2]:
loan_data = pd.read_csv("/home/vivien/Lending-Club-Loans.csv")

### Data Treatment
We started by selectionning the columns that we wanted to use for the input and output data
- 'annual_inc': Self-reported annual income
- 'dti': debt-to-income ratio
- 'installment': amount to be paid monthly
- 'home_ownership': 'OWN', 'MORTGAGE', 'RENT', 'OTHER', 'NONE' and 'ANY'
    - treated with 'one hot labels', with NONE and ANY considered as having no data  

Then, we transformed the 'home_ownership' text data into number categories, representing the one hot labels, and concatenated the two sets to form our input data

In [3]:
#number of parameters settings the number of neurons 
#in each layer of the neural network
col_num = 7
param_num = col_num

#getting the relevant input columns from the data
input_data = loan_data.ix[:,['annual_inc',
                             'dti',
                             'installment',
                             'home_ownership'
                            ]].as_matrix()

#getting the relevant output columns from the data
output_data = loan_data.ix[:,['acc_now_delinq']].as_matrix().astype('int')

#temporary array to transform the text data 
#from 'home_ownership' to one hot labels
input_home = []
for i in range(input_data.shape[0]):
    if input_data[i, 3] == "OWN":
        input_home.append([1,0,0,0])  #OWN, #MORTGAGE, #RENT, #OTHER
    elif input_data[i, 3] == "MORTGAGE":
        input_home.append([0,1,0,0])  #OWN, #MORTGAGE, #RENT, #OTHER
    elif input_data[i, 3] == "RENT":
        input_home.append([0,0,1,0])  #OWN, #MORTGAGE, #RENT, #OTHER
    elif input_data[i, 3] == "OTHER":
        input_home.append([0,0,0,1])  #OWN, #MORTGAGE, #RENT, #OTHER
    else:
        input_home.append([0,0,0,0])  #OWN, #MORTGAGE, #RENT, #OTHER

#removing the original 'home_ownership' column from the input_data
input_data = np.delete(input_data, 3, 1)
#adding the home data to the input_data
input_data = np.concatenate((input_data, input_home), axis=1)
input_data = input_data.astype('float')

### Cleaning the data
Some of the data needed for the output is empty, so we discard those rows

In [4]:
mask = output_data >= 0
mask = np.reshape(mask, (-1))
print(mask.shape)

output_data = output_data[mask]
input_data = input_data[mask,:]

print(input_data.shape)
print(output_data.shape)

(887379,)
(887350, 7)
(887350, 1)


The data for the output indicates the *number* of accounts that defaulted, we just want to know whether one has defaulted, so any values superior to 1 is reduced to 1, so as to have a binary category.

In [5]:
for i in range(output_data.shape[0]):
    if output_data[i,0] > 0:
        output_data[i,0] = 1

Here is the visualization of the shapes of the input and output data.  

In [6]:
print("Shape of input_data:")
print(input_data.shape)
print("Shape of output_data:")
print(output_data.shape)

Shape of input_data:
(887350, 7)
Shape of output_data:
(887350, 1)


### Separation of the defaulting data for testing (and for training with different proportion of each data)
In order to visualize the proportion of data for each result, we made two data arrays with the "will default" and "will not default" data.  
>For reference, 'ndef' corresponds to "will not default".

In [7]:
mask_default = output_data == 1
mask_ndef = output_data == 0
output_data_default = output_data[mask_default]
output_data_ndef = output_data[mask_ndef]

mask_default = np.reshape(mask_default, -1)
mask_ndef = np.reshape(mask_ndef, -1)
input_data_default = input_data[mask_default,:]
input_data_ndef = input_data[mask_ndef,:]

#### Visualization of the data

In [8]:
r_dta = output_data_default.shape[0]/output_data.shape[0]
print("Total number of loans:")
print(input_data.shape[0])
print("Number of defaulting loans:")
print(input_data_default.shape[0])
print("Number of loans NOT defaulting:")
print(input_data_ndef.shape[0])
print("Ratio of default to all")
print('%.6f' % r_dta)
print("In percent:")
print('%.3f' % (r_dta*100) + "%")

Total number of loans:
887350
Number of defaulting loans:
4114
Number of loans NOT defaulting:
883236
Ratio of default to all
0.004636
In percent:
0.464%


We can see that there is a very small number of loans that default compared to the entire dataset.  
**During our first attempts to train the net, we used the entire dataset, but we quickly realized that we would get better results by using a dataset with more equivalent numbers between the default and ndef.**
This is the reason why we decided to use only as much non-defaulting loans as we had defaulting ones.  
There are probably consequences to this decisions, but we haven't yet taken the time to examine them closely.

---

#### Splitting the data between training and testing sets
As discussed above, we do not use the entire dataset in order to have an equal amount of defaulting and non-defaulting loans.

We separate the data because it is not possible to efficiently test a neural net with the data that it was trained with. The data that is not used to train the neural net is therefore used to test it afterwards.

In [9]:
#create the test datasets here
#(take in account a proportion of defaulting is necessary)
num_def = input_data_default.shape[0]
half_num_def = int(num_def/2)
input_data_train = np.concatenate((input_data_default[0:half_num_def], input_data_ndef[0:half_num_def]), axis=0)
input_data_train_default = input_data_default
input_data_train_ndef = input_data_ndef[0:num_def]
output_data_train = np.concatenate((output_data_default[0:half_num_def], output_data_ndef[0:half_num_def]))
input_data_test = np.concatenate((input_data_default[half_num_def+1:num_def], input_data_ndef[half_num_def+1:num_def]), axis=0)
input_data_test_default = input_data_default[half_num_def+1:num_def]
input_data_test_ndef = input_data_ndef[half_num_def+1:num_def]
output_data_test = np.concatenate((output_data_default[half_num_def+1:num_def], output_data_ndef[half_num_def+1:num_def]))
output_data_test_default = output_data_default[half_num_def+1:num_def]
output_data_test_ndef = output_data_ndef[half_num_def+1:num_def]

### Neural network implementation
For the neural network implementation, we partially followed this [tutorial](http://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html) that offers a good base that we then modified to our needs.

In [10]:
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

#### Declaration of the net
It is made of 4 layers, each layers having a number of parameters depending on the number of columns present in the input data.

In [11]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(col_num,param_num)
        self.fc3 = nn.Linear(param_num,param_num)
        self.fc4 = nn.Linear(param_num,param_num)
        #self.fc5 = nn.Linear(param_num,param_num)
        #self.fc6 = nn.Linear(param_num,param_num)
        #self.fc7 = nn.Linear(param_num,param_num)
        #self.fc8 = nn.Linear(param_num,param_num)
        #self.fc9 = nn.Linear(param_num,param_num)
        #self.fc10 = nn.Linear(param_num,param_num)
        #self.fc11 = nn.Linear(param_num,param_num)
        #self.fc12 = nn.Linear(param_num,param_num)
        #self.fc13 = nn.Linear(param_num,param_num)
        #self.fc14 = nn.Linear(param_num,param_num)
        #self.fc15 = nn.Linear(param_num,param_num)
        self.fc2 = nn.Linear(param_num,2)
        #self.fc4 = nn.Linear(param_num,param_num)
        #self.fc5 = nn.Linear(param_num,param_num)
        #self.fc6 = nn.Linear(param_num,2)

    def forward(self, x):
        #print(x)
        #m = nn.LeakyReLU()
        #x = m(self.fc1(x))
        #print(x)
        #m = nn.LeakyReLU()
        #x = m(self.fc2(x))
        #print(x)
        #m = nn.LeakyReLU()
        #x = m(self.fc3(x))
        #print(x)
        x = F.relu(self.fc1(x))
        #print("LAYER 1")
        #print(x)
        #x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        #x = F.relu(self.fc5(x))
        #x = F.relu(self.fc6(x))
        #x = F.relu(self.fc7(x))
        #x = F.relu(self.fc8(x))
        #x = F.relu(self.fc9(x))
        #x = F.relu(self.fc10(x))
        #x = F.relu(self.fc11(x))
        #x = F.relu(self.fc12(x))
        #x = F.relu(self.fc13(x))
        #x = F.relu(self.fc14(x))
        #x = F.relu(self.fc15(x))
        #print("LAYER 2")
        #print(x)
        x = F.relu(self.fc2(x))
        #print("LAYER 3")
        #print(x)
        # Max pooling over a (2, 2) window
        #x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        #x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        #x = x.view(-1, self.num_flat_features(x))
        #x = F.relu(self.fc1(x))
        #x = F.relu(self.fc2(x))
        #x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()

For 7 columns in the input data, we have:

In [12]:
print(net)

Net (
  (fc1): Linear (7 -> 7)
  (fc3): Linear (7 -> 7)
  (fc4): Linear (7 -> 7)
  (fc2): Linear (7 -> 2)
)


We can see that the final output is a combination of 2 values.

---
Part of the net is a set of parameters corresponding to weights associated with each information given as an input. 
We set the parameters to be between 0 and 1.

In [13]:
params = list(net.parameters())
for par in params:
    for p in par:
        p.data.uniform_(0, 1)
print(params[0])

Parameter containing:
 0.7907  0.0669  0.1690  0.0038  0.8367  0.0138  0.1079
 0.4388  0.5240  0.8189  0.2457  0.6980  0.8258  0.6703
 0.8013  0.3473  0.1702  0.0716  0.4446  0.6174  0.7090
 0.7707  0.5348  0.4092  0.4479  0.4359  0.5640  0.8792
 0.7390  0.4351  0.2730  0.8421  0.9516  0.1133  0.8564
 0.1610  0.4816  0.5535  0.1207  0.4755  0.3246  0.2840
 0.9842  0.2202  0.6269  0.7427  0.7348  0.0188  0.2262
[torch.FloatTensor of size 7x7]



#### Testing the neural net on random data
The purpose of this is just to make sure that there are no compilation errors when the net is used.

In [14]:
test_net = Net()
random_input = Variable(torch.randn(5, col_num))
print(random_input)
test_out = test_net(random_input)
print(test_out)

Variable containing:
-0.2852  2.4541 -2.5827 -0.0814  1.9959 -0.0096  0.1887
 0.3453  0.3512  0.9091 -0.8473  0.2835 -0.4485 -0.0090
 0.1916  0.3022 -0.8995 -1.1556  0.3704  0.0765  1.7199
-2.0099  0.1947  1.2209  1.2088  1.7082  0.2674 -1.6513
 0.0421 -0.2005  0.1479 -1.2144  0.6888  0.1857 -0.3128
[torch.FloatTensor of size 5x7]

Variable containing:
 0.0647  0.0000
 0.1056  0.0000
 0.1040  0.0000
 0.0908  0.0139
 0.1130  0.0000
[torch.FloatTensor of size 5x2]



### Create optimizer, set input and target
#### Creation of the optimizer with the net parameters and the learning rate
The learning rate (lr) is extremely important and sensitive.  
It determines how fast the network learns, but setting it too high won't give any interesting data because it will "bounce around" without getting more precise data. 


**It is important to note that we change the learning rate as the learning goes,** indeed, the learning rate needs to be slower (and therefore smaller) as the we get a smaller loss. The learning that you can see is the one used on the latest iteration of our training.

In [24]:
import torch.optim as optim

# create your optimizer
#working version: lr=0.0000001
optimizer = optim.SGD(net.parameters(), lr=0.00000001) #the mother of all rates

#### Setting the input tensor, the output and the target tensor

In [25]:
input = Variable(torch.from_numpy(input_data_train))
input = input.float()
output_data_train = np.reshape(output_data_train, -1)
target = Variable(torch.from_numpy(output_data_train))
#target = target.float()

In [26]:
params = list(net.parameters())
#print(params)

#### Training Loop
- Run the input into the net
- Calculate the loss between the output and the target
- Calculate the gradient
- Optimize the net by modifying the parameters using the gradient and the learning rate

In [27]:
for i in range(1000):
    # in your training loop:
    optimizer.zero_grad()   # zero the gradient buffers

    
    output = net(input)
    #print(output)
    
    criterion = nn.CrossEntropyLoss()

    loss = criterion(output, target)
    loss.backward()     #calculates gradient
    optimizer.step()    # Does the update (applies gradient (multiplies by learning rate and add))
    if i % 100 == 0:
        print('LOSS iteration: %d' %i)
        print(loss)
        

LOSS iteration: 0
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 100
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 200
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 300
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 400
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 500
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 600
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 700
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 800
Variable containing:
 0.6578
[torch.FloatTensor of size 1]

LOSS iteration: 900
Variable containing:
 0.6578
[torch.FloatTensor of size 1]



In [28]:
print(loss)

Variable containing:
 0.6578
[torch.FloatTensor of size 1]



In [29]:
print(output_data_test.shape)

(4112,)


### Testing with training data

Testing with training data allows us to verify that our training is functioning, which is also an information available by the loss variable.

#### Defaulting


In [30]:
in_data_default = Variable(torch.from_numpy(input_data_train_default))
in_data_default = in_data_default.float()

print(in_data_default.size())
out = net(in_data_default)

print('softmax prob')
softout = F.softmax(out)
print(softout[:10])

training_psum_d = 0
count_d = 0
for o_d in softout:
    training_psum_d = training_psum_d + o_d[0]
    count_d = count_d + 1
    
print(training_psum_d)
print(count_d)
training_psum_d = training_psum_d / count_d
print(training_psum_d)


torch.Size([4114, 7])
softmax prob
Variable containing:
 0.4857  0.5143
 0.6343  0.3657
 0.3701  0.6299
 0.5531  0.4469
 0.3991  0.6009
 0.4170  0.5830
 0.4416  0.5584
 0.3759  0.6241
 0.5164  0.4836
 0.5386  0.4614
[torch.FloatTensor of size 10x2]

Variable containing:
 1844.3379
[torch.FloatTensor of size 1]

4114
Variable containing:
 0.4483
[torch.FloatTensor of size 1]



We can see that the first value of the output of the net, when we give it defaulting inputs, is on average 0.4483.  We will use this result under in order to predict the delinquency of the loan.

---
#### Not defaulting

In [31]:
in_data_ndef = Variable(torch.from_numpy(input_data_train_ndef))
in_data_ndef = in_data_ndef.float()

out = net(in_data_ndef)

print('softmax prob')
softout = F.softmax(out)
print(softout[:10])
print(output_data_ndef[:10])

training_psum_nd = 0
count_nd = 0
for o_nd in softout:
    training_psum_nd = training_psum_nd + o_nd[0]
    count_nd = count_nd + 1
    
print(training_psum_nd)
print(count_nd)
training_psum_nd = training_psum_nd / count_nd
print(training_psum_nd)


softmax prob
Variable containing:
 0.5947  0.4053
 0.5911  0.4089
 0.6246  0.3754
 0.5318  0.4682
 0.4848  0.5152
 0.5718  0.4282
 0.5470  0.4530
 0.5504  0.4496
 0.5643  0.4357
 0.6161  0.3839
[torch.FloatTensor of size 10x2]

[0 0 0 0 0 0 0 0 0 0]
Variable containing:
 2039.7925
[torch.FloatTensor of size 1]

4114
Variable containing:
 0.4958
[torch.FloatTensor of size 1]



We can see that the first value of the output of the net, when we give it non-defaulting inputs, is on average 0.4958. We will use this result under in order to predict the delinquency of the loan.

---


In [32]:
print('Average probability of defaulting is: ', training_psum_d)
print('Average probability of NOT defaulting is: ', training_psum_nd)

Average probability of defaulting is:  Variable containing:
 0.4483
[torch.FloatTensor of size 1]

Average probability of NOT defaulting is:  Variable containing:
 0.4958
[torch.FloatTensor of size 1]



### Testing with test data
We then follow the same steps with our test data: this is data that the net has not been trained on, therefore more accurately testing the net's ability.
#### Defaulting


In [33]:
in_test_data_default = Variable(torch.from_numpy(input_data_test_default))
in_test_data_default = in_test_data_default.float()

print(in_test_data_default.size())
out = net(in_test_data_default)

print('softmax prob')
softout = F.softmax(out)
print(softout[:10])

testing_psum_d = 0
count_d = 0
for o_d in softout:
    testing_psum_d = testing_psum_d + o_d[0]
    count_d = count_d + 1
    
print(testing_psum_d)
print(count_d)
testing_psum_d = testing_psum_d / count_d
print(testing_psum_d)


torch.Size([2056, 7])
softmax prob
Variable containing:
 0.4907  0.5093
 0.1177  0.8823
 0.3367  0.6633
 0.3055  0.6945
 0.4902  0.5098
 0.3239  0.6761
 0.5218  0.4782
 0.5034  0.4966
 0.5438  0.4562
 0.4642  0.5358
[torch.FloatTensor of size 10x2]

Variable containing:
 921.5962
[torch.FloatTensor of size 1]

2056
Variable containing:
 0.4482
[torch.FloatTensor of size 1]



We can see that the first value of the output of the net, when we give it defaulting inputs, is on average 0.4482. This is very close to the value obtained from the training data. We will use this result under in order to predict the delinquency of the loan.

---
#### Not defaulting

In [34]:
in_test_data_ndef = Variable(torch.from_numpy(input_data_test_ndef))
in_test_data_ndef = in_test_data_ndef.float()

out = net(in_test_data_ndef)

print('softmax prob')
softout = F.softmax(out)
print(softout[:10])
print(output_data_test_ndef[:10])

testing_psum_nd = 0
count_nd = 0
for o_nd in softout:
    testing_psum_nd = testing_psum_nd + o_nd[0]
    count_nd = count_nd + 1
    
print(testing_psum_nd)
print(count_nd)
testing_psum_nd = testing_psum_nd / count_nd
print(testing_psum_nd)

softmax prob
Variable containing:
 0.5993  0.4007
 0.5150  0.4850
 0.4359  0.5641
 0.5211  0.4789
 0.5571  0.4429
 0.4637  0.5363
 0.3825  0.6175
 0.4175  0.5825
 0.5992  0.4008
 0.4079  0.5921
[torch.FloatTensor of size 10x2]

[0 0 0 0 0 0 0 0 0 0]
Variable containing:
 1007.0131
[torch.FloatTensor of size 1]

2056
Variable containing:
 0.4898
[torch.FloatTensor of size 1]



We can see that the first value of the output of the net, when we give it non-defaulting inputs, is on average 0.4888. We will use this result under in order to predict the delinquency of the loan.

---


In [35]:
print('Average probability of defaulting is: ', testing_psum_d)
print('Average probability of NOT defaulting is: ', testing_psum_nd)

Average probability of defaulting is:  Variable containing:
 0.4482
[torch.FloatTensor of size 1]

Average probability of NOT defaulting is:  Variable containing:
 0.4898
[torch.FloatTensor of size 1]



### Predicting the delinquency

To predict the delinquency, we chose to take the mean between the average of the default values and the average of the non-default values (called cutoff).  
Any value that is less than the cutoff will correspond to a prediction that the loan will default, and oppositely, any value above the cutoff corresponds to a prediction that the loan will NOT default.

#### Training data
We start by trying to predict the delinquency using the training data.  
It is likely that we should have been able to get much better results on this situation, given that the neural net can be overfitted to this data.  
We were not able to overfit the neural net to the training data, and therefore those are the best results that we have for it. 


Note that this might be a blessing in disguise as overfitting the neural net usually means poor testing results, and we are satisfied with the result that we obtained using the test data (see next paragraph).

In [37]:
cutoff = (training_psum_d + training_psum_nd)/2
print("cutoff value is %.5f" % cutoff.data[0])

in_train_data = Variable(torch.from_numpy(input_data_train))
in_train_data = in_train_data.float()

out = net(in_train_data)

print('softmax prob')
softout = F.softmax(out)
print(softout[:10])
print(output_data_train[:10])

correct_count_train = 0
incorrect_count_train = 0
count_nd_train = 0
for i in range(softout.size()[0]):
    o_pred = softout[i][0]
    pred_default = o_pred <= cutoff
    if (pred_default.data[0] == output_data_train[i]):
        correct_count_train += 1
    else:
        incorrect_count_train += 1
    count_nd_train = count_nd_train + 1

print(correct_count_train)
print(incorrect_count_train)
print("Percent of correct:")
print('%.3f' %(correct_count_train/count_nd_train))
print("Percent of incorrect:")
print('%.3f' %(incorrect_count_train/count_nd_train))

cutoff value is 0.47206
softmax prob
Variable containing:
 0.4857  0.5143
 0.6343  0.3657
 0.3701  0.6299
 0.5531  0.4469
 0.3991  0.6009
 0.4170  0.5830
 0.4416  0.5584
 0.3759  0.6241
 0.5164  0.4836
 0.5386  0.4614
[torch.FloatTensor of size 10x2]

[1 1 1 1 1 1 1 1 1 1]
2588
1526
Percent of correct:
0.629
Percent of incorrect:
0.371


#### Testing data
Using the test data, we find that we are correctly predicting whether a loan will default 59.2% of the time, with the random prediction being 50%.

In [38]:
cutoff = (training_psum_d + training_psum_nd)/2
print("cutoff value is %.5f" % cutoff.data[0])

in_test_data = Variable(torch.from_numpy(input_data_test))
in_test_data = in_test_data.float()

out = net(in_test_data)

print('softmax prob')
softout = F.softmax(out)
print(softout[:10])
print(output_data_test[:10])

correct_count = 0
incorrect_count = 0
count_nd = 0
for i in range(softout.size()[0]):
    o_pred = softout[i][0]
    pred_default = o_pred <= cutoff
    if (pred_default.data[0] == output_data_test[i]):
        correct_count += 1
    else:
        incorrect_count += 1
    count_nd = count_nd + 1

#print(psum_nd)
#print(count_nd)
#psum_nd = psum_nd / count_nd
print(correct_count)
print(incorrect_count)
print("Percent of correct:")
print('%.3f' %(correct_count/count_nd))
print("Percent of incorrect:")
print('%.3f' %(incorrect_count/count_nd))

cutoff value is 0.47206
softmax prob
Variable containing:
 0.4907  0.5093
 0.1177  0.8823
 0.3367  0.6633
 0.3055  0.6945
 0.4902  0.5098
 0.3239  0.6761
 0.5218  0.4782
 0.5034  0.4966
 0.5438  0.4562
 0.4642  0.5358
[torch.FloatTensor of size 10x2]

[1 1 1 1 1 1 1 1 1 1]
2433
1679
Percent of correct:
0.592
Percent of incorrect:
0.408


## Discussion and analysis
### Real life application
We know that banks and debtors already do a preliminary check to know whether they should give a loan or not. In this dataset, we only had the loan that they accepted, but there are also datasets with the rejects. We decided not to use those sets because the data did not correspond, and we needed to use one of the given sets.  
In reality, it is worth it for them to accept a portion of the people that might default as long as this probability is small enough to be worth the profits from the ones not defaulting.  

Depending on the accuracy on the current system of prediction of the loans (which is, to our knowledge, mostly based on fixed criterias), it might be possible to improve it significantly. Of course, that would require a better set of data (debtors have this kind of data available) as well as some time sensitive data (not all data is equal between economic stability and crashes).  

Another potential use would be to give an idea to creditors as to how much of a loan they can hope to have given their current financial situation. 

### Suggestions
We lack a good understanding of the financial system, specifically the way loans are accepted or refused and the theory behind the value of loaning to some that might default as long as the chance is small enough. Working with someone with knowledge in the banking industry would be helpful.  

It would be interesting to give a percentage of chance as to whether the loan will default or not. This should be possible using the standard deviation of the results from the training set to create a scale in which each resul can fall.  

We do not have a good explanation as to why we could not get the neural network to overfit the training data. With that knowledge, with would have been certain that everything was working correctly and could have tried to derive the best fitting for testing results from there. Instead, we spent a lot of time trying to get the loss down without too much success.


### Authors
Brennan Nichyporuk  
Vivien Traineau
