In [1]:
using Revise, LinearAlgebra, Random, CSV, Printf

Load the NeuralNetworks module.

In [2]:
includet("./NeuralNetworks//NeuralNetworks.jl")
using .NeuralNetworks

Read the training data, and drop the ID-column, since it is of no predictive value. Split into design matrix and outcome vector.

In [4]:
Xytrain = CSV.read("Data/CCDataCleanTrain.csv")
colnames = names(Xytrain)
println(colnames)
Xytrain = Matrix(Xytrain[2:end])
Xtrain, ytrain = Xytrain[:, 1:27], Xytrain[:, 28]

Symbol[:ID, :LIMIT_BAL, :SEX, :AGE, :EDUCATION2, :EDUCATION3, :EDUCATION4, :EDUCATION5, :MARRIAGE2, :MARRIAGE3, :PAY_0, :PAY_2, :PAY_3, :PAY_4, :PAY_5, :PAY_6, :BILL_AMT1, :BILL_AMT2, :BILL_AMT3, :BILL_AMT4, :BILL_AMT5, :BILL_AMT6, :PAY_AMT1, :PAY_AMT2, :PAY_AMT3, :PAY_AMT4, :PAY_AMT5, :PAY_AMT6, Symbol("default.payment.next.month")]


│   caller = top-level scope at In[4]:4
└ @ Core In[4]:4


([0.555185150030247 1.0 … 0.6189477105406902 -0.3012860091737996; 0.09418956435995768 1.0 … -0.2434678905136638 -0.2772231272518333; … ; 0.7856829428653916 1.0 … -0.30414220776795353 -0.15060377983071738; -0.136308228475187 0.0 … -0.3019178947117881 -0.24066605663961527], [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0])

In [5]:
size(Xytrain)

(21000, 28)

Perform onehot encoding on y-vector.

In [6]:
y_onehot = Float64.([yi == c for yi in ytrain, c in (0, 1)])

21000×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 0.0  1.0
 1.0  0.0
 1.0  0.0
 ⋮       
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 1.0  0.0
 0.0  1.0
 1.0  0.0
 0.0  1.0

### Setting up the NN

#### FeedForward Model Setup

Here, the feed forward model is selected as the architecture of the neural network we will use.

In [7]:
ff = FeedForwardNet{Float64}(Xtrain,     #X_data
                             y_onehot,   #Y_data
                             20,         #n_hidden_neurons
                             2,          #n_categories
                             100,        #epochs
                             500,        #batch_size
                             0.1,        #η
                             .5)         #λ

ff

FeedForwardNet{Float64}([0.555185150030247 1.0 … 0.6189477105406902 -0.3012860091737996; 0.09418956435995768 1.0 … -0.2434678905136638 -0.2772231272518333; … ; 0.7856829428653916 1.0 … -0.30414220776795353 -0.15060377983071738; -0.136308228475187 0.0 … -0.3019178947117881 -0.24066605663961527], [1.0 0.0; 0.0 1.0; … ; 1.0 0.0; 0.0 1.0], 20, 2, 100, 500, 0.1, 0.5, 21000, 27, 42)

In [8]:
ff.iterations

42

#### Parameters

Initialize the parameters for the neural network. `rng` is a random number generator used to randomly select starting values for the weights. 

In [9]:
rng = MersenneTwister(1234)
params = Parameters(rng, ff)

Parameters{Float64}([0.8673472019512456 0.7692782605345824 … -0.946598947349706 1.2725914022859486; -0.9017438158568171 -0.31015257323306406 … -0.023545616079346052 -1.899221574511888; … ; -0.07401454242444336 -0.6864935365141717 … -0.18761736894830652 -1.4919454611677114; 0.1509756176321479 -0.7129319615024848 … -0.7321608989850492 -2.8398124127085147], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.34416258762111895 0.25437048313045746; -0.3265358894486727 -0.06812133639648942; … ; -0.3254657985816864 0.9634421310448461; -1.5906210270117098 0.4767669134455632], [0.01, 0.01], [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.0 0.0; 0.0 0.0; … ; 0.0 0.0; 0.0 0.0], [0.0 0.0; 0.0 0.0; … ; 0.0 0.0; 0.0 0.0])

#### Neural Network Model

Create the neural network model with the architecture and the unfitted parameter values.

In [10]:
nn = NeuralNetwork(ff, params, false)

NeuralNetwork{FeedForwardNet{Float64},Parameters{Float64}}(FeedForwardNet{Float64}([0.555185150030247 1.0 … 0.6189477105406902 -0.3012860091737996; 0.09418956435995768 1.0 … -0.2434678905136638 -0.2772231272518333; … ; 0.7856829428653916 1.0 … -0.30414220776795353 -0.15060377983071738; -0.136308228475187 0.0 … -0.3019178947117881 -0.24066605663961527], [1.0 0.0; 0.0 1.0; … ; 1.0 0.0; 0.0 1.0], 20, 2, 100, 500, 0.1, 0.5, 21000, 27, 42), Parameters{Float64}([0.8673472019512456 0.7692782605345824 … -0.946598947349706 1.2725914022859486; -0.9017438158568171 -0.31015257323306406 … -0.023545616079346052 -1.899221574511888; … ; -0.07401454242444336 -0.6864935365141717 … -0.18761736894830652 -1.4919454611677114; 0.1509756176321479 -0.7129319615024848 … -0.7321608989850492 -2.8398124127085147], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.34416258762111895 0.25437048313045746; -0.3265358894486727 -0.0681213363964894

#### Training

Train the neural network. The random number generator is used when sampling batches from the training data.

In [11]:
Train!(nn, rng)

NeuralNetwork{FeedForwardNet{Float64},Parameters{Float64}}(FeedForwardNet{Float64}([0.555185150030247 1.0 … 0.6189477105406902 -0.3012860091737996; 0.09418956435995768 1.0 … -0.2434678905136638 -0.2772231272518333; … ; 0.7856829428653916 1.0 … -0.30414220776795353 -0.15060377983071738; -0.136308228475187 0.0 … -0.3019178947117881 -0.24066605663961527], [1.0 0.0; 0.0 1.0; … ; 1.0 0.0; 0.0 1.0], 20, 2, 100, 500, 0.1, 0.5, 21000, 27, 42), Parameters{Float64}([0.8673472019512456 0.7692782605345824 … -0.946598947349706 1.2725914022859486; -0.9017438158568171 -0.31015257323306406 … -0.023545616079346052 -1.899221574511888; … ; -0.07401454242444336 -0.6864935365141717 … -0.18761736894830652 -1.4919454611677114; 0.1509756176321479 -0.7129319615024848 … -0.7321608989850492 -2.8398124127085147], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [0.34416258762111895 0.25437048313045746; -0.3265358894486727 -0.0681213363964894

#### Predicting and scoring

Predict and get accuracy score on the training set. Since Julia is 1-indexed, subtract 1 from the predicted labels to get correct ones.

In [12]:
y_hat = Predict(nn, Xtrain) .- 1
Score(ytrain, y_hat)

0.23485714285714285

This score is not very good, and needs to be improved. 

## Test Data

Read in the test data, drop the ID-column, and split into design matrix and outcome vector.

In [14]:
Xytest = CSV.read("Data/CCDataCleanTest.csv")
Xytest = Matrix(Xytest[2:end])
Xtest, ytest = Xytest[:, 1:27], Xytest[:, 28]

│   caller = top-level scope at In[14]:2
└ @ Core In[14]:2


([-0.3668060213103317 0.0 … -0.056996312638463886 -0.13643213062186701; -0.6741364117571912 0.0 … -0.2800454829928283 -0.2787270573719562; … ; -1.1351319974274805 0.0 … -0.2423557339855811 -0.3012860091737996; 2.16866969987626 0.0 … 0.4667058391409246 0.08788483114107844], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0])

## Evaluation on Test Set
Performing grid search on the test set to find optimal values for $\eta$ and $\lambda$

In [15]:
η_grid = exp10.(range(-5, stop=1, length=7))
λ_grid = exp10.(range(-5, stop=1, length=7))

grid = zeros(Float64, (7, 7))

for (i, η) in enumerate(η_grid)
    for (j, λ) in enumerate(λ_grid)
        
        ffn = FeedForwardNet{Float64}(Xtrain,    #X_data
                                      y_onehot,  #Y_data
                                      20,        #n_hidden_neurons
                                      2,         #n_categories
                                      100,       #epochs
                                      500,      #batch_size
                                      η,         #η
                                      λ)         #λ
        
        params = Parameters(rng, ffn)
        
        nn = NeuralNetwork(ffn, params, false)
        
        Train!(nn, rng)
        
        y_hat = Predict(nn, Xtest) .- 1
        
        s = Score(ytest, y_hat)
        
        println(@sprintf "η: %.5e  λ: %.5e Accuracy: %.6f" η λ s)
        
        grid[i, j] = s
    end
end

η: 1.00000e-05  λ: 1.00000e-05 Accuracy: 0.231000
η: 1.00000e-05  λ: 1.00000e-04 Accuracy: 0.304000
η: 1.00000e-05  λ: 1.00000e-03 Accuracy: 0.667444
η: 1.00000e-05  λ: 1.00000e-02 Accuracy: 0.254667
η: 1.00000e-05  λ: 1.00000e-01 Accuracy: 0.774222
η: 1.00000e-05  λ: 1.00000e+00 Accuracy: 0.314889
η: 1.00000e-05  λ: 1.00000e+01 Accuracy: 0.366000
η: 1.00000e-04  λ: 1.00000e-05 Accuracy: 0.391667
η: 1.00000e-04  λ: 1.00000e-04 Accuracy: 0.240000
η: 1.00000e-04  λ: 1.00000e-03 Accuracy: 0.450889
η: 1.00000e-04  λ: 1.00000e-02 Accuracy: 0.761889
η: 1.00000e-04  λ: 1.00000e-01 Accuracy: 0.322778
η: 1.00000e-04  λ: 1.00000e+00 Accuracy: 0.220111
η: 1.00000e-04  λ: 1.00000e+01 Accuracy: 0.779889
η: 1.00000e-03  λ: 1.00000e-05 Accuracy: 0.610889
η: 1.00000e-03  λ: 1.00000e-04 Accuracy: 0.236778
η: 1.00000e-03  λ: 1.00000e-03 Accuracy: 0.646000
η: 1.00000e-03  λ: 1.00000e-02 Accuracy: 0.370778
η: 1.00000e-03  λ: 1.00000e-01 Accuracy: 0.752111
η: 1.00000e-03  λ: 1.00000e+00 Accuracy: 0.403889


In [17]:
acc = maximum(grid)
η_optim = η_grid[argmax(grid)[1]]
λ_optim = λ_grid[argmax(grid)[2]]

@sprintf "Maximum Acc.: %.6f with η %.3e and λ %.3e" acc η_optim λ_optim

"Maximum Acc.: 0.789111 with η 1.000e+01 and λ 1.000e-03"

## Fitting With Optimal Parameters

Now we use the optimal values for $\eta$ and $\lambda$ to train the network, and then predict on the test set. 

In [18]:
ff_optim = FeedForwardNet{Float64}(Xtrain,    #X_data
                                   y_onehot,  #Y_data
                                   20,        #n_hidden_neurons
                                   2,         #n_categories
                                   100,       #epochs
                                   500,       #batch_size
                                   η_optim,   #η
                                   λ_optim)   #λ

params = Parameters(rng, ff_optim)
nn_optim = NeuralNetwork(ff_optim, params, false)

NeuralNetwork{FeedForwardNet{Float64},Parameters{Float64}}(FeedForwardNet{Float64}([0.555185150030247 1.0 … 0.6189477105406902 -0.3012860091737996; 0.09418956435995768 1.0 … -0.2434678905136638 -0.2772231272518333; … ; 0.7856829428653916 1.0 … -0.30414220776795353 -0.15060377983071738; -0.136308228475187 0.0 … -0.3019178947117881 -0.24066605663961527], [1.0 0.0; 0.0 1.0; … ; 1.0 0.0; 0.0 1.0], 20, 2, 100, 500, 10.0, 0.001, 21000, 27, 42), Parameters{Float64}([0.16148374241835553 0.6565250687919649 … -0.23539422287993353 -0.3820236247275385; -0.5067231977166837 2.774496194457105 … 0.6876281991793981 1.8518329009130197; … ; 0.33179883794538484 -0.5682512512863014 … 1.072089221362957 -0.4656796061826088; -0.33088193393745485 -1.1321523606201436 … -1.1725323201758697 1.1069535428126334], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [-0.22071181842164986 -1.5774811868198162; 1.907083899461616 -0.4966072961166837; …

In [19]:
Train!(nn_optim, rng)

NeuralNetwork{FeedForwardNet{Float64},Parameters{Float64}}(FeedForwardNet{Float64}([0.555185150030247 1.0 … 0.6189477105406902 -0.3012860091737996; 0.09418956435995768 1.0 … -0.2434678905136638 -0.2772231272518333; … ; 0.7856829428653916 1.0 … -0.30414220776795353 -0.15060377983071738; -0.136308228475187 0.0 … -0.3019178947117881 -0.24066605663961527], [1.0 0.0; 0.0 1.0; … ; 1.0 0.0; 0.0 1.0], 20, 2, 100, 500, 10.0, 0.001, 21000, 27, 42), Parameters{Float64}([0.16148374241835553 0.6565250687919649 … -0.23539422287993353 -0.3820236247275385; -0.5067231977166837 2.774496194457105 … 0.6876281991793981 1.8518329009130197; … ; 0.33179883794538484 -0.5682512512863014 … 1.072089221362957 -0.4656796061826088; -0.33088193393745485 -1.1321523606201436 … -1.1725323201758697 1.1069535428126334], [0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01], [-0.22071181842164986 -1.5774811868198162; 1.907083899461616 -0.4966072961166837; …

Prediction and evaluation of accuracy on train set

In [21]:
y_hat = Predict(nn_optim, Xtrain) .- 1
Score(ytrain, y_hat)

0.6994285714285714

Prediction and evaluation of accuracy on test set

In [22]:
y_hat_test = Predict(nn_optim, Xtest) .- 1
Score(ytest, y_hat_test)

0.7043333333333334

Thus, we see that the score has improved a lot. And surprisingly, this network performs slightly better on the test data than on the training data.