#### Code Review
    - with DuBose Tuller

- Learned what the cu() function does, learned that we did not actually set it up to be ran on GPU correctly
- I think comments would help for both parties in understanding what the code actually does because it has been a long time since working on the project
- updated my code in order to make it more concise because there were points where repeated code could be made into one variable or cleaner functions.
- overall, the code was simiilar to mine, so not many changes were needed

In [1]:
# You should only need to run this cell once when you (re)start the kernel. Thereafter, includet should import any changes.
using Pkg
Pkg.activate("..") # change this to your package-install location
using BenchmarkTools
using MLDatasets: MNIST
using ImageCore
using Flux: onehotbatch, onecold
using CUDA
CUDA.allowscalar(false)

using Revise
includet("activations_and_losses.jl")
includet("dense_network_model.jl")
includet("dense_network_training.jl")

[32m[1m  Activating[22m[39m project at `~/Workspace/CSC381`


In [18]:
train_x, train_y = MNIST.traindata()
test_x,  test_y  = MNIST.testdata()

train_set_size = size(train_x)[end]
test_set_size = size(test_x)[end]
image_dimensions = size(train_x)[1:end-1]

println(train_set_size, " points in the training set")
println(test_set_size, " points in the training set")
println("image inputs have dimension ", image_dimensions)

60000 points in the training set
10000 points in the training set
image inputs have dimension (28, 28)


In [12]:
function test_accuracy(y_pred, real)
    CUDA.allowscalar(true)
    correct = 0
    incorrect = 0
    for i in 1:size(y_pred, 1)
        test_pred = y_pred[i, :]
        test_real = real[i, :]
        if test_pred[test_real][1] > 0.9
            correct += 1
        else
            incorrect +=1
        end 
    end
    percent_accurate = correct / (correct + incorrect)
end

test_accuracy (generic function with 1 method)

# First Test

In [19]:
num_inputs = image_dimensions[1] * image_dimensions[2]
num_labels = length(unique(train_y))
hidden_sizes = [15]
hidden_activations = [ReLU_activation]
output_activation = softmax_activation
nn1 = DenseNetworkCPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);
nn2 = DenseNetworkGPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);

In [20]:
inputs = Array{Float32}(reshape(permutedims(train_x, [3,1,2]), train_set_size, num_inputs))
GPU_inputs = cu(inputs)
targets = onehotbatch(train_y, 0:9)'
GPU_targets = cu(Array{Float32}(targets))
println("input shape: ", size(inputs))
println("target shape: ", size(targets))

input shape: (60000, 784)
target shape: (60000, 10)


In [21]:
epochs = 10
batch_size = 10

10

In [22]:
train!(nn1, inputs, targets, 0.1, epochs, batch_size; verbose=true)
train!(nn2, GPU_inputs, GPU_targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 1.1113095
epoch #2 ... average loss = 0.64510024
epoch #3 ... average loss = 0.5162058
epoch #4 ... average loss = 0.4586566
epoch #5 ... average loss = 0.4240687
epoch #6 ... average loss = 0.40288737
epoch #7 ... average loss = 0.38576153
epoch #8 ... average loss = 0.37362728
epoch #9 ... average loss = 0.36390746
epoch #10 ... average loss = 0.35574207
epoch #1 ... average loss = 1.1996245
epoch #2 ... average loss = 0.71958816
epoch #3 ... average loss = 0.5601815
epoch #4 ... average loss = 0.48820668
epoch #5 ... average loss = 0.44905385
epoch #6 ... average loss = 0.42498124
epoch #7 ... average loss = 0.40758044
epoch #8 ... average loss = 0.39734954
epoch #9 ... average loss = 0.38775736
epoch #10 ... average loss = 0.37959337


In [24]:
test_inputs = Array{Float32}(reshape(permutedims(test_x, [3,1,2]), test_set_size, num_inputs))
targets = onehotbatch(test_y, 0:9)'

10000×10 adjoint(OneHotMatrix(::Vector{UInt32})) with eltype Bool:
 0  0  0  0  0  0  0  1  0  0
 0  0  1  0  0  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0
 1  0  0  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  1
 0  0  0  0  0  1  0  0  0  0
 0  0  0  0  0  0  0  0  0  1
 1  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  1  0  0  0
 0  0  0  0  0  0  0  0  0  1
 ⋮              ⋮           
 0  0  0  0  0  1  0  0  0  0
 0  0  0  0  0  0  1  0  0  0
 0  0  0  0  0  0  0  1  0  0
 0  0  0  0  0  0  0  0  1  0
 0  0  0  0  0  0  0  0  0  1
 1  0  0  0  0  0  0  0  0  0
 0  1  0  0  0  0  0  0  0  0
 0  0  1  0  0  0  0  0  0  0
 0  0  0  1  0  0  0  0  0  0
 0  0  0  0  1  0  0  0  0  0
 0  0  0  0  0  1  0  0  0  0
 0  0  0  0  0  0  1  0  0  0

In [25]:
y_pred_cpu = predict(nn1, test_inputs)
test_accuracy(y_pred_cpu, targets)

0.6436

In [26]:
y_pred_gpu = predict(nn2, test_inputs)
test_accuracy(y_pred_gpu, targets)

0.6228

# Second Test

In [27]:
hidden_sizes = [30, 30, 30]
hidden_activations = [ReLU_activation, ReLU_activation, ReLU_activation]
output_activation = softmax_activation
nn1 = DenseNetworkCPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);
nn2 = DenseNetworkGPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);

In [28]:
inputs = Array{Float32}(reshape(permutedims(train_x, [3,1,2]), train_set_size, num_inputs))
GPU_inputs = cu(inputs)
targets = onehotbatch(train_y, 0:9)'
GPU_targets = cu(Array{Float32}(targets))
println("input shape: ", size(inputs))
println("target shape: ", size(targets))

input shape: (60000, 784)
target shape: (60000, 10)


In [29]:
epochs = 10
batch_size = 10

10

In [30]:
train!(nn1, inputs, targets, 0.1, epochs, batch_size; verbose=true)
println("************************")
train!(nn2, GPU_inputs, GPU_targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 2.0384738
epoch #2 ... average loss = 1.0680647
epoch #3 ... average loss = 0.60658896
epoch #4 ... average loss = 0.47340494
epoch #5 ... average loss = 0.4213134
epoch #6 ... average loss = 0.3880435
epoch #7 ... average loss = 0.36332297
epoch #8 ... average loss = 0.34977695
epoch #9 ... average loss = 0.33501244
epoch #10 ... average loss = 0.32421112
************************
epoch #1 ... average loss = 2.0046678
epoch #2 ... average loss = 0.951921
epoch #3 ... average loss = 0.5621962
epoch #4 ... average loss = 0.4755887
epoch #5 ... average loss = 0.43496954
epoch #6 ... average loss = 0.40817577
epoch #7 ... average loss = 0.38704714
epoch #8 ... average loss = 0.37035236
epoch #9 ... average loss = 0.35649133
epoch #10 ... average loss = 0.34702477


In [31]:
test_inputs = Array{Float32}(reshape(permutedims(test_x, [3,1,2]), test_set_size, num_inputs))
targets = onehotbatch(test_y, 0:9)'
y_pred_cpu = predict(nn1, test_inputs)
display(test_accuracy(y_pred_cpu, targets))
y_pred_gpu = predict(nn2, test_inputs)
display(test_accuracy(y_pred_gpu, targets))

0.6848

0.6825

# Third Test

In [8]:
hidden_sizes = [1000]
hidden_activations = [ReLU_activation]
output_activation = softmax_activation
nn2 = DenseNetworkGPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);
inputs = Array{Float32}(reshape(permutedims(train_x, [3,1,2]), train_set_size, num_inputs))
GPU_inputs = cu(inputs)
targets = onehotbatch(train_y, 0:9)'
GPU_targets = cu(Array{Float32}(targets))
println("input shape: ", size(inputs))
println("target shape: ", size(targets))

input shape: (60000, 784)
target shape: (60000, 10)


In [10]:
epochs = 10
batch_size = 10
train!(nn2, GPU_inputs, GPU_targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 0.382697
epoch #2 ... average loss = 0.33146527
epoch #3 ... average loss = 0.3039106
epoch #4 ... average loss = 0.28622618
epoch #5 ... average loss = 0.27361915
epoch #6 ... average loss = 0.26480538
epoch #7 ... average loss = 0.2556093
epoch #8 ... average loss = 0.24655691
epoch #9 ... average loss = 0.2402749
epoch #10 ... average loss = 0.23451093


In [13]:
test_inputs = Array{Float32}(reshape(permutedims(test_x, [3,1,2]), test_set_size, num_inputs))
targets = onehotbatch(test_y, 0:9)'
println("************************")
println("GPU ACCURACY")
y_pred_gpu = predict(nn2, test_inputs)
display(test_accuracy(y_pred_gpu, targets))

************************
GPU ACCURACY


0.7526

# Fourth Test

In [14]:
hidden_sizes = [1000, 1000]
hidden_activations = [ReLU_activation, ReLU_activation]
output_activation = softmax_activation
nn2 = DenseNetworkGPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);
inputs = Array{Float32}(reshape(permutedims(train_x, [3,1,2]), train_set_size, num_inputs))
GPU_inputs = cu(inputs)
targets = onehotbatch(train_y, 0:9)'
GPU_targets = cu(Array{Float32}(targets))
println("input shape: ", size(inputs))
println("target shape: ", size(targets))

input shape: (60000, 784)
target shape: (60000, 10)


In [15]:
epochs = 10
batch_size = 10
train!(nn2, GPU_inputs, GPU_targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 0.3447325
epoch #2 ... average loss = 0.2710708
epoch #3 ... average loss = 0.23837233
epoch #4 ... average loss = 0.21882996
epoch #5 ... average loss = 0.20484327
epoch #6 ... average loss = 0.18103561
epoch #7 ... average loss = 0.16922922
epoch #8 ... average loss = 0.16545115
epoch #9 ... average loss = 0.14983746
epoch #10 ... average loss = 0.1450526


In [17]:
test_inputs = Array{Float32}(reshape(permutedims(test_x, [3,1,2]), test_set_size, num_inputs))
targets = onehotbatch(test_y, 0:9)'
println("************************")
println("GPU ACCURACY")
y_pred_gpu = predict(nn2, test_inputs)
display(test_accuracy(y_pred_gpu, targets))

************************
GPU ACCURACY


0.8448

# Benchmark for Time

In [2]:
train_x, train_y = MNIST.traindata()
test_x,  test_y  = MNIST.testdata()

train_set_size = size(train_x)[end]
test_set_size = size(test_x)[end]
image_dimensions = size(train_x)[1:end-1]

println(train_set_size, " points in the training set")
println(test_set_size, " points in the training set")
println("image inputs have dimension ", image_dimensions)
num_inputs = image_dimensions[1] * image_dimensions[2]
num_labels = length(unique(train_y))
hidden_sizes = [100, 30, 30, 30]
hidden_activations = [ReLU_activation, ReLU_activation, ReLU_activation, ReLU_activation]
output_activation = softmax_activation
nn1 = DenseNetworkCPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);
nn2 = DenseNetworkGPU(num_inputs, num_labels, hidden_sizes; hidden_activations=hidden_activations, output_activation=output_activation);
inputs = Array{Float32}(reshape(permutedims(train_x, [3,1,2]), train_set_size, num_inputs))
GPU_inputs = cu(inputs)
targets = onehotbatch(train_y, 0:9)'
GPU_targets = cu(Array{Float32}(targets))
println("input shape: ", size(inputs))
println("target shape: ", size(targets))

60000 points in the training set
10000 points in the training set
image inputs have dimension (28, 28)
input shape: (60000, 784)
target shape: (60000, 10)


In [4]:
epochs = 10
batch_size = 10

10

In [8]:
@benchmark CUDA.@sync train!(nn1, inputs, targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 1.7144024
epoch #2 ... average loss = 0.70992094
epoch #3 ... average loss = 0.546513
epoch #4 ... average loss = 0.4560476
epoch #5 ... average loss = 0.4077106
epoch #6 ... average loss = 0.3860169
epoch #7 ... average loss = 0.37056953
epoch #8 ... average loss = 0.34829906
epoch #9 ... average loss = 0.33297735
epoch #10 ... average loss = 0.31371465
epoch #1 ... average loss = 0.30302054
epoch #2 ... average loss = 0.28794122
epoch #3 ... average loss = 0.28256056
epoch #4 ... average loss = 0.27182493
epoch #5 ... average loss = 0.2618159
epoch #6 ... average loss = 0.25784504
epoch #7 ... average loss = 0.2602559
epoch #8 ... average loss = 0.25141895
epoch #9 ... average loss = 0.242885
epoch #10 ... average loss = 0.24037871
epoch #1 ... average loss = 0.23723467
epoch #2 ... average loss = 0.22892968
epoch #3 ... average loss = 0.22854039
epoch #4 ... average loss = 0.22322017
epoch #5 ... average loss = 0.22453482
epoch #6 ... average loss = 0.220

BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took [34m141.642 s[39m (1.01% GC) to evaluate,
 with a memory estimate of [33m110.51 GiB[39m, over [33m10382840[39m allocations.

In [5]:
@benchmark CUDA.@sync train!(nn2, GPU_inputs, GPU_targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 1.8711956
epoch #2 ... average loss = 0.9251965
epoch #3 ... average loss = 0.6509949
epoch #4 ... average loss = 0.49028608
epoch #5 ... average loss = 0.41271523
epoch #6 ... average loss = 0.3658207
epoch #7 ... average loss = 0.33951637
epoch #8 ... average loss = 0.31961238
epoch #9 ... average loss = 0.31122878
epoch #10 ... average loss = 0.29561642
epoch #1 ... average loss = 0.28984374
epoch #2 ... average loss = 0.28074974
epoch #3 ... average loss = 0.27309895
epoch #4 ... average loss = 0.2675291
epoch #5 ... average loss = 0.2632172
epoch #6 ... average loss = 0.25606444
epoch #7 ... average loss = 0.25427902
epoch #8 ... average loss = 0.2514099
epoch #9 ... average loss = 0.2506071
epoch #10 ... average loss = 0.23990731
epoch #1 ... average loss = 0.24599384
epoch #2 ... average loss = 0.23747373
epoch #3 ... average loss = 0.2340533
epoch #4 ... average loss = 0.22914945
epoch #5 ... average loss = 0.22640067
epoch #6 ... average loss = 0.22

BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took [34m110.690 s[39m (2.90% GC) to evaluate,
 with a memory estimate of [33m70.10 GiB[39m, over [33m129007071[39m allocations.

In [28]:
epochs = 1
batch_size = 1000000

@benchmark CUDA.@sync train!(nn1, inputs, targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 2.2111588
epoch #1 ... average loss = 2.0372195
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN


BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m2.802 s[22m[39m … [35m5.694 s[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.29% … 0.16%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m4.248 s            [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.20%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m4.248 s[22m[39m ± [32m2.045 s[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.20% ± 0.09%

  [34m█[39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [32m [39m[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m█[39m [39m 
  [34m█[39m[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[39m▁[

In [29]:
@benchmark CUDA.@sync train!(nn2, GPU_inputs, GPU_targets, 0.1, epochs, batch_size; verbose=true)

epoch #1 ... average loss = 2.2608552
epoch #1 ... average loss = 2.2526152
epoch #1 ... average loss = 2.0883303
epoch #1 ... average loss = 21.581697
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN
epoch #1 ... average loss = NaN


BenchmarkTools.Trial: 11 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m274.125 ms[22m[39m … [35m825.842 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m1.70% … 0.65%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m465.639 ms               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m1.13%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m485.937 ms[22m[39m ± [32m149.229 ms[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m1.25% ± 0.53%

  [39m▁[39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m█[39m [39m [39m▁[34m [39m[39m [39m [39m▁[39m [32m [39m[39m▁[39m [39m▁[39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m [39m 
  [39m█[39m▁[39m▁[39m▁

This shows the difference in using a CPU and a GPU.

The top graph shows the CPU run time with multiple tests, and the bottom one shows the same but with a GPU

# Conclusions

## Best Architecture

We arrived at the best architecture by at first starting with a basic architecture with one hidden layer with 15 neurons.

The reason for this is to start the net training with a basic starting point. It resulted in a accuracy of 60%. The way we calculated the accuracy was to round up predicted oneHotEncoded vector. If a number was greater than 0.9 then we rounded that up to 1, and compared the location of that label to the actual label and if they were the same, then we counted it as correct prediction.

We messed with the arhictecture with multiple layers and varying neurons, but kept the epochs and batch size the same. We saw not much increase in accuracy

For the third test we decided to radically change the amount of neurons in the hidden layer to 1000 and got much better results of 75%.

For the fourth test we added another hidden layer of 1000 neurons and that further increased the accuracy to about 85%. We decided that this is the best architecture.

We tinkered with the learning rate from 0.001 to 0.1 in the scratchwork notebook. We chose 0.1 because we were not doing many epochs so needed larger changes in the gradient. The reason we kept the epochs low was because the training with these handcoded networks are slow.

## Time

The GPU will run faster because it is able to conduct many calculations in parallel. 

We found that there is a tradeoff with increased speed. In order to do more calculations in parallel, then the computer needs to allocate much more space. We found that this was worth it because the handcoded networks are not as efficient as other well made packages.