## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
versioninfo()

Julia Version 0.6.0-rc1.0
Commit 6bdb3950bd (2017-05-07 00:00 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools

In [3]:
source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m./deprecated.jl:64[22m[22m
 [2] [1mArray[22m[22m[1m([22m[22m::Type{Float64}, ::Int64, ::Int64[1m)[22m[22m at [1m./deprecated.jl:51[22m[22m
 [3] [1mtraindata[22m[22m[1m([22m[22m[1m)[22m[22m at [1m/Users/david/.julia/v0.6/MNIST/src/MNIST.jl:88[22m[22m
 [4] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m./loading.jl:498[22m[22m
 [5] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[22m[22m at [1m/Users/david/.julia/v0.6/IJulia/src/execute_request.jl:156[22m[22m
 [6] [1meventloop[22m[22m[1m([22m[22m::ZMQ.Socket[1m)[22m[22m at [1m/Users/david/.julia/v0.6/IJulia/src/eventloop.jl:8[22m[22m
 [7] [1m(::IJulia.##9#12)[22m[22m[1m([22m[22m[1m)[22m[22m at [1m./task.jl:335[22m[22m
while loading In[4], in expression starting on line 1
Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::S

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6132166666666666
Accuracy epoch 1 is :0.7161166666666666
Accuracy epoch 1 is :0.7579333333333333
Accuracy epoch 1 is :0.78195
Accuracy epoch 1 is :0.7981
Accuracy epoch 1 is :0.8085
Accuracy epoch 1 is :0.8167
Accuracy epoch 1 is :0.8238
Accuracy epoch 1 is :0.8295333333333333
Accuracy epoch 1 is :0.8339166666666666
Accuracy epoch 1 is :0.8383833333333334
Accuracy epoch 1 is :0.842
Accuracy epoch 1 is :0.8450333333333333
Accuracy epoch 1 is :0.8477333333333333
Accuracy epoch 1 is :0.8500333333333333
Accuracy epoch 1 is :0.8519166666666667
Accuracy epoch 1 is :0.8539833333333333
Accuracy epoch 1 is :0.8558
Accuracy epoch 1 is :0.8575666666666667


BenchmarkTools.Trial: 
  memory estimate:  570.32 MiB
  allocs estimate:  651377
  --------------
  minimum time:     774.821 ms (11.54% GC)
  median time:      794.364 ms (11.25% GC)
  mean time:        797.025 ms (11.14% GC)
  maximum time:     828.901 ms (10.31% GC)
  --------------
  samples:          7
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5672
Accuracy epoch 1 is :0.6941
Accuracy epoch 1 is :0.74735
Accuracy epoch 1 is :0.7748
Accuracy epoch 1 is :0.7930166666666667
Accuracy epoch 1 is :0.80565
Accuracy epoch 1 is :0.81495
Accuracy epoch 1 is :0.8223
Accuracy epoch 1 is :0.8286333333333333
Accuracy epoch 1 is :0.8332666666666667
Accuracy epoch 1 is :0.8368833333333333
Accuracy epoch 1 is :0.8402
Accuracy epoch 1 is :0.8428833333333333
Accuracy epoch 1 is :0.8455
Accuracy epoch 1 is :0.8479
Accuracy epoch 1 is :0.85005
Accuracy epoch 1 is :0.8517
Accuracy epoch 1 is :0.8534666666666667
Accuracy epoch 1 is :0.8553
Accuracy epoch 1 is :0.8570333333333333
Accuracy epoch 1 is :0.8585
Accuracy epoch 1 is :0.8598333333333333
Accuracy epoch 1 is :0.8609166666666667
Accuracy epoch 1 is :0.8624166666666667
Accuracy epoch 1 is :0.86385
Accuracy epoch 1 is :0.8649166666666667
Accuracy epoch 1 is :0.8654333333333334
Accuracy epoch 1 is :0.8663833333333333
Accuracy epoch 1 is :0.8672333333333333
Accuracy epoch 

BenchmarkTools.Trial: 
  memory estimate:  168.74 MiB
  allocs estimate:  403891
  --------------
  minimum time:     181.127 ms (12.42% GC)
  median time:      191.025 ms (13.01% GC)
  mean time:        193.447 ms (12.77% GC)
  maximum time:     229.107 ms (11.54% GC)
  --------------
  samples:          26
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5811833333333334
Accuracy epoch 1 is :0.7026
Accuracy epoch 1 is :0.7492333333333333
Accuracy epoch 1 is :0.7760166666666667
Accuracy epoch 1 is :0.7925166666666666
Accuracy epoch 1 is :0.8052666666666667
Accuracy epoch 1 is :0.81565
Accuracy epoch 1 is :0.8234666666666667
Accuracy epoch 1 is :0.8293666666666667
Accuracy epoch 1 is :0.8346
Accuracy epoch 1 is :0.8378
Accuracy epoch 1 is :0.8417666666666667
Accuracy epoch 1 is :0.8444
Accuracy epoch 1 is :0.8469666666666666
Accuracy epoch 1 is :0.8491833333333333
Accuracy epoch 1 is :0.8509166666666667
Accuracy epoch 1 is :0.8526
Accuracy epoch 1 is :0.8544833333333334
Accuracy epoch 1 is :0.8558666666666667
Accuracy epoch 1 is :0.8576166666666667
Accuracy epoch 1 is :0.85935
Accuracy epoch 1 is :0.8606166666666667
Accuracy epoch 1 is :0.8615666666666667
Accuracy epoch 1 is :0.8626333333333334
Accuracy epoch 1 is :0.86355
Accuracy epoch 1 is :0.8646166666666667
Accuracy epoch 1 is :0.8653
Accuracy epoch 1 is :0.86

BenchmarkTools.Trial: 
  memory estimate:  137.82 MiB
  allocs estimate:  163364
  --------------
  minimum time:     172.431 ms (11.11% GC)
  median time:      184.856 ms (11.81% GC)
  mean time:        185.669 ms (11.43% GC)
  maximum time:     206.155 ms (11.67% GC)
  --------------
  samples:          27
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5978833333333333
Accuracy epoch 1 is :0.704
Accuracy epoch 1 is :0.7501166666666667
Accuracy epoch 1 is :0.7755833333333333
Accuracy epoch 1 is :0.79225
Accuracy epoch 1 is :0.8042
Accuracy epoch 1 is :0.8130166666666667
Accuracy epoch 1 is :0.8214166666666667
Accuracy epoch 1 is :0.8270166666666666
Accuracy epoch 1 is :0.8322666666666667
Accuracy epoch 1 is :0.8369
Accuracy epoch 1 is :0.8408833333333333
Accuracy epoch 1 is :0.8436333333333333
Accuracy epoch 1 is :0.8469
Accuracy epoch 1 is :0.84945
Accuracy epoch 1 is :0.8514
Accuracy epoch 1 is :0.8533333333333334
Accuracy epoch 1 is :0.8551333333333333
Accuracy epoch 1 is :0.8568833333333333
Accuracy epoch 1 is :0.85835
Accuracy epoch 1 is :0.8596166666666667
Accuracy epoch 1 is :0.8614166666666667
Accuracy epoch 1 is :0.8627
Accuracy epoch 1 is :0.8636333333333334
Accuracy epoch 1 is :0.8645666666666667
Accuracy epoch 1 is :0.8657333333333334
Accuracy epoch 1 is :0.8663666666666666
Accuracy epoch 1 is :0.867

BenchmarkTools.Trial: 
  memory estimate:  135.88 MiB
  allocs estimate:  162894
  --------------
  minimum time:     166.493 ms (12.82% GC)
  median time:      184.598 ms (11.08% GC)
  mean time:        183.802 ms (11.35% GC)
  maximum time:     199.643 ms (9.84% GC)
  --------------
  samples:          28
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
