## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
versioninfo()

Julia Version 0.6.0-rc1.0
Commit 6bdb3950bd (2017-05-07 00:00 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools

In [3]:
source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m./deprecated.jl:64[22m[22m
 [2] [1mArray[22m[22m[1m([22m[22m::Type{Float64}, ::Int64, ::Int64[1m)[22m[22m at [1m./deprecated.jl:51[22m[22m
 [3] [1mtraindata[22m[22m[1m([22m[22m[1m)[22m[22m at [1m/Users/david/.julia/v0.6/MNIST/src/MNIST.jl:88[22m[22m
 [4] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m./loading.jl:498[22m[22m
 [5] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[22m[22m at [1m/Users/david/.julia/v0.6/IJulia/src/execute_request.jl:156[22m[22m
 [6] [1meventloop[22m[22m[1m([22m[22m::ZMQ.Socket[1m)[22m[22m at [1m/Users/david/.julia/v0.6/IJulia/src/eventloop.jl:8[22m[22m
 [7] [1m(::IJulia.##9#12)[22m[22m[1m([22m[22m[1m)[22m[22m at [1m./task.jl:335[22m[22m
while loading In[4], in expression starting on line 1
Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::S

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5877166666666667
Accuracy epoch 1 is :0.70205
Accuracy epoch 1 is :0.7513666666666666
Accuracy epoch 1 is :0.7771833333333333
Accuracy epoch 1 is :0.7952
Accuracy epoch 1 is :0.8076166666666666
Accuracy epoch 1 is :0.8163666666666667
Accuracy epoch 1 is :0.8235333333333333
Accuracy epoch 1 is :0.82915
Accuracy epoch 1 is :0.8340166666666666
Accuracy epoch 1 is :0.8383833333333334
Accuracy epoch 1 is :0.8424
Accuracy epoch 1 is :0.8450833333333333
Accuracy epoch 1 is :0.8476833333333333


BenchmarkTools.Trial: 
  memory estimate:  581.69 MiB
  allocs estimate:  655081
  --------------
  minimum time:     911.939 ms (9.59% GC)
  median time:      945.465 ms (8.91% GC)
  mean time:        952.875 ms (8.76% GC)
  maximum time:     1.011 s (7.75% GC)
  --------------
  samples:          6
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6088666666666667
Accuracy epoch 1 is :0.70935
Accuracy epoch 1 is :0.7524333333333333
Accuracy epoch 1 is :0.7760833333333333
Accuracy epoch 1 is :0.7926333333333333
Accuracy epoch 1 is :0.80415
Accuracy epoch 1 is :0.8133333333333334
Accuracy epoch 1 is :0.8204666666666667
Accuracy epoch 1 is :0.8267
Accuracy epoch 1 is :0.8313833333333334
Accuracy epoch 1 is :0.8357166666666667
Accuracy epoch 1 is :0.8392666666666667
Accuracy epoch 1 is :0.8424
Accuracy epoch 1 is :0.8453166666666667
Accuracy epoch 1 is :0.84805
Accuracy epoch 1 is :0.85035
Accuracy epoch 1 is :0.8523166666666666
Accuracy epoch 1 is :0.8540166666666666
Accuracy epoch 1 is :0.85625
Accuracy epoch 1 is :0.8575833333333334
Accuracy epoch 1 is :0.8591333333333333
Accuracy epoch 1 is :0.8600833333333333
Accuracy epoch 1 is :0.8613833333333333
Accuracy epoch 1 is :0.8629333333333333
Accuracy epoch 1 is :0.86425
Accuracy epoch 1 is :0.865


BenchmarkTools.Trial: 
  memory estimate:  187.84 MiB
  allocs estimate:  490908
  --------------
  minimum time:     352.676 ms (7.80% GC)
  median time:      362.943 ms (7.25% GC)
  mean time:        368.206 ms (7.13% GC)
  maximum time:     411.802 ms (6.23% GC)
  --------------
  samples:          14
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.60565
Accuracy epoch 1 is :0.71085
Accuracy epoch 1 is :0.7527833333333334
Accuracy epoch 1 is :0.7762833333333333
Accuracy epoch 1 is :0.7918833333333334
Accuracy epoch 1 is :0.8034833333333333
Accuracy epoch 1 is :0.8123666666666667
Accuracy epoch 1 is :0.8195166666666667
Accuracy epoch 1 is :0.8252
Accuracy epoch 1 is :0.83075
Accuracy epoch 1 is :0.83485
Accuracy epoch 1 is :0.8379833333333333
Accuracy epoch 1 is :0.8409833333333333
Accuracy epoch 1 is :0.8438833333333333
Accuracy epoch 1 is :0.8464
Accuracy epoch 1 is :0.8487333333333333
Accuracy epoch 1 is :0.8504666666666667
Accuracy epoch 1 is :0.8524
Accuracy epoch 1 is :0.8540166666666666
Accuracy epoch 1 is :0.8558833333333333
Accuracy epoch 1 is :0.8576666666666667
Accuracy epoch 1 is :0.8585333333333334
Accuracy epoch 1 is :0.85965
Accuracy epoch 1 is :0.861
Accuracy epoch 1 is :0.8621333333333333
Accuracy epoch 1 is :0.8630166666666667
Accuracy epoch 1 is :0.8639
Accuracy epoch 1 is :0.8647166666666

BenchmarkTools.Trial: 
  memory estimate:  153.74 MiB
  allocs estimate:  168537
  --------------
  minimum time:     334.486 ms (6.96% GC)
  median time:      354.597 ms (6.24% GC)
  mean time:        358.858 ms (6.28% GC)
  maximum time:     386.456 ms (4.97% GC)
  --------------
  samples:          15
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6072666666666666
Accuracy epoch 1 is :0.7085
Accuracy epoch 1 is :0.7536333333333334
Accuracy epoch 1 is :0.7773
Accuracy epoch 1 is :0.79215
Accuracy epoch 1 is :0.8029
Accuracy epoch 1 is :0.81185
Accuracy epoch 1 is :0.8189
Accuracy epoch 1 is :0.82535
Accuracy epoch 1 is :0.8301833333333334
Accuracy epoch 1 is :0.83405
Accuracy epoch 1 is :0.83755
Accuracy epoch 1 is :0.8409
Accuracy epoch 1 is :0.84315
Accuracy epoch 1 is :0.8458333333333333
Accuracy epoch 1 is :0.8478166666666667
Accuracy epoch 1 is :0.8499833333333333
Accuracy epoch 1 is :0.8519166666666667
Accuracy epoch 1 is :0.8538166666666667
Accuracy epoch 1 is :0.85535
Accuracy epoch 1 is :0.8572
Accuracy epoch 1 is :0.85865
Accuracy epoch 1 is :0.8598166666666667
Accuracy epoch 1 is :0.8607166666666667
Accuracy epoch 1 is :0.8616666666666667
Accuracy epoch 1 is :0.8625833333333334
Accuracy epoch 1 is :0.8635666666666667
Accuracy epoch 1 is :0.8642666666666666
Accuracy epoch 1 is :0.8654
Accuracy epo

BenchmarkTools.Trial: 
  memory estimate:  55.41 MiB
  allocs estimate:  152284
  --------------
  minimum time:     292.554 ms (2.45% GC)
  median time:      311.382 ms (2.84% GC)
  mean time:        315.694 ms (2.79% GC)
  maximum time:     350.124 ms (1.75% GC)
  --------------
  samples:          16
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
