## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
versioninfo()

Julia Version 0.6.0-rc1.0
Commit 6bdb3950bd (2017-05-07 00:00 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools

In [3]:
source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m./deprecated.jl:64[22m[22m
 [2] [1mArray[22m[22m[1m([22m[22m::Type{Float64}, ::Int64, ::Int64[1m)[22m[22m at [1m./deprecated.jl:51[22m[22m
 [3] [1mtraindata[22m[22m[1m([22m[22m[1m)[22m[22m at [1m/Users/david/.julia/v0.6/MNIST/src/MNIST.jl:88[22m[22m
 [4] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m./loading.jl:498[22m[22m
 [5] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[22m[22m at [1m/Users/david/.julia/v0.6/IJulia/src/execute_request.jl:156[22m[22m
 [6] [1meventloop[22m[22m[1m([22m[22m::ZMQ.Socket[1m)[22m[22m at [1m/Users/david/.julia/v0.6/IJulia/src/eventloop.jl:8[22m[22m
 [7] [1m(::IJulia.##9#12)[22m[22m[1m([22m[22m[1m)[22m[22m at [1m./task.jl:335[22m[22m
while loading In[4], in expression starting on line 1
Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::S

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5818833333333333
Accuracy epoch 1 is :0.6920666666666667
Accuracy epoch 1 is :0.7402
Accuracy epoch 1 is :0.7685166666666666
Accuracy epoch 1 is :0.7872166666666667
Accuracy epoch 1 is :0.7997
Accuracy epoch 1 is :0.8089
Accuracy epoch 1 is :0.8170833333333334
Accuracy epoch 1 is :0.8233833333333334
Accuracy epoch 1 is :0.8283333333333334
Accuracy epoch 1 is :0.8327333333333333
Accuracy epoch 1 is :0.83655
Accuracy epoch 1 is :0.8397666666666667
Accuracy epoch 1 is :0.8432166666666666


BenchmarkTools.Trial: 
  memory estimate:  586.84 MiB
  allocs estimate:  656755
  --------------
  minimum time:     803.514 ms (11.57% GC)
  median time:      844.181 ms (10.22% GC)
  mean time:        845.986 ms (10.18% GC)
  maximum time:     915.224 ms (8.96% GC)
  --------------
  samples:          6
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5744
Accuracy epoch 1 is :0.6903833333333333
Accuracy epoch 1 is :0.7405
Accuracy epoch 1 is :0.7701333333333333
Accuracy epoch 1 is :0.78705
Accuracy epoch 1 is :0.7994833333333333
Accuracy epoch 1 is :0.8094
Accuracy epoch 1 is :0.8174
Accuracy epoch 1 is :0.8227833333333333
Accuracy epoch 1 is :0.8274166666666667
Accuracy epoch 1 is :0.83135
Accuracy epoch 1 is :0.8353333333333334
Accuracy epoch 1 is :0.83845
Accuracy epoch 1 is :0.8416333333333333
Accuracy epoch 1 is :0.84455
Accuracy epoch 1 is :0.8477333333333333
Accuracy epoch 1 is :0.8500166666666666
Accuracy epoch 1 is :0.8518333333333333
Accuracy epoch 1 is :0.8535833333333334
Accuracy epoch 1 is :0.8547666666666667
Accuracy epoch 1 is :0.85655
Accuracy epoch 1 is :0.8584666666666667
Accuracy epoch 1 is :0.8598
Accuracy epoch 1 is :0.8607
Accuracy epoch 1 is :0.8618166666666667
Accuracy epoch 1 is :0.86275
Accuracy epoch 1 is :0.86385
Accuracy epoch 1 is :0.8649
Accuracy epoch 1 is :0.8657333333333334
A

BenchmarkTools.Trial: 
  memory estimate:  169.67 MiB
  allocs estimate:  404174
  --------------
  minimum time:     179.531 ms (13.01% GC)
  median time:      207.601 ms (12.09% GC)
  mean time:        202.122 ms (12.53% GC)
  maximum time:     223.196 ms (13.18% GC)
  --------------
  samples:          25
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6027166666666667
Accuracy epoch 1 is :0.7050333333333333
Accuracy epoch 1 is :0.7494666666666666
Accuracy epoch 1 is :0.7739
Accuracy epoch 1 is :0.7906333333333333
Accuracy epoch 1 is :0.8027166666666666
Accuracy epoch 1 is :0.8119
Accuracy epoch 1 is :0.8189833333333333
Accuracy epoch 1 is :0.8249166666666666
Accuracy epoch 1 is :0.82995
Accuracy epoch 1 is :0.83455
Accuracy epoch 1 is :0.8379666666666666
Accuracy epoch 1 is :0.8413666666666667
Accuracy epoch 1 is :0.8445
Accuracy epoch 1 is :0.8467666666666667
Accuracy epoch 1 is :0.84895
Accuracy epoch 1 is :0.8515
Accuracy epoch 1 is :0.8532
Accuracy epoch 1 is :0.8549666666666667
Accuracy epoch 1 is :0.8566333333333334
Accuracy epoch 1 is :0.8580666666666666
Accuracy epoch 1 is :0.8595333333333334
Accuracy epoch 1 is :0.8609333333333333
Accuracy epoch 1 is :0.8621166666666666
Accuracy epoch 1 is :0.8633166666666666
Accuracy epoch 1 is :0.8641
Accuracy epoch 1 is :0.8649166666666667
Accuracy epoch 1 is :0.86

BenchmarkTools.Trial: 
  memory estimate:  138.58 MiB
  allocs estimate:  163609
  --------------
  minimum time:     166.153 ms (9.73% GC)
  median time:      186.867 ms (12.05% GC)
  mean time:        186.546 ms (11.97% GC)
  maximum time:     200.547 ms (12.37% GC)
  --------------
  samples:          27
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5922166666666666
Accuracy epoch 1 is :0.7018666666666666
Accuracy epoch 1 is :0.7483333333333333
Accuracy epoch 1 is :0.7743666666666666
Accuracy epoch 1 is :0.7906
Accuracy epoch 1 is :0.8036
Accuracy epoch 1 is :0.8131333333333334
Accuracy epoch 1 is :0.8203
Accuracy epoch 1 is :0.8267833333333333
Accuracy epoch 1 is :0.8311833333333334
Accuracy epoch 1 is :0.8355166666666667
Accuracy epoch 1 is :0.8388833333333333
Accuracy epoch 1 is :0.8424
Accuracy epoch 1 is :0.8451166666666666
Accuracy epoch 1 is :0.8475333333333334
Accuracy epoch 1 is :0.8497333333333333
Accuracy epoch 1 is :0.8521666666666666
Accuracy epoch 1 is :0.8539333333333333
Accuracy epoch 1 is :0.8555666666666667
Accuracy epoch 1 is :0.8571333333333333
Accuracy epoch 1 is :0.8587833333333333
Accuracy epoch 1 is :0.8600833333333333
Accuracy epoch 1 is :0.8607833333333333
Accuracy epoch 1 is :0.8621166666666666
Accuracy epoch 1 is :0.8635666666666667
Accuracy epoch 1 is :0.8643333333333333
Accuracy

BenchmarkTools.Trial: 
  memory estimate:  50.93 MiB
  allocs estimate:  217951
  --------------
  minimum time:     135.942 ms (4.60% GC)
  median time:      155.537 ms (5.01% GC)
  mean time:        152.846 ms (5.75% GC)
  maximum time:     160.916 ms (4.70% GC)
  --------------
  samples:          33
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
