## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
workspace()
versioninfo()

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools



In [3]:

source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5694833333333333
Accuracy epoch 1 is :0.6897666666666666
Accuracy epoch 1 is :0.7429333333333333
Accuracy epoch 1 is :0.7706333333333333
Accuracy epoch 1 is :0.7892833333333333
Accuracy epoch 1 is :0.8021833333333334
Accuracy epoch 1 is :0.8119333333333333
Accuracy epoch 1 is :0.8197
Accuracy epoch 1 is :0.8252333333333334
Accuracy epoch 1 is :0.83065
Accuracy epoch 1 is :0.8352166666666667
Accuracy epoch 1 is :0.8387166666666667
Accuracy epoch 1 is :0.8422666666666667


BenchmarkTools.Trial: 
  memory estimate:  712.49 MiB
  allocs estimate:  974412
  --------------
  minimum time:     987.298 ms (8.42% GC)
  median time:      1.200 s (7.94% GC)
  mean time:        1.211 s (8.77% GC)
  maximum time:     1.492 s (10.67% GC)
  --------------
  samples:          5
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5942
Accuracy epoch 1 is :0.7090666666666666
Accuracy epoch 1 is :0.7551166666666667
Accuracy epoch 1 is :0.7815
Accuracy epoch 1 is :0.79785
Accuracy epoch 1 is :0.8095833333333333
Accuracy epoch 1 is :0.8181333333333334
Accuracy epoch 1 is :0.8250333333333333
Accuracy epoch 1 is :0.8307833333333333
Accuracy epoch 1 is :0.8353666666666667
Accuracy epoch 1 is :0.83925
Accuracy epoch 1 is :0.8430666666666666
Accuracy epoch 1 is :0.8461166666666666
Accuracy epoch 1 is :0.8485833333333334
Accuracy epoch 1 is :0.8511
Accuracy epoch 1 is :0.8528
Accuracy epoch 1 is :0.8549166666666667
Accuracy epoch 1 is :0.8566
Accuracy epoch 1 is :0.8579
Accuracy epoch 1 is :0.8593166666666666
Accuracy epoch 1 is :0.8607666666666667
Accuracy epoch 1 is :0.8619
Accuracy epoch 1 is :0.8629166666666667
Accuracy epoch 1 is :0.8639
Accuracy epoch 1 is :0.8648666666666667
Accuracy epoch 1 is :0.8658833333333333
Accuracy epoch 1 is :0.8667833333333334
Accuracy epoch 1 is :0.867283333333333

BenchmarkTools.Trial: 
  memory estimate:  179.90 MiB
  allocs estimate:  719121
  --------------
  minimum time:     211.234 ms (11.97% GC)
  median time:      231.052 ms (10.69% GC)
  mean time:        230.448 ms (10.77% GC)
  maximum time:     259.418 ms (10.83% GC)
  --------------
  samples:          22
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5858666666666666
Accuracy epoch 1 is :0.7003
Accuracy epoch 1 is :0.7468333333333333
Accuracy epoch 1 is :0.7737333333333334
Accuracy epoch 1 is :0.79125
Accuracy epoch 1 is :0.80305
Accuracy epoch 1 is :0.8128
Accuracy epoch 1 is :0.8205833333333333
Accuracy epoch 1 is :0.8262666666666667
Accuracy epoch 1 is :0.8312333333333334
Accuracy epoch 1 is :0.8356
Accuracy epoch 1 is :0.8387833333333333
Accuracy epoch 1 is :0.84235
Accuracy epoch 1 is :0.8453666666666667
Accuracy epoch 1 is :0.8476
Accuracy epoch 1 is :0.8498666666666667
Accuracy epoch 1 is :0.8515166666666667
Accuracy epoch 1 is :0.8533166666666666
Accuracy epoch 1 is :0.8548166666666667
Accuracy epoch 1 is :0.8564166666666667
Accuracy epoch 1 is :0.8571833333333333
Accuracy epoch 1 is :0.8585833333333334
Accuracy epoch 1 is :0.8599333333333333
Accuracy epoch 1 is :0.86105
Accuracy epoch 1 is :0.86165
Accuracy epoch 1 is :0.8625666666666667
Accuracy epoch 1 is :0.86395
Accuracy epoch 1 is :0.86481666666

BenchmarkTools.Trial: 
  memory estimate:  167.94 MiB
  allocs estimate:  599860
  --------------
  minimum time:     213.061 ms (12.94% GC)
  median time:      230.968 ms (11.86% GC)
  mean time:        231.726 ms (11.62% GC)
  maximum time:     251.474 ms (10.74% GC)
  --------------
  samples:          22
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5830166666666666
Accuracy epoch 1 is :0.69805
Accuracy epoch 1 is :0.7456333333333334
Accuracy epoch 1 is :0.7720333333333333
Accuracy epoch 1 is :0.7898166666666666
Accuracy epoch 1 is :0.80155
Accuracy epoch 1 is :0.8115666666666667
Accuracy epoch 1 is :0.8188666666666666
Accuracy epoch 1 is :0.82475
Accuracy epoch 1 is :0.8298166666666666
Accuracy epoch 1 is :0.8339
Accuracy epoch 1 is :0.8381166666666666
Accuracy epoch 1 is :0.8415333333333334
Accuracy epoch 1 is :0.8446666666666667
Accuracy epoch 1 is :0.8472166666666666
Accuracy epoch 1 is :0.8495833333333334
Accuracy epoch 1 is :0.8520833333333333
Accuracy epoch 1 is :0.8540166666666666
Accuracy epoch 1 is :0.8558
Accuracy epoch 1 is :0.8575666666666667
Accuracy epoch 1 is :0.85935
Accuracy epoch 1 is :0.8607833333333333
Accuracy epoch 1 is :0.8617833333333333
Accuracy epoch 1 is :0.8625166666666667
Accuracy epoch 1 is :0.8634833333333334
Accuracy epoch 1 is :0.8644666666666667
Accuracy epoch 1 is :0.86531

BenchmarkTools.Trial: 
  memory estimate:  165.31 MiB
  allocs estimate:  598078
  --------------
  minimum time:     206.841 ms (12.28% GC)
  median time:      220.375 ms (12.35% GC)
  mean time:        222.287 ms (12.33% GC)
  maximum time:     245.430 ms (11.09% GC)
  --------------
  samples:          23
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

`
memory estimate:  79.56 MiB
`

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
