## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
workspace()
versioninfo()

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, ivybridge)


In [2]:
using MNIST
using BenchmarkTools



In [3]:

source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5903833333333334
Accuracy epoch 1 is :0.7073666666666667
Accuracy epoch 1 is :0.7545666666666667
Accuracy epoch 1 is :0.7810833333333334
Accuracy epoch 1 is :0.7972
Accuracy epoch 1 is :0.8089833333333334
Accuracy epoch 1 is :0.8175666666666667
Accuracy epoch 1 is :0.8243333333333334
Accuracy epoch 1 is :0.8298833333333333
Accuracy epoch 1 is :0.8346833333333333
Accuracy epoch 1 is :0.8386666666666667
Accuracy epoch 1 is :0.8422666666666667


BenchmarkTools.Trial: 
  memory estimate:  709.89 MiB
  allocs estimate:  973207
  --------------
  minimum time:     1.199 s (5.91% GC)
  median time:      1.407 s (6.33% GC)
  mean time:        1.415 s (6.15% GC)
  maximum time:     1.644 s (6.02% GC)
  --------------
  samples:          4
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5722333333333334
Accuracy epoch 1 is :0.6888833333333333
Accuracy epoch 1 is :0.7408333333333333
Accuracy epoch 1 is :0.7699666666666667
Accuracy epoch 1 is :0.7884333333333333
Accuracy epoch 1 is :0.8015666666666666
Accuracy epoch 1 is :0.8125666666666667
Accuracy epoch 1 is :0.8203666666666667
Accuracy epoch 1 is :0.82675
Accuracy epoch 1 is :0.8321833333333334
Accuracy epoch 1 is :0.8368
Accuracy epoch 1 is :0.8404666666666667
Accuracy epoch 1 is :0.8433166666666667
Accuracy epoch 1 is :0.8458333333333333


BenchmarkTools.Trial: 
  memory estimate:  215.62 MiB
  allocs estimate:  749275
  --------------
  minimum time:     893.643 ms (3.07% GC)
  median time:      935.521 ms (2.92% GC)
  mean time:        962.262 ms (2.88% GC)
  maximum time:     1.136 s (2.39% GC)
  --------------
  samples:          6
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.61255
Accuracy epoch 1 is :0.7124666666666667
Accuracy epoch 1 is :0.7564333333333333
Accuracy epoch 1 is :0.78075
Accuracy epoch 1 is :0.79665
Accuracy epoch 1 is :0.8093
Accuracy epoch 1 is :0.8179333333333333
Accuracy epoch 1 is :0.8249833333333333
Accuracy epoch 1 is :0.8312333333333334
Accuracy epoch 1 is :0.8364666666666667
Accuracy epoch 1 is :0.8404666666666667
Accuracy epoch 1 is :0.8437166666666667
Accuracy epoch 1 is :0.8471166666666666
Accuracy epoch 1 is :0.8497833333333333


BenchmarkTools.Trial: 
  memory estimate:  199.08 MiB
  allocs estimate:  626183
  --------------
  minimum time:     824.302 ms (2.33% GC)
  median time:      838.756 ms (2.55% GC)
  mean time:        854.798 ms (2.60% GC)
  maximum time:     939.911 ms (2.16% GC)
  --------------
  samples:          6
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.58355
Accuracy epoch 1 is :0.69715
Accuracy epoch 1 is :0.74565
Accuracy epoch 1 is :0.7748
Accuracy epoch 1 is :0.7925166666666666
Accuracy epoch 1 is :0.80565
Accuracy epoch 1 is :0.81535
Accuracy epoch 1 is :0.82295
Accuracy epoch 1 is :0.8287666666666667
Accuracy epoch 1 is :0.83395
Accuracy epoch 1 is :0.8381
Accuracy epoch 1 is :0.8415
Accuracy epoch 1 is :0.8446333333333333
Accuracy epoch 1 is :0.8471333333333333


BenchmarkTools.Trial: 
  memory estimate:  201.32 MiB
  allocs estimate:  628515
  --------------
  minimum time:     833.639 ms (2.82% GC)
  median time:      864.468 ms (3.15% GC)
  mean time:        900.856 ms (2.96% GC)
  maximum time:     1.035 s (3.15% GC)
  --------------
  samples:          6
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%

#### MulticlassPerceptron5

**What else can be improved?**

`
memory estimate:  79.56 MiB
`

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
