## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [13]:
workspace()
versioninfo()

Julia Version 0.6.0-dev.2069
Commit ff9a949 (2017-01-13 02:17 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)


In [17]:
using MNIST
using BenchmarkTools

source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

Float32

In [18]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [4]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6012666666666666
Accuracy epoch 1 is :0.7109
Accuracy epoch 1 is :0.7563666666666666
Accuracy epoch 1 is :0.78075
Accuracy epoch 1 is :0.7967
Accuracy epoch 1 is :0.8081833333333334
Accuracy epoch 1 is :0.81715
Accuracy epoch 1 is :0.8237666666666666


BenchmarkTools.Trial: 
  memory estimate:  628.05 MiB
  allocs estimate:  1611772
  --------------
  minimum time:     2.028 s (0.00% GC)
  median time:      2.085 s (14.42% GC)
  mean time:        2.150 s (10.59% GC)
  maximum time:     2.338 s (12.86% GC)
  --------------
  samples:          3
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [5]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5819666666666666
Accuracy epoch 1 is :0.7014166666666667
Accuracy epoch 1 is :0.7470666666666667
Accuracy epoch 1 is :0.7752333333333333
Accuracy epoch 1 is :0.79275
Accuracy epoch 1 is :0.8056333333333333
Accuracy epoch 1 is :0.8148
Accuracy epoch 1 is :0.8216
Accuracy epoch 1 is :0.82805


BenchmarkTools.Trial: 
  memory estimate:  251.18 MiB
  allocs estimate:  1499874
  --------------
  minimum time:     1.276 s (2.47% GC)
  median time:      1.361 s (2.49% GC)
  mean time:        1.371 s (2.54% GC)
  maximum time:     1.485 s (2.37% GC)
  --------------
  samples:          4
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [43]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.58455
Accuracy epoch 1 is :0.6983666666666667
Accuracy epoch 1 is :0.74455
Accuracy epoch 1 is :0.7709666666666667
Accuracy epoch 1 is :0.78955
Accuracy epoch 1 is :0.8026
Accuracy epoch 1 is :0.8122833333333334
Accuracy epoch 1 is :0.8204166666666667
Accuracy epoch 1 is :0.8261666666666667
Accuracy epoch 1 is :0.8315
Accuracy epoch 1 is :0.8355666666666667
Accuracy epoch 1 is :0.8387
Accuracy epoch 1 is :0.8418833333333333


BenchmarkTools.Trial: 
  memory estimate:  210.27 MiB
  allocs estimate:  1500277
  --------------
  minimum time:     1.117 s (3.12% GC)
  median time:      1.179 s (2.95% GC)
  mean time:        1.182 s (2.97% GC)
  maximum time:     1.240 s (2.99% GC)
  --------------
  samples:          5
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [42]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.8626166666666667
Accuracy epoch 1 is :0.8637166666666667
Accuracy epoch 1 is :0.8647833333333333
Accuracy epoch 1 is :0.8659
Accuracy epoch 1 is :0.8664166666666666
Accuracy epoch 1 is :0.8675
Accuracy epoch 1 is :0.8682
Accuracy epoch 1 is :0.8687166666666667
Accuracy epoch 1 is :0.8691333333333333
Accuracy epoch 1 is :0.8701
Accuracy epoch 1 is :0.8707166666666667
Accuracy epoch 1 is :0.8714333333333333
Accuracy epoch 1 is :0.8720333333333333
Accuracy epoch 1 is :0.8725666666666667


BenchmarkTools.Trial: 
  memory estimate:  79.56 MiB
  allocs estimate:  1398264
  --------------
  minimum time:     946.407 ms (1.25% GC)
  median time:      978.270 ms (1.49% GC)
  mean time:        972.575 ms (1.50% GC)
  maximum time:     995.436 ms (1.69% GC)
  --------------
  samples:          6
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

`
memory estimate:  79.56 MiB
`

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
