## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
workspace()
versioninfo()

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools



In [3]:

source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6029333333333333
Accuracy epoch 1 is :0.7073
Accuracy epoch 1 is :0.7523833333333333
Accuracy epoch 1 is :0.7765333333333333
Accuracy epoch 1 is :0.7937333333333333
Accuracy epoch 1 is :0.8056333333333333
Accuracy epoch 1 is :0.8144166666666667
Accuracy epoch 1 is :0.8219166666666666
Accuracy epoch 1 is :0.82865
Accuracy epoch 1 is :0.8334
Accuracy epoch 1 is :0.8372833333333334
Accuracy epoch 1 is :0.8408666666666667
Accuracy epoch 1 is :0.8443333333333334


BenchmarkTools.Trial: 
  memory estimate:  705.54 MiB
  allocs estimate:  971238
  --------------
  minimum time:     1.125 s (8.42% GC)
  median time:      1.142 s (8.28% GC)
  mean time:        1.155 s (7.83% GC)
  maximum time:     1.214 s (6.16% GC)
  --------------
  samples:          5
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5750833333333333
Accuracy epoch 1 is :0.6893166666666667
Accuracy epoch 1 is :0.741
Accuracy epoch 1 is :0.7695833333333333
Accuracy epoch 1 is :0.7873833333333333
Accuracy epoch 1 is :0.7999166666666667
Accuracy epoch 1 is :0.80935
Accuracy epoch 1 is :0.8174666666666667
Accuracy epoch 1 is :0.8239333333333333
Accuracy epoch 1 is :0.82875
Accuracy epoch 1 is :0.8329166666666666
Accuracy epoch 1 is :0.8365
Accuracy epoch 1 is :0.83975
Accuracy epoch 1 is :0.8424833333333334
Accuracy epoch 1 is :0.8452333333333333
Accuracy epoch 1 is :0.8474666666666667
Accuracy epoch 1 is :0.84985
Accuracy epoch 1 is :0.8515666666666667
Accuracy epoch 1 is :0.8531666666666666
Accuracy epoch 1 is :0.8552166666666666
Accuracy epoch 1 is :0.85645
Accuracy epoch 1 is :0.8578666666666667
Accuracy epoch 1 is :0.8593333333333333
Accuracy epoch 1 is :0.8605333333333334
Accuracy epoch 1 is :0.862
Accuracy epoch 1 is :0.8631
Accuracy epoch 1 is :0.8638833333333333
Accuracy epoch 1 is :0.86

BenchmarkTools.Trial: 
  memory estimate:  192.61 MiB
  allocs estimate:  745367
  --------------
  minimum time:     371.087 ms (7.39% GC)
  median time:      388.771 ms (7.06% GC)
  mean time:        392.453 ms (7.15% GC)
  maximum time:     420.398 ms (6.21% GC)
  --------------
  samples:          13
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.58715
Accuracy epoch 1 is :0.7063666666666667
Accuracy epoch 1 is :0.75265
Accuracy epoch 1 is :0.77875
Accuracy epoch 1 is :0.79605
Accuracy epoch 1 is :0.8074333333333333
Accuracy epoch 1 is :0.8168166666666666
Accuracy epoch 1 is :0.82355
Accuracy epoch 1 is :0.82915
Accuracy epoch 1 is :0.8333333333333334
Accuracy epoch 1 is :0.8376333333333333
Accuracy epoch 1 is :0.8411833333333333
Accuracy epoch 1 is :0.84465
Accuracy epoch 1 is :0.84715
Accuracy epoch 1 is :0.84955
Accuracy epoch 1 is :0.8521833333333333
Accuracy epoch 1 is :0.8541333333333333
Accuracy epoch 1 is :0.8559833333333333
Accuracy epoch 1 is :0.8577666666666667
Accuracy epoch 1 is :0.8592333333333333
Accuracy epoch 1 is :0.8606333333333334
Accuracy epoch 1 is :0.86195
Accuracy epoch 1 is :0.8631166666666666
Accuracy epoch 1 is :0.8638833333333333
Accuracy epoch 1 is :0.8649166666666667
Accuracy epoch 1 is :0.8658166666666667
Accuracy epoch 1 is :0.8665833333333334
Accuracy epoch 1 is :0.86735
A

BenchmarkTools.Trial: 
  memory estimate:  175.35 MiB
  allocs estimate:  606096
  --------------
  minimum time:     371.294 ms (6.60% GC)
  median time:      393.596 ms (5.86% GC)
  mean time:        393.375 ms (5.83% GC)
  maximum time:     408.442 ms (5.31% GC)
  --------------
  samples:          13
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5976666666666667
Accuracy epoch 1 is :0.7063333333333334
Accuracy epoch 1 is :0.7502833333333333
Accuracy epoch 1 is :0.7754333333333333
Accuracy epoch 1 is :0.7918666666666667
Accuracy epoch 1 is :0.8035
Accuracy epoch 1 is :0.8128833333333333
Accuracy epoch 1 is :0.8202333333333334
Accuracy epoch 1 is :0.8261166666666667
Accuracy epoch 1 is :0.83105
Accuracy epoch 1 is :0.8361333333333333
Accuracy epoch 1 is :0.8396666666666667
Accuracy epoch 1 is :0.8419666666666666
Accuracy epoch 1 is :0.8445333333333334
Accuracy epoch 1 is :0.8472333333333333
Accuracy epoch 1 is :0.8497166666666667
Accuracy epoch 1 is :0.8513666666666667
Accuracy epoch 1 is :0.8531666666666666
Accuracy epoch 1 is :0.8547833333333333
Accuracy epoch 1 is :0.8560666666666666
Accuracy epoch 1 is :0.8579333333333333
Accuracy epoch 1 is :0.8592666666666666
Accuracy epoch 1 is :0.8608166666666667
Accuracy epoch 1 is :0.8617
Accuracy epoch 1 is :0.8629666666666667
Accuracy epoch 1 is :0.863933333333

BenchmarkTools.Trial: 
  memory estimate:  176.38 MiB
  allocs estimate:  607427
  --------------
  minimum time:     365.966 ms (6.86% GC)
  median time:      394.657 ms (6.35% GC)
  mean time:        390.801 ms (6.33% GC)
  maximum time:     415.926 ms (5.89% GC)
  --------------
  samples:          13
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

`
memory estimate:  79.56 MiB
`

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
