## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
workspace()
versioninfo()

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools

[1m[34mINFO: Recompiling stale cache file /Users/david/.julia/lib/v0.5/HDF5.ji for module HDF5.
[0m[1m[34mINFO: Recompiling stale cache file /Users/david/.julia/lib/v0.5/JLD.ji for module JLD.
[0m

In [3]:

source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5796666666666667
Accuracy epoch 1 is :0.7022
Accuracy epoch 1 is :0.7491833333333333
Accuracy epoch 1 is :0.77645
Accuracy epoch 1 is :0.7931333333333334
Accuracy epoch 1 is :0.8054333333333333
Accuracy epoch 1 is :0.81405
Accuracy epoch 1 is :0.8209166666666666
Accuracy epoch 1 is :0.82665
Accuracy epoch 1 is :0.83145
Accuracy epoch 1 is :0.8356166666666667
Accuracy epoch 1 is :0.83925
Accuracy epoch 1 is :0.84205


BenchmarkTools.Trial: 
  memory estimate:  709.59 MiB
  allocs estimate:  973140
  --------------
  minimum time:     1.173 s (8.96% GC)
  median time:      1.184 s (8.72% GC)
  mean time:        1.195 s (8.47% GC)
  maximum time:     1.249 s (7.47% GC)
  --------------
  samples:          5
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5883666666666667
Accuracy epoch 1 is :0.7023833333333334
Accuracy epoch 1 is :0.74805
Accuracy epoch 1 is :0.77335
Accuracy epoch 1 is :0.7912833333333333
Accuracy epoch 1 is :0.80395
Accuracy epoch 1 is :0.8132
Accuracy epoch 1 is :0.8216333333333333
Accuracy epoch 1 is :0.8285
Accuracy epoch 1 is :0.8330833333333333
Accuracy epoch 1 is :0.8373333333333334
Accuracy epoch 1 is :0.8410833333333333
Accuracy epoch 1 is :0.8439166666666666
Accuracy epoch 1 is :0.8469833333333333
Accuracy epoch 1 is :0.84905
Accuracy epoch 1 is :0.8511333333333333
Accuracy epoch 1 is :0.8530333333333333
Accuracy epoch 1 is :0.8551333333333333
Accuracy epoch 1 is :0.8563666666666667
Accuracy epoch 1 is :0.8579166666666667
Accuracy epoch 1 is :0.8590666666666666
Accuracy epoch 1 is :0.8601833333333333
Accuracy epoch 1 is :0.8615166666666667
Accuracy epoch 1 is :0.8625
Accuracy epoch 1 is :0.8636666666666667
Accuracy epoch 1 is :0.8646166666666667
Accuracy epoch 1 is :0.8657666666666667


BenchmarkTools.Trial: 
  memory estimate:  192.18 MiB
  allocs estimate:  744946
  --------------
  minimum time:     387.767 ms (6.67% GC)
  median time:      427.512 ms (6.42% GC)
  mean time:        422.479 ms (6.46% GC)
  maximum time:     438.480 ms (6.39% GC)
  --------------
  samples:          12
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6104166666666667
Accuracy epoch 1 is :0.7093333333333334
Accuracy epoch 1 is :0.7542833333333333
Accuracy epoch 1 is :0.7803333333333333
Accuracy epoch 1 is :0.7979666666666667
Accuracy epoch 1 is :0.8096833333333333
Accuracy epoch 1 is :0.8182333333333334
Accuracy epoch 1 is :0.8257166666666667
Accuracy epoch 1 is :0.8314166666666667
Accuracy epoch 1 is :0.8358833333333333
Accuracy epoch 1 is :0.84005
Accuracy epoch 1 is :0.8439666666666666
Accuracy epoch 1 is :0.84705
Accuracy epoch 1 is :0.84905
Accuracy epoch 1 is :0.8513833333333334
Accuracy epoch 1 is :0.85335
Accuracy epoch 1 is :0.8556166666666667
Accuracy epoch 1 is :0.8572833333333333
Accuracy epoch 1 is :0.8587
Accuracy epoch 1 is :0.8601333333333333
Accuracy epoch 1 is :0.8611333333333333
Accuracy epoch 1 is :0.8625666666666667
Accuracy epoch 1 is :0.86375
Accuracy epoch 1 is :0.8647833333333333
Accuracy epoch 1 is :0.8656
Accuracy epoch 1 is :0.8664666666666667
Accuracy epoch 1 is :0.86725
Accuracy e

BenchmarkTools.Trial: 
  memory estimate:  174.01 MiB
  allocs estimate:  604995
  --------------
  minimum time:     391.165 ms (6.12% GC)
  median time:      416.557 ms (6.37% GC)
  mean time:        414.780 ms (6.50% GC)
  maximum time:     444.447 ms (5.51% GC)
  --------------
  samples:          13
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.60905
Accuracy epoch 1 is :0.71725
Accuracy epoch 1 is :0.7590666666666667
Accuracy epoch 1 is :0.78205
Accuracy epoch 1 is :0.7967333333333333
Accuracy epoch 1 is :0.8072666666666667
Accuracy epoch 1 is :0.81635
Accuracy epoch 1 is :0.8227833333333333
Accuracy epoch 1 is :0.8286833333333333
Accuracy epoch 1 is :0.83315
Accuracy epoch 1 is :0.8372333333333334
Accuracy epoch 1 is :0.8404666666666667
Accuracy epoch 1 is :0.8436833333333333
Accuracy epoch 1 is :0.8462
Accuracy epoch 1 is :0.8488
Accuracy epoch 1 is :0.85095
Accuracy epoch 1 is :0.8525833333333334
Accuracy epoch 1 is :0.8540333333333333
Accuracy epoch 1 is :0.8558
Accuracy epoch 1 is :0.85725
Accuracy epoch 1 is :0.8587333333333333
Accuracy epoch 1 is :0.85995
Accuracy epoch 1 is :0.8608166666666667
Accuracy epoch 1 is :0.8619333333333333
Accuracy epoch 1 is :0.86305
Accuracy epoch 1 is :0.86425
Accuracy epoch 1 is :0.8652166666666666
Accuracy epoch 1 is :0.8659666666666667


BenchmarkTools.Trial: 
  memory estimate:  178.69 MiB
  allocs estimate:  609384
  --------------
  minimum time:     382.655 ms (8.37% GC)
  median time:      421.207 ms (6.73% GC)
  mean time:        462.182 ms (6.11% GC)
  maximum time:     576.767 ms (4.71% GC)
  --------------
  samples:          11
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

`
memory estimate:  79.56 MiB
`

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
