## Benchmarking Perceptron


#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

In [1]:
workspace()
versioninfo()

Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-4650U CPU @ 1.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)


In [2]:
using MNIST
using BenchmarkTools



In [3]:

source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron4
using MulticlassPerceptron3
using MulticlassPerceptron2
using MulticlassPerceptron1

percep1 = MulticlassPerceptron1.MPerceptron(Float32, 10, 784)
percep2 = MulticlassPerceptron2.MPerceptron(Float32, 10, 784)
percep3 = MulticlassPerceptron3.MPerceptron(Float32, 10, 784)
percep4 = MulticlassPerceptron4.MPerceptron(Float32, 10, 784)

n_classes = 10
n_features = 784

784

In [4]:
X_train, y_train = MNIST.traindata();
X_test, y_test = MNIST.testdata();
y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [5]:
@benchmark MulticlassPerceptron1.fit!(percep1, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5894666666666667
Accuracy epoch 1 is :0.7051833333333334
Accuracy epoch 1 is :0.7507666666666667
Accuracy epoch 1 is :0.7770666666666667
Accuracy epoch 1 is :0.79365
Accuracy epoch 1 is :0.8064166666666667
Accuracy epoch 1 is :0.81505
Accuracy epoch 1 is :0.8219666666666666
Accuracy epoch 1 is :0.8288666666666666
Accuracy epoch 1 is :0.8337333333333333
Accuracy epoch 1 is :0.8371
Accuracy epoch 1 is :0.8411666666666666
Accuracy epoch 1 is :0.8443833333333334
Accuracy epoch 1 is :0.8470166666666666


BenchmarkTools.Trial: 
  memory estimate:  700.82 MiB
  allocs estimate:  969112
  --------------
  minimum time:     860.834 ms (10.09% GC)
  median time:      875.897 ms (10.42% GC)
  mean time:        876.136 ms (10.42% GC)
  maximum time:     896.919 ms (11.08% GC)
  --------------
  samples:          6
  evals/sample:     1

#### MulticlassPerceptron2

- Using views instead of copying examples

In [6]:
@benchmark MulticlassPerceptron2.fit!(percep2, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.5932333333333333
Accuracy epoch 1 is :0.70485
Accuracy epoch 1 is :0.75085
Accuracy epoch 1 is :0.7765
Accuracy epoch 1 is :0.7934
Accuracy epoch 1 is :0.8061333333333334
Accuracy epoch 1 is :0.8143333333333334
Accuracy epoch 1 is :0.8213166666666667
Accuracy epoch 1 is :0.8264166666666667
Accuracy epoch 1 is :0.8309833333333333
Accuracy epoch 1 is :0.8352666666666667
Accuracy epoch 1 is :0.8387166666666667
Accuracy epoch 1 is :0.84155
Accuracy epoch 1 is :0.8442666666666667
Accuracy epoch 1 is :0.8470166666666666
Accuracy epoch 1 is :0.8488833333333333
Accuracy epoch 1 is :0.8509166666666667
Accuracy epoch 1 is :0.8524666666666667
Accuracy epoch 1 is :0.85395
Accuracy epoch 1 is :0.8556166666666667
Accuracy epoch 1 is :0.8570666666666666
Accuracy epoch 1 is :0.8582833333333333
Accuracy epoch 1 is :0.8593166666666666
Accuracy epoch 1 is :0.8604333333333334
Accuracy epoch 1 is :0.8614
Accuracy epoch 1 is :0.8624833333333334
Accuracy epoch 1 is :0.8633
Accuracy epo

BenchmarkTools.Trial: 
  memory estimate:  181.04 MiB
  allocs estimate:  720092
  --------------
  minimum time:     209.718 ms (10.96% GC)
  median time:      227.584 ms (11.73% GC)
  mean time:        226.012 ms (11.55% GC)
  maximum time:     240.665 ms (9.36% GC)
  --------------
  samples:          23
  evals/sample:     1

#### MulticlassPerceptron3

- Using views instead of copying examples
- using inbounds


In [7]:
@benchmark MulticlassPerceptron3.fit!(percep3, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.59695
Accuracy epoch 1 is :0.7084666666666667
Accuracy epoch 1 is :0.7534
Accuracy epoch 1 is :0.77915
Accuracy epoch 1 is :0.7957833333333333
Accuracy epoch 1 is :0.8073666666666667
Accuracy epoch 1 is :0.8166166666666667
Accuracy epoch 1 is :0.8236166666666667
Accuracy epoch 1 is :0.82945
Accuracy epoch 1 is :0.8336166666666667
Accuracy epoch 1 is :0.83795
Accuracy epoch 1 is :0.84165
Accuracy epoch 1 is :0.8445833333333334
Accuracy epoch 1 is :0.8474833333333334
Accuracy epoch 1 is :0.8503166666666667
Accuracy epoch 1 is :0.8523833333333334
Accuracy epoch 1 is :0.8544833333333334
Accuracy epoch 1 is :0.8561
Accuracy epoch 1 is :0.8578666666666667
Accuracy epoch 1 is :0.8595
Accuracy epoch 1 is :0.8606666666666667
Accuracy epoch 1 is :0.8614166666666667
Accuracy epoch 1 is :0.8623833333333333
Accuracy epoch 1 is :0.8632
Accuracy epoch 1 is :0.8646666666666667
Accuracy epoch 1 is :0.8657166666666667
Accuracy epoch 1 is :0.8668
Accuracy epoch 1 is :0.86795
Accura

BenchmarkTools.Trial: 
  memory estimate:  165.55 MiB
  allocs estimate:  597863
  --------------
  minimum time:     211.919 ms (12.93% GC)
  median time:      220.678 ms (12.17% GC)
  mean time:        222.796 ms (11.57% GC)
  maximum time:     236.384 ms (9.65% GC)
  --------------
  samples:          23
  evals/sample:     1

#### MulticlassPerceptron4

- Using views instead of copying examples
- using views
- prealocated vector for predicting all datapoints
- using .* sintax for loop fusion

In [8]:
@benchmark MulticlassPerceptron4.fit!(percep4, X_train, y_train, 1, 0.0001)

Accuracy epoch 1 is :0.6235833333333334
Accuracy epoch 1 is :0.7199333333333333
Accuracy epoch 1 is :0.7609666666666667
Accuracy epoch 1 is :0.7843833333333333
Accuracy epoch 1 is :0.7973833333333333
Accuracy epoch 1 is :0.8086666666666666
Accuracy epoch 1 is :0.8170333333333333
Accuracy epoch 1 is :0.8238
Accuracy epoch 1 is :0.8295333333333333
Accuracy epoch 1 is :0.8341666666666666
Accuracy epoch 1 is :0.8380666666666666
Accuracy epoch 1 is :0.8412166666666666
Accuracy epoch 1 is :0.8442166666666666
Accuracy epoch 1 is :0.8469
Accuracy epoch 1 is :0.8497166666666667
Accuracy epoch 1 is :0.8519166666666667
Accuracy epoch 1 is :0.8538333333333333
Accuracy epoch 1 is :0.8555333333333334
Accuracy epoch 1 is :0.8568166666666667
Accuracy epoch 1 is :0.8581666666666666
Accuracy epoch 1 is :0.8594166666666667
Accuracy epoch 1 is :0.86095
Accuracy epoch 1 is :0.8619166666666667
Accuracy epoch 1 is :0.8630833333333333
Accuracy epoch 1 is :0.8638333333333333
Accuracy epoch 1 is :0.8646
Accurac

BenchmarkTools.Trial: 
  memory estimate:  166.40 MiB
  allocs estimate:  613176
  --------------
  minimum time:     206.422 ms (13.18% GC)
  median time:      220.692 ms (12.33% GC)
  mean time:        221.952 ms (12.18% GC)
  maximum time:     233.333 ms (9.98% GC)
  --------------
  samples:          23
  evals/sample:     1

#### MulticlassPerceptron5

**What else can be improved?**

`
memory estimate:  79.56 MiB
`

**Can we push the code to memory estimate 0 ?**

**Are we really using the BLAS at the fullest potential?**
