## Benchmarking Perceptron



## MNIST 

In [6]:
using MLDatasets
using BenchmarkTools
#using PyPlot

In [7]:
peakflops()

9.458961641658127e10

In [8]:
versioninfo()

Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)


In [9]:
source_path = join(push!(split(pwd(),"/")[1:end-1],"source/" ),"/")

if !contains(==,LOAD_PATH, source_path) 
    push!(LOAD_PATH, source_path)
end

using MulticlassPerceptron
percep = MulticlassPerceptron.MPerceptron(Float32, 10,784)
n_features = 784

X_train, y_train = MLDatasets.MNIST.traindata();
X_test, y_test = MLDatasets.MNIST.testdata();
X_train = reshape(X_train, 784, 60000);
X_test = reshape(X_test, 784, 10000);

y_train = y_train + 1
y_test = y_test + 1;

T = Float32
X_train = Array{T}((X_train - minimum(X_train))/(maximum(X_train) - minimum(X_train)))
y_train = Array{Int64}(y_train)
X_test = Array{T}(X_test - minimum(X_test))/(maximum(X_test) - minimum(X_test)) 
y_test = Array{Int64}(y_test);

In [10]:
?MulticlassPerceptron.fit!

> fit!(h::Perceptron,         X::Array,         y::Array;         n_epochs=50,         learning_rate=0.1,         print_flag=false,         compute_accuracy=true,         seed=srand(1234),         pocket=false,         shuffle_data=false)


##### Arguments

  * **`h`**, (MPerceptron{T} type), initialized perceptron.
  * **`X`**, (Array{T,2} type), data contained in the columns of X.
  * **`y`**, (Vector{T} type), class labels (as integers from 1 to n_classes).

##### Keyword arguments

  * **`n_epochs`**, (Integer type), number of passes (epochs) through the data.
  * **`learning_rate`**, (Float type), learning rate (The standard perceptron is with learning_rate=1.)
  * **`compute_accuracy`**, (Bool type), if `true` the accuracy is computed at the end of every epoch.
  * **`print_flag`**, (Bool type), if `true` the accuracy is printed at the end of every epoch.
  * **`seed`**, (MersenneTwister type), seed for the permutation of the datapoints in case there the data is shuffled.
  * **`pocket`** , (Bool type), if `true` the best weights are saved (in the pocket) during learning.
  * **`shuffle_data`**, (Bool type),  if `true` the data is shuffled at every epoch (in reality we only shuffle indicies for performance).


### Testing the percep in the MNIST

  2.430426 seconds (2.37 M allocations: 512.297 MiB, 13.22% gc time)


In [11]:
percep = MulticlassPerceptron.MPerceptron(Float32, 10, 784)

Perceptron{Float32}(n_classes=10, n_features=784)

In [12]:
@time fit!(percep, X_train, y_train; n_epochs=10, print_flag=true)

  7.501581 seconds (8.84 M allocations: 696.995 MiB, 3.55% gc time)


In [13]:
@time fit!(percep, X_train, y_train; n_epochs=1, print_flag=true)

  0.561922 seconds (663.24 k allocations: 56.801 MiB, 1.26% gc time)


In [14]:
@time fit2!(percep, X_train, y_train; n_epochs=10, print_flag=true)

670.266875 seconds (3.98 G allocations: 59.326 GiB, 0.49% gc time)


In [None]:
percep.W[2,3]

In [None]:

percep.W[1,2]  = percep.W[1,2]*23 

In [None]:
y_test_hat = [ predict(percep,view(X_test,:,m)) for m in 1:size(X_test,2) ];

In [None]:
mean(y_test_hat .== y_test)

## Averaged Perceptron vs standard perceptron

- ERROR! -> They seem to return the exact same weights!
- NOTICE: Given the same seed they should return the same accuracy per epoch values since the weights during learning are the same. Nevertheless once learning is finished the averaged perceptron should have different weights since they are changed by the average of the weights present during learning.

In [None]:
n_samples = size(X_train,2)
@time Array(1:n_samples);

In [None]:
fieldnames(percep)

In [None]:
percep = MulticlassPerceptron.MPerceptron(Float32, 10,784)
fit!(percep, X_train, y_train;
     n_epochs=5, print_flag=true)

In [None]:
percep.W[1:5]

In [None]:
av_percep = MulticlassPerceptron.MPerceptron(Float32, 10,784)
fit!(av_percep, X_train, y_train;
     n_epochs=5, average_weights=true, print_flag=true)

In [None]:
av_percep.W[1:5]

In [None]:
percep.accuracy[1:5] 

In [None]:
av_percep.accuracy[1:5]

In [None]:
av_percep.W[1:4] 

In [None]:
percep.W[1:4] 

### Shuffle data at every epoch

In [None]:
percep = MulticlassPerceptron.MPerceptron(Float32, 10,784)

In [None]:
fit!(percep, X_train, y_train)

In [None]:
percep.accuracy

# Improving the code



#### About profiling julia code

- https://thirld.com/blog/2015/05/30/julia-profiling-cheat-sheet/

#### Examples of speeding up code

There is a small number of "tricks" that can be applied to speed up execution time and save memory allocations. This is paramount for enjoying C like speed with julia code.

- https://discourse.julialang.org/t/speed-up-this-code-game/3666

## Allowing perceptron to use Sparse Matrices

In [None]:
h = MulticlassPerceptron.MPerceptron(Float32, 10,784)

In [None]:
Xsp = sparse(zeros(100,1000))

In [None]:
X_tr_sp = sparse(X_train);

In [None]:
@time MulticlassPerceptron.predict(h, X_tr_sp[:,1])

In [None]:
@time MulticlassPerceptron.predict(h, X_train[:,1])

#### Why is sparse multiplication slower ?

In [None]:
x = deepcopy(X_tr_sp[:,1]);

In [None]:
@time indmax(h.W' * x .+ h.b)

In [None]:
hW = sparse(rand(T, 784, 10));
hb = sparse(zeros(T,10))

In [None]:
typeof(hb), typeof(hW)

In [None]:
eltype(hW), eltype(hb), eltype(x)

In [None]:
@time indmax(hW' * x .+ hb)

In [None]:
@time MulticlassPerceptron.predict(h, X_train[:,1])

In [None]:
function testspeedsparse(hW,x,hb)
    for i in 1:100 
        indmax(hW' * x .+ hb)
    end
end

In [None]:
# Why is the sparse version slower??
@time testspeedsparse(hW,x,hb)
@time testspeedsparse(h.W,X_train[:,1],h.b)

In [None]:
# The slowness does not come from the indmax function
# the same happens in this version
function testspeedsparse_(hW, x, hb)
    for i in 1:100 
        hW' * x .+ hb
    end
end

In [None]:
@time testspeedsparse_(hW,x,hb)
@time testspeedsparse_(h.W,X_train[:,1],h.b)

In [None]:
#### It does not seem worth to use views when data is
#    a sparse matrix operations
@time X_tr_sp[:,1];
@time view(X_tr_sp,:,1);

In [None]:
hW' * x .+ hb

In [None]:
@time for i in 1:100000 X_train[:,1] end
@time for i in 1:100000  view(X_train,:,1) end

## defining pipeline