Julia has a built-in profiler, but I propose using the ProfileView package. The good thing about this package is that we get a nice visual representation of the time spent in each function. 

In [1]:
using Laplacians

In [2]:
PROFILEVIEW_USEGTK = true
using ProfileView

In [3]:
M = [100,200]

2-element Array{Int64,1}:
 100
 200

In [32]:
@time for i = 1:1e7
    t = rand(1:M[1])
end

  2.503612 seconds (10.00 M allocations: 305.176 MB, 3.56% gc time)


In [33]:
@time for i = 1:1e7
    t = rand(1:100)
end

  0.843543 seconds


In [36]:
@time for i = 1:1e7
    t = rand()*ceil(Int64,100)
end

  0.062625 seconds


In [35]:
2.503612/0.843543

2.967971994314457

In [37]:
0.843543/0.062625

13.469748502994012

In [31]:
include("../src/fastSampler.jl")

epsequal (generic function with 1 method)

In [32]:
p = ones(100); p'

1x100 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  …  1.0  1.0  1.0  1.0  1.0  1.0  1.0

In [33]:
Sp = sampler(p)

Sampler{Float64,Int32}([1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0  …  1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0],Int32[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],Int32[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],100)

In [34]:
@time for i = 1:1e7
    sample(Sp)
end

  2.949322 seconds (30.04 M allocations: 611.924 MB, 3.55% gc time)


In [6]:
include("../src/newFastSampler.jl")

epsequal (generic function with 1 method)

In [7]:
nSp = newSampler(p)

NewSampler{Float64,Int64}([1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0  …  1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0],[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],100)

In [10]:
@time for i = 1:1e7
    newSample(nSp)
end

  0.661902 seconds


In [11]:
2.86756/0.661902

4.332302969321742

This is the speed-up from making sure n is typed!

In [3]:
nSp = newSampler(p)

NewSampler{Float64,Int64}([1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0  …  1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0],[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],100)

In [6]:
@time for i = 1:1e7
    newSample(nSp)
end

  0.352735 seconds


In [7]:
0.661902/0.352735

1.876485180092704

Speed-up from switching to ceil(Ti,rand()*n)

Total speed-up from type and ceil

In [8]:
2.86756/0.352735

8.129502317603867

Time compared to single sample w/o look-ups:

In [11]:
0.352735/0.062625

5.63249500998004

In [13]:
nSp = newSampler(p)

NewSampler{Float64,Int64}([1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0  …  1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0],[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],[1,2,3,4,5,6,7,8,9,10  …  91,92,93,94,95,96,97,98,99,100],100)

In [28]:
@time t = newSampleMany(nSp,round(Int64,1e7));

  0.159039 seconds (7 allocations: 76.294 MB, 4.81% gc time)


Speed-up from sampling many at once:

In [22]:
0.352735/0.158651

2.2233392792985867

Total speed up from type, ceil, and many-at-once:

In [23]:
2.86756/0.158651

18.07464182387757

Time compared to single sample w/o look-ups:

In [24]:
0.158651/0.062625

2.5333493013972053

Wow, pretty good

In [1]:
4.3*1.9*2.2

17.974

# Sampling w different distribution

In [29]:
distrSize = round(Int64,1e4)
sampCount = round(Int64,1e7)

10000000

Compare it to the old sampler again

In [30]:
include("../src/fastSampler.jl")

epsequal (generic function with 1 method)

In [31]:
p = rand(distrSize); p'

1x10000 Array{Float64,2}:
 0.942461  0.672257  0.211922  0.18137  …  0.439821  0.0888726  0.394198

In [32]:
nSp = newSampler(p)

NewSampler{Float64,Int64}([0.423718,0.362632,0.908496,0.959493,0.0102091,0.716698,0.139153,0.664286,0.660671,0.555271  …  0.596707,0.744528,0.997771,0.998678,0.879132,0.915059,0.975977,0.867482,0.963732,1.0],[3,4,5,9,10,11,12,14,16,17  …  9973,9975,9979,9981,9983,9985,9986,9989,9995,9996],[1,1,2,2,2,6,6,7,7,7  …  9989,9989,9995,9995,9995,9995,9996,9996,9996,9996],10000)

In [34]:
@time t = newSampleMany(nSp,sampCount);

  0.620302 seconds (6 allocations: 76.294 MB, 0.48% gc time)


In [35]:
sp = sampler(p)

Sampler{Float64,Int32}([0.423718,0.362632,0.908496,0.959493,0.0102091,0.716698,0.139153,0.664286,0.660671,0.555271  …  0.596707,0.744528,0.997771,0.998678,0.879132,0.915059,0.975977,0.867482,0.963732,1.0],Int32[3,4,5,9,10,11,12,14,16,17  …  9973,9975,9979,9981,9983,9985,9986,9989,9995,9996],Int32[1,1,2,2,2,6,6,7,7,7  …  9989,9989,9995,9995,9995,9995,9996,9996,9996,9996],10000)

In [38]:
@time for i = 1:sampCount
    sample(sp)
end

 10.846836 seconds (78.97 M allocations: 1.475 GB, 2.19% gc time)


In [27]:
6/0.28

21.428571428571427

# Looking at prelloc of randomness

In [1]:
include("../src/newFastSampler.jl")

epsequal (generic function with 1 method)

In [65]:
distrSize = round(Int64,1e4)
sampCount = round(Int64,1e7)

10000000

In [68]:
@time for i=1:sampCount
    t = rand()
end

  1.967678 seconds (40.00 M allocations: 762.924 MB, 8.29% gc time)


In [69]:
@time rand(sampCount);

  0.042698 seconds (8 allocations: 76.294 MB, 6.43% gc time)


In [70]:
p = rand(distrSize);

In [71]:
nSp = newSampler(p)

NewSampler{Float64,Int64}([0.409884,0.202567,0.587932,0.74016,0.913545,0.930345,0.344543,0.536824,0.134419,0.616749  …  0.923074,0.785685,0.997634,0.773605,0.781762,0.729834,0.970726,0.951223,0.828569,1.0],[3,4,5,6,10,11,15,19,24,28  …  9977,9978,9981,9982,9984,9985,9992,9994,9995,9998],[1,2,7,7,8,8,8,9,12,13  …  9995,9995,9998,9998,9998,9998,9998,9998,9998,9998],10000)

let's do prealloc randomness:

In [75]:
@time t = newSampleMany(nSp,sampCount);

  0.264126 seconds (6 allocations: 76.294 MB, 14.76% gc time)


In [76]:
@time t = newSampleManyPrealloc(nSp,sampCount);

  0.261617 seconds (12 allocations: 152.588 MB, 8.19% gc time)


wow, that's slow: but maybe it's mainly the memory alloc?

In [89]:
distrSize = round(Int64,1e4)
sampCount = round(Int64,1e4)

10000

In [90]:
@time t = newSampleMany(nSp,sampCount);

  0.000524 seconds (6 allocations: 78.344 KB)


In [91]:
@time t = newSampleManyPrealloc(nSp,sampCount);

  0.000544 seconds (12 allocations: 156.594 KB)


# Should rand be allocating the array

In [96]:
sampCount = round(Int64,1e7)

10000000

In [97]:
@time for i = 1:10
    t = rand(sampCount);
end

  0.270408 seconds (40 allocations: 762.940 MB, 22.10% gc time)


In [98]:
@time for i = 1:10
    t = Array{Float64,1}(sampCount)
    rand!(t);
end

  0.270643 seconds (20 allocations: 762.940 MB, 21.38% gc time)


Preallocating this array does very little?

# Optimization macros?
## @inbounds and @simd?

In [1]:
include("../src/newFastSampler.jl")

epsequal (generic function with 1 method)

In [2]:
distrSize = round(Int64,1e4)
sampCount = round(Int64,1e8)

100000000

In [3]:
p = rand(distrSize);
nSp = newSampler(p);

In [19]:
@time t = newSampleMany(nSp,sampCount);

  3.453693 seconds (6 allocations: 762.940 MB, 2.68% gc time)


In [20]:
@time t = newSampleManyInbounds(nSp,sampCount);

  3.253960 seconds (6 allocations: 762.940 MB, 2.22% gc time)


In [21]:
@time t = newSampleManyInboundsLines(nSp,sampCount);

  3.176713 seconds (6 allocations: 762.940 MB, 2.58% gc time)


In [22]:
@time t = newSampleManyInboundsSgnFn(nSp,sampCount);

  3.284361 seconds (6 allocations: 762.940 MB, 2.99% gc time)


In [23]:
@time t = newSampleManyInboundsSgnFnSimd(nSp,sampCount);

  3.405632 seconds (6 allocations: 762.940 MB, 2.41% gc time)


I don't think the @simd  macro is "active" since there is a random array access pattern

Rerunning the above, there's quite a lot of variance.
It seems perhaps that @inbounds does help a little.
Maybe pick newSampleManyInboundsSgnFn?

In [13]:
@time t = newSampleManyInboundsAllSgnFnSimd(nSp,sampCount);

LoadError: LoadError: MethodError: `isless` has no method matching isless(::Float64, ::Void)
Closest candidates are:
  isless(::Float64, !Matched::Float64)
  isless(::AbstractFloat, !Matched::AbstractFloat)
  isless(::Real, !Matched::AbstractFloat)
  ...
while loading In[13], in expression starting on line 155