slow array allocation #4707

ghost · 2013-11-02T04:55:05Z

I am generating the 2D Laplacian in matlab and Julia.
For a 3D mesh of 64^3 I have a (sparse) matrix A which has 7 diagonals
of size 262144^2
I generate a random vector v of the right size
then run the following

----------------- Julia ----------------------

tic(); for i=1:100, z=A*v; end; toc()
elapsed time: 4.248922994 seconds

%-------------- matlab ---------------------
tic; for i=1:100, z = A*v; end; toc
Elapsed time is 0.282192 seconds.

I am not sure why is Julia so much slower for mat-vec products.

ViralBShah · 2013-11-02T05:48:42Z

Could you post the matrix generator?

ghost · 2013-11-02T08:39:55Z

You'll need 3 codes for that

example:

A = getDivGrad(8,8,8);

#----------------- Get the A matrix
function getDivGrad(n1,n2,n3)

        # the Divergence
        D1 = kron(speye(n3),kron(speye(n2),ddx(n1)))
        D2 = kron(speye(n3),kron(ddx(n2),speye(n1)))
        D3 = kron(ddx(n3),kron(speye(n2),speye(n1)))
        # DIV from faces to cell-centers
        Div = [D1 D2 D3]

        return Div*Div';
end

#----------------- 1D finite difference on staggered grid
function ddx(n)
# generate 1D derivatives
    return d = spdiags(ones(n)*[-1 1],[0,1],n,n+1)

end

#------------- Build a diagonal matrix
function spdiags(B,d,m,n)
#   spdiags(B,d,m,n)
# creates a sparse matrix from its diagonals


d = d[:]
p = length(d)

len = zeros(p+1,1)
for k = 1:p
    len[k+1] = int(len[k]+length(max(1,1-d[k]):min(m,n-d[k])))
end
a = zeros(int(len[p+1]),3)
for k = 1:p
    # Append new d[k]-th diagonal to compact form
    i = max(1,1-d[k]):min(m,n-d[k])
    a[(int(len[k])+1):int(len[k+1]),:] = [i i+d[k] B[i+(m>=n)*d[k],k]]
end

A = sparse(int(a[:,1]),int(a[:,2]),a[:,3],m,n);

return A

end

timholy · 2013-11-02T11:02:05Z

It would also be helpful to know how many cores you have, and to test when starting matlab with the -singleCompThread option.

ghost · 2013-11-02T17:05:52Z

Comparison on a single core with matlab, grid size 64^3
matrix size 64^3 \times 64^3

tic; for i=1:100,z = A*v;end; toc
Elapsed time is 0.378893 seconds.

tic(); for i=1:100, z=A*v; end; toc()
elapsed time: 4.09633341 seconds

timholy · 2013-11-03T11:26:54Z

Hmm, I can't replicate a difference anywhere close to this. For me, Matlab runs in 0.9 seconds, slower than for you. Julia runs in 1.45 seconds, faster than for you. It would be great to close this gap, of course, but it's not the tenfold difference you see.

Are you using a recently-compiled Julia? What platform? What version of Matlab? I'm testing on R2012b.

timholy · 2013-11-03T12:11:47Z

#4720 gets Julia's performance down to about 1.1 seconds, still a little slower than matlab but now they're pretty close. Somewhat surprisingly to me, disabling bounds-checking was the most important change, but every tweak made some difference.

ViralBShah · 2013-11-03T12:18:49Z

@ehaber99 which version of julia are you using?

ViralBShah · 2013-11-03T12:22:37Z

Also which OS?

ghost · 2013-11-03T21:37:13Z

I'm running everything through Julia Studio 0.4.1 and downloaded the latest version.
my matlab is 8.0.0.783 (R2012b).
I'm on OS X 10.8.5

I replicated this on another mac I have. I'll try to see tomorrow if my students can replicate it on their systems

ghost · 2013-11-03T21:39:19Z

timholy - can you post the code that made julia and matlab around the same speed?

timholy · 2013-11-03T21:51:09Z

The test code is just your code; I translated the first two functions directly into matlab, too. (Did you translate it differently somehow? You're not doing any row/column rearrangements, are you?)

The code that provides a ~35% speed improvement for Julia is in #4720. If you were building Julia yourself, you could just git pull and rebuild. (But you're not, if you're using Studio.) However, that speed improvement is far too modest to account for the difference you're observing.

I have no idea what Julia version Julia Studio 0.4.1 corresponds to; versioninfo() will give us what we need.

ghost · 2013-11-03T21:57:25Z

Thanks

I have
Julia Version 0.2.0-rc1
Commit 040f3ab 2013-10-15 05:58:39 UTC
Platform Info:
System: Darwin (x86_64-apple-darwin12.5.0)
WORD_SIZE: 64
BLAS: libgfortblas
LAPACK: liblapack
LIBM: libopenlibm

ghost · 2013-11-03T21:58:57Z

by the way, my matlab is almost identical. No tricks ...

timholy · 2013-11-03T22:07:48Z

The only things I can recommend are asking Forio to update to the latest, building Julia yourself, or profiling to figure out the issue. I've definitely seen at least one example of code that should have been fast, but wasn't at some point in the last month, but was fast again when I tried the latest Julia. The details escape me, unfortunately.

IainNZ · 2013-11-03T23:31:50Z

Could this be a good thing to capture in our codespeed tests?

ghost · 2013-11-04T00:42:05Z

I believe so.
Sparse matrix vector products are at the base of any iterative solution of PDE's as well as eigenvalue problems. A comparison of sparse mat-vec and (hopefully) showing its around the same speed as matlab is an excellent promotional tool

JeffBezanson · 2013-11-04T02:22:52Z

We need some code generation improvements here too.

ViralBShah · 2013-11-04T02:48:51Z

Cc @staticfloat

We should certainly track this in code speed. Matvec performance is important and will be improved significantly as part of the iterative solvers work that is just starting out.

andreasnoack · 2013-11-04T06:32:57Z

I can't get such a large difference either. I am running Julia pre #4720 and get
MATLAB: 0.48 sec
Julia: 0.77 sec

ViralBShah · 2013-11-04T07:15:35Z

Perhaps @WestleyArgentum can provide a new julia studio build to try this out?

WestleyArgentum · 2013-11-05T00:26:11Z

Sorry about the delay, we've been meaning to update since last week. There's a new build (0.4.2) out now though, and it includes rc2

timholy · 2013-11-05T00:39:21Z

No worries, you're updating quite frequently given that it's a little more involve than git pull; make :-).

staticfloat · 2013-11-05T23:13:08Z

@ViralBShah Would this be tracked by the axpy test in our BLAS suite?

ViralBShah · 2013-11-06T03:37:39Z

@staticfloat That is just a.*x .+ y for dense vectors.

staticfloat · 2013-11-06T05:21:43Z

Oh, of course, this is for sparse vectors. Got it.

ViralBShah · 2013-11-06T17:07:21Z

This is for sparse matrix times dense vector.

lruthotto · 2013-11-06T23:42:28Z

To me it seems that most of the confusion in this thread arised from the slightly different syntax of the for-loop in Julia and Matlab. In Julia a semicolon is used to separate the for loop from the statements. Thus, in the above example matrix-vector product is evaluated more often in Julia than in Matlab which explains the huge difference in timings. This may also be the reason why most people here could not reproduce this.

However, on my MacBookPro, Julia (Julia Version 0.2.0-rc3) is still slower by a factor of 2 than Matlab R2013a.

#----------------- Julia (wrong) ----------------------
tic(); for i=1:100, z=A*v; end; toc()
elapsed time: 4.267891737 seconds

#----------------- Julia (correct) ----------------------
tic(); for i=1:100; z=A*v; end; toc()
elapsed time: 0.610396601 seconds

%-------------- Matlab R2013a ---------------------
tic; for i=1:100, z = A*v; end; toc
Elapsed time is 0.311197 seconds.

pao · 2013-11-06T23:52:23Z

To me it seems that most of the confusion in this thread arised from the slightly different syntax of the for-loop in Julia and Matlab.

Perhaps it's best to use multiline for loops, then, to avoid this possible misunderstanding?

lruthotto · 2013-11-07T00:01:28Z

sure in this case that would have helped. I actually like Julia's ability to concisely define nested for loops by separating the ranges with commas. I think this pitfall for matlab users may be a candidate for the Noteworthy Differences from other Languages section.

andreasnoack · 2013-11-07T08:19:14Z

@lruthotto Good catch. The difference between MATLAB and Julia is not that big on my machine, which is

Julia Version 0.2.0-rc3+5
anj/sparsemkl/7f12025 (fork: 1 commits, 0 days)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libm

It takes 0.31 seconds in Julia and 0.23 seconds in MATLAB. The anj/sparsemkl branch wraps the sparse BLAS in MKL and then the multiplication runs in 0.26 seconds. The last difference must be the allocation time. MATLAB allocates 100 vectors with length 160000 in 0.006 seconds, but Julia needs 0.04 seconds.

timholy · 2013-11-07T11:34:27Z

@lruthotto, nice catch indeed. @ehaber99, I'm sorry I didn't notice that as a the reason earlier.

@lruthotto, were you running Matlab single-threaded? Obviously it's a relevant comparison to check the multithreaded Matlab, but it's also worth knowing as it may explain the timing difference.

lruthotto · 2013-11-07T18:08:37Z

Yes, I ran the code in a single threaded Matlab.

@andreasnoackjensen: I also see a huge difference in allocation speed:

---- MATLAB ----
tic; for i=1:100; z = zeros(160000,1); end; toc;
Elapsed time is 0.008080 seconds.

---- Julia ----
tic(); for i=1:100; z = zeros(160000); end; toc();
elapsed time: 0.080487484 seconds

Maybe that is the real bottleneck on my machine? Here's my versioninfo()

Julia Version 0.2.0-rc3+8
Commit 4b2a4e1 (0 days old master)
Platform Info:
System: Darwin (x86_64-apple-darwin12.5.0)
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm

ViralBShah · 2013-11-28T17:57:55Z

@JeffBezanson Do we need codegen improvements here, or is it memory allocation / GC improvements to get on par? Is it possible that Matlab is using SIMD instructions?

* upstream/master: (53 commits) edit embedding chapter formatting fixes Fix JuliaLang#5056 more consitent/useful description of predicate in ie all() NEWS for JuliaLang#4042 Add sparse matvec to perf benchmark (JuliaLang#4707) Use warn_once instead of warn (JuliaLang#5038) Use warn() instead of println in base/sprase/sparsematrix.jl allow scale(b,A) or scale(A,b) when b is a scalar as well as a vector, don't restrict scale unnecessarily to arrays of numbers (e.g. scaling arrays of arrays should work), and improve documentation for scale\! Added section about memory management added negative bitarray rolling More accurate linspace for types with greater precision than Float64 add realmin realmax for BigFloat fix eps(realmax), add typemin and typemax for BigFloat fix JuliaLang#5025, ordering used by nextfloat/prevfloat roll back on errors during type definitions. fixes JuliaLang#5009 improve method signature errors. fixes JuliaLang#5018 update juliadoc add more asserts for casts Fix segfaulting on strstr() failure ...

andreasnoack · 2014-03-04T16:32:34Z

The header should either be changed to something like "slow array allocation" or the issue should be closed because the original problem was because of a typo.

JeffBezanson · 2014-03-04T19:00:38Z

I think we can close this. All that's left here is generic performance improvements.

timholy mentioned this issue Nov 3, 2013

Speed up (and test) sparse matrix-vector multiplication #4720

Merged

ghost closed this as completed Nov 3, 2013

ghost reopened this Nov 3, 2013

JeffBezanson closed this as completed in #4720 Nov 4, 2013

JeffBezanson reopened this Nov 4, 2013

ViralBShah mentioned this issue Nov 19, 2013

Roadmap for 0.3 #4853

Closed

21 tasks

ViralBShah pushed a commit that referenced this issue Dec 7, 2013

Add sparse matvec to perf benchmark (#4707)

8ce7dfb

JeffBezanson closed this as completed Mar 4, 2014

slow array allocation #4707

slow array allocation #4707

Comments

ghost commented Nov 2, 2013

----------------- Julia ----------------------

ViralBShah commented Nov 2, 2013

ghost commented Nov 2, 2013

timholy commented Nov 2, 2013

ghost commented Nov 2, 2013

timholy commented Nov 3, 2013

timholy commented Nov 3, 2013

ViralBShah commented Nov 3, 2013

ViralBShah commented Nov 3, 2013

ghost commented Nov 3, 2013

ghost commented Nov 3, 2013

timholy commented Nov 3, 2013

ghost commented Nov 3, 2013

ghost commented Nov 3, 2013

timholy commented Nov 3, 2013

IainNZ commented Nov 3, 2013

ghost commented Nov 4, 2013

JeffBezanson commented Nov 4, 2013

ViralBShah commented Nov 4, 2013

andreasnoack commented Nov 4, 2013

ViralBShah commented Nov 4, 2013

WestleyArgentum commented Nov 5, 2013

timholy commented Nov 5, 2013

staticfloat commented Nov 5, 2013

ViralBShah commented Nov 6, 2013

staticfloat commented Nov 6, 2013

ViralBShah commented Nov 6, 2013

lruthotto commented Nov 6, 2013

pao commented Nov 6, 2013

lruthotto commented Nov 7, 2013

andreasnoack commented Nov 7, 2013

timholy commented Nov 7, 2013

lruthotto commented Nov 7, 2013

ViralBShah commented Nov 28, 2013

andreasnoack commented Mar 4, 2014

JeffBezanson commented Mar 4, 2014