Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow array allocation #4707

Closed
ghost opened this issue Nov 2, 2013 · 35 comments · Fixed by #4720
Closed

slow array allocation #4707

ghost opened this issue Nov 2, 2013 · 35 comments · Fixed by #4720
Labels
domain:linear algebra Linear algebra performance Must go faster
Milestone

Comments

@ghost
Copy link

ghost commented Nov 2, 2013

I am generating the 2D Laplacian in matlab and Julia.
For a 3D mesh of 64^3 I have a (sparse) matrix A which has 7 diagonals
of size 262144^2
I generate a random vector v of the right size
then run the following

----------------- Julia ----------------------

tic(); for i=1:100, z=A*v; end; toc()
elapsed time: 4.248922994 seconds

%-------------- matlab ---------------------
tic; for i=1:100, z = A*v; end; toc
Elapsed time is 0.282192 seconds.

I am not sure why is Julia so much slower for mat-vec products.

@ViralBShah
Copy link
Member

Could you post the matrix generator?

@ghost
Copy link
Author

ghost commented Nov 2, 2013

You'll need 3 codes for that

example:

A = getDivGrad(8,8,8);

#----------------- Get the A matrix
function getDivGrad(n1,n2,n3)

        # the Divergence
        D1 = kron(speye(n3),kron(speye(n2),ddx(n1)))
        D2 = kron(speye(n3),kron(ddx(n2),speye(n1)))
        D3 = kron(ddx(n3),kron(speye(n2),speye(n1)))
        # DIV from faces to cell-centers
        Div = [D1 D2 D3]

        return Div*Div';
end

#----------------- 1D finite difference on staggered grid
function ddx(n)
# generate 1D derivatives
    return d = spdiags(ones(n)*[-1 1],[0,1],n,n+1)

end

#------------- Build a diagonal matrix
function spdiags(B,d,m,n)
#   spdiags(B,d,m,n)
# creates a sparse matrix from its diagonals


d = d[:]
p = length(d)

len = zeros(p+1,1)
for k = 1:p
    len[k+1] = int(len[k]+length(max(1,1-d[k]):min(m,n-d[k])))
end
a = zeros(int(len[p+1]),3)
for k = 1:p
    # Append new d[k]-th diagonal to compact form
    i = max(1,1-d[k]):min(m,n-d[k])
    a[(int(len[k])+1):int(len[k+1]),:] = [i i+d[k] B[i+(m>=n)*d[k],k]]
end

A = sparse(int(a[:,1]),int(a[:,2]),a[:,3],m,n);

return A

end

@timholy
Copy link
Sponsor Member

timholy commented Nov 2, 2013

It would also be helpful to know how many cores you have, and to test when starting matlab with the -singleCompThread option.

@ghost
Copy link
Author

ghost commented Nov 2, 2013

Comparison on a single core with matlab, grid size 64^3
matrix size 64^3 \times 64^3

tic; for i=1:100,z = A*v;end; toc
Elapsed time is 0.378893 seconds.

tic(); for i=1:100, z=A*v; end; toc()
elapsed time: 4.09633341 seconds

@timholy
Copy link
Sponsor Member

timholy commented Nov 3, 2013

Hmm, I can't replicate a difference anywhere close to this. For me, Matlab runs in 0.9 seconds, slower than for you. Julia runs in 1.45 seconds, faster than for you. It would be great to close this gap, of course, but it's not the tenfold difference you see.

Are you using a recently-compiled Julia? What platform? What version of Matlab? I'm testing on R2012b.

@timholy
Copy link
Sponsor Member

timholy commented Nov 3, 2013

#4720 gets Julia's performance down to about 1.1 seconds, still a little slower than matlab but now they're pretty close. Somewhat surprisingly to me, disabling bounds-checking was the most important change, but every tweak made some difference.

@ViralBShah
Copy link
Member

@ehaber99 which version of julia are you using?

@ViralBShah
Copy link
Member

Also which OS?

@ghost ghost closed this as completed Nov 3, 2013
@ghost ghost reopened this Nov 3, 2013
@ghost
Copy link
Author

ghost commented Nov 3, 2013

I'm running everything through Julia Studio 0.4.1 and downloaded the latest version.
my matlab is 8.0.0.783 (R2012b).
I'm on OS X 10.8.5

I replicated this on another mac I have. I'll try to see tomorrow if my students can replicate it on their systems

@ghost
Copy link
Author

ghost commented Nov 3, 2013

timholy - can you post the code that made julia and matlab around the same speed?

@timholy
Copy link
Sponsor Member

timholy commented Nov 3, 2013

The test code is just your code; I translated the first two functions directly into matlab, too. (Did you translate it differently somehow? You're not doing any row/column rearrangements, are you?)

The code that provides a ~35% speed improvement for Julia is in #4720. If you were building Julia yourself, you could just git pull and rebuild. (But you're not, if you're using Studio.) However, that speed improvement is far too modest to account for the difference you're observing.

I have no idea what Julia version Julia Studio 0.4.1 corresponds to; versioninfo() will give us what we need.

@ghost
Copy link
Author

ghost commented Nov 3, 2013

Thanks

I have
Julia Version 0.2.0-rc1
Commit 040f3ab 2013-10-15 05:58:39 UTC
Platform Info:
System: Darwin (x86_64-apple-darwin12.5.0)
WORD_SIZE: 64
BLAS: libgfortblas
LAPACK: liblapack
LIBM: libopenlibm

@ghost
Copy link
Author

ghost commented Nov 3, 2013

by the way, my matlab is almost identical. No tricks ...

@timholy
Copy link
Sponsor Member

timholy commented Nov 3, 2013

The only things I can recommend are asking Forio to update to the latest, building Julia yourself, or profiling to figure out the issue. I've definitely seen at least one example of code that should have been fast, but wasn't at some point in the last month, but was fast again when I tried the latest Julia. The details escape me, unfortunately.

@IainNZ
Copy link
Member

IainNZ commented Nov 3, 2013

Could this be a good thing to capture in our codespeed tests?

@ghost
Copy link
Author

ghost commented Nov 4, 2013

I believe so.
Sparse matrix vector products are at the base of any iterative solution of PDE's as well as eigenvalue problems. A comparison of sparse mat-vec and (hopefully) showing its around the same speed as matlab is an excellent promotional tool

@JeffBezanson
Copy link
Sponsor Member

We need some code generation improvements here too.

@ViralBShah
Copy link
Member

Cc @staticfloat

We should certainly track this in code speed. Matvec performance is important and will be improved significantly as part of the iterative solvers work that is just starting out.

@andreasnoack
Copy link
Member

I can't get such a large difference either. I am running Julia pre #4720 and get
MATLAB: 0.48 sec
Julia: 0.77 sec

@ViralBShah
Copy link
Member

Perhaps @WestleyArgentum can provide a new julia studio build to try this out?

@WestleyArgentum
Copy link
Member

Sorry about the delay, we've been meaning to update since last week. There's a new build (0.4.2) out now though, and it includes rc2

@timholy
Copy link
Sponsor Member

timholy commented Nov 5, 2013

No worries, you're updating quite frequently given that it's a little more involve than git pull; make :-).

@staticfloat
Copy link
Sponsor Member

@ViralBShah Would this be tracked by the axpy test in our BLAS suite?

@ViralBShah
Copy link
Member

@staticfloat That is just a.*x .+ y for dense vectors.

@staticfloat
Copy link
Sponsor Member

Oh, of course, this is for sparse vectors. Got it.

@ViralBShah
Copy link
Member

This is for sparse matrix times dense vector.

@lruthotto
Copy link

To me it seems that most of the confusion in this thread arised from the slightly different syntax of the for-loop in Julia and Matlab. In Julia a semicolon is used to separate the for loop from the statements. Thus, in the above example matrix-vector product is evaluated more often in Julia than in Matlab which explains the huge difference in timings. This may also be the reason why most people here could not reproduce this.

However, on my MacBookPro, Julia (Julia Version 0.2.0-rc3) is still slower by a factor of 2 than Matlab R2013a.

#----------------- Julia (wrong) ----------------------
tic(); for i=1:100, z=A*v; end; toc()
elapsed time: 4.267891737 seconds

#----------------- Julia (correct) ----------------------
tic(); for i=1:100; z=A*v; end; toc()
elapsed time: 0.610396601 seconds

%-------------- Matlab R2013a ---------------------
tic; for i=1:100, z = A*v; end; toc
Elapsed time is 0.311197 seconds.

@pao
Copy link
Member

pao commented Nov 6, 2013

To me it seems that most of the confusion in this thread arised from the slightly different syntax of the for-loop in Julia and Matlab.

Perhaps it's best to use multiline for loops, then, to avoid this possible misunderstanding?

@lruthotto
Copy link

sure in this case that would have helped. I actually like Julia's ability to concisely define nested for loops by separating the ranges with commas. I think this pitfall for matlab users may be a candidate for the Noteworthy Differences from other Languages section.

@andreasnoack
Copy link
Member

@lruthotto Good catch. The difference between MATLAB and Julia is not that big on my machine, which is

Julia Version 0.2.0-rc3+5
anj/sparsemkl/7f12025 (fork: 1 commits, 0 days)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  WORD_SIZE: 64
  BLAS: libmkl_rt
  LAPACK: libmkl_rt
  LIBM: libm

It takes 0.31 seconds in Julia and 0.23 seconds in MATLAB. The anj/sparsemkl branch wraps the sparse BLAS in MKL and then the multiplication runs in 0.26 seconds. The last difference must be the allocation time. MATLAB allocates 100 vectors with length 160000 in 0.006 seconds, but Julia needs 0.04 seconds.

@timholy
Copy link
Sponsor Member

timholy commented Nov 7, 2013

@lruthotto, nice catch indeed. @ehaber99, I'm sorry I didn't notice that as a the reason earlier.

@lruthotto, were you running Matlab single-threaded? Obviously it's a relevant comparison to check the multithreaded Matlab, but it's also worth knowing as it may explain the timing difference.

@lruthotto
Copy link

Yes, I ran the code in a single threaded Matlab.

@andreasnoackjensen: I also see a huge difference in allocation speed:

---- MATLAB ----
tic; for i=1:100; z = zeros(160000,1); end; toc;
Elapsed time is 0.008080 seconds.

---- Julia ----
tic(); for i=1:100; z = zeros(160000); end; toc();
elapsed time: 0.080487484 seconds

Maybe that is the real bottleneck on my machine? Here's my versioninfo()

Julia Version 0.2.0-rc3+8
Commit 4b2a4e1 (0 days old master)
Platform Info:
System: Darwin (x86_64-apple-darwin12.5.0)
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm

@ViralBShah ViralBShah mentioned this issue Nov 19, 2013
21 tasks
@ViralBShah
Copy link
Member

@JeffBezanson Do we need codegen improvements here, or is it memory allocation / GC improvements to get on par? Is it possible that Matlab is using SIMD instructions?

gitfoxi pushed a commit to gitfoxi/julia that referenced this issue Dec 13, 2013
* upstream/master: (53 commits)
  edit embedding chapter
  formatting fixes
  Fix JuliaLang#5056
  more consitent/useful description of predicate in ie all()
  NEWS for JuliaLang#4042
  Add sparse matvec to perf benchmark (JuliaLang#4707)
  Use warn_once instead of warn (JuliaLang#5038)
  Use warn() instead of println in base/sprase/sparsematrix.jl
  allow scale(b,A) or scale(A,b) when b is a scalar as well as a vector, don't restrict scale unnecessarily to arrays of numbers (e.g. scaling arrays of arrays should work), and improve documentation for scale\!
  Added section about memory management
  added negative bitarray rolling
  More accurate linspace for types with greater precision than Float64
  add realmin realmax for BigFloat
  fix eps(realmax), add typemin and typemax for BigFloat
  fix JuliaLang#5025, ordering used by nextfloat/prevfloat
  roll back on errors during type definitions. fixes JuliaLang#5009
  improve method signature errors. fixes JuliaLang#5018
  update juliadoc
  add more asserts for casts
  Fix segfaulting on strstr() failure
  ...
@andreasnoack
Copy link
Member

The header should either be changed to something like "slow array allocation" or the issue should be closed because the original problem was because of a typo.

@JeffBezanson
Copy link
Sponsor Member

I think we can close this. All that's left here is generic performance improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants