Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Same LLVM IR, radically different performance #22408

Open
timholy opened this issue Jun 17, 2017 · 7 comments
Open

Same LLVM IR, radically different performance #22408

timholy opened this issue Jun 17, 2017 · 7 comments
Labels
performance Must go faster

Comments

@timholy
Copy link
Sponsor Member

timholy commented Jun 17, 2017

This came up in the context of #22210, where I'm noticing a big performance hit on transpose for sparse matrices. A convenient test case comes from copying these lines to a separate file, and annotating _computecolptrs_halfperm! with @noinline (not strictly necessary since it doesn't inline on master) and then comparing the result of using either @noinline or @inline on _distributevals_halfperm!.

Demo:

A = sprand(600, 600, 0.01);
X = transpose(A);
using BenchmarkTools

With @inline on _distributevals_halfperm!:

julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  166.98 KiB
  allocs estimate:  10685
  --------------
  minimum time:     921.938 μs (0.00% GC)
  median time:      936.064 μs (0.00% GC)
  mean time:        954.923 μs (0.40% GC)
  maximum time:     1.627 ms (38.60% GC)
  --------------
  samples:          1046
  evals/sample:     1

With @noinline on _distributevals_halfperm!:

julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     23.175 μs (0.00% GC)
  median time:      23.390 μs (0.00% GC)
  mean time:        23.658 μs (0.00% GC)
  maximum time:     52.727 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Inspection does not suggest an immediate reason for this 40x performance gap; profiling places all the blame at this line with the function evaluation. It made me wonder whether there is some problem inlining the function call.

However, the truly bizarre part is that, with @inline, @code_llvm _distributevals_halfperm!(X, A, 1:A.n, identity) is, for all practical purposes that I can see, identical to @code_llvm halfperm!(X, A, 1:A.n, identity) (aside from the obvious call to _computecolptrs_halfperm!). I am not at all good at reading assembly, but even there the differences do not seem dramatic to me (there are some constant differences to movq statements that might be problematic?).

This seems really puzzling. LLVM bug? Present at least on 0.6.0-rc3 and master.

@ararslan ararslan added the performance Must go faster label Jun 18, 2017
@A-Shah19
Copy link

I'm trying to implement this @noinline annotation to the _distributevals_halfperm! function as I also noticed a significant slowdown in Transposing large sparse matrices. How would I go about doing this to see if it makes a difference in performance on my tasks? (Sorry if this comment is basic but I'm confused as to how to edit the source code for the sparse matrix library locally) Thank you for the help

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 28, 2017

Do you have a git checkout of Julia or did you download as a binary? If you have the source you just edit the file and type make in the parent directory. But on 0.6 and higher you can also eval(mod, quote <code> end) to redefine the function within a running Julia session. Here mod (the module) would presumably be Base.SparseArrays.

@A-Shah19
Copy link

I believe I have the binary as I installed with homebrew on Mac OS and I'm not sure if the git checkout has support for this OS. How would I do it for this version?

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2017

Just try

eval(Base.SparseArrays, quote
<Your revised function definition goes here>
end)

You can copy/paste the original version into the <> to test it, and then start pasting modified versions in once you've verified it works.

@timholy
Copy link
Sponsor Member Author

timholy commented Jun 29, 2017

But mac OS is well supported, just see https://github.com/JuliaLang/julia#source-download-and-compilation

@A-Shah19
Copy link

You're right, I just upgraded to 0.6 so I'll try it with the eval method. My attempt at git checkout method was buggy when makeing the file possibly due to proxy issues but regardless I think eval should do the trick.

@gbaraldi
Copy link
Member

gbaraldi commented Sep 1, 2023

Running the original test locally I can't reproduce this, @timholy can we lose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants