Same LLVM IR, radically different performance #22408

timholy · 2017-06-17T18:17:57Z

This came up in the context of #22210, where I'm noticing a big performance hit on transpose for sparse matrices. A convenient test case comes from copying these lines to a separate file, and annotating _computecolptrs_halfperm! with @noinline (not strictly necessary since it doesn't inline on master) and then comparing the result of using either @noinline or @inline on _distributevals_halfperm!.

Demo:

A = sprand(600, 600, 0.01);
X = transpose(A);
using BenchmarkTools

With @inline on _distributevals_halfperm!:

julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  166.98 KiB
  allocs estimate:  10685
  --------------
  minimum time:     921.938 μs (0.00% GC)
  median time:      936.064 μs (0.00% GC)
  mean time:        954.923 μs (0.40% GC)
  maximum time:     1.627 ms (38.60% GC)
  --------------
  samples:          1046
  evals/sample:     1

With @noinline on _distributevals_halfperm!:

julia> @benchmark halfperm!($X, $A, $(1:A.n), $(identity)) seconds=1
BenchmarkTools.Trial: 
  memory estimate:  64 bytes
  allocs estimate:  2
  --------------
  minimum time:     23.175 μs (0.00% GC)
  median time:      23.390 μs (0.00% GC)
  mean time:        23.658 μs (0.00% GC)
  maximum time:     52.727 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Inspection does not suggest an immediate reason for this 40x performance gap; profiling places all the blame at this line with the function evaluation. It made me wonder whether there is some problem inlining the function call.

However, the truly bizarre part is that, with @inline, @code_llvm _distributevals_halfperm!(X, A, 1:A.n, identity) is, for all practical purposes that I can see, identical to @code_llvm halfperm!(X, A, 1:A.n, identity) (aside from the obvious call to _computecolptrs_halfperm!). I am not at all good at reading assembly, but even there the differences do not seem dramatic to me (there are some constant differences to movq statements that might be problematic?).

This seems really puzzling. LLVM bug? Present at least on 0.6.0-rc3 and master.

The text was updated successfully, but these errors were encountered:

A-Shah19 · 2017-06-28T17:52:34Z

I'm trying to implement this @noinline annotation to the _distributevals_halfperm! function as I also noticed a significant slowdown in Transposing large sparse matrices. How would I go about doing this to see if it makes a difference in performance on my tasks? (Sorry if this comment is basic but I'm confused as to how to edit the source code for the sparse matrix library locally) Thank you for the help

timholy · 2017-06-28T19:04:27Z

Do you have a git checkout of Julia or did you download as a binary? If you have the source you just edit the file and type make in the parent directory. But on 0.6 and higher you can also eval(mod, quote <code> end) to redefine the function within a running Julia session. Here mod (the module) would presumably be Base.SparseArrays.

A-Shah19 · 2017-06-29T13:43:29Z

I believe I have the binary as I installed with homebrew on Mac OS and I'm not sure if the git checkout has support for this OS. How would I do it for this version?

timholy · 2017-06-29T14:02:43Z

Just try

eval(Base.SparseArrays, quote
<Your revised function definition goes here>
end)

You can copy/paste the original version into the <> to test it, and then start pasting modified versions in once you've verified it works.

timholy · 2017-06-29T14:04:15Z

But mac OS is well supported, just see https://github.com/JuliaLang/julia#source-download-and-compilation

A-Shah19 · 2017-06-29T14:06:35Z

You're right, I just upgraded to 0.6 so I'll try it with the eval method. My attempt at git checkout method was buggy when makeing the file possibly due to proxy issues but regardless I think eval should do the trick.

gbaraldi · 2023-09-01T15:51:20Z

Running the original test locally I can't reproduce this, @timholy can we lose?

ararslan added the performance Must go faster label Jun 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Same LLVM IR, radically different performance #22408

Same LLVM IR, radically different performance #22408

timholy commented Jun 17, 2017

A-Shah19 commented Jun 28, 2017

timholy commented Jun 28, 2017

A-Shah19 commented Jun 29, 2017

timholy commented Jun 29, 2017 •

edited

Loading

timholy commented Jun 29, 2017

A-Shah19 commented Jun 29, 2017

gbaraldi commented Sep 1, 2023

Same LLVM IR, radically different performance #22408

Same LLVM IR, radically different performance #22408

Comments

timholy commented Jun 17, 2017

A-Shah19 commented Jun 28, 2017

timholy commented Jun 28, 2017

A-Shah19 commented Jun 29, 2017

timholy commented Jun 29, 2017 • edited Loading

timholy commented Jun 29, 2017

A-Shah19 commented Jun 29, 2017

gbaraldi commented Sep 1, 2023

timholy commented Jun 29, 2017 •

edited

Loading