Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an inductive range check elimination pass #42573

Merged
merged 2 commits into from Oct 14, 2021

Conversation

pchintalapudi
Copy link
Member

@pchintalapudi pchintalapudi commented Oct 10, 2021

As stated in #42521, using @inbounds to denote areas where bounds checks are necessary is dangerous. LLVM provides an InductiveRangeCheckElimination pass which can extract some bounds checks which are known to be safe.

Godbolt Links:

@noinline function f!(ys, xs)
                       for i in eachindex(ys, xs)
                         x = xs[i]
                         if -0.5 < x < 0.5
                           ys[i] = 2*x
                         end
                      end
              end

Original: https://godbolt.org/z/YY3WrYThh
With @inbounds: https://godbolt.org/z/cEW9GTsT3
Original+IRCE: https://godbolt.org/z/j1YndbzrE

The pass makes the control flow graph a little messier than @inbounds does, but does allow vectorization that is not allowable when bounds checks are present.

src/debuginfo.cpp Outdated Show resolved Hide resolved
@vchuravy
Copy link
Sponsor Member

Nice! Could you time the passes during sysimage compilation? So that we can see how much additional time this takes.

@vchuravy
Copy link
Sponsor Member

@nanosoldier runtests(ALL, vs = ":master")

@vchuravy
Copy link
Sponsor Member

Can you post some benchmark numbers from your test-cases?

@vchuravy vchuravy added the compiler:codegen Generation of LLVM IR and native code label Oct 10, 2021
@pchintalapudi
Copy link
Member Author

Nice! Could you time the passes during sysimage compilation? So that we can see how much additional time this takes.

Core.Compiler: 45.6049 seconds
Base ───────────── 23.928286 seconds
ArgTools ───────── 4.344550 seconds
Artifacts ──────── 0.106307 seconds
Base64 ─────────── 0.106040 seconds
CRC32c ─────────── 0.007023 seconds
FileWatching ───── 0.098718 seconds
Libdl ──────────── 0.001533 seconds
Logging ────────── 0.036474 seconds
Mmap ───────────── 0.083878 seconds
NetworkOptions ─── 0.094496 seconds
SHA ────────────── 0.187798 seconds
Serialization ──── 0.288951 seconds
Sockets ────────── 0.335865 seconds
Unicode ────────── 0.007169 seconds
DelimitedFiles ─── 0.103579 seconds
LinearAlgebra ──── 7.549069 seconds
Markdown ───────── 0.824861 seconds
Printf ─────────── 0.267628 seconds
Random ─────────── 1.139277 seconds
Tar ────────────── 0.275640 seconds
Dates ──────────── 1.595620 seconds
Distributed ────── 0.787926 seconds
Future ─────────── 0.004703 seconds
InteractiveUtils ─ 0.526222 seconds
LibGit2 ────────── 1.342156 seconds
Profile ────────── 0.271113 seconds
SparseArrays ───── 2.652343 seconds
UUIDs ──────────── 0.015049 seconds
REPL ───────────── 3.619972 seconds
SharedArrays ───── 0.520581 seconds
Statistics ─────── 0.193729 seconds
SuiteSparse ────── 1.590027 seconds
TOML ───────────── 0.068915 seconds
Test ───────────── 0.296838 seconds
LibCURL ────────── 0.415338 seconds
Downloads ──────── 0.385399 seconds
Pkg ────────────── 4.512018 seconds
LazyArtifacts ──── 0.001587 seconds
Stdlibs total ──── 34.670334 seconds
Sysimage built. Summary:
Total ─────── 58.600201 seconds
Base: ─────── 23.928286 seconds 40.8331%
Stdlibs: ──── 34.670334 seconds 59.1642%

Generating REPL precompile statements... 31/31
Executing precompile statements... 1221/1253
Precompilation complete. Summary:
Total ─────── 100.895169 seconds
Generation ── 76.441509 seconds 75.7633%
Execution ─── 24.453660 seconds 24.2367%

@pchintalapudi
Copy link
Member Author

Can you post some benchmark numbers from your test-cases?

Test Case 1:

@noinline function f!(ys, xs)
                       for i in eachindex(ys, xs)
                         x = xs[i]
                         if -0.5 < x < 0.5
                           ys[i] = 2*x
                         end
                      end
              end

Original:

julia> @benchmark f!(zeros(Float64, 1000), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  7.703 μs … 274.071 μs  ┊ GC (min … max): 0.00% … 95.81%
 Time  (median):     8.200 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.682 μs ±   6.729 μs  ┊ GC (mean ± σ):  2.09% ±  2.66%

   ▁▆▇█▇▆▄▁    ▁▃▂▂▁                   ▁▁▁▁▁                  ▂
  ▅████████▇▆▇▇███████▇▆▅▄▃▃▃▃▁▁▅▄▆▆▇████████▇▆▆▄▅▄▃▄▃▄▄▄▁▄▅▃ █
  7.7 μs       Histogram: log(frequency) by time      13.9 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Original + IRCE:

julia> @benchmark f!(zeros(Float64, 1000), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.650 μs … 272.520 μs  ┊ GC (min … max): 0.00% … 97.36%
 Time  (median):     2.946 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.972 μs ±   7.472 μs  ┊ GC (mean ± σ):  9.37% ±  5.22%

  ▂▆██▇▄                                  ▁▂▃▃▃▂▁▁  ▁▂▃▃▃▂▁▁  ▂
  ███████▇▆▄▄▁▃▄▁▁▁▃▁▃▅▅▄▅▄▃▁▃▁▁▃▃▃▁▁▁▁▄▆███████████████████▇ █
  2.65 μs      Histogram: log(frequency) by time      7.64 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Original + inbounds:

julia> @benchmark f!(zeros(Float64, 1000), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.593 μs … 300.730 μs  ┊ GC (min … max):  0.00% … 97.46%
 Time  (median):     2.934 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.029 μs ±   8.744 μs  ┊ GC (mean ± σ):  11.03% ±  5.24%

  ▆▇▆██▆▂                                 ▁▂▂▃▃▃▂▁  ▁▂▂▃▃▃▂▂  ▂
  ███████▇▆▆▅▁▃▅▅▃▆▆▆▆▅▆▅▅▃▁▁▅▃▁▁▁▃▁▁▃▅▅▅▇███████████████████ █
  2.59 μs      Histogram: log(frequency) by time      7.73 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Test Case 2:

function f2!(a, xs)
                  xs = Base.Experimental.Const(xs)
                  Base.Experimental.@aliasscope begin
                      for i in eachindex(xs)
                          a[1] += xs[i]
                      end
                  end
              end

Original:

julia> @benchmark f2!(zeros(Float64, 1), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.666 μs … 299.508 μs  ┊ GC (min … max): 0.00% … 97.83%
 Time  (median):     6.269 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.914 μs ±   7.616 μs  ┊ GC (mean ± σ):  3.72% ±  3.04%

   █▂
  ▇██▅▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▂▂▂▂▃▄▆▇▇▆▄▃▂▂▂▂▂▂▂▃▃▃▃▃▃▂▂ ▃
  4.67 μs         Histogram: frequency by time        7.23 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 2.

Original + IRCE:

julia> @benchmark f2!(zeros(Float64, 1), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.306 μs … 237.869 μs  ┊ GC (min … max): 0.00% … 98.01%
 Time  (median):     2.399 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.922 μs ±   4.680 μs  ┊ GC (mean ± σ):  4.97% ±  3.55%

  ▅█▇▄▁                              ▂▃▃▃▂▂▁           ▁▁▁    ▂
  ██████▇▆▄▅▃▁▁▅▅▅▄▄▃▁▃▃▃▁▃▁▁▁▁▁▁▁▃▆██████████▇▆▆▃▅▃▅▆██████▇ █
  2.31 μs      Histogram: log(frequency) by time      5.34 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 2.

Original + inbounds:

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.330 μs … 253.731 μs  ┊ GC (min … max): 0.00% … 98.23%
 Time  (median):     2.448 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.927 μs ±   4.951 μs  ┊ GC (mean ± σ):  5.70% ±  3.60%

  ▅▇██▄                                  ▂▃▃▃▃▂▁▁▁  ▁▁▁▂▁ ▁   ▂
  ██████▇▇▆▆▆▅▅▃▃▁▁▄▁▁▄▄▅▅▄▁▄▃▄▁▁▄▁▁▃▃▁▆████████████████████▇ █
  2.33 μs      Histogram: log(frequency) by time      4.75 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 2.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@vtjnash vtjnash added the needs nanosoldier run This PR should have benchmarks run on it label Oct 12, 2021
@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 12, 2021

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: error when preparing/pushing to report repo: failed process: Process(setenv(`git push`; dir="/run/media/system/data/nanosoldier/workdir/NanosoldierReports"), ProcessExited(1)) [1]

Unfortunately, the logs could not be uploaded.
cc @christopher-dG

@vchuravy
Copy link
Sponsor Member

@vtjnash can you upload the logs? and maybe we should remove @christopher-dG from the ping list.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 13, 2021

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 13, 2021

Is this related: ["misc", "allocation elision view", "no conditional"] 2.14 (5%) ❌ 1.00 (1%)

Otherwise, all others seem to be noise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code needs nanosoldier run This PR should have benchmarks run on it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants