add an inductive range check elimination pass #42573

pchintalapudi · 2021-10-10T02:08:56Z

As stated in #42521, using @inbounds to denote areas where bounds checks are necessary is dangerous. LLVM provides an InductiveRangeCheckElimination pass which can extract some bounds checks which are known to be safe.

Godbolt Links:

@noinline function f!(ys, xs)
                       for i in eachindex(ys, xs)
                         x = xs[i]
                         if -0.5 < x < 0.5
                           ys[i] = 2*x
                         end
                      end
              end

Original: https://godbolt.org/z/YY3WrYThh
With @inbounds: https://godbolt.org/z/cEW9GTsT3
Original+IRCE: https://godbolt.org/z/j1YndbzrE

The pass makes the control flow graph a little messier than @inbounds does, but does allow vectorization that is not allowable when bounds checks are present.

src/debuginfo.cpp

vchuravy · 2021-10-10T02:33:54Z

Nice! Could you time the passes during sysimage compilation? So that we can see how much additional time this takes.

vchuravy · 2021-10-10T02:41:39Z

@nanosoldier runtests(ALL, vs = ":master")

vchuravy · 2021-10-10T02:47:44Z

Can you post some benchmark numbers from your test-cases?

pchintalapudi · 2021-10-10T04:21:00Z

Nice! Could you time the passes during sysimage compilation? So that we can see how much additional time this takes.

Core.Compiler: 45.6049 seconds
Base ───────────── 23.928286 seconds
ArgTools ───────── 4.344550 seconds
Artifacts ──────── 0.106307 seconds
Base64 ─────────── 0.106040 seconds
CRC32c ─────────── 0.007023 seconds
FileWatching ───── 0.098718 seconds
Libdl ──────────── 0.001533 seconds
Logging ────────── 0.036474 seconds
Mmap ───────────── 0.083878 seconds
NetworkOptions ─── 0.094496 seconds
SHA ────────────── 0.187798 seconds
Serialization ──── 0.288951 seconds
Sockets ────────── 0.335865 seconds
Unicode ────────── 0.007169 seconds
DelimitedFiles ─── 0.103579 seconds
LinearAlgebra ──── 7.549069 seconds
Markdown ───────── 0.824861 seconds
Printf ─────────── 0.267628 seconds
Random ─────────── 1.139277 seconds
Tar ────────────── 0.275640 seconds
Dates ──────────── 1.595620 seconds
Distributed ────── 0.787926 seconds
Future ─────────── 0.004703 seconds
InteractiveUtils ─ 0.526222 seconds
LibGit2 ────────── 1.342156 seconds
Profile ────────── 0.271113 seconds
SparseArrays ───── 2.652343 seconds
UUIDs ──────────── 0.015049 seconds
REPL ───────────── 3.619972 seconds
SharedArrays ───── 0.520581 seconds
Statistics ─────── 0.193729 seconds
SuiteSparse ────── 1.590027 seconds
TOML ───────────── 0.068915 seconds
Test ───────────── 0.296838 seconds
LibCURL ────────── 0.415338 seconds
Downloads ──────── 0.385399 seconds
Pkg ────────────── 4.512018 seconds
LazyArtifacts ──── 0.001587 seconds
Stdlibs total ──── 34.670334 seconds
Sysimage built. Summary:
Total ─────── 58.600201 seconds
Base: ─────── 23.928286 seconds 40.8331%
Stdlibs: ──── 34.670334 seconds 59.1642%

Generating REPL precompile statements... 31/31
Executing precompile statements... 1221/1253
Precompilation complete. Summary:
Total ─────── 100.895169 seconds
Generation ── 76.441509 seconds 75.7633%
Execution ─── 24.453660 seconds 24.2367%

pchintalapudi · 2021-10-10T04:51:29Z

Can you post some benchmark numbers from your test-cases?

Test Case 1:

@noinline function f!(ys, xs)
                       for i in eachindex(ys, xs)
                         x = xs[i]
                         if -0.5 < x < 0.5
                           ys[i] = 2*x
                         end
                      end
              end

Original:

julia> @benchmark f!(zeros(Float64, 1000), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 3 evaluations.
 Range (min … max):  7.703 μs … 274.071 μs  ┊ GC (min … max): 0.00% … 95.81%
 Time  (median):     8.200 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.682 μs ±   6.729 μs  ┊ GC (mean ± σ):  2.09% ±  2.66%

   ▁▆▇█▇▆▄▁    ▁▃▂▂▁                   ▁▁▁▁▁                  ▂
  ▅████████▇▆▇▇███████▇▆▅▄▃▃▃▃▁▁▅▄▆▆▇████████▇▆▆▄▅▄▃▄▃▄▄▄▁▄▅▃ █
  7.7 μs       Histogram: log(frequency) by time      13.9 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Original + IRCE:

julia> @benchmark f!(zeros(Float64, 1000), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.650 μs … 272.520 μs  ┊ GC (min … max): 0.00% … 97.36%
 Time  (median):     2.946 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.972 μs ±   7.472 μs  ┊ GC (mean ± σ):  9.37% ±  5.22%

  ▂▆██▇▄                                  ▁▂▃▃▃▂▁▁  ▁▂▃▃▃▂▁▁  ▂
  ███████▇▆▄▄▁▃▄▁▁▁▃▁▃▅▅▄▅▄▃▁▃▁▁▃▃▃▁▁▁▁▄▆███████████████████▇ █
  2.65 μs      Histogram: log(frequency) by time      7.64 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Original + inbounds:

julia> @benchmark f!(zeros(Float64, 1000), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.593 μs … 300.730 μs  ┊ GC (min … max):  0.00% … 97.46%
 Time  (median):     2.934 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.029 μs ±   8.744 μs  ┊ GC (mean ± σ):  11.03% ±  5.24%

  ▆▇▆██▆▂                                 ▁▂▂▃▃▃▂▁  ▁▂▂▃▃▃▂▂  ▂
  ███████▇▆▆▅▁▃▅▅▃▆▆▆▆▅▆▅▅▃▁▁▅▃▁▁▁▃▁▁▃▅▅▅▇███████████████████ █
  2.59 μs      Histogram: log(frequency) by time      7.73 μs <

 Memory estimate: 15.88 KiB, allocs estimate: 2.

Test Case 2:

function f2!(a, xs)
                  xs = Base.Experimental.Const(xs)
                  Base.Experimental.@aliasscope begin
                      for i in eachindex(xs)
                          a[1] += xs[i]
                      end
                  end
              end

Original:

julia> @benchmark f2!(zeros(Float64, 1), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.666 μs … 299.508 μs  ┊ GC (min … max): 0.00% … 97.83%
 Time  (median):     6.269 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.914 μs ±   7.616 μs  ┊ GC (mean ± σ):  3.72% ±  3.04%

   █▂
  ▇██▅▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▂▂▂▂▃▄▆▇▇▆▄▃▂▂▂▂▂▂▂▃▃▃▃▃▃▂▂ ▃
  4.67 μs         Histogram: frequency by time        7.23 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 2.

Original + IRCE:

julia> @benchmark f2!(zeros(Float64, 1), rand(Float64, 1000))
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.306 μs … 237.869 μs  ┊ GC (min … max): 0.00% … 98.01%
 Time  (median):     2.399 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.922 μs ±   4.680 μs  ┊ GC (mean ± σ):  4.97% ±  3.55%

  ▅█▇▄▁                              ▂▃▃▃▂▂▁           ▁▁▁    ▂
  ██████▇▆▄▅▃▁▁▅▅▅▄▄▃▁▃▃▃▁▃▁▁▁▁▁▁▁▃▆██████████▇▆▆▃▅▃▅▆██████▇ █
  2.31 μs      Histogram: log(frequency) by time      5.34 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 2.

Original + inbounds:

BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.330 μs … 253.731 μs  ┊ GC (min … max): 0.00% … 98.23%
 Time  (median):     2.448 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.927 μs ±   4.951 μs  ┊ GC (mean ± σ):  5.70% ±  3.60%

  ▅▇██▄                                  ▂▃▃▃▃▂▁▁▁  ▁▁▁▂▁ ▁   ▂
  ██████▇▇▆▆▆▅▅▃▃▁▁▄▁▁▄▄▅▅▄▁▄▃▄▁▁▄▁▁▃▃▁▆████████████████████▇ █
  2.33 μs      Histogram: log(frequency) by time      4.75 μs <

 Memory estimate: 8.00 KiB, allocs estimate: 2.

nanosoldier · 2021-10-10T10:47:48Z

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

vtjnash · 2021-10-12T17:57:37Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2021-10-13T00:22:37Z

Something went wrong when running your job:

NanosoldierError: error when preparing/pushing to report repo: failed process: Process(setenv(`git push`; dir="/run/media/system/data/nanosoldier/workdir/NanosoldierReports"), ProcessExited(1)) [1]

Unfortunately, the logs could not be uploaded.
cc @christopher-dG

vchuravy · 2021-10-13T18:51:49Z

@vtjnash can you upload the logs? and maybe we should remove @christopher-dG from the ping list.

vtjnash · 2021-10-13T20:36:55Z

https://github.com/JuliaCI/NanosoldierReports/tree/master/benchmark/by_hash/125170e_vs_1389c2f/report.md

vtjnash · 2021-10-13T20:57:51Z

Is this related: ["misc", "allocation elision view", "no conditional"] 2.14 (5%) ❌ 1.00 (1%)

Otherwise, all others seem to be noise

add an inductive range check elimination pass

ebef173

vchuravy reviewed Oct 10, 2021

View reviewed changes

src/debuginfo.cpp Outdated Show resolved Hide resolved

vchuravy added the compiler:codegen Generation of LLVM IR and native code label Oct 10, 2021

Remove debuginfo change

e4dc08b

vtjnash added the needs nanosoldier run This PR should have benchmarks run on it label Oct 12, 2021

vtjnash merged commit 8c47de5 into JuliaLang:master Oct 14, 2021

maleadt mentioned this pull request Oct 15, 2021

Inductive range check elimination pass JuliaGPU/GPUCompiler.jl#256

Closed

vchuravy mentioned this pull request Oct 24, 2021

make counting more robust to input datatype JuliaStats/StatsBase.jl#722

Merged

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022

add the LLVM inductive range check elimination pass (JuliaLang#42573)

9d894c9

pchintalapudi deleted the pc/boundscheck branch March 6, 2022 21:57

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022

add the LLVM inductive range check elimination pass (JuliaLang#42573)

3ea7970

vchuravy mentioned this pull request Jul 4, 2022

Incorrect @inbounds annotation in Base.last can result in out-of-bounds memory accesses #41267

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add an inductive range check elimination pass #42573

add an inductive range check elimination pass #42573

pchintalapudi commented Oct 10, 2021 •

edited by vchuravy

vchuravy commented Oct 10, 2021

vchuravy commented Oct 10, 2021

vchuravy commented Oct 10, 2021

pchintalapudi commented Oct 10, 2021

pchintalapudi commented Oct 10, 2021

nanosoldier commented Oct 10, 2021

vtjnash commented Oct 12, 2021

nanosoldier commented Oct 13, 2021

vchuravy commented Oct 13, 2021

vtjnash commented Oct 13, 2021

vtjnash commented Oct 13, 2021

add an inductive range check elimination pass #42573

add an inductive range check elimination pass #42573

Conversation

pchintalapudi commented Oct 10, 2021 • edited by vchuravy

vchuravy commented Oct 10, 2021

vchuravy commented Oct 10, 2021

vchuravy commented Oct 10, 2021

pchintalapudi commented Oct 10, 2021

pchintalapudi commented Oct 10, 2021

nanosoldier commented Oct 10, 2021

vtjnash commented Oct 12, 2021

nanosoldier commented Oct 13, 2021

vchuravy commented Oct 13, 2021

vtjnash commented Oct 13, 2021

vtjnash commented Oct 13, 2021

pchintalapudi commented Oct 10, 2021 •

edited by vchuravy