Check operations in `@turbo` automatically with `can_avx`; if failure, switch to `@inbounds @fastmath` #431

MilesCranmer · 2022-09-18T05:21:50Z

This solves #430 and prevents the StackOverflowError from appearing in #232. cc @timholy @chriselrod. Thanks to @chriselrod for walking me through how to implement this with detailed instructions.

Basically this will check all instructions in a @turbo loop (if the safe=true kwarg is set - ~~true~~false by default) with ArrayInterface.can_avx. If any operation is false, then the check will fail and @inbounds @fastmath will be used instead, which will also print a warning message unless warn_check_arg=false.

using LoopVectorization
using SpecialFunctions

x = Float32.(1:0.1:10)
y = similar(x)

@turbo safe=true for i in indices(x)
    y[i] = gamma(x[i])
end

Note that ~~safe=true~~ safe=false by default ~~so this is technically not needed~~.

Before this change, this code would result in a mysterious StackOverflowError, since gamma does not have an AVX version implemented.

As described in #430, this change is useful for cases where the user can pass an arbitrary operator - one still wants @turbo to work if that operator can be AVX'd, but default to @inbounds @fastmath otherwise.

Minor additional changes: I refactored can_avx.jl, and included SpecialFunctions.gamma as an operator which cannot currently be AVX'd.

Currently, this macro does nothing.

MilesCranmer · 2022-09-18T05:47:11Z

I'm not sure if can_avx is as general as one would like? There are some operators which can @turbo fine, but can_avx fails. e.g., if I just write:

f(x) = exp(x)
can_avx(f)

this would return false. Is there a more general way to check this @chriselrod? I suppose I should turn safe=false by default anyways, just in case situations like this occur.

chriselrod · 2022-09-18T12:12:48Z

Is there a more general way to check this @chriselrod?

#430 (comment)
You could use promote_op as described there, checking the number of arguments used.
Getting the argument types would be harder, but you could assume Vec{2,Int} if you don't want to do anything fancy.

codecov · 2022-09-18T12:28:11Z

Codecov Report

Base: 86.35% // Head: 83.70% // Decreases project coverage by -2.65% ⚠️

Coverage data is based on head (4efdb90) compared to base (1238fc8).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
- Coverage   86.35%   83.70%   -2.66%     
==========================================
  Files          37       37              
  Lines        9310     9336      +26     
==========================================
- Hits         8040     7815     -225     
- Misses       1270     1521     +251

Impacted Files	Coverage Δ
src/codegen/lower_threads.jl	`0.63% <ø> (-52.34%)`	⬇️
src/reconstruct_loopset.jl	`92.01% <ø> (-0.40%)`	⬇️
src/broadcast.jl	`89.25% <100.00%> (ø)`
src/condense_loopset.jl	`96.00% <100.00%> (+0.15%)`	⬆️
src/constructors.jl	`98.74% <100.00%> (+0.01%)`	⬆️
src/modeling/graphs.jl	`89.71% <100.00%> (+0.29%)`	⬆️
src/modeling/costs.jl	`53.05% <0.00%> (-0.94%)`	⬇️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

chriselrod · 2022-09-18T12:41:04Z

For example:

julia> using SpecialFunctions, VectorizationBase, SLEEFPirates

julia> can_turbo(f::F, ::Val{NARGS}) where {F,NARGS} = Base.promote_op(f, ntuple(Returns(Vec{2,Int}), Val(NARGS))...) !== Union{}
can_turbo (generic function with 1 method)

julia> can_turbo(+, Val(1))
true

julia> can_turbo(exp, Val(1))
true

julia> f(x) = exp(x)
f (generic function with 1 method)

julia> can_turbo(f, Val(1))
true

julia> can_turbo(gamma, Val(1))
false

This isn't necessarilly precise, but should work well in practice.

MilesCranmer · 2022-09-18T18:28:24Z

What is the best way to get NARGS? Maybe length(op.dependencies)?

chriselrod · 2022-09-18T18:40:41Z

What is the best way to get NARGS? Maybe length(op.dependencies)?

length(parents(op))

MilesCranmer · 2022-09-18T18:42:56Z

Awesome, thanks! Everything is implemented, and added a few tests too. Let me know what you think.

Also, let me know if you'd rather have safe=true by default. Right now it is false since I wasn't sure how general this technique is.

MilesCranmer · 2022-09-18T21:18:22Z

I'm not sure why the tests are failing - did I miss a , safe snippet somewhere?

(probably would be good to refactor away the shotgun surgery in the future, if possible)

chriselrod · 2022-09-18T22:19:52Z

Okay, avx_config_val is a bit of a mess.
It's defined in condense_loopset.jl, has a different length depending on broadcast vs not.

Currently, it's dropping safe, hence it's missing in the broadcast files, resulting in the indexing errors.

MilesCranmer · 2022-09-18T22:43:45Z

I tried to add warncheckarg and safe everywhere the packing/unpacking occurs... still not sure where the other errors are coming from. It's really hard to debug this message:

    [1] indexed_iterate(t::Tuple{Bool, Int8, Int8, Int8, Bool, Int64, Int64, Int64, Int64, UInt64}, i::Int64, state::Int64)
      @ Base ./tuple.jl:88
    [2] #s191#153
      @ ~/Documents/LoopVectorization.jl/src/broadcast.jl:551 [inlined]
    [3] var"#s191#153"(T::Any, N::Any, BC::Any, Mod::Any, UNROLL::Any, dontbc::Any, ::Any, dest::Any, bc::Any, #unused#::Type, #unused#::Type, #unused#::Any)
      @ LoopVectorization ./none:0
    [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
      @ Core ./boot.jl:582
    [5] vmaterialize!(dest::OffsetVector{Float64, Vector{Float64}}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(identity), Tuple{OffsetVector{Float64, Vector{Float64}}}}, #unused#::Val{:Main}, #unused#::Val{(true, 0, 0, 0, true, 0, 16, 31, 128, 0x0000000000000001)})
      @ LoopVectorization ~/Documents/LoopVectorization.jl/src/broadcast.jl:666
    [6] macro expansion
      @ ~/Documents/LoopVectorization.jl/test/iteration_bound_tests.jl:19 [inlined]
    [7] macro expansion
      @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]

Traceback [6] is just a call with @turbo and then [5] is vmaterialize - I have no idea where the UNROLL actually gets packed...

MilesCranmer · 2022-09-18T23:07:25Z

Okay, I think I found them all. (I would definitely recommend making a dedicated type for UNROLL instead of unpacking/packing (maybe a NamedTuple?) - I think the current pattern will inevitably cause bugs.)

Let me know how this PR looks.
Thanks,
Miles

Seems to be breaking imports.

test/safe_turbo.jl

chriselrod · 2022-09-19T04:52:18Z

 shuffles load/stores     | 4442    257   4699  1m15.4s

These nightly failures happen on the main branch, too.

chriselrod · 2022-09-19T05:32:18Z

Add special functions to test/Project.toml

MilesCranmer · 2022-09-19T17:26:46Z

Seeing this issue now:

  Test threw exception
  Expression: LoopVectorization.can_turbo(f2, Val(1))
  UndefVarError: Returns not defined
  Stacktrace:
   [1] can_turbo(f::var"#f2#2", #unused#::Val{1})
     @ LoopVectorization ~/work/LoopVectorization.jl/LoopVectorization.jl/src/condense_loopset.jl:913
   [2] macro expansion
     @ ~/work/LoopVectorization.jl/LoopVectorization.jl/test/safe_turbo.jl:21 [inlined]
   [3] macro expansion
     @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [4] top-level scope
     @ ~/work/LoopVectorization.jl/LoopVectorization.jl/test/safe_turbo.jl:4

I guess this is from the Returns(Vec{2,Int}) snippet? Not sure how to fix this

MilesCranmer · 2022-09-19T17:37:12Z

I'm traveling for the rest of this week unfortunately, so I won't be online much. I can retry this weekend though. Let me know if you have any clues as to why this breaks. Also, feel free to push to the branch (and potentially merge) as you wish.

chriselrod · 2022-09-19T17:38:22Z

I guess this is from the Returns(Vec{2,Int}) snippet? Not sure how to fix this

Returns was new in Julia 1.7; it didn't exist in Julia 1.6.

chriselrod

This is an example workaround for Returns not existing yet.

src/condense_loopset.jl

Co-authored-by: Chris Elrod <elrodc@gmail.com>

chriselrod · 2022-09-27T16:07:14Z

Have you looked into

Safe @turbo: Error During Test at /home/runner/work/LoopVectorization.jl/LoopVectorization.jl/test/safe_turbo.jl:8
  Got exception outside of a @test
  UndefVarError: #f###6### not defined

?

MilesCranmer · 2022-09-27T16:18:39Z

I couldn't figure that one out, was hoping you would have some idea. I tried defining the function inside and outside of the test set, and ensuring that each function has a unique name, but that error persists. I assume it's some weird test syntax quirk.

MilesCranmer · 2022-09-28T02:06:33Z

Nice!! Thanks

chriselrod · 2022-09-28T02:37:23Z

Thanks for all the work on this!

MilesCranmer added 6 commits September 18, 2022 00:25

Create safe kwarg for @turbo macro

1f8cb06

Currently, this macro does nothing.

Run can_avx on each operator when checking loopset

3e148f5

Refactor can_avx test

7a89027

Add test for safe=true option in @turbo

3585ec9

Remove debugging statement

ec3f6a0

Clean up preamble generation

02919d8

MilesCranmer mentioned this pull request Sep 18, 2022

Preventing StackOverflowError automatically with a @safe_turbo? #430

Closed

Set safe=false for @turbo by default

f60c1f5

MilesCranmer added 3 commits September 18, 2022 14:31

Switch to more generic can_turbo function for safe @turbo

5115351

Remove @turbo safe=true tests from can_avx.jl

40c425a

Create file to test @turbo safe=true and can_turbo

2dff297

Compute nargs of instruction properly

7136114

Add missing safe kwarg in vmaterialize!

0df1606

MilesCranmer added 2 commits September 18, 2022 18:21

Also unpack warncheckarg and safe from UNROLL

066e349

Ensure warncheckarg and safe passed everywhere for consistency

b7b9470

MilesCranmer added 3 commits September 18, 2022 18:45

Consistency in UNROLL name

4c57fde

Add packages required for testing to [extras] and [targets]

bd2fc43

Add safe and warncheckarg throughout library

73f60ab

Merge branch 'main' into main

e92949e

MilesCranmer added 4 commits September 19, 2022 00:03

Remove edits to Project

3d399d0

Add missing imports in save @turbo tests

181e10a

Fix call to can_avx

809dbf2

Remove nested testset

da44c74

Seems to be breaking imports.

chriselrod reviewed Sep 19, 2022

View reviewed changes

test/safe_turbo.jl Outdated Show resolved Hide resolved

Test that can_avx validates exp by itself

5ef2edc

MilesCranmer force-pushed the main branch from 095a75b to 5ef2edc Compare September 19, 2022 04:26

MilesCranmer added 3 commits September 19, 2022 02:26

Add SpecialFunctions.jl to test

2fddc43

Clean up test set

02a29be

Ping test

cbed1d3

MilesCranmer force-pushed the main branch from 12b3c03 to cbed1d3 Compare September 19, 2022 16:49

Ensure that function names in safe test are unique

9568ba9

chriselrod reviewed Sep 19, 2022

View reviewed changes

src/condense_loopset.jl Outdated Show resolved Hide resolved

src/condense_loopset.jl Outdated Show resolved Hide resolved

MilesCranmer and others added 3 commits September 19, 2022 15:23

Add RetVec2Int for julia <1.6 as Returns()

a93f1ad

Co-authored-by: Chris Elrod <elrodc@gmail.com>

Use RetVec2Int() instead of Returns(Vec{2,Int})

e126032

Co-authored-by: Chris Elrod <elrodc@gmail.com>

Merge branch 'main' into main

ec4f41c

push functions into prepre

4efdb90

chriselrod enabled auto-merge (squash) September 27, 2022 21:11

chriselrod merged commit e123cb2 into JuliaSIMD:main Sep 27, 2022

MilesCranmer mentioned this pull request Sep 29, 2022

Use LoopVectorization.@turbo for evaluation loops MilesCranmer/SymbolicRegression.jl#132

Closed

MilesCranmer mentioned this pull request Oct 25, 2022

Use LoopVectorization.@turbo in dynamic expression evaluation scheme SymbolicML/DynamicExpressions.jl#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check operations in `@turbo` automatically with `can_avx`; if failure, switch to `@inbounds @fastmath` #431

Check operations in `@turbo` automatically with `can_avx`; if failure, switch to `@inbounds @fastmath` #431

MilesCranmer commented Sep 18, 2022 •

edited

Loading

MilesCranmer commented Sep 18, 2022 •

edited

Loading

chriselrod commented Sep 18, 2022 •

edited

Loading

codecov bot commented Sep 18, 2022 •

edited

Loading

chriselrod commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

chriselrod commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

chriselrod commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

chriselrod commented Sep 19, 2022

chriselrod commented Sep 19, 2022

MilesCranmer commented Sep 19, 2022

MilesCranmer commented Sep 19, 2022

chriselrod commented Sep 19, 2022

chriselrod left a comment

chriselrod commented Sep 27, 2022

MilesCranmer commented Sep 27, 2022

MilesCranmer commented Sep 28, 2022

chriselrod commented Sep 28, 2022

Check operations in @turbo automatically with can_avx; if failure, switch to @inbounds @fastmath #431

Check operations in @turbo automatically with can_avx; if failure, switch to @inbounds @fastmath #431

Conversation

MilesCranmer commented Sep 18, 2022 • edited Loading

MilesCranmer commented Sep 18, 2022 • edited Loading

chriselrod commented Sep 18, 2022 • edited Loading

codecov bot commented Sep 18, 2022 • edited Loading

Codecov Report

chriselrod commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

chriselrod commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

chriselrod commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

MilesCranmer commented Sep 18, 2022

chriselrod commented Sep 19, 2022

chriselrod commented Sep 19, 2022

MilesCranmer commented Sep 19, 2022

MilesCranmer commented Sep 19, 2022

chriselrod commented Sep 19, 2022

chriselrod left a comment

Choose a reason for hiding this comment

chriselrod commented Sep 27, 2022

MilesCranmer commented Sep 27, 2022

MilesCranmer commented Sep 28, 2022

chriselrod commented Sep 28, 2022

Check operations in `@turbo` automatically with `can_avx`; if failure, switch to `@inbounds @fastmath` #431

Check operations in `@turbo` automatically with `can_avx`; if failure, switch to `@inbounds @fastmath` #431

MilesCranmer commented Sep 18, 2022 •

edited

Loading

MilesCranmer commented Sep 18, 2022 •

edited

Loading

chriselrod commented Sep 18, 2022 •

edited

Loading

codecov bot commented Sep 18, 2022 •

edited

Loading