Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check operations in @turbo automatically with can_avx; if failure, switch to @inbounds @fastmath #431

Merged
merged 31 commits into from Sep 27, 2022

Conversation

MilesCranmer
Copy link
Contributor

@MilesCranmer MilesCranmer commented Sep 18, 2022

This solves #430 and prevents the StackOverflowError from appearing in #232. cc @timholy @chriselrod. Thanks to @chriselrod for walking me through how to implement this with detailed instructions.

Basically this will check all instructions in a @turbo loop (if the safe=true kwarg is set - truefalse by default) with ArrayInterface.can_avx. If any operation is false, then the check will fail and @inbounds @fastmath will be used instead, which will also print a warning message unless warn_check_arg=false.

using LoopVectorization
using SpecialFunctions

x = Float32.(1:0.1:10)
y = similar(x)

@turbo safe=true for i in indices(x)
    y[i] = gamma(x[i])
end

Note that safe=true safe=false by default so this is technically not needed.

Before this change, this code would result in a mysterious StackOverflowError, since gamma does not have an AVX version implemented.

As described in #430, this change is useful for cases where the user can pass an arbitrary operator - one still wants @turbo to work if that operator can be AVX'd, but default to @inbounds @fastmath otherwise.


Minor additional changes: I refactored can_avx.jl, and included SpecialFunctions.gamma as an operator which cannot currently be AVX'd.

@MilesCranmer
Copy link
Contributor Author

MilesCranmer commented Sep 18, 2022

I'm not sure if can_avx is as general as one would like? There are some operators which can @turbo fine, but can_avx fails. e.g., if I just write:

f(x) = exp(x)
can_avx(f)

this would return false. Is there a more general way to check this @chriselrod? I suppose I should turn safe=false by default anyways, just in case situations like this occur.

@chriselrod
Copy link
Member

chriselrod commented Sep 18, 2022

Is there a more general way to check this @chriselrod?

#430 (comment)
You could use promote_op as described there, checking the number of arguments used.
Getting the argument types would be harder, but you could assume Vec{2,Int} if you don't want to do anything fancy.

@codecov
Copy link

codecov bot commented Sep 18, 2022

Codecov Report

Base: 86.35% // Head: 83.70% // Decreases project coverage by -2.65% ⚠️

Coverage data is based on head (4efdb90) compared to base (1238fc8).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #431      +/-   ##
==========================================
- Coverage   86.35%   83.70%   -2.66%     
==========================================
  Files          37       37              
  Lines        9310     9336      +26     
==========================================
- Hits         8040     7815     -225     
- Misses       1270     1521     +251     
Impacted Files Coverage Δ
src/codegen/lower_threads.jl 0.63% <ø> (-52.34%) ⬇️
src/reconstruct_loopset.jl 92.01% <ø> (-0.40%) ⬇️
src/broadcast.jl 89.25% <100.00%> (ø)
src/condense_loopset.jl 96.00% <100.00%> (+0.15%) ⬆️
src/constructors.jl 98.74% <100.00%> (+0.01%) ⬆️
src/modeling/graphs.jl 89.71% <100.00%> (+0.29%) ⬆️
src/modeling/costs.jl 53.05% <0.00%> (-0.94%) ⬇️
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@chriselrod
Copy link
Member

For example:

julia> using SpecialFunctions, VectorizationBase, SLEEFPirates

julia> can_turbo(f::F, ::Val{NARGS}) where {F,NARGS} = Base.promote_op(f, ntuple(Returns(Vec{2,Int}), Val(NARGS))...) !== Union{}
can_turbo (generic function with 1 method)

julia> can_turbo(+, Val(1))
true

julia> can_turbo(exp, Val(1))
true

julia> f(x) = exp(x)
f (generic function with 1 method)

julia> can_turbo(f, Val(1))
true

julia> can_turbo(gamma, Val(1))
false

This isn't necessarilly precise, but should work well in practice.

@MilesCranmer
Copy link
Contributor Author

What is the best way to get NARGS? Maybe length(op.dependencies)?

@chriselrod
Copy link
Member

What is the best way to get NARGS? Maybe length(op.dependencies)?

length(parents(op))

@MilesCranmer
Copy link
Contributor Author

Awesome, thanks! Everything is implemented, and added a few tests too. Let me know what you think.

Also, let me know if you'd rather have safe=true by default. Right now it is false since I wasn't sure how general this technique is.

@MilesCranmer
Copy link
Contributor Author

I'm not sure why the tests are failing - did I miss a , safe snippet somewhere?

(probably would be good to refactor away the shotgun surgery in the future, if possible)

@chriselrod
Copy link
Member

Okay, avx_config_val is a bit of a mess.
It's defined in condense_loopset.jl, has a different length depending on broadcast vs not.

Currently, it's dropping safe, hence it's missing in the broadcast files, resulting in the indexing errors.

@MilesCranmer
Copy link
Contributor Author

I tried to add warncheckarg and safe everywhere the packing/unpacking occurs... still not sure where the other errors are coming from. It's really hard to debug this message:

    [1] indexed_iterate(t::Tuple{Bool, Int8, Int8, Int8, Bool, Int64, Int64, Int64, Int64, UInt64}, i::Int64, state::Int64)
      @ Base ./tuple.jl:88
    [2] #s191#153
      @ ~/Documents/LoopVectorization.jl/src/broadcast.jl:551 [inlined]
    [3] var"#s191#153"(T::Any, N::Any, BC::Any, Mod::Any, UNROLL::Any, dontbc::Any, ::Any, dest::Any, bc::Any, #unused#::Type, #unused#::Type, #unused#::Any)
      @ LoopVectorization ./none:0
    [4] (::Core.GeneratedFunctionStub)(::Any, ::Vararg{Any})
      @ Core ./boot.jl:582
    [5] vmaterialize!(dest::OffsetVector{Float64, Vector{Float64}}, bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(identity), Tuple{OffsetVector{Float64, Vector{Float64}}}}, #unused#::Val{:Main}, #unused#::Val{(true, 0, 0, 0, true, 0, 16, 31, 128, 0x0000000000000001)})
      @ LoopVectorization ~/Documents/LoopVectorization.jl/src/broadcast.jl:666
    [6] macro expansion
      @ ~/Documents/LoopVectorization.jl/test/iteration_bound_tests.jl:19 [inlined]
    [7] macro expansion
      @ /Applications/Julia-1.8.app/Contents/Resources/julia/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]

Traceback [6] is just a call with @turbo and then [5] is vmaterialize - I have no idea where the UNROLL actually gets packed...

@MilesCranmer
Copy link
Contributor Author

Okay, I think I found them all. (I would definitely recommend making a dedicated type for UNROLL instead of unpacking/packing (maybe a NamedTuple?) - I think the current pattern will inevitably cause bugs.)

Let me know how this PR looks.
Thanks,
Miles

test/safe_turbo.jl Outdated Show resolved Hide resolved
@chriselrod
Copy link
Member

 shuffles load/stores     | 4442    257   4699  1m15.4s

These nightly failures happen on the main branch, too.

@chriselrod
Copy link
Member

Add special functions to test/Project.toml

@MilesCranmer
Copy link
Contributor Author

Seeing this issue now:

  Test threw exception
  Expression: LoopVectorization.can_turbo(f2, Val(1))
  UndefVarError: Returns not defined
  Stacktrace:
   [1] can_turbo(f::var"#f2#2", #unused#::Val{1})
     @ LoopVectorization ~/work/LoopVectorization.jl/LoopVectorization.jl/src/condense_loopset.jl:913
   [2] macro expansion
     @ ~/work/LoopVectorization.jl/LoopVectorization.jl/test/safe_turbo.jl:21 [inlined]
   [3] macro expansion
     @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [4] top-level scope
     @ ~/work/LoopVectorization.jl/LoopVectorization.jl/test/safe_turbo.jl:4

I guess this is from the Returns(Vec{2,Int}) snippet? Not sure how to fix this

@MilesCranmer
Copy link
Contributor Author

I'm traveling for the rest of this week unfortunately, so I won't be online much. I can retry this weekend though. Let me know if you have any clues as to why this breaks. Also, feel free to push to the branch (and potentially merge) as you wish.

@chriselrod
Copy link
Member

I guess this is from the Returns(Vec{2,Int}) snippet? Not sure how to fix this

Returns was new in Julia 1.7; it didn't exist in Julia 1.6.

Copy link
Member

@chriselrod chriselrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example workaround for Returns not existing yet.

src/condense_loopset.jl Outdated Show resolved Hide resolved
src/condense_loopset.jl Outdated Show resolved Hide resolved
MilesCranmer and others added 3 commits September 19, 2022 15:23
Co-authored-by: Chris Elrod <elrodc@gmail.com>
Co-authored-by: Chris Elrod <elrodc@gmail.com>
@chriselrod
Copy link
Member

Have you looked into

Safe @turbo: Error During Test at /home/runner/work/LoopVectorization.jl/LoopVectorization.jl/test/safe_turbo.jl:8
  Got exception outside of a @test
  UndefVarError: #f###6### not defined

?

@MilesCranmer
Copy link
Contributor Author

I couldn't figure that one out, was hoping you would have some idea. I tried defining the function inside and outside of the test set, and ensuring that each function has a unique name, but that error persists. I assume it's some weird test syntax quirk.

@chriselrod chriselrod enabled auto-merge (squash) September 27, 2022 21:11
@chriselrod chriselrod merged commit e123cb2 into JuliaSIMD:main Sep 27, 2022
@MilesCranmer
Copy link
Contributor Author

Nice!! Thanks

@chriselrod
Copy link
Member

Thanks for all the work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants