Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LoopVectorization.@turbo for evaluation loops #132

Closed
wants to merge 7 commits into from

Conversation

MilesCranmer
Copy link
Owner

This switches from @inbounds @simd to @turbo for most of the inner evaluation loops. There is a 30% speedup which is huge.

However, there's currently a StackOverflowError when running the Float16 tests. I'm also not sure if LoopVectorization will be safe to use, since ideally I want to allow for any user-defined operator - would it mean certain passed operators fail? Also need to test it on a distributed system.

@MilesCranmer
Copy link
Owner Author

Okay looks like LoopVectorization.jl hasn't even implemented SpecialFunctions, and will raise a StackOverflowError if any of them are used (as an operator): JuliaSIMD/LoopVectorization.jl#233. I'm only seeing it in Float16 since that's the first test (which tries a gamma function as an operator).

So this change seems not possible at the moment, unless @turbo is capable of falling back to non-SIMD operations to avoid StackOverflowErrors.

@MilesCranmer
Copy link
Owner Author

Raised issue on JuliaSIMD/LoopVectorization.jl#430

@MilesCranmer
Copy link
Owner Author

Probably also want to switch to using similar throughout, rather than Array{T,1}(undef, n). This is also needed independent of this change for generality to input array type.

@MilesCranmer MilesCranmer reopened this Sep 29, 2022
@MilesCranmer
Copy link
Owner Author

With JuliaSIMD/LoopVectorization.jl#431 being merged, this should now work. Let's see how stable things are.

@MilesCranmer MilesCranmer deleted the fast-loop-vectorization branch January 1, 2024 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant