Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to hardware floating-point atomics. #604

Merged
merged 2 commits into from
Feb 27, 2024
Merged

Conversation

pxl-th
Copy link
Collaborator

@pxl-th pxl-th commented Feb 27, 2024

Default to 'unsafe' hardware floating-point atomics.
TL;DR instead of emulating them via CAS loop use hardware RWM instruction that is significantly faster.
More details: link.

E.g. assembly atomic instruction before & after this PR for the following kernel:

function ker!(x)
    @inline @atomic x[1] += 1f0
    return
end
  • Before: global_atomic_cmpswap_b32 v0, v2, v[0:1], s[0:1] glc
  • After: global_atomic_add_f32 v0, v1, s[0:1]

I'm inclined to make this a default because of huge performance increase.
On Nerf.jl benchmark this gives ~2x performance improvement and on yet-unreleased GaussianSplatting.jl 17x boost in performance matching CUDA.

However, on a per-kernel basis this can be disabled with:

@roc unsafe_fp_atomics=false f(...)

CC @luraess @OsKnoth

@luraess
Copy link
Collaborator

luraess commented Feb 27, 2024

cc: @albert-de-montserrat

@pxl-th pxl-th merged commit dbad788 into master Feb 27, 2024
1 check was pending
@pxl-th pxl-th deleted the pxl-th/unsafe-atomics branch February 27, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants