New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CachePadded: use 128-byte alignment on x86-64 #331
Conversation
/// The size of `CachePadded<T>` is the smallest multiple of 64 bytes large enough to accommodate | ||
/// Cache lines are assumed to be of N bytes, depending on the architecture: | ||
/// | ||
/// * On x86-64, N = 128. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it more the case that x64 CPUs fetch two cache lines at a time to align them to 128-bit chunks rather than that a single cache line is 128-bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe the wording is not great, but N
is just a pessimistic guess of how long a cache line is. So N
is at least the length of a cache line, but may be larger than that in order to accommodate architectures with longer cache lines or ones that have unusual prefetching strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, perhaps this could be reworded slightly to avoid confusion later? In any case LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - just added an additional comment for clarification.
@@ -49,7 +52,14 @@ use core::ops::{Deref, DerefMut}; | |||
/// } | |||
/// ``` | |||
#[derive(Clone, Copy, Default, Hash, PartialEq, Eq)] | |||
#[repr(align(64))] | |||
// Starting from Intel's Sandy Bridge, spatial prefetcher is now pulling pairs of 64-byte cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this also the case for AMD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging by a quick skim through AMD's optimization manual, I'd say no.
https://developer.amd.com/wordpress/media/2013/12/55723_SOG_Fam_17h_Processors_3.00.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sad, that there are only intel processors addressed here.
From what I've found Rust has flags to indicate CPU features
https://rust-lang.github.io/packed_simd/perf-guide/target-feature/rustflags.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mart-Bogdan I would accept a PR to adjust the alignment of CachePadded based on the CPU when the user specifies -C target-cpu in rustflag.
bors r+ |
331: CachePadded: use 128-byte alignment on x86-64 r=stjepang a=stjepang Motivation: https://www.reddit.com/r/rust/comments/akzas0/lockfree_rust_crossbeam_in_2019/efa26r1/ Co-authored-by: Stjepan Glavina <stjepang@gmail.com>
Build succeeded |
636: CachePadded: update alignment based on golang's CacheLinePadSize r=jeehoonkang a=taiki-e This updates alignment of `CachePadded` on the following architectures based on golang's [`CacheLinePadSize`](https://github.com/golang/go/blob/3510a1e32cbc86b73db143aefcc00aadc44c27bd/src/internal/cpu/cpu.go#L20). - On [powerpc64](https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_ppc64x.go#L9), use 128-byte alignment. - On [arm](https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_arm.go#L7), [mips](https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_mips.go#L7), [mips64](https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_mips64x.go#L9), and [riscv64](https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_riscv64.go#L7), use 32-byte alignment. - On [s390x](https://github.com/golang/go/blob/3dd58676054223962cd915bb0934d1f9f489d4d2/src/internal/cpu/cpu_s390x.go#L7), use 256-byte alignment. On x86_64 and aarch64, use 128-byte alignment as before. See #331 and #499 for reasons. On all others, use 64-byte alignment as before. Closes #427 cc @stjepang (when this merged, I'll send a same patch to smol-rs/cache-padded) FYI @yoshuawuyts @cuviper (because you all were [discussing about adding `CachePadded` to the standard library](https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/.60std.3A.3Amem.3A.3ACachePadded.60)) Co-authored-by: Taiki Endo <te316e89@gmail.com>
Motivation: https://www.reddit.com/r/rust/comments/akzas0/lockfree_rust_crossbeam_in_2019/efa26r1/