-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Relaxed" multiply and add operations. #214
Conversation
|
@swift-ci test |
1 similar comment
|
@swift-ci test |
|
Hrm, why are we using a Swift-5.3.3 Linux toolchain for testing instead of something more recent? Still, good to know--if unfortunate--that reassociate(on) is not supported there. I'll have to add a workaround and a note for that. |
|
@swift-ci test |
1 similar comment
|
@swift-ci test |
|
@swift-ci test |
|
@swift-ci test |
This commit adds the following to the RealFunctions protocol:
static func _relaxedAdd(_:Self, _:Self) -> Self
static func _relaxedMul(_:Self, _:Self) -> Self
These are equivalent to + and *, but have "relaxed semantics"; specifically, they license the compiler to reassociate them and to form FMA nodes, which are both significant optimizations that can easily make many common loops 8-10x faster. These transformation perturb results slightly, so they should not be enabled without care, but the results with the relaxed operations are--for most purposes--"just as good as" (and often better than) what strict operations produce. The main thing to beware of is that they are no longer portable; different compiler versions and different targets and optimization flags will result in different results.
They are underscored because they are not stable API. In particular:
- `RealFunctions` is not really the right protocol for these (and neither is `Real`). I need to do some thinking about where to attach them.
- Even if it were the right protocol, these are more like implementation hooks than the API I really want people to use (TBD).
- I like "relaxed" more than other commonly used idioms ("fast"), but I'm not sure it's the name I ultimately want.
The C type isn't available yet in clang on i386 or x86_64; once the calling conventions are stabilized in clang, we can reenable this.
a1045b0
to
740cc50
Compare
|
@swift-ci test |
|
Some quick perf numbers from my M1 laptop: repeatedly summing 1024 Floatstime using repeated dot-product of 1024 Floatstime using For "typical" reduction workloads as above, we see about a 10x speedup over the strict operators, and we're about 2x off of hand-written SIMD. |
This commit adds the following implementation hooks to the AlgebraicField protocol:
These are equivalent to + and *, but have "relaxed semantics"; specifically, they license the compiler to reassociate them and to form FMA nodes, which are both significant optimizations that can easily make many common loops 8-10x faster. These transformation perturb results slightly, so they should not be enabled without care, but the results with the relaxed operations are--for most purposes--"just as good as" (and often better than) what strict operations produce. The main thing to beware of is that they are no longer portable; different compiler versions and different targets and optimization flags will result in different results.
These are then exposed under the
Relaxednamespace as: