Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Relaxed" multiply and add operations. #214

Merged
merged 10 commits into from
Apr 26, 2023

Conversation

stephentyrone
Copy link
Member

@stephentyrone stephentyrone commented Nov 18, 2021

This commit adds the following implementation hooks to the AlgebraicField protocol:

static func _relaxedAdd(_:Self, _:Self) -> Self
static func _relaxedMul(_:Self, _:Self) -> Self

These are equivalent to + and *, but have "relaxed semantics"; specifically, they license the compiler to reassociate them and to form FMA nodes, which are both significant optimizations that can easily make many common loops 8-10x faster. These transformation perturb results slightly, so they should not be enabled without care, but the results with the relaxed operations are--for most purposes--"just as good as" (and often better than) what strict operations produce. The main thing to beware of is that they are no longer portable; different compiler versions and different targets and optimization flags will result in different results.

These are then exposed under the Relaxed namespace as:

Relaxed.sum(a, b)
Relaxed.product(a, b)

@stephentyrone
Copy link
Member Author

@swift-ci test

1 similar comment
@stephentyrone
Copy link
Member Author

@swift-ci test

@stephentyrone
Copy link
Member Author

Hrm, why are we using a Swift-5.3.3 Linux toolchain for testing instead of something more recent? Still, good to know--if unfortunate--that reassociate(on) is not supported there. I'll have to add a workaround and a note for that.

@stephentyrone
Copy link
Member Author

@swift-ci test

1 similar comment
@stephentyrone
Copy link
Member Author

@swift-ci test

@stephentyrone
Copy link
Member Author

@swift-ci test

@stephentyrone
Copy link
Member Author

@swift-ci test

This commit adds the following to the RealFunctions protocol:

    static func _relaxedAdd(_:Self, _:Self) -> Self
    static func _relaxedMul(_:Self, _:Self) -> Self

These are equivalent to + and *, but have "relaxed semantics"; specifically, they license the compiler to reassociate them and to form FMA nodes, which are both significant optimizations that can easily make many common loops 8-10x faster. These transformation perturb results slightly, so they should not be enabled without care, but the results with the relaxed operations are--for most purposes--"just as good as" (and often better than) what strict operations produce. The main thing to beware of is that they are no longer portable; different compiler versions and different targets and optimization flags will result in different results.

They are underscored because they are not stable API. In particular:
- `RealFunctions` is not really the right protocol for these (and neither is `Real`). I need to do some thinking about where to attach them.
- Even if it were the right protocol, these are more like implementation hooks than the API I really want people to use (TBD).
- I like "relaxed" more than other commonly used idioms ("fast"), but I'm not sure it's the name I ultimately want.
The C type isn't available yet in clang on i386 or x86_64; once the calling conventions are stabilized in clang, we can reenable this.
@stephentyrone
Copy link
Member Author

@swift-ci test

@stephentyrone
Copy link
Member Author

Some quick perf numbers from my M1 laptop:

repeatedly summing 1024 Floats

time using reduce(0, +): 0.091 sec
time using reduce(0, Relaxed.sum): 0.009 sec
time using vDSP.sum from Accelerate: 0.004 sec

repeated dot-product of 1024 Floats

time using reduce(0) { $0 + $1*$1 }: 0.085 sec
time using reduce(0) { Relaxed.multiplyAdd($1, $1, $0): 0.011 sec
time using vDSP.sumOfSquares from Accelerate: 0.005 sec

For "typical" reduction workloads as above, we see about a 10x speedup over the strict operators, and we're about 2x off of hand-written SIMD.

@stephentyrone stephentyrone merged commit 93e5499 into apple:main Apr 26, 2023
@stephentyrone stephentyrone deleted the rilakkuma branch April 26, 2023 15:57
@stephentyrone stephentyrone restored the rilakkuma branch April 26, 2023 15:57
@stephentyrone stephentyrone changed the title Initial pass at "relaxed" multiply and add operations. "Relaxed" multiply and add operations. Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant