Mixed precision implementations for operations on single-precision inputs #20

ffevotte · 2020-07-10T14:06:21Z

Most of what I wanted to explore with Float32 is now done, which is why I'm opening this as a draft PR to let anyone interested have a look at it and comment if necessary. I'm planning to release this as v0.3.5 when it's done.

Here is a list of the changes (to be) included in this PR:

optimized, vectorized, mixed-precision implementation of sums and dot products relying on the use of a Float64 accumulator for Float32 inputs: sum_mixed and dot_mixed;
inclusion of Float32 inputs in all verification tests;
possibility to run performance tests on 32-bit inputs; a collateral benefit of this work is that the generation of input data for sums and dot products with a given condition number is now much more reliable;
updated README advertising these new features.

I'm still in the process of testing the performance of dot_mixed, but my first conclusions are:

mixed-precision implementations are always faster than compensated ones; they should be the preferred choice to get additional accuracy with Float32 inputs;
mixed-precision implementations are even almost as fast as naive implementations on AVX512 systems; we could probably even go as far as suggesting that mixed implementations be used by default (i.e. in Base.sum or LinearAlgebra.dot) when working with Float32 inputs on newer or high-end systems;
I'm thinking of perhaps introducing single, high-level entry points that would dispatch on the most efficient accurate implementation based on the input element types and the CPU being used. These new entry points could be unexported functions with common names (e.g. AccurateArithmetic.sum and AccurateArithmetic.dot) that users would need to use explicitly. Or they could be exported functions with specific names (like accurate_sum and accurate_dot).

using a Float64 accumulator for Float32 values

Not sure where the last allocation comes from...

Generated sums and dot products should now always be within a factor of 3 of the requested condition number (except for very well-conditioned problems, which are hard to generate)

ffevotte added 8 commits July 3, 2020 18:28

Allow running performance tests in single precision

c1b9396

Test performance of a mixed precision summation

0711720

using a Float64 accumulator for Float32 values

Mixed precision implementation, with unrolling and cache prefetching

82a8374

Not sure where the last allocation comes from...

Test the correctness of everything with 32-bit floats

aa7528b

Mixed precision implementation of the dot product of Float32 vectors

130a6d0

More reliable generation of problems with requested condition number

9c94a8e

Generated sums and dot products should now always be within a factor of 3 of the requested condition number (except for very well-conditioned problems, which are hard to generate)

Test the performance of mixed precision dot products

aa471b8

Updated README to mention mixed-precision implementations

86ab0d5

ffevotte marked this pull request as ready for review July 14, 2020 15:38

ffevotte mentioned this pull request Jul 14, 2020

correct four_sum #21

Merged

ffevotte merged commit 74a6445 into master Jul 16, 2020

ffevotte deleted the ff/float32 branch July 16, 2020 22:06

ffevotte referenced this pull request Jul 16, 2020

Bump version

84d7c3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mixed precision implementations for operations on single-precision inputs #20

Mixed precision implementations for operations on single-precision inputs #20

Uh oh!

ffevotte commented Jul 10, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mixed precision implementations for operations on single-precision inputs #20

Mixed precision implementations for operations on single-precision inputs #20

Uh oh!

Conversation

ffevotte commented Jul 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ffevotte commented Jul 10, 2020 •

edited

Loading