Naive dot product performance

The naive dot product implementation currently uses "explicit" SIMD instructions:

 https://github.com/JuliaMath/AccurateArithmetic.jl/blob/1fd8e58a273bd028590b6a02032c0d874cb724b1/src/accumulators/dot.jl#L7

This should prevent LLVM from transforming those into `vfmadd` instructions, which is a shame since those would probably be both more accurate and more efficient. We should
- either change those to non-explicit ("fuseable") SIMD instructions, relying on LLVM to turn those into `vfmadd`s, or
- explicitly use an fma here.

I wonder if this is what explains the differences between OpenBLAS.ddot and AccurateArithmetic.dot_naive observed in the paper (fig. 4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Naive dot product performance #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Naive dot product performance #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions