Added explicit dependency on ArrayInterface #29
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BLASBenchmarks currently uses
reshape(view(x, prod(dims)), dims)
to create its arrays.This is a workaround for a BenchmarkTools issue, where interpolated variables don't get freed by the GC. Doing it that way makes it easier to run benchmarks on computers with limited RAM, because I can use the same
x
across all iterations and matrix sizes (rather than allocating new arrays every time that never get freed).However, I realized that Octavian was unusually slow at small sizes. This was because
Because two of Octavian's deps already make heavy use of
ArrayInterface.jl
, I figured the easiest way to handle this case would be to add an explicit dependency. This also opens the door to using it in more places.A couple asides:
We should really optimize the fast path.
That was important for getting PaddedMatrice's fully dynamic
jmul!(::Matrix, ::Matrix, ::Matrix)
to quickly overtakeStaticArrays
, e.g. here. To hit 20 GFLOPS for 6x6 and 35 for 8x8 requires around 22ns and 30ns respectively; every nanosecond matters there.Also, we should add threading for loop 5 at smallish sizes, particularly when we don't pack
B
. This is important at smallish sizes, especially on computers with a lot of threads looking for a chunk of matrix to multiply.