Skip to content

Performance of using CartesianIndices with StaticArray types #1010

@nchisholm

Description

@nchisholm

Multidimensional algorithms built on CartesianIndices seem to have significant performance overhead when applied to StaticArrays.MArray​s, but not regular Array​s. One can recover performance by writing specific functions with nested loops for each dimension, or by leveraging the macros in Base.Cartesian to automate that process.

To demonstrate, consider an N-dimensional algorithm that fills the elements of a generic input array arr using a function f(i_1, ⋯, i_N) of the indices.

function fillfn_cartesianindices!(arr::AbstractArray, f::F) where F  # force specialization
    @inbounds for I  CartesianIndices(arr)
        arr[I] = f(I)
    end
    return arr
end

Here is a similar @generated function using the Base.Cartesian macros.

using Base.Cartesian

@generated function fillfn_basecartesian!(arr::AbstractArray{<:Any,N}, f::F) where {N,F}
    # Generates a function body like
    # for i_N ∈ axes(arr, N), ..., i_1 ∈ axes(arr, 1)
    #     arr[i_1, ..., i_N] = f(i_1, ..., i_N)
    # end
    quote
        @inbounds @nloops $N i arr begin
            (@nref $N arr i) = (@ncall $N f i)
        end
        return arr
    end
end

Finally, here is function of indices to feed to either of the above functions (for the four dimensional case).

indexfunc4d(i, j, k, l) = (i==k) & (j==l) - (i==l) & (j==k)
@inline indexfunc4d(I::CartesianIndex{4}) = indexfunc4d(Tuple(I)...)

For a regular Array, using CartesianIndices in the 3×3×3×3 case has no performance overhead. (In fact, it is consistently 1% to 3% faster than using explicit loops on my machine.)

using BenchmarkTools
using StaticArrays

let dims = (3,3,3,3)
    T = eltype(indexfunc4d(map(_ -> 1, dims)...))

    println("Using `CartesianIndices`")
    out1 = @btime(
        fillfn_cartesianindices!(A, $indexfunc4d),
        setup=(A = Array{$T}(undef, $dims))
    )
    println("Using `Base.Cartesian`")
    out2 = @btime(
        fillfn_basecartesian!(A, $indexfunc4d),
        setup=(A = Array{$T}(undef, $dims))
    )
    @assert out1 == out2
end;
Using `CartesianIndices`
  101.466 ns (0 allocations: 0 bytes)
Using `Base.Cartesian`
  104.874 ns (0 allocations: 0 bytes)

A similar benchmark on an MArray shows that fillfn_basecartesian is a good deal quicker than fillfn_cartesianindices.

let SDims = NTuple{4,3}
    dims = fieldtypes(SDims)
    T = eltype(indexfunc4d(map(_ -> 1, dims)...))  # Element type

    println("Using `CartesianIndices`")
    out1 = @btime(
        fillfn_cartesianindices!(A, $indexfunc4d),
        setup=(A = MArray{$SDims,$T}(undef))
    )
    println("Using `Base.Cartesian`")
    out2 = @btime(
        fillfn_basecartesian!(A, $indexfunc4d),
        setup=(A = MArray{$SDims,$T}(undef))
    )
end;
Using `CartesianIndices`
  88.504 ns (0 allocations: 0 bytes)
Using `Base.Cartesian`
  14.276 ns (0 allocations: 0 bytes)

The result is the same for a SizedArray.

let SDims = NTuple{4,3}
    dims = fieldtypes(SDims)
    T = eltype(indexfunc4d(map(_ -> 1, dims)...))  # Element type

    println("Using `CartesianIndices`")
    out1 = @btime(
        fillfn_cartesianindices!(A, $indexfunc4d),
        setup=(A = SizedArray{$SDims,$T}(undef))
    )
    println("Using `Base.Cartesian`")
    out2 = @btime(
        fillfn_basecartesian!(A, $indexfunc4d),
        setup=(A = SizedArray{$SDims,$T}(undef))
    )
end;
Using `CartesianIndices`
  87.301 ns (0 allocations: 0 bytes)
Using `Base.Cartesian`
  13.225 ns (0 allocations: 0 bytes)

Note that this problem does not apply to SArray (since it is not mutable); using StaticArrays.sacollect with CartesianIndices is efficient.

@btime StaticArrays.sacollect(SArray{NTuple{4,3}},
                              $indexfunc4d(I) for I  CartesianIndices(ntuple(_ -> SOneTo(3), Val(4))));
@btime StaticArrays.sacollect(MArray{NTuple{4,3}},
                              $indexfunc4d(I) for I  CartesianIndices(ntuple(_ -> SOneTo(3), Val(4))));
10.907 ns (0 allocations: 0 bytes)
51.259 ns (1 allocation: 672 bytes)

However, this way is different than trying to mutate the elements of a preallocated MArray or SizedArray.

Is there some necessary special support for CartesianIndices that is not implemented? Is it possible to implement? I need to look into exactly how CartesianIndices works in each case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions