need efficient `reinterpret` for extracting array elements #31305

ExpandingMan · 2019-03-09T20:42:10Z

The performance of reinterpret is impressive, however there is still one bit of functionality (which is sometimes needed for efficient random-access IO) which is a little slow. Extracting individual elements from a buffer using reinterpret is only as fast as using pointers if the size of the eltype of the buffer matches the size of the type being converted to.

See this example:

using BenchmarkTools

v = zeros(UInt8, 128)

function unsafe(::Type{T}, v::Vector, i::Integer) where {T}
    p = convert(Ptr{T}, pointer(v))
    checkbounds(v, i)
    checkbounds(v, i+sizeof(T)-1)
    unsafe_load(p, i)
end

function safe(::Type{T}, v::Vector, i::Integer) where {T}
    first(reinterpret(T, @view v[i:(i+sizeof(T)-1)]))
end

julia> @btime unsafe(Int, v, 1);
  42.647 ns (0 allocations: 0 bytes)

julia> @btime safe(Int, v, 1);
  57.300 ns (2 allocations: 80 bytes)

There are a few things going on here. First, reinterpret can only be called on an entire array, not just a slice of it, so a view of the array must be allocated before reinterpret can even be called. Then, the return value is itself an array, so if you only want a single value, you need to index it (I think what's going on is that, since the returned array is itself a view which must be allocated, the allocation of both of these views are the two allocations seen in the benchmark).

It would be really nice if we can get a version of reinterpret which would allow one to extract individual elements without allocating. It would also be nice to have a reinterpret that takes views from the middle of an array rather than the start of it, however the performance of that is a little less crucial since you'd only be eliminating 1 allocation of a view for an entire array.

I'm not sure what that function would best be called or how to structure its arguments. Were I to make a PR for this, I'd probably put in a function a bit like my unsafe method here (which ought to be safe if properly bounds checked).

Thoughts?

The text was updated successfully, but these errors were encountered:

Keno · 2019-03-09T22:54:56Z

There's two unrelated problems here:

julia> function safe(::Type{T}, v::Vector, i::Integer) where {T}
           @inbounds reinterpret(T, v)[fld1(i, sizeof(T))]
       end

julia> @btime safe(Int, v, 1)
  50.717 ns (0 allocations: 0 bytes)
0

julia> @btime unsafe(Int, v, 1)
  49.655 ns (0 allocations: 0 bytes)
0

We allocate the ReinterpretArray despite it only being used in the error path causing an allocation.
LLVM doesn't combine the Int8 sized loads into an Int size load.

ExpandingMan · 2019-03-10T04:02:33Z

Are those problems realistically resolvable? If not, would it still make sense to define a new method in Base which is safe but non-allocating?

KristofferC · 2019-03-10T09:10:17Z

1 sounds similar to #29867 but it's not really scalable to add that "fix" to all array wrappers.

Keno · 2019-03-10T19:30:16Z

It is legal for the optimizer to move the allocation into the error path. We should figure out how to teach it to do that.

maleadt · 2019-03-10T21:35:23Z

LLVM doesn't combine the Int8 sized loads into an Int size load.

Would that be legal? It's an issue with PTX too, where these 1-byte loads are very costly (hence a custom unsafe_load with alignment at https://github.com/JuliaGPU/CUDAnative.jl/blob/e28c5f07cd2dae78a41d0d67ea2dfd4b374a92fe/src/device/pointer.jl#L157).

chethega · 2019-03-11T22:46:54Z

LLVM doesn't combine the Int8 sized loads into an Int size load.

Would that be legal?

Combining in IR should definitely be legal. And the backend knows whether unaligned access is supported by the hardware (yes on x86, no on ptx). Since pointer(v) is only byte-aligned (e.g. after popfirst!), llvm probably must emit the slow code on ptx, but could do better on x86.

JeffBezanson added performance Must go faster arrays [a, r, r, a, y, s] labels Mar 11, 2019

ChrisRackauckas mentioned this issue Dec 2, 2022

Add FixedSizeDiffCache SciML/PreallocationTools.jl#53

Merged

ndinsmore mentioned this issue Jan 5, 2023

Breaking up ReinterpretArray methods by typcasting type to enable more optimizations #48138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need efficient `reinterpret` for extracting array elements #31305

need efficient `reinterpret` for extracting array elements #31305

ExpandingMan commented Mar 9, 2019

Keno commented Mar 9, 2019

ExpandingMan commented Mar 10, 2019

KristofferC commented Mar 10, 2019

Keno commented Mar 10, 2019

maleadt commented Mar 10, 2019

chethega commented Mar 11, 2019

need efficient reinterpret for extracting array elements #31305

need efficient reinterpret for extracting array elements #31305

Comments

ExpandingMan commented Mar 9, 2019

Keno commented Mar 9, 2019

ExpandingMan commented Mar 10, 2019

KristofferC commented Mar 10, 2019

Keno commented Mar 10, 2019

maleadt commented Mar 10, 2019

chethega commented Mar 11, 2019

need efficient `reinterpret` for extracting array elements #31305

need efficient `reinterpret` for extracting array elements #31305