Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when reading multiple packed structures #13

Open
evetion opened this issue Dec 30, 2017 · 2 comments
Open

Performance issues when reading multiple packed structures #13

evetion opened this issue Dec 30, 2017 · 2 comments

Comments

@evetion
Copy link

evetion commented Dec 30, 2017

I've encountered this in replacing manually written out read and write methods with StructIO in visr/LasIO.jl#10. Although using StructIO is elegant, it's also slower.

I've put up a gist here: https://gist.github.com/evetion/2b57d6105cca39b2d3c6ef670a5cc393 with the following results for reading a thousand TwoUInt64s.

➜ julia performance.jl

Using StructIO:
  4.143 ms (2000 allocations: 62.50 KiB)

Using read_generic_array:
  6.138 ms (12000 allocations: 406.25 KiB)

Using read_generic_tuple:
  480.796 μs (8000 allocations: 265.63 KiB)

Using read_written_out:
  29.229 μs (2000 allocations: 31.25 KiB)

Using generated_read:
  29.704 μs (2000 allocations: 31.25 KiB)

The handwritten read version, which also can be generated, is ~200 times faster.

I know this is an unfair comparison, as StructIO has much more functionality than these simple read functions, but it seems it could be faster, especially if you look at the allocations, which are on par.

Let me know if you can't duplicate these results, or if I'm missing a StructIO method for reading multiple packed structures.

@visr
Copy link

visr commented Jan 16, 2019

I updated and reduced the gist above a little:

using BenchmarkTools
using StructIO

const io = IOBuffer(zeros(UInt8, 16*1000));
abstract type TwoUInt64s end

@io struct TwoUInt64sDefault <: TwoUInt64s
    x::UInt64
    y::UInt64
end align_default

@io struct TwoUInt64sPacked <: TwoUInt64s
    x::UInt64
    y::UInt64
end align_packed

function read_written_out(io::IOBuffer, t::Type{<:TwoUInt64s})
    x = read(io, UInt64)
    y = read(io, UInt64)
    t(x, y)
end

println("Using read_written_out:")
@btime read_written_out($io, TwoUInt64sPacked) setup=seekstart($io)

println("Using StructIO Default:")
@btime unpack($io, TwoUInt64sDefault) setup=seekstart($io)

println("Using StructIO Packed:")
@btime unpack($io, TwoUInt64sPacked) setup=seekstart($io)

Which gives:

Using read_written_out:
  6.158 ns (0 allocations: 0 bytes)
Using StructIO Default:
  24.222 ns (1 allocation: 32 bytes)
Using StructIO Packed:
  1.273 μs (5 allocations: 96 bytes)

StructIO allocates here on every unpack, causing it to be slower, especially if we mark it as packed. In this example struct there is no padding, so perhaps an idea is to fall back to the faster Default unpack if
StructIO.packed_sizeof(TwoUInt64s) === sizeof(TwoUInt64s).

I'm not sure if the current general unpack method can be sped up easily (no allocation). Another way to go would be to have @io struct T not only define a packing_strategy(::Type{T}) but a complete unpack(::Type{T}), similar to @evetion's generated_read above?

@stevengj
Copy link
Member

stevengj commented Jan 8, 2023

Couldn't unpack take a pre-allocated buffer parameter if needed, for use when it is called repeatedly?

And/or maybe you could have an unpack(io, T, n) method that reads up to n elements of type T into an array, so to that it can do the allocations only once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants