Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slower than expected decompression #13

Closed
jrevels opened this issue Jul 24, 2019 · 7 comments
Closed

slower than expected decompression #13

jrevels opened this issue Jul 24, 2019 · 7 comments

Comments

@jrevels
Copy link

jrevels commented Jul 24, 2019

Hi, great packages, thanks for writing them! I'm trying to figure out the source of some overhead I'm seeing when decompressing via transcode vs. using the zstandard Python wrapper.

Using the zstandard Python wrapper:

In [89]: d = zstd.ZstdDecompressor()

In [90]: byts = open('recordings.ordered.json.zst', 'rb').read();

In [91]: %timeit -n 1 -r 1 d.decompress(byts)
481 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Using this package + TranscodingStreams:

julia> bytes = read("recordings.ordered.json.zst");

julia> @time transcode(ZstdDecompressor, bytes);
  2.748998 seconds (22 allocations: 256.937 MiB, 39.16% gc time)

If it helps, recordings.ordered.json.zst is ~16 MB and decompresses to ~620 MB.

@bicycle1885
Copy link
Member

Thank you. I think this is mainly due to cross-compiling of BinaryBuilder.jl. We can see a similar slowdown in CodecZlib.jl, which I've confirmed it is caused by the binaries compiled with BinaryBuilder.jl. I believe Elliot is working hard on this problem, but if you need a faster version now, I think you need to compile your own Zstandard on your machine and edit the library loading path.

@jrevels
Copy link
Author

jrevels commented Jul 30, 2019

Thanks for the response! I'm kind of dumb when it comes to binary dependencies in general; I built zstd v1.3.8 from source and replaced this line in the auto-generated deps/deps.jl:

const libzstd = joinpath(dirname(@__FILE__), "usr/lib/libzstd.1.3.5.dylib")

with

# built from source the same version used by the python package I have installed, 
# though the python package is actually loading a thingy called
# /usr/local/lib/python3.7/site-packages/zstd.cpython-37m-darwin.so
const libzstd = "path/to/my/lib/libzstd.1.3.8.dylib"

Now:

julia> @time transcode(ZstdDecompressor, bytes);
  1.924619 seconds (1.60 k allocations: 753.789 MiB, 2.10% gc time)

So performance has change a bit, but still not matching the python wrapper. Anything else I could try (or did I even follow your advice correctly)?

@jrevels
Copy link
Author

jrevels commented Jul 30, 2019

Tried calling into zstd directly:

julia> using CodecZstd, TranscodingStreams

julia> const libzstd = "/Users/jarrettrevels/data/repos/zstd/lib/libzstd.dylib"
"/Users/jarrettrevels/data/repos/zstd/lib/libzstd.dylib"

julia> using Libdl

julia> dlopen(libzstd)
Ptr{Nothing} @0x00007fcca2c06570

julia> bytes = read("recordings.ordered.json.zst");

julia> dest = Vector{UInt8}(undef, 649866551);

julia> ccall(:ZSTD_decompress, Cint, (Ptr{UInt8}, Cint, Ptr{UInt8}, Cint), pointer(dest), Cint(length(dest)), pointer(bytes), Cint(length(bytes)));

julia> dest = Vector{UInt8}(undef, 649866551);

julia> @time ccall(:ZSTD_decompress, Cint, (Ptr{UInt8}, Cint, Ptr{UInt8}, Cint), pointer(dest), Cint(length(dest)), pointer(bytes), Cint(length(bytes)));
  0.558639 seconds (14 allocations: 224 bytes)

julia> @time dest2 = transcode(ZstdDecompressor, bytes);
  1.793545 seconds (1.60 k allocations: 753.789 MiB, 2.33% gc time)

julia> dest2 == dest
true

So BinaryBuilder issues aside, it seems we could simply improve the transcode method maybe by getting the decompressed size from the src frame, preallocating the destination buffer, and then calling the above? That's essentially what the python zstandard is doing. Maybe transcode is already doing something similar though, haven't dug into it yet!

@bicycle1885
Copy link
Member

Thank you for investigation! You seem to be right; I'm going to investigate it further. Can you show me the package versions and the environment you are using?

@bicycle1885
Copy link
Member

I've found an inefficiency in data processing due to silly buffering. I guess this causes the problem for the most part. I'll fix that soon.

@jrevels
Copy link
Author

jrevels commented Jul 31, 2019

Glad to help :)

Can you show me the package versions and the environment you are using?

  • CodecZstd: v0.5.0
  • TranscodingStreams: v0.9.4
  • zstd (python): 1.3.8
  • zstd (julia): tests above were on 1.3.8 to match python, but now that I'm building my own, I'm using 1.4.2. Doesn't seem to cause much of a difference performance wise.

If it's helpful, the small functions I cooked up yesterday that I'm now using are something like:

function transcode(::Type{ZstdCompressor}, compressed_bytes::Vector{UInt8}, level)
    compressed_bound = ccall(:ZSTD_compressBound, Cint, (Cint,),
                             length(decompressed_bytes))
    compressed_bytes = Vector{UInt8}(undef, compressed_bound)
    acutal_length = ccall(:ZSTD_compress, Cint, (Ptr{UInt8}, Cint, Ptr{UInt8}, Cint, Cint),
                          compressed_bytes, length(compressed_bytes),
                          decompressed_bytes, length(decompressed_bytes),
                          level)
    resize!(compressed_bytes, actual_length)
    return compressed_bytes
end

function transcode(::Type{ZstdDecompressor}, compressed_bytes::Vector{UInt8})
    decompressed_length = ccall(:ZSTD_getFrameContentSize, Clong,
                                (Ptr{UInt8}, Clong), compressed_bytes,
                                length(compressed_bytes))
    decompressed_bytes = Vector{UInt8}(undef, decompressed_length)
    ccall(:ZSTD_decompress, Cint, (Ptr{UInt8}, Cint, Ptr{UInt8}, Cint),
          decompressed_bytes, length(decompressed_bytes),
          compressed_bytes, length(compressed_bytes))
    return decompressed_bytes
end

@bicycle1885
Copy link
Member

Thank you! I'm pretty sure this is fixed by the latest master branches of TranscodingStreams.jl and CodecZstd.jl. If you still see the performance degradation, please feel free to reopen this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants