Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decompressing concatenated gzip members (stream) #38271

Closed
amassalha opened this issue Oct 15, 2023 · 0 comments · Fixed by #38272
Closed

Support decompressing concatenated gzip members (stream) #38271

amassalha opened this issue Oct 15, 2023 · 0 comments · Fixed by #38272

Comments

@amassalha
Copy link
Contributor

Describe the enhancement requested

Curent gzip decompress is calling 'infalte' until getting 'Z_STREAM_END ' or error is returned, but zccording to gzip (zlib) documentation, this might be not enough:

" inflate() will not automatically decode concatenated gzip members. inflate() will return Z_STREAM_END at the end of the gzip member. The state would need to be reset to continue decoding a subsequent gzip member. This must be done if there is more data after a gzip member, in order for the decompression to be compliant with the gzip standard (RFC 1952)." (https://www.zlib.net/manual.html)

This PR is for supporting reading parquet files that contains more than 1 gzip member. (example file attahced)
concatenated_gzip_members.zip

Component(s)

C++, Parquet

mapleFU pushed a commit that referenced this issue Nov 29, 2023
…gzip members (#38272)

### What changes are included in this PR?
Adding support in GZipCodec to decompress concatenated gzip members

### Are these changes tested?
test is attached

### Are there any user-facing changes?
no

* Closes: #38271

Lead-authored-by: amassalha <amassalha@speedata.io>
Co-authored-by: Atheel Massalha <amassalha@speedata.io>
Signed-off-by: mwish <maplewish117@gmail.com>
@mapleFU mapleFU added this to the 15.0.0 milestone Nov 29, 2023
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…tiple gzip members (apache#38272)

### What changes are included in this PR?
Adding support in GZipCodec to decompress concatenated gzip members

### Are these changes tested?
test is attached

### Are there any user-facing changes?
no

* Closes: apache#38271

Lead-authored-by: amassalha <amassalha@speedata.io>
Co-authored-by: Atheel Massalha <amassalha@speedata.io>
Signed-off-by: mwish <maplewish117@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants