You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Curent gzip decompress is calling 'infalte' until getting 'Z_STREAM_END ' or error is returned, but zccording to gzip (zlib) documentation, this might be not enough:
" inflate() will not automatically decode concatenated gzip members. inflate() will return Z_STREAM_END at the end of the gzip member. The state would need to be reset to continue decoding a subsequent gzip member. This must be done if there is more data after a gzip member, in order for the decompression to be compliant with the gzip standard (RFC 1952)." (https://www.zlib.net/manual.html)
This PR is for supporting reading parquet files that contains more than 1 gzip member. (example file attahced) concatenated_gzip_members.zip
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered:
…gzip members (#38272)
### What changes are included in this PR?
Adding support in GZipCodec to decompress concatenated gzip members
### Are these changes tested?
test is attached
### Are there any user-facing changes?
no
* Closes: #38271
Lead-authored-by: amassalha <amassalha@speedata.io>
Co-authored-by: Atheel Massalha <amassalha@speedata.io>
Signed-off-by: mwish <maplewish117@gmail.com>
…tiple gzip members (apache#38272)
### What changes are included in this PR?
Adding support in GZipCodec to decompress concatenated gzip members
### Are these changes tested?
test is attached
### Are there any user-facing changes?
no
* Closes: apache#38271
Lead-authored-by: amassalha <amassalha@speedata.io>
Co-authored-by: Atheel Massalha <amassalha@speedata.io>
Signed-off-by: mwish <maplewish117@gmail.com>
Describe the enhancement requested
Curent gzip decompress is calling 'infalte' until getting 'Z_STREAM_END ' or error is returned, but zccording to gzip (zlib) documentation, this might be not enough:
" inflate() will not automatically decode concatenated gzip members. inflate() will return Z_STREAM_END at the end of the gzip member. The state would need to be reset to continue decoding a subsequent gzip member. This must be done if there is more data after a gzip member, in order for the decompression to be compliant with the gzip standard (RFC 1952)." (https://www.zlib.net/manual.html)
This PR is for supporting reading parquet files that contains more than 1 gzip member. (example file attahced)
concatenated_gzip_members.zip
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: