You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug, including details regarding any error messages, version, and platform.
The docstring for decompressed_size in pyarrow.decompress and the underlying Codec.decompress states:
decompressed_size : int, default None
If not specified, will be computed if the codec is able to determine the uncompressed buffer size.
However, you can see from the implementation that the behavior isn't possibly and that the argument is effectively required because we always raise when it's None:
ValueError: Must pass decompressed_size for <pyarrow.Codec name=lz4 compression_level=1> codec
I think the original intent was to support detecting or estimating the size of the output buffer for codecs that support it (like Snappy) but, the last time this was brought up, it was marked as low priority and remains unimplemented.
Should the argument be made required and the bit about "If not specified" be removed?
Component(s)
Python
The text was updated successfully, but these errors were encountered:
amoeba
changed the title
Change docstring for decompressed_size arg in pyarrow.decompress to reflect implementation
[Python] Change docstring for decompressed_size arg in pyarrow.decompress to reflect implementation
Dec 20, 2022
Yes, I think we should certainly update the docs to match the current behaviour (which is indeed to always require the decompressed_size)
amoeba
changed the title
[Python] Change docstring for decompressed_size arg in pyarrow.decompress to reflect implementation
[Python] [Docs] Change docstring for decompressed_size arg in pyarrow.decompress to reflect implementation
Dec 21, 2022
…5061)
I opted to keep `decompressed_size` optional to reduce churn (optional->required->optional) if we ever add in detection/estimation for certain codecs in the future. Let me know what you think.
* Closes: #15043
Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
…ss (apache#15061)
I opted to keep `decompressed_size` optional to reduce churn (optional->required->optional) if we ever add in detection/estimation for certain codecs in the future. Let me know what you think.
* Closes: apache#15043
Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Describe the bug, including details regarding any error messages, version, and platform.
The docstring for
decompressed_size
inpyarrow.decompress
and the underlyingCodec.decompress
states:However, you can see from the implementation that the behavior isn't possibly and that the argument is effectively required because we always raise when it's
None
:arrow/python/pyarrow/io.pxi
Lines 2091 to 2124 in 4e9158d
For example,
Raises:
I think the original intent was to support detecting or estimating the size of the output buffer for codecs that support it (like Snappy) but, the last time this was brought up, it was marked as low priority and remains unimplemented.
Should the argument be made required and the bit about "If not specified" be removed?
Component(s)
Python
The text was updated successfully, but these errors were encountered: