Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] ArrowFileWriter/ArrowStreamWriter lack compression support #15203

Closed
lidavidm opened this issue Jan 5, 2023 · 2 comments · Fixed by #15223
Closed

[Java] ArrowFileWriter/ArrowStreamWriter lack compression support #15203

lidavidm opened this issue Jan 5, 2023 · 2 comments · Fixed by #15223

Comments

@lidavidm
Copy link
Member

lidavidm commented Jan 5, 2023

Describe the enhancement requested

Their reader counterparts support compression, but the writers don't give you any way to enable it. Furthermore, the design of the compression API means that enabling this will require modifying several classes: the compression API closes the input buffer after compression, but this runs contrary to the expectation of VectorUnloader, which isn't supposed to mutate the source VectorSchemaRoot. (Arguably, this is a flaw in the VectorUnloader API that prevents efficient memory usage, but fixing that would mean invalidating essentially all current Arrow Java code.)

Also there are various other flaws (e.g. VectorUnloader says it is OK to pass a null compression codec but will crash if you do so because it assumes the codec is never null.)

See also: #15102

It is possible to manually generate compressed data yourself in Java, but this requires taping together several low-level APIs yourself.

Component(s)

Java

@lidavidm
Copy link
Member Author

lidavidm commented Jan 5, 2023

CC @liyafan82 did I miss anything about the Java compression API here?

@liyafan82
Copy link
Contributor

CC @liyafan82 did I miss anything about the Java compression API here?

I guess not. Thanks for bringing up the problems.

lidavidm added a commit to lidavidm/arrow that referenced this issue Jan 6, 2023
lidavidm added a commit to lidavidm/arrow that referenced this issue Jan 6, 2023
lidavidm added a commit to lidavidm/arrow that referenced this issue Jan 6, 2023
lidavidm added a commit that referenced this issue Jan 19, 2023
* Closes: #15203

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 12.0.0 milestone Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants