New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tighten up our compression framework. #11279
Conversation
I'm reading the description again and I've probably not been clear enough about what changed:
|
I opened #11280 to fix the BinaryFieldMapper issue. |
8741a59
to
f5fceb6
Compare
I pushed a new commit to also use DEFLATE (with a compression level of 3) instead of LZF. The reason is that even if LZF might be faster, we have had some issues recently like #7210 or #7468 and expect that using something that is more widely used like DEFLATE will protect us better from corruptions in the future. |
|
||
package org.elasticsearch.common.compress; | ||
|
||
/** Exception indicating that we were expecting something compressed, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasted from NotCompressedException?
LGTM, I left a couple minor comments. |
2bc10bf
to
e725532
Compare
@rjernst Thanks for the review, I pushed a new commit. |
LGTM |
We have a compression framework that we use internally, mainly to compress some xcontent bytes. However it is quite lenient: for instance it relies on the assumption that detection of the compression format can only be called on either some compressed xcontent bytes or some raw xcontent bytes, but nothing checks this. By the way, we are misusing it in BinaryFieldMapper so that if someone indexes a binary field which happens to have the same header as a LZF stream, then at read time, we will try to decompress it. It also simplifies the API by removing block compression (only streaming) and some code duplication caused by some methods accepting a byte[] and other methods a BytesReference.
LZF only stays for backward-compatibility reasons and can only read, not write. DEFLATE is configured to use level=3, which is a nice trade-off between speed and compression ratio and is the same as we use for Lucene's high compression codec.
e725532
to
b6a3952
Compare
Internal: tighten up our compression framework.
We have a compression framework that we use internally, mainly to compress some
xcontent bytes. However it is quite lenient: for instance it relies on the
assumption that detection of the compression format can only be called on either
some compressed xcontent bytes or some raw xcontent bytes, but nothing checks
this. By the way, we are misusing it in BinaryFieldMapper so that if someone
indexes a binary field which happens to have the same header as a LZF stream,
then at read time, we will try to decompress it.
It also simplifies the API by removing block compression (only streaming) and
some code duplication caused by some methods accepting a byte[] and other
methods a BytesReference.