Tighten up our compression framework. #11279

jpountz · 2015-05-21T12:47:24Z

We have a compression framework that we use internally, mainly to compress some
xcontent bytes. However it is quite lenient: for instance it relies on the
assumption that detection of the compression format can only be called on either
some compressed xcontent bytes or some raw xcontent bytes, but nothing checks
this. By the way, we are misusing it in BinaryFieldMapper so that if someone
indexes a binary field which happens to have the same header as a LZF stream,
then at read time, we will try to decompress it.

It also simplifies the API by removing block compression (only streaming) and
some code duplication caused by some methods accepting a byte[] and other
methods a BytesReference.

jpountz · 2015-05-21T12:53:07Z

I'm reading the description again and I've probably not been clear enough about what changed:

If you call CompressorFactory.compressor(ChannelBuffer) and the compression format is not detected, you will get an exception. This was already working this way at the call site: MessageChannelHandler.messageReceived, now it's done directly in the framework.
If you call CompressorFactory.compressor(BytesReference) and the bytes are not either some xcontent bytes or some compressed xcontent bytes, then you will get an exception. There is no way in general to detect if some bytes are compressed because nothing prevents an arbitrary bytes reference to have the same header as a LZF block so we enforce that this method be only used on xcontent bytes, which is how we are using this method (to the notable exception of BinaryFieldMapper, which is buggy as described in the description).

jpountz · 2015-05-21T13:44:19Z

I opened #11280 to fix the BinaryFieldMapper issue.

jpountz · 2015-05-22T16:16:40Z

I pushed a new commit to also use DEFLATE (with a compression level of 3) instead of LZF. The reason is that even if LZF might be faster, we have had some issues recently like #7210 or #7468 and expect that using something that is more widely used like DEFLATE will protect us better from corruptions in the future.

rjernst · 2015-05-23T00:33:14Z

src/main/java/org/elasticsearch/common/compress/NotXContentException.java

+
+package org.elasticsearch.common.compress;
+
+/** Exception indicating that we were expecting something compressed, which


copy-pasted from NotCompressedException?

rjernst · 2015-05-23T01:31:58Z

LGTM, I left a couple minor comments.

jpountz · 2015-05-29T08:40:01Z

@rjernst Thanks for the review, I pushed a new commit.

rjernst · 2015-05-29T09:05:45Z

LGTM

We have a compression framework that we use internally, mainly to compress some xcontent bytes. However it is quite lenient: for instance it relies on the assumption that detection of the compression format can only be called on either some compressed xcontent bytes or some raw xcontent bytes, but nothing checks this. By the way, we are misusing it in BinaryFieldMapper so that if someone indexes a binary field which happens to have the same header as a LZF stream, then at read time, we will try to decompress it. It also simplifies the API by removing block compression (only streaming) and some code duplication caused by some methods accepting a byte[] and other methods a BytesReference.

LZF only stays for backward-compatibility reasons and can only read, not write. DEFLATE is configured to use level=3, which is a nice trade-off between speed and compression ratio and is the same as we use for Lucene's high compression codec.

Internal: tighten up our compression framework.

jpountz added v2.0.0-beta1 :Internal labels May 21, 2015

jpountz mentioned this pull request May 21, 2015

Remove the compress/compress_threshold options of the BinaryFieldMapper #11280

Merged

jpountz mentioned this pull request May 21, 2015

Tests: replacement test for missing elements in the diff-serialized cluster state #11257

Closed

jpountz force-pushed the fix/simplify_compression branch 2 times, most recently from 8741a59 to f5fceb6 Compare May 22, 2015 16:10

rjernst reviewed May 23, 2015
View reviewed changes

jpountz force-pushed the fix/simplify_compression branch 2 times, most recently from 2bc10bf to e725532 Compare May 29, 2015 08:39

jpountz added 2 commits May 29, 2015 12:13

jpountz force-pushed the fix/simplify_compression branch from e725532 to b6a3952 Compare May 29, 2015 15:16

jpountz added a commit that referenced this pull request May 29, 2015

Merge pull request #11279 from jpountz/fix/simplify_compression

0f3206e

Internal: tighten up our compression framework.

jpountz merged commit 0f3206e into elastic:master May 29, 2015

kevinkluge removed the review label May 29, 2015

jpountz deleted the fix/simplify_compression branch May 29, 2015 15:23

clintongormley added the >enhancement label May 29, 2015

jpountz mentioned this pull request Jun 4, 2015

CompressorFactory.uncompressIfNeeded is fragile #4576

Closed

clintongormley changed the title ~~Internal: tighten up our compression framework.~~ Tighten up our compression framework. Jun 7, 2015

rmuir mentioned this pull request Dec 2, 2015

remove unused core dependencies #15172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten up our compression framework. #11279

Tighten up our compression framework. #11279

jpountz commented May 21, 2015

jpountz commented May 21, 2015

jpountz commented May 21, 2015

jpountz commented May 22, 2015

rjernst May 23, 2015

rjernst commented May 23, 2015

jpountz commented May 29, 2015

rjernst commented May 29, 2015


		package org.elasticsearch.common.compress;

		/** Exception indicating that we were expecting something compressed, which

Tighten up our compression framework. #11279

Tighten up our compression framework. #11279

Conversation

jpountz commented May 21, 2015

jpountz commented May 21, 2015

jpountz commented May 21, 2015

jpountz commented May 22, 2015

rjernst May 23, 2015

Choose a reason for hiding this comment

rjernst commented May 23, 2015

jpountz commented May 29, 2015

rjernst commented May 29, 2015