Skip to content

Conversation

@Myasuka
Copy link
Member

@Myasuka Myasuka commented Feb 25, 2019

What is the purpose of the change

This PR created based on the discussion within PR-7515.

LZ4 is a popular lightweight compression, which has better performance than Snappy in many cases, and also recommended by RocksDB.

Based on this, I introduce LZ4 except for now existing snappy compression for keyed state in full checkpoint and savepoints.

Brief change log

  • Introduce new StreamCompressionDecoratorSnapshot interface. The relationship between StreamCompressionDecorator and it just like TypeSerializerSnapshot and TypeSerializerSnapshot. We serialize StreamCompressionDecoratorSnapshot within KeyedBackendSerializationProxy so that we even support user defined StreamCompressionDecorator.
  • Add new abstract method setCompressionDecorator in AbstractStateBackend.
  • Bump KeyedBackendSerializationProxy to a newer version to support customized compression decorator.
  • Migrated existing tests to use LZ4 compression.

Verifying this change

This change added tests and can be verified as follows:

  • Extended unit tests SerializationProxiesTest and StateSnapshotCompressionTest for newely added compression type.
  • Add unit test testSetCompressionDecorator within StateBackendTestBase to verify different state backends could set StreamCompressionDecorator well.
  • Migrate EventTimeWindowCheckpointingITCase IT cases to use LZ4 compression.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): yes, add lz4-java dependency.
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no, but changed the KeyedBackendSerializationProxy
  • The runtime per-record code paths (performance sensitive): no, should not affect topology task performance.
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 25, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@Myasuka Myasuka marked this pull request as ready for review February 25, 2019 09:36
@Myasuka Myasuka force-pushed the lz4-compression-verify branch from 38210f1 to ce48769 Compare February 27, 2019 15:05
@Myasuka
Copy link
Member Author

Myasuka commented Feb 27, 2019

Since #7586 already merged, I need to rebase the latest code.

@Myasuka
Copy link
Member Author

Myasuka commented Apr 1, 2019

Since Yu has completed refactoring the builder of state backend, I have to rebase with the latest code.
CC @StefanRRichter

@Myasuka Myasuka force-pushed the lz4-compression-verify branch from c59b5a7 to def5147 Compare June 11, 2019 10:13
@Myasuka
Copy link
Member Author

Myasuka commented Jun 11, 2019

Rebase latest code again...

@github-actions
Copy link

This PR is being marked as stale since it has not had any activity in the last 180 days.
If you would like to keep this PR alive, please leave a comment asking for a review.
If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the [community](https://flink.apache.org/what-is-flink/community/).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 90 days, it will be automatically closed.

@flinkbot
Copy link
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@github-actions
Copy link

This PR has been closed since it has not had any activity in 120 days.
If you feel like this was a mistake, or you would like to continue working on it,
please feel free to re-open the PR and ask for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants