Skip to content

Conversation

@Myasuka
Copy link
Member

@Myasuka Myasuka commented Jan 23, 2020

What is the purpose of the change

Add end-to-end test for controlling RocksDB memory usage. This job has 4 states in 4 different operator, and all the operators are shared in one slot.

NOTE: This end-to-end test could be a unstable one when too many unflushed immutable mem-tables. I wrote a doc to explain how write buffer manager works in RocksDB. In this doc I explained the most total memory usage could be much higher than expected in the worst case.

Below is the general test result:
1GB TM, 2 slot each without memory control. To compare fairly, I also cache index & filter into cache but not change other configurations of RocksDB.
When we do not control memory usage over RocksDB instances, the total memory should be summed as block-cache-usgae + total-mem-table from all 4 states. As you can see, the total memory usage in one slot could be 400MB+
111

1GB TM, 2 slot each has 161061276 bytes of managed off-heap memory
Since we use the same cache to share among all rocksDB instances, the total memory usage is the block cache usage. As you can see, the memory usage could be near the vicinity of 161061276 bytes.
image

Brief change log

Add end-to-end test for controlling RocksDB memory usage.

Verifying this change

This change added tests and can be verified as follows:

  • Added RocksDBStateMemoryControlTestProgram to verify end-to-end.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit abfc351 (Thu Jan 23 07:53:33 UTC 2020)

Warnings:

  • 2 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Member

@carp84 carp84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design of the test looks good to me, please check my inline comments.

Could you also trigger the e2e tests in travis after resolving the comments, to confirm that the newly added tests could pass @Myasuka ? Thanks.

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 23, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@StephanEwen
Copy link
Contributor

Nice work, @Myasuka and @carp84 .

With Chinese New Year happening now, I can take this over from here and address the remaining comments.

@asfgit asfgit closed this in 6128028 Jan 23, 2020
StephanEwen pushed a commit to StephanEwen/flink that referenced this pull request Jan 23, 2020
asfgit pushed a commit that referenced this pull request Jan 23, 2020
JTaky pushed a commit to JTaky/flink that referenced this pull request Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants