[FLINK-15532][rocksdb] Enable strict capacity limit for memory usage for RocksDB #16835

Myasuka · 2021-08-16T04:37:47Z

What is the purpose of the change

Enable strict capacity limit for memory usage for RocksDB.

Brief change log

Change the constructor of LRUCache to strict mode.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable

…for RocksDB

flinkbot · 2021-08-16T04:41:38Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit cb9bab9 (Mon Aug 16 04:41:38 UTC 2021)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

carp84

+1. However, it seems to me the fix in RocksDB only prevents the process crashing instead of completely restrict the memory usage, right? And we still need the calculations to preserve enough space and prevent memory over-usage? If so, I suggest we leave an explicit document in JIRA.

cc @StephanEwen

flinkbot · 2021-08-16T05:20:46Z

CI report:

cb9bab9 Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

Myasuka · 2021-08-16T08:39:48Z

@carp84 , RocksDB's implementation of memory control actually has two parts:

WriteBufferManager which controls the memory usage in write buffers. And it cast the memory usage to dummy block and insert to block cache.
Block cache manages all block usage of index, filter, data and dummy blocks.
A strict limit block cache could help ensure the usage of block cache, and we still need the calculations in write buffer manager as it controls the write buffers.
However, I noticed the CI failed due to the memory control end-to-end test. I am still investigating...

carp84 · 2021-08-16T11:10:13Z

@carp84 , RocksDB's implementation of memory control actually has two parts:

WriteBufferManager which controls the memory usage in write buffers. And it cast the memory usage to dummy block and insert to block cache.

Block cache manages all block usage of index, filter, data and dummy blocks.
A strict limit block cache could help ensure the usage of block cache, and we still need the calculations in write buffer manager as it controls the write buffers.
However, I noticed the CI failed due to the memory control end-to-end test. I am still investigating...

I see, thanks for the information.

Myasuka · 2021-08-17T07:04:51Z

The test failed due to fail to insert block cache, it seems RocksDB will throw exception directly if no more spaces in block cache. I'll check what to do to avoid this.

2021-08-16 07:18:45,145 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - ValueStateMapper (2/2)#0 (f3c4c9f3b6ff23a52ac2d2c0fbf66b96) switched from RUNNING to FAILED with failure cause: org.apache.flink.util.FlinkRuntimeException: Error while retrieving data from RocksDB.
	at org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:91)
	at org.apache.flink.streaming.tests.RocksDBStateMemoryControlTestProgram$ValueStateMapper.map(RocksDBStateMemoryControlTestProgram.java:130)
	at org.apache.flink.streaming.tests.RocksDBStateMemoryControlTestProgram$ValueStateMapper.map(RocksDBStateMemoryControlTestProgram.java:103)
	at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38)
	at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:230)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
	at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:487)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:817)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:744)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:783)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:726)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:786)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:572)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.rocksdb.RocksDBException: Insert failed due to LRU cache being full.
	at org.rocksdb.RocksDB.get(Native Method)
	at org.rocksdb.RocksDB.get(RocksDB.java:2084)
	at org.apache.flink.contrib.streaming.state.RocksDBValueState.value(RocksDBValueState.java:83)
	... 16 more

github-actions · 2025-01-15T00:15:25Z

This PR is being marked as stale since it has not had any activity in the last 180 days.
If you would like to keep this PR alive, please leave a comment asking for a review.
If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the [community](https://flink.apache.org/what-is-flink/community/).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 90 days, it will be automatically closed.

github-actions · 2025-03-29T06:27:55Z

This PR has been closed since it has not had any activity in 120 days.
If you feel like this was a mistake, or you would like to continue working on it,
please feel free to re-open the PR and ask for a review.

[FLINK-15532][rocksdb] Enable strict capacity limit for memory usage …

cb9bab9

…for RocksDB

rmetzger added review=description? component=Runtime/StateBackends labels Aug 16, 2021

carp84 approved these changes Aug 16, 2021

View reviewed changes

1996fanrui added the stale label Jan 17, 2025

github-actions bot added the closed-stale label Mar 29, 2025

github-actions bot closed this Mar 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-15532][rocksdb] Enable strict capacity limit for memory usage for RocksDB #16835

[FLINK-15532][rocksdb] Enable strict capacity limit for memory usage for RocksDB #16835

Uh oh!

Myasuka commented Aug 16, 2021

Uh oh!

flinkbot commented Aug 16, 2021

Uh oh!

carp84 left a comment

Uh oh!

flinkbot commented Aug 16, 2021 •

edited

Loading

Uh oh!

Myasuka commented Aug 16, 2021

Uh oh!

carp84 commented Aug 16, 2021

Uh oh!

Myasuka commented Aug 17, 2021

Uh oh!

github-actions bot commented Jan 15, 2025

Uh oh!

github-actions bot commented Mar 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[FLINK-15532][rocksdb] Enable strict capacity limit for memory usage for RocksDB #16835

[FLINK-15532][rocksdb] Enable strict capacity limit for memory usage for RocksDB #16835

Uh oh!

Conversation

Myasuka commented Aug 16, 2021

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Aug 16, 2021

Automated Checks

Review Progress

Uh oh!

carp84 left a comment

Choose a reason for hiding this comment

Uh oh!

flinkbot commented Aug 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI report:

Uh oh!

Myasuka commented Aug 16, 2021

Uh oh!

carp84 commented Aug 16, 2021

Uh oh!

Myasuka commented Aug 17, 2021

Uh oh!

github-actions bot commented Jan 15, 2025

Uh oh!

github-actions bot commented Mar 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

flinkbot commented Aug 16, 2021 •

edited

Loading