Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-30328][tests] Fix unstable TaskManagerWideRocksDbMemorySharingITCase #21778

Merged
merged 2 commits into from Feb 4, 2023

Conversation

rkhachatryan
Copy link
Contributor

TaskManagerWideRocksDbMemorySharingITCase is unstable because it uses metrics to validate RocksDB cache/wbm sharing.
This PR extract an interface to create those shared objects (WBM/cache) and configures state backend to use its test implementation.
As a side-effect, mocking is removed from RocksDBMemoryControllerUtilsTest.

cc: @Myasuka

@rkhachatryan
Copy link
Contributor Author

@AlanConfluent could you please review this PR?

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 29, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

}
Assert.assertEquals(1, createdCaches.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this seems like a much more reliable way to check the cache directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, if they were not sharing the cache as intended, you would have more than one in this list right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right: there will be a cache per task if memory is shared inside slot, and no caches if memory is not shared.

@Myasuka
Copy link
Member

Myasuka commented Feb 1, 2023

RocksDB actually cannot limit the native memory usage perfectly, why we can make the test stable if not leveraging metrics to detect the memory usage?

@rkhachatryan
Copy link
Contributor Author

RocksDB actually cannot limit the native memory usage perfectly, why we can make the test stable if not leveraging metrics to detect the memory usage?

This PR changes the test from measuring memory usage to directly verifying the number of Cache/WBM objects that were actually created (regardless of capacity). Sharing of those objects was the goal of the main PR.

@rkhachatryan
Copy link
Contributor Author

CI failures are caused by FLINK-29427 - waiting to be resolved
(the test that this PR fixes is currently disabled on master).

@rkhachatryan rkhachatryan merged commit 7b65644 into apache:master Feb 4, 2023
@rkhachatryan rkhachatryan deleted the f30328-extract-interface branch February 4, 2023 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants