-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599
[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599
Conversation
dfcfdbe
to
3bdc6b8
Compare
…o a generic GroupCache
3bdc6b8
to
bb483e4
Compare
…oid deserializing duplicate big objects
bb483e4
to
63697a2
Compare
… avoid contiguous huge memory usage
63697a2
to
93ce477
Compare
Hi @pnowojski @RocMarshal , would you mind helping take a look this PR in your free time? Thanks a lot |
Hi @huwh , would you mind helping take a look this PR in your free time as well? This improvement is totally similar with FLINK-32386 is contributed by you, and this PR refactor the ShuffleDescriptorsCache into a generic GroupCache, so it would be better if you join this review, thanks a lot! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @1996fanrui for the contribution.
I left a few of comments.
Please let me know what's your opinion~ :)
cachedBlobKeysPerJob.computeIfPresent( | ||
cacheKey.getGroup(), | ||
(group, keys) -> { | ||
keys.remove(cacheKey); | ||
if (keys.isEmpty()) { | ||
return null; | ||
} else { | ||
return keys; | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be there a risk of memory leakage here?
For example, let's talk about the situation:
- There are too many groups
- Perform the following operations on each of these groups one by one
- Add a set for one group and then remove set for the same one group, but the key has not been removed. Would there be many Entries in the form of
Entry-i<Group-i, set-i>
(set-i is empty or null) ?
- Add a set for one group and then remove set for the same one group, but the key has not been removed. Would there be many Entries in the form of
In short, would cachedBlobKeysPerJob
degenerate into a collection with too many elements?
Please correct me if needed for my limited read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (serializedTaskInformation instanceof NonOffloaded) { | ||
NonOffloaded<TaskInformation> taskInformation = | ||
(NonOffloaded<TaskInformation>) serializedTaskInformation; | ||
return taskInformation.serializedValue; | ||
} else { | ||
throw new IllegalStateException( | ||
"Trying to work with offloaded serialized job information."); | ||
return taskInformation.serializedValue.deserializeValue(getClass().getClassLoader()); | ||
} | ||
throw new IllegalStateException( | ||
"Trying to work with offloaded serialized task information."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
Preconditions.checkState(
serializedJobInformation instanceof NonOffloaded,
"Trying to work with offloaded serialized job information.");
NonOffloaded<JobInformation> jobInformation =
(NonOffloaded<JobInformation>) serializedJobInformation;
return jobInformation.serializedValue.deserializeValue(getClass().getClassLoader());
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @1996fanrui for preparing this PR. LGTM with a minor comment.
flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update & review.
LGTM +1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, merging~
What is the purpose of the change
The background is similar to FLINK-33315.
A hive table with a lot of data, and the HiveSource#partitionBytes is 281MB. When slotPerTM = 4, one TM will run 4 HiveSources at the same time.
How the TaskExecutor to submit a large task?
Based on the above process, TM memory will have 2 big byte array for each task:
When one TM runs 4 HiveSources at the same time, it will have 8 big byte array.
In our production environment, this is also a situation that often leads to TM OOM.
Brief change log
Verifying this change
Improve the old tests:
Add a new test:
DefaultGroupCacheTest#testTaskInformationCache
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: noDocumentation