[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599

1996fanrui · 2023-10-26T08:03:59Z

What is the purpose of the change

The background is similar to FLINK-33315.

A hive table with a lot of data, and the HiveSource#partitionBytes is 281MB. When slotPerTM = 4, one TM will run 4 HiveSources at the same time.

How the TaskExecutor to submit a large task?

TaskExecutor#loadBigData will read all bytes from file to SerializedValue

The SerializedValue has a byte[]
It will cost the heap memory
It will be great than 281 MB, because it not only stores HiveSource#partitionBytes, it also stores other information of TaskInformation.

Generate the TaskInformation from SerializedValue

TaskExecutor#submitTask calls the tdd.getSerializedTaskInformation()..deserializeValue()
tdd.getSerializedTaskInformation() is SerializedValue
It will generate the TaskInformation
TaskInformation includes the Configuration taskConfiguration
The taskConfiguration includes StreamConfig#SERIALIZEDUDF

Based on the above process, TM memory will have 2 big byte array for each task:

The SerializedValue
The TaskInformation

When one TM runs 4 HiveSources at the same time, it will have 8 big byte array.

In our production environment, this is also a situation that often leads to TM OOM.

Brief change log

[FLINK-33354][runtime][refactor] Refactor ShuffleDescriptorsCache into a generic GroupCache
[FLINK-33354][runtime][refactor] serializedJobInformation and taskInfo are never null
[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects
[FLINK-33354][runtime] Using the InputStream instead of byte array to avoid contiguous huge memory usage

Verifying this change

Improve the old tests:

DefaultGroupCacheTest
TaskDeploymentDescriptorTest

Add a new test: DefaultGroupCacheTest#testTaskInformationCache

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature?no

flinkbot · 2023-10-26T08:12:16Z

CI report:

93ce477 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

…o a generic GroupCache

…o are never null

…oid deserializing duplicate big objects

… avoid contiguous huge memory usage

1996fanrui · 2023-10-30T07:42:08Z

Hi @pnowojski @RocMarshal , would you mind helping take a look this PR in your free time? Thanks a lot

1996fanrui · 2023-10-30T09:42:07Z

Hi @huwh , would you mind helping take a look this PR in your free time as well?

This improvement is totally similar with FLINK-32386 is contributed by you, and this PR refactor the ShuffleDescriptorsCache into a generic GroupCache, so it would be better if you join this review, thanks a lot!

RocMarshal

Thanks @1996fanrui for the contribution.
I left a few of comments.
Please let me know what's your opinion~ :)

RocMarshal · 2023-10-30T10:07:01Z

flink-runtime/src/main/java/org/apache/flink/runtime/util/DefaultGroupCache.java

+            cachedBlobKeysPerJob.computeIfPresent(
+                    cacheKey.getGroup(),
+                    (group, keys) -> {
+                        keys.remove(cacheKey);
+                        if (keys.isEmpty()) {
+                            return null;
+                        } else {
+                            return keys;
+                        }
+                    });


Would be there a risk of memory leakage here?
For example, let's talk about the situation:

There are too many groups

Perform the following operations on each of these groups one by one

Add a set for one group and then remove set for the same one group, but the key has not been removed. Would there be many Entries in the form of Entry-i<Group-i, set-i>(set-i is empty or null) ?

In short, would cachedBlobKeysPerJob degenerate into a collection with too many elements?

Please correct me if needed for my limited read.

I write a demo on My IDEA, it doesn't have the memory leak. When return null, the map will remove the key.

RocMarshal · 2023-10-30T10:27:45Z

flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java

        if (serializedTaskInformation instanceof NonOffloaded) {
            NonOffloaded<TaskInformation> taskInformation =
                    (NonOffloaded<TaskInformation>) serializedTaskInformation;
-            return taskInformation.serializedValue;
-        } else {
-            throw new IllegalStateException(
-                    "Trying to work with offloaded serialized job information.");
+            return taskInformation.serializedValue.deserializeValue(getClass().getClassLoader());
        }
+        throw new IllegalStateException(
+                "Trying to work with offloaded serialized task information.");


How about

Preconditions.checkState( serializedJobInformation instanceof NonOffloaded, "Trying to work with offloaded serialized job information."); NonOffloaded<JobInformation> jobInformation = (NonOffloaded<JobInformation>) serializedJobInformation; return jobInformation.serializedValue.deserializeValue(getClass().getClassLoader());

?

huwh

Thanks @1996fanrui for preparing this PR. LGTM with a minor comment.

flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java

RocMarshal

Thanks for the update & review.
LGTM +1.

1996fanrui

Thanks for the review, merging~

1996fanrui marked this pull request as draft October 26, 2023 09:17

1996fanrui force-pushed the 33354/Reuse-TaskInformation branch 2 times, most recently from dfcfdbe to 3bdc6b8 Compare October 27, 2023 02:24

1996fanrui added 2 commits October 27, 2023 10:27

[FLINK-33354][runtime][refactor] Refactor ShuffleDescriptorsCache int…

1839f5b

…o a generic GroupCache

[FLINK-33354][runtime][refactor] serializedJobInformation and taskInf…

f982739

…o are never null

1996fanrui force-pushed the 33354/Reuse-TaskInformation branch from 3bdc6b8 to bb483e4 Compare October 27, 2023 02:27

[FLINK-33354][runtime] Cache TaskInformation and JobInformation to av…

1e6085c

…oid deserializing duplicate big objects

1996fanrui force-pushed the 33354/Reuse-TaskInformation branch from bb483e4 to 63697a2 Compare October 27, 2023 07:35

[FLINK-33354][runtime] Using the InputStream instead of byte array to…

93ce477

… avoid contiguous huge memory usage

1996fanrui force-pushed the 33354/Reuse-TaskInformation branch from 63697a2 to 93ce477 Compare October 27, 2023 09:18

1996fanrui marked this pull request as ready for review October 30, 2023 07:40

RocMarshal reviewed Oct 30, 2023

View reviewed changes

huwh self-requested a review November 1, 2023 14:30

huwh approved these changes Nov 6, 2023

View reviewed changes

flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java Show resolved Hide resolved

RocMarshal approved these changes Nov 7, 2023

View reviewed changes

1996fanrui commented Nov 7, 2023

View reviewed changes

1996fanrui merged commit b759794 into apache:master Nov 7, 2023

1996fanrui deleted the 33354/Reuse-TaskInformation branch November 7, 2023 06:31

flinkbot added the component=Runtime/Task label Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599

[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599

1996fanrui commented Oct 26, 2023 •

edited

Loading

flinkbot commented Oct 26, 2023 •

edited

Loading

1996fanrui commented Oct 30, 2023

1996fanrui commented Oct 30, 2023

RocMarshal left a comment

RocMarshal Oct 30, 2023

1996fanrui Oct 30, 2023

RocMarshal Oct 30, 2023

huwh left a comment

RocMarshal left a comment

1996fanrui left a comment

[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599

[FLINK-33354][runtime] Cache TaskInformation and JobInformation to avoid deserializing duplicate big objects #23599

Conversation

1996fanrui commented Oct 26, 2023 • edited Loading

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Oct 26, 2023 • edited Loading

CI report:

1996fanrui commented Oct 30, 2023

1996fanrui commented Oct 30, 2023

RocMarshal left a comment

Choose a reason for hiding this comment

RocMarshal Oct 30, 2023

Choose a reason for hiding this comment

1996fanrui Oct 30, 2023

Choose a reason for hiding this comment

RocMarshal Oct 30, 2023

Choose a reason for hiding this comment

huwh left a comment

Choose a reason for hiding this comment

RocMarshal left a comment

Choose a reason for hiding this comment

1996fanrui left a comment

Choose a reason for hiding this comment

1996fanrui commented Oct 26, 2023 •

edited

Loading

flinkbot commented Oct 26, 2023 •

edited

Loading