Skip to content

Conversation

wsry
Copy link
Contributor

@wsry wsry commented Mar 10, 2023

What is the purpose of the change

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.

Brief change log

  • Reserve the guaranteed buffers on initializing for SortMergeResultPartition.

Verifying this change

This change added tests.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@wsry wsry requested a review from reswqa March 10, 2023 08:34
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 10, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

wsry pushed a commit to wsry/flink that referenced this pull request Mar 10, 2023
…huffle

This closes apache#22148.

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.
@wsry wsry force-pushed the release-1.17-FLINK-31386 branch from 782cd9d to e7de8c4 Compare March 10, 2023 08:41
wsry pushed a commit to wsry/flink that referenced this pull request Mar 10, 2023
…huffle

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.

This closes apache#22148.
@wsry wsry force-pushed the release-1.17-FLINK-31386 branch from e7de8c4 to ed627bf Compare March 10, 2023 08:56
Copy link
Member

@reswqa reswqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wsry for the fix, this changes looks good to me overall. I only left some minor comments, PTAL.

Comment on lines 192 to 199
try {
for (int i = 0; i < bufferPool.getNumberOfRequiredMemorySegments(); ++i) {
freeSegments.add(checkNotNull(bufferPool.requestMemorySegmentBlocking()));
}
} catch (Throwable throwable) {
freeSegments.forEach(bufferPool::recycle);
throw new IOException("Failed to reserve required buffers.", throwable);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this logic to a method to reuse this, because requestNetworkBuffers also request buffers to guaranteed.

…huffle

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.

This closes apache#22148.
@wsry wsry force-pushed the release-1.17-FLINK-31386 branch from ed627bf to 7b5e869 Compare March 10, 2023 11:58
@wsry
Copy link
Contributor Author

wsry commented Mar 10, 2023

@reswqa Thanks for the review and feedback. I have updated the PR.

Copy link
Member

@reswqa reswqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wsry for the updated,LGTM!

@wsry wsry merged commit b90bf7a into apache:release-1.17 Mar 10, 2023
wsry pushed a commit to wsry/flink that referenced this pull request Mar 10, 2023
…huffle

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.

This closes apache#22148.
wsry pushed a commit that referenced this pull request Mar 11, 2023
…huffle

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.

This closes #22148.
reswqa pushed a commit to reswqa/flink that referenced this pull request Mar 14, 2023
…huffle

Currently, the SortMergeResultPartition may allocate more network buffers than the guaranteed size of the LocalBufferPool. As a result, some result partitions may need to wait other result partitions to release the over-allocated network buffers to continue. However, the result partitions which have allocated more than guaranteed buffers relies on the processing of input data to trigger data spilling and buffer recycling. The input data further relies on batch reading buffers used by the SortMergeResultPartitionReadScheduler which may already taken by those blocked result partitions that are waiting for buffers. Then deadlock occurs. This patch fixes the deadlock issue by reserving the guaranteed buffers on initializing.

This closes apache#22148.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants