Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-33360] HybridSourceSplitEnumerator clear finishedReaders when … #23593

Merged
merged 4 commits into from
Oct 30, 2023

Conversation

fengjiajie
Copy link
Contributor

@fengjiajie fengjiajie commented Oct 25, 2023

…switchEnumerator

What is the purpose of the change

org.apache.flink.connector.base.source.hybrid.HybridSourceSplitEnumerator:

            // track readers that have finished processing for current enumerator
            finishedReaders.add(subtaskId);
            if (finishedReaders.size() == context.currentParallelism()) {
                LOG.debug("All readers finished, ready to switch enumerator!");
                if (currentSourceIndex + 1 < sources.size()) {
                    switchEnumerator();
                    // switch all readers prior to sending split assignments
                    for (int i = 0; i < context.currentParallelism(); i++) {
                        sendSwitchSourceEvent(i, currentSourceIndex);
                    }
                }
            }

I think that finishedReaders is used to keep track of all the subTaskIds that have finished reading the current round of the source. Therefore, in the switchEnumerator function, finishedReaders should be cleared:

If it's not cleared, then in the next source reading, whenever any SourceReader reports a SourceReaderFinishedEvent (while other SourceReaders may not have finished processing in parallel), the condition finishedReaders.size() == context.currentParallelism() will be satisfied and it will trigger sendSwitchSourceEvent(i, currentSourceIndex), sending a SwitchSourceEvent to all SourceReaders.
If a SourceReader receives a SwitchSourceEvent before it finishes reading the previous source, it will execute currentReader.close(), and some data may not be fully read, resulting in a partial data loss in the source.

Brief change log

Fix the issue of HybridSource prematurely closing the reader and missing data.

Verifying this change

When each shard takes a long time to process, it may result in data loss. The fix has been validated in my scenario and proven effective.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 25, 2023

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@fengjiajie
Copy link
Contributor Author

@flinkbot run azure

@Apache9
Copy link

Apache9 commented Oct 26, 2023

Can we add a UT to reproduce the problem?

assertThat(getCurrentSourceIndex(enumerator))
.as(
"only reader-0 has finished reading, reader-1 is not yet done, so do not switch to the next source")
.isEqualTo(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR modification, the assert will fail here, which means switching to the next source only after one reader completes reading

@leonardBang leonardBang self-requested a review October 26, 2023 16:40
Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fengjiajie for the contribution and @Apache9 for the review, I left one comment.

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fengjiajie for the update, LGTM, wait the CI green.

@leonardBang leonardBang merged commit 8d21c32 into apache:master Oct 30, 2023
@leonardBang
Copy link
Contributor

CI passed, merging, @fengjiajie Would you like to backport the fix to release-1.17 and release-1.18 branch?

@fengjiajie
Copy link
Contributor Author

Thanks @leonardBang @Apache9.

I have already submitted the backport:

1.18: #23616
1.17: #23617

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants