Skip to content

Conversation

@talatuyarer
Copy link
Contributor

When we have multiple KafkaIO on same job. Beam's Default max parallesim apply for all even though Kafka does not have that much partition on the topic. To prevent that We need to set max Parallesim on UnBoundedSource based on split count.

This is fix for that issue.

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 16, 2024
@talatuyarer
Copy link
Contributor Author

@je-ik Could you review this ? Do you see any issue with this PR ?

@je-ik
Copy link
Contributor

je-ik commented Feb 22, 2024

Are there any issues you see with the empty splits? In general, these should simply emit MAX_WATERMARK and then should not cause any overhead. If this is not happening, then there might be some other issue we might want to uncover.

@github-actions github-actions bot removed the stale label Feb 22, 2024
@talatuyarer
Copy link
Contributor Author

@je-ik maxParallelism beyond the number of partitions in Kafka, you're basically giving workers extra splits that they don't need for that Kafka topic. This means each worker ends up using more threads than necessary. And when you've got a bunch of KafkaIO pipelines doing this, it eats up more memory and makes the checkpoints bigger. Bigger checkpoints can slow down how often you can do them, which isn't great for keeping things running smoothly.

Is there any benefit to set bigger than partition count ?

@je-ik
Copy link
Contributor

je-ik commented Feb 22, 2024

I think there is no particular benefit, at least given the current implementation. On the other hand, before we place a hard limit, we should know it is really needed, that it is not just a manifestation of another issue.

From my understanding - if reader gets empty partitions, the particular split should not connect to Kafka, or contribute to checkpoint size, it should actually only emit MAX_WATERMARK (which means that there will be no more data from this split) and sleep forever (or terminate after shutdownSourcesAfterIdleMs milliseconds). Is this not working in your case?

@github-actions
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 23, 2024
@github-actions
Copy link
Contributor

github-actions bot commented May 1, 2024

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants