Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partitioned transactional source phase 1 #930

Merged
merged 2 commits into from
Nov 29, 2019

Conversation

seglo
Copy link
Member

@seglo seglo commented Oct 11, 2019

Purpose

This PR adds a new partitioned source with support for Kafka transactions, Transactions.partitionedSource. A Source is created per partition in a consumer group subscription. Each Source must have a downstream Transactional.sink|flow to commit messages transactionally. A derived transactional.id is used per Transactional.sink|flow that's based on a user-provided transactional.id along with the group.id, topic and partition of the originally consumed message. Transactions.partitionedSource will make it easier to run transactional workloads across consumer groups with multiple members.

This PR is based on the extensive work already completed by @charlibot earlier this year. It was rebased by @raboof and updated to accommodate transactional consistency work completed by @2m and @szymonm. This new branch will serve as the integration point to get this functionality into Alpakka Kafka 2.0.0.

Update 27/11/2019

Due to the large number of changes introduced by this PR it has been repurposed as a "phase 1" to lay the groundwork to continue this work. This PR will not expose the Transactional.partitionedSource to the end user.

References

Phase 1 tasks

Phase 2 tasks (preliminary)

  • Test: Consistency of partitioned source when calling shutdown using draining control
  • Set ConsumerSettings.stopTimeout to 0 to allow for faster shutdowns when a stream completes/fails.
  • Update docs. (to be continued in phase 2)
  • Add more low level testing of draining control logic

@seglo
Copy link
Member Author

seglo commented Oct 17, 2019

I received confirmation that Kafka Streams EoS creates one producer per source partition. This will introduce performance implications which will be addressed in KIP-447, Producer scalability for exactly once semantics. This work is in progress and will land in a future version of Kafka (not 2.4.0).

In the meantime we can proceed with our current implementation to have one producer per partitioned source.

@seglo seglo requested a review from ennru October 17, 2019 22:29
Copy link
Member

@ennru ennru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many places would improve with named classes.

@ennru ennru added this to the 2.0.0 milestone Oct 18, 2019
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch 6 times, most recently from 6f77477 to 405687b Compare October 30, 2019 21:08
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch 12 times, most recently from 6364a0a to 209b5b7 Compare November 2, 2019 15:04
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch from b42561a to 44cb47d Compare November 5, 2019 22:41
@ennru ennru added this to In progress in Akka streaming via automation Nov 21, 2019
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch 2 times, most recently from 7934d20 to fad9b2f Compare November 27, 2019 21:39
@seglo seglo changed the title Add partitioned transactional source (2) Add partitioned transactional source phase 1 Nov 27, 2019
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch from fad9b2f to e394850 Compare November 27, 2019 22:34
@seglo seglo requested a review from ennru November 27, 2019 22:37
Copy link
Member

@ennru ennru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a chunk of code.
I'm not sure what is tested and what is not. We should find a way to document that, especially when we move things across integration tests and "fast" tests.

Sprinkle more final on classes.

@ennru ennru moved this from In progress to Ready for review in Akka streaming Nov 28, 2019
@seglo seglo requested a review from ennru November 28, 2019 19:24
@ennru
Copy link
Member

ennru commented Nov 29, 2019

I commented it out and added final, private, and protected at a few places.

@ennru ennru moved this from Ready for review to Review in progress in Akka streaming Nov 29, 2019
Copy link
Member

@ennru ennru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Akka streaming automation moved this from Review in progress to Reviewer approved Nov 29, 2019
@ennru ennru force-pushed the transactional-partitioned-source-rebased branch from 5dd9017 to bb5394b Compare November 29, 2019 11:30
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch from bb5394b to 4617e3e Compare November 29, 2019 15:14
@seglo seglo force-pushed the transactional-partitioned-source-rebased branch from 4617e3e to df5bf2e Compare November 29, 2019 15:35
@seglo seglo merged commit 367edfb into master Nov 29, 2019
Akka streaming automation moved this from Reviewer approved to Done Nov 29, 2019
@seglo seglo deleted the transactional-partitioned-source-rebased branch November 29, 2019 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants