Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run loop refactor and starting offsets feature #87

Merged
merged 26 commits into from
Oct 19, 2021

Conversation

xianwill
Copy link
Collaborator

@xianwill xianwill commented Oct 13, 2021

This PR refactors the run loop structure and adds support for specifying explicit starting offsets.

Apologies, there is a lot to look at here, but it needed to be done. The lions share of changes are in lib.rs.

The main changes include:

  • Made the entry point a function (start_ingest) instead of a struct method and we now execute the run loop directly within it.
  • Centralized control as much as possible to execute in the run loop.
  • Merged ProcessingState and IngestProcessor into just IngestProcessor.
  • Instead of an Arc<Mutex<PartitionAssignment>>, we now have an Arc<RwLock<Option<RebalanceSignal>>> for responding to rebalance events so we no longer have to hold a lock while updating the PartitionAssignment.

In the near future, I think we may look at breaking some files out of lib.rs for another cleanup pass.

@xianwill xianwill requested review from mosyp and houqp and removed request for mosyp October 18, 2021 15:40
@xianwill xianwill marked this pull request as ready for review October 18, 2021 15:40
@xianwill xianwill requested a review from mosyp October 18, 2021 15:41
Copy link
Contributor

@mosyp mosyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments but overall lgtm, great job!

src/deltalake_ext.rs Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
tests/emails_s3_tests.rs Outdated Show resolved Hide resolved
* Add rustdocs for behavior when starting offsets reference invalid
Kafka offsets
* Log topic instead of table uri (except for checkpoints) to shorten the
log message weight
* Lengthen wait_for_version_created timeout to 3 minutes and panic
instead of returning
* Add token.cancel back to emails_s3_tests
src/lib.rs Outdated Show resolved Hide resolved
@xianwill xianwill merged commit 13917b0 into delta-io:main Oct 19, 2021
@xianwill xianwill deleted the run_loop_refactor-4 branch October 19, 2021 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants