-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New plumber #1639
New plumber #1639
Conversation
* This is useful when something *may* want to make a Committer, but also may not, and when making the actual | ||
* Committer could be expensive. | ||
*/ | ||
public interface CommitterMaker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just Supplier<Committer>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no real reason, then I would possibly be inclined to not have another interface and use Supplier<Committer>
What is the path of reconciliation if the commit fails? How is a commit failure indicated? |
@drcrallen Commit failure of what kind indicated to who? |
@gianm Right now any sort of error in the commit results in complete failure of the persist. I think I'm mis-interpreting the purpose of this PR. Is it possible to split up the PR more? At a cursory glance it might be able to be split into:
|
@drcrallen This PR is relatively small and I think there is a decent description of the scope of changes. Why do we need to split into 5 different PRs? |
@fjy Because there's a lot of concepts in one PR, even thought the line count is small. There seem to be code maintenance changes, responsibility changes (who handles time and row related persisting), and API version-ish changes (migrating from Runnable to Committer from Firehosev2) |
And they are done in a "will affect everybody (who uses plumber/indexing stuff)" kind of manner. |
In general, on its own, this PR looks OK to me but I have following general comments.
I would really like to at least get 1st done before this PR is merged. |
|
I think 1/3/4/5 in your list should be the same PR, since I think it makes sense to make all the Plumber interface changes at once. And those are all Plumber interface changes. 2 could be a separate PR, it's not really related to anything, just cleaning up dead code. |
pushed a couple changes,
|
👍 |
@@ -376,10 +381,6 @@ public DataSegment push(File file, DataSegment segment) throws IOException | |||
); | |||
} | |||
metrics.incrementProcessed(); | |||
|
|||
if (numRows >= myRowFlushBoundary) { | |||
plumber.persist(firehose.commit()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems removing this changes the behavior, rowFlushBoundary is not honored anymore .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's passed to the plumber through a RealtimeTuningConfig as maxRowsInMemory
1) Persist using Committer instead of Runnable. (Although the metadata object is ignored in this patch) 2) Remove the getSink method. 3) Plumbers are now responsible for time-based and hydrant-full-based periodic committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)
LGTM (after Travis) |
Modify Plumbers in these ways,
Persist using Committer instead of Runnable. (Although the metadata object
is ignored in this patch)
Remove the getSink method.
Plumbers are now responsible for time-based and hydrant-full-based periodic
committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)
The purpose of these changes is to make it possible to implement Plumbers using
Appenderators, which is a hopefully more useful interface for things like transactional
ingestion and windowPeriodless ingestion. Thinking of something like this for
the interface:
https://gist.github.com/gianm/7d8f83619aada126fcd9