Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New plumber #1639

Merged
merged 1 commit into from
Sep 4, 2015
Merged

New plumber #1639

merged 1 commit into from
Sep 4, 2015

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Aug 18, 2015

Modify Plumbers in these ways,

  1. Persist using Committer instead of Runnable. (Although the metadata object
    is ignored in this patch)

  2. Remove the getSink method.

  3. Plumbers are now responsible for time-based and hydrant-full-based periodic
    committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)

The purpose of these changes is to make it possible to implement Plumbers using
Appenderators, which is a hopefully more useful interface for things like transactional
ingestion and windowPeriodless ingestion. Thinking of something like this for
the interface:

https://gist.github.com/gianm/7d8f83619aada126fcd9

* This is useful when something *may* want to make a Committer, but also may not, and when making the actual
* Committer could be expensive.
*/
public interface CommitterMaker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just Supplier<Committer> ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no real reason, then I would possibly be inclined to not have another interface and use Supplier<Committer>

@drcrallen
Copy link
Contributor

What is the path of reconciliation if the commit fails? How is a commit failure indicated?

@gianm
Copy link
Contributor Author

gianm commented Aug 18, 2015

@drcrallen Commit failure of what kind indicated to who?

@drcrallen
Copy link
Contributor

@gianm Right now any sort of error in the commit results in complete failure of the persist. I think I'm mis-interpreting the purpose of this PR.

Is it possible to split up the PR more?

At a cursory glance it might be able to be split into:

  1. IndexTask.convertTuningConfig changes
  2. Removing Indexer
  3. Moving time and row persisting
  4. Moving getSink to private
  5. Moving to Committer interface

@fjy
Copy link
Contributor

fjy commented Aug 19, 2015

@drcrallen This PR is relatively small and I think there is a decent description of the scope of changes. Why do we need to split into 5 different PRs?

@drcrallen
Copy link
Contributor

@fjy Because there's a lot of concepts in one PR, even thought the line count is small.

There seem to be code maintenance changes, responsibility changes (who handles time and row related persisting), and API version-ish changes (migrating from Runnable to Committer from Firehosev2)

@drcrallen
Copy link
Contributor

And they are done in a "will affect everybody (who uses plumber/indexing stuff)" kind of manner.

@himanshug
Copy link
Contributor

In general, on its own, this PR looks OK to me but I have following general comments.

  1. It has definite merge conflicts with Experimental kafa simple consumer based firehose #1609 , is it possible to base the work on top of that PR (may be merge that one). I have incorporated review comments, by adding code comments, to what was last discussed in the dev sync-up meeting. That would save me some work.

  2. Without seeing a PR that includes a firechiefy driver that drives a plumber on top of Appenderator ... it is very hard to imagine how this is really going to work all the way to provide the results we are trying to achieve. That PR could just be bare minimum code (not necessarily working version and may contain lots of TODOs) that shows the essentials.

I would really like to at least get 1st done before this PR is merged.

@gianm
Copy link
Contributor Author

gianm commented Aug 19, 2015

@himanshug,

  1. sounds good, I think we should get Experimental kafa simple consumer based firehose #1609 merged before this one. I was trying to do this one in a way that would minimize conflicts since I was assuming Experimental kafa simple consumer based firehose #1609 would get merged first…

  2. ok, that's fair, I'll work on a sketch of how you might use this stuff.

@gianm
Copy link
Contributor Author

gianm commented Aug 19, 2015

@drcrallen,

I think 1/3/4/5 in your list should be the same PR, since I think it makes sense to make all the Plumber interface changes at once. And those are all Plumber interface changes.

2 could be a separate PR, it's not really related to anything, just cleaning up dead code.

@gianm
Copy link
Contributor Author

gianm commented Aug 28, 2015

pushed a couple changes,

  1. Rebased on master (post Experimental kafa simple consumer based firehose #1609)
  2. Split the Indexer interface removal into a separate PR (Remove unused Indexer interface. #1683)

@fjy
Copy link
Contributor

fjy commented Aug 30, 2015

👍

@@ -376,10 +381,6 @@ public DataSegment push(File file, DataSegment segment) throws IOException
);
}
metrics.incrementProcessed();

if (numRows >= myRowFlushBoundary) {
plumber.persist(firehose.commit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems removing this changes the behavior, rowFlushBoundary is not honored anymore .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's passed to the plumber through a RealtimeTuningConfig as maxRowsInMemory

1) Persist using Committer instead of Runnable. (Although the metadata object
   is ignored in this patch)

2) Remove the getSink method.

3) Plumbers are now responsible for time-based and hydrant-full-based periodic
   committing. (FireChief, RealtimeIndexTask, and IndexTask used to do this)
@himanshug
Copy link
Contributor

LGTM (after Travis)

@fjy fjy closed this Sep 3, 2015
@fjy fjy reopened this Sep 3, 2015
@fjy fjy closed this Sep 4, 2015
@fjy fjy reopened this Sep 4, 2015
fjy added a commit that referenced this pull request Sep 4, 2015
@fjy fjy merged commit 75a5829 into apache:master Sep 4, 2015
@gianm gianm deleted the new-plumber branch February 23, 2016 21:14
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants