Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information #2015

Merged
merged 2 commits into from
Dec 8, 2015

Conversation

nishantmonu51
Copy link
Member

  • fixes Peons put high load on zookeeper on disconnects #1970
  • extracted out segment handoff callbacks in SegmentHandoffNotifier
    which is responsible for tracking segment handoffs and doing callbacks
    when handoff is complete.
  • Coordinator now maintains a timeline for the segments which makes it faster to provide serverView for an interval
  • Added new end point to DatasourcesResource which exposes information about where nodes segments are loaded based on interval.
  • realtime index task and realtime nodes now use HTTP end points exposed by the
    coordinator to get serverView

Below image shows the improvements by above changes on zookeeper read load with nearly 500 realtime index tasks running -

screen shot 2015-12-09 at 3 21 58 am

@@ -35,6 +35,8 @@
*/
public Iterable<TimelineObjectHolder<VersionType, ObjectType>> lookup(Interval interval);

public Iterable<TimelineObjectHolder<VersionType, ObjectType>> lookup(Interval interval, boolean incompleteOk);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowIncomplete would be clearer I think. Maybe add some javadoc as well while we're at it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead expose the method

public Iterable<TimelineObjectHolder<VersionType, ObjectType>> lookupWithIncomplete(Interval interval);

Boolean flag style method often result in pretty confusing function signatures, because you are not always sure of what the boolean is supposed to mean. Especially if you don't actually have the code. If we can instead move those semantics into a completely different method name, it becomes easier for users to know what is going on.

Also, 👍 on adding javadoc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@xvrl
Copy link
Member

xvrl commented Dec 3, 2015

👍 looks good to me.

If we change the way coordinator discovery is done, we should also change the way overlord discovery in RemoteTaskActionClient is done, since they are effectively the same. In order to avoid scope creep of this PR, I would be in favor of doing this change in a separate one. I'll file an issue for this.


|Property|Description|Default|
|--------|-----------|-------|
|`druid.selectors.coordinator.serviceName`|The druid.service name of the coordinator node. To start the Coordinator with a different name, set it with this property. |druid/coordinator|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this is "druid/coordinator" by default, it should be set to "coordinator" in the example common.runtime.properties, since the example coordinator properties set the druid.service to "coordinator". Many people start off their clusters by copying the example configs.

I wonder if we should also just change the defaults in the code to match those…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the configs were created before we had defaults. I'd rather just change the example configs to match the defaults, or leave it out of the example configs entirely, since it's not necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default overlord service name in tranquility is "overlord" to match the examples, so if we do change the examples we should change that too. All three of those things should match though (druid defaults, tranquility defaults, druid example configs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2046 to track

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishantmonu51
Copy link
Member Author

bouncing for travis.

@nishantmonu51
Copy link
Member Author

Also added graph showing the improvement on zookeeper load with ~500 realtime index tasks running in PR description.

xvrl added a commit that referenced this pull request Dec 8, 2015
Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information
@xvrl xvrl merged commit dcd1573 into apache:master Dec 8, 2015
@xvrl xvrl deleted the handoff-notifier-coordinator branch December 8, 2015 22:06
@gianm
Copy link
Contributor

gianm commented Dec 9, 2015

I think this patch makes the rolling update process not work, since it suggests doing coordinators after indexing nodes, but we'd need the coordinator endpoint up first.

@gianm
Copy link
Contributor

gianm commented Dec 9, 2015

added a "release notes" label

xvrl added a commit that referenced this pull request Dec 9, 2015
@gianm
Copy link
Contributor

gianm commented Dec 21, 2015

We also need to add to the release notes that you need to make sure your druid.selectors.coordinator.serviceName is set properly. Otherwise realtime indexing will stop working after updating because the handoff notifier won't be linked up.

@xvrl
Copy link
Member

xvrl commented Jan 8, 2016

@gianm oddly this PR is not even mentioned at all in the release notes

@gianm
Copy link
Contributor

gianm commented Jan 8, 2016

Probably it was added after the first RC and missed. We should double check that all the stuff after that actually made it into the notes.

@gianm
Copy link
Contributor

gianm commented Jan 8, 2016

@xvrl will take a look now

@gianm gianm mentioned this pull request Jan 8, 2016
@gianm
Copy link
Contributor

gianm commented Jan 8, 2016

@xvrl updated the release notes & added a few other missing things

@xvrl
Copy link
Member

xvrl commented Jan 8, 2016

@gianm thx I was updating the release notes but you beat me to it.

@rasahner
Copy link
Contributor

I just realized that the new endpoint isn't documented at http://druid.io/docs/latest/design/coordinator.html. Should it be? Or, are there some types of endpoints that are deliberately not documented at druid.io/docs and this is one of them?

@fjy
Copy link
Contributor

fjy commented Jan 10, 2016

@rasahner This should be documented. Can you submit a PR?

@rasahner
Copy link
Contributor

see #2238

@fjy
Copy link
Contributor

fjy commented Jan 27, 2016

@nishantmonu51 @gianm @pjain1 @xvrl @cheddar

None of the examples were updated to reflect that serviceName now talks to coordinator and realtimes won't be able to do handoff.

@fjy
Copy link
Contributor

fjy commented Jan 27, 2016

Also, does anything that requires the coordinator to talk to the overlord still work? Kill tasks, merge tasks, etc?

Edit: yes they will, I thought we'd removed indexing.serviceName for a second there

@fjy fjy modified the milestone: 0.9.0 Feb 4, 2016
@xvrl xvrl modified the milestones: 0.8.3, 0.9.0 Feb 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Peons put high load on zookeeper on disconnects
7 participants