Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information #2015

nishantmonu51 · 2015-11-26T18:25:40Z

fixes Peons put high load on zookeeper on disconnects #1970
extracted out segment handoff callbacks in SegmentHandoffNotifier
which is responsible for tracking segment handoffs and doing callbacks
when handoff is complete.
Coordinator now maintains a timeline for the segments which makes it faster to provide serverView for an interval
Added new end point to DatasourcesResource which exposes information about where nodes segments are loaded based on interval.
realtime index task and realtime nodes now use HTTP end points exposed by the
coordinator to get serverView

Below image shows the improvements by above changes on zookeeper read load with nearly 500 realtime index tasks running -

xvrl · 2015-11-30T18:49:47Z

common/src/main/java/io/druid/timeline/TimelineLookup.java

@@ -35,6 +35,8 @@
   */
  public Iterable<TimelineObjectHolder<VersionType, ObjectType>> lookup(Interval interval);

+  public Iterable<TimelineObjectHolder<VersionType, ObjectType>> lookup(Interval interval, boolean incompleteOk);


allowIncomplete would be clearer I think. Maybe add some javadoc as well while we're at it.

Can we instead expose the method

public Iterable<TimelineObjectHolder<VersionType, ObjectType>> lookupWithIncomplete(Interval interval);

Boolean flag style method often result in pretty confusing function signatures, because you are not always sure of what the boolean is supposed to mean. Especially if you don't actually have the code. If we can instead move those semantics into a completely different method name, it becomes easier for users to know what is going on.

Also, 👍 on adding javadoc.

xvrl · 2015-12-03T19:59:37Z

👍 looks good to me.

If we change the way coordinator discovery is done, we should also change the way overlord discovery in RemoteTaskActionClient is done, since they are effectively the same. In order to avoid scope creep of this PR, I would be in favor of doing this change in a separate one. I'll file an issue for this.

gianm · 2015-12-03T20:25:04Z

docs/content/configuration/index.md

+
+|Property|Description|Default|
+|--------|-----------|-------|
+|`druid.selectors.coordinator.serviceName`|The druid.service name of the coordinator node. To start the Coordinator with a different name, set it with this property. |druid/coordinator|


Even if this is "druid/coordinator" by default, it should be set to "coordinator" in the example common.runtime.properties, since the example coordinator properties set the druid.service to "coordinator". Many people start off their clusters by copying the example configs.

I wonder if we should also just change the defaults in the code to match those…

I think the configs were created before we had defaults. I'd rather just change the example configs to match the defaults, or leave it out of the example configs entirely, since it's not necessary.

The default overlord service name in tranquility is "overlord" to match the examples, so if we do change the examples we should change that too. All three of those things should match though (druid defaults, tranquility defaults, druid example configs)

#2046 to track

nishantmonu51 · 2015-12-08T21:01:13Z

bouncing for travis.

nishantmonu51 · 2015-12-08T21:57:08Z

Also added graph showing the improvement on zookeeper load with ~500 realtime index tasks running in PR description.

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information

gianm · 2015-12-09T16:29:20Z

I think this patch makes the rolling update process not work, since it suggests doing coordinators after indexing nodes, but we'd need the coordinator endpoint up first.

gianm · 2015-12-09T16:29:29Z

added a "release notes" label

0.8.3 backport for #2015

gianm · 2015-12-21T19:15:20Z

We also need to add to the release notes that you need to make sure your druid.selectors.coordinator.serviceName is set properly. Otherwise realtime indexing will stop working after updating because the handoff notifier won't be linked up.

xvrl · 2016-01-08T21:38:11Z

@gianm oddly this PR is not even mentioned at all in the release notes

gianm · 2016-01-08T21:42:40Z

Probably it was added after the first RC and missed. We should double check that all the stuff after that actually made it into the notes.

gianm · 2016-01-08T21:43:34Z

@xvrl will take a look now

gianm · 2016-01-08T22:35:17Z

@xvrl updated the release notes & added a few other missing things

xvrl · 2016-01-08T22:36:54Z

@gianm thx I was updating the release notes but you beat me to it.

rasahner · 2016-01-10T17:49:35Z

I just realized that the new endpoint isn't documented at http://druid.io/docs/latest/design/coordinator.html. Should it be? Or, are there some types of endpoints that are deliberately not documented at druid.io/docs and this is one of them?

fjy · 2016-01-10T17:50:38Z

@rasahner This should be documented. Can you submit a PR?

rasahner · 2016-01-11T18:42:34Z

see #2238

fjy · 2016-01-27T00:07:49Z

@nishantmonu51 @gianm @pjain1 @xvrl @cheddar

None of the examples were updated to reflect that serviceName now talks to coordinator and realtimes won't be able to do handoff.

fjy · 2016-01-27T00:11:50Z

Also, does anything that requires the coordinator to talk to the overlord still work? Kill tasks, merge tasks, etc?

Edit: yes they will, I thought we'd removed indexing.serviceName for a second there

nishantmonu51 mentioned this pull request Nov 26, 2015

Remove ServerView from RealtimeIndexTasks and use overlord for handoffs to reduce load on ZK #2007

Closed

nishantmonu51 force-pushed the handoff-notifier-coordinator branch 2 times, most recently from 5571aa2 to 209011f Compare November 27, 2015 10:28

xvrl reviewed Nov 30, 2015
View reviewed changes

nishantmonu51 force-pushed the handoff-notifier-coordinator branch 4 times, most recently from 42641f5 to f1281c0 Compare December 1, 2015 17:18

nishantmonu51 mentioned this pull request Dec 1, 2015

Inconsistent handling for interval in DataSourcesResource #2021

Closed

nishantmonu51 force-pushed the handoff-notifier-coordinator branch from f1281c0 to c26d333 Compare December 2, 2015 05:44

gianm reviewed Dec 3, 2015
View reviewed changes

nishantmonu51 closed this Dec 8, 2015

nishantmonu51 reopened this Dec 8, 2015

xvrl added a commit that referenced this pull request Dec 8, 2015

Merge pull request #2015 from metamx/handoff-notifier-coordinator

dcd1573

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information

xvrl merged commit dcd1573 into apache:master Dec 8, 2015

xvrl deleted the handoff-notifier-coordinator branch December 8, 2015 22:06

nishantmonu51 mentioned this pull request Dec 9, 2015

0.8.3 backport for #2015 #2071

Merged

gianm added the Release Notes label Dec 9, 2015

xvrl added a commit that referenced this pull request Dec 9, 2015

Merge pull request #2071 from metamx/0.8.3-backport-2015

1c96168

0.8.3 backport for #2015

pjain1 mentioned this pull request Jan 7, 2016

Completely rework the Druid getting started process #2216

Merged

gianm mentioned this pull request Jan 8, 2016

druid-0.8.3 release notes #2044

Closed

drcrallen mentioned this pull request Jan 11, 2016

doc: add information about new serverview coordinator endpoint #2238

Merged

fjy modified the milestone: 0.9.0 Feb 4, 2016

xvrl modified the milestones: 0.8.3, 0.9.0 Feb 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information #2015

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information #2015

nishantmonu51 commented Nov 26, 2015

xvrl Nov 30, 2015

cheddar Nov 30, 2015

nishantmonu51 Dec 1, 2015

xvrl commented Dec 3, 2015

gianm Dec 3, 2015

xvrl Dec 3, 2015

gianm Dec 4, 2015

nishantmonu51 Dec 5, 2015

nishantmonu51 Dec 5, 2015

nishantmonu51 commented Dec 8, 2015

nishantmonu51 commented Dec 8, 2015

gianm commented Dec 9, 2015

gianm commented Dec 9, 2015

gianm commented Dec 21, 2015

xvrl commented Jan 8, 2016

gianm commented Jan 8, 2016

gianm commented Jan 8, 2016

gianm commented Jan 8, 2016

xvrl commented Jan 8, 2016

rasahner commented Jan 10, 2016

fjy commented Jan 10, 2016

rasahner commented Jan 11, 2016

fjy commented Jan 27, 2016

fjy commented Jan 27, 2016

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information #2015

Remove ServerView from RealtimeIndexTasks and use coordinator http endpoint for handoff information #2015

Conversation

nishantmonu51 commented Nov 26, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xvrl commented Dec 3, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nishantmonu51 commented Dec 8, 2015

nishantmonu51 commented Dec 8, 2015

gianm commented Dec 9, 2015

gianm commented Dec 9, 2015

gianm commented Dec 21, 2015

xvrl commented Jan 8, 2016

gianm commented Jan 8, 2016

gianm commented Jan 8, 2016

gianm commented Jan 8, 2016

xvrl commented Jan 8, 2016

rasahner commented Jan 10, 2016

fjy commented Jan 10, 2016

rasahner commented Jan 11, 2016

fjy commented Jan 27, 2016

fjy commented Jan 27, 2016