Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SelfDiscoveryResource; rename org.apache.druid.discovery.NodeType to NodeRole #6702

Merged
merged 25 commits into from Dec 8, 2019

Conversation

leventov
Copy link
Member

@leventov leventov commented Dec 3, 2018

Fixes #6372.

Added /status/selfDiscoveredStatus endpoint. It returns a JSON map of the form {"selfDiscovered": true/false}, indicating whether the node has recieved a confirmation
from the central node discovery mechanism (currently ZooKeeper) of the Druid cluster that the node has been added to the
cluster. It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns {"selfDiscovered": true} from this endpoint.

Also added /status/selfDiscovered endpoint which does the same as /status/selfDiscoveredStatus but responses in the form of 200/503 return code depending on whether the node has discovered itself already or not.

Design Review tag because endpoint is added.
Release Notes tag to make Druid cluster operators notice this change.

Also, in this PR, I renamed org.apache.druid.discovery.NodeType to NodeRole.

@kaijianding
Copy link
Contributor

kaijianding commented Dec 4, 2018

@leventov Currently only broker/coordinator watches all compute nodes, according to #6683 , if a compute node self watches all other compute nodes to find itself (this is how current CuratorDruidNodeDiscoveryProvider works, it watches all compute nodes), the number of watches in zk can grow to huge number.

Is it better to add similar listener mechanism when DruidNodeAnnouncer.announce(DiscoveryDruidNode) is called?

@leventov
Copy link
Member Author

@kaijianding originally I created this endpoint to be used on brokers only, but then generalized to other node types, because why not. What do you think if I change the code so that the new endpoint is added only on brokers and coordinators for now?

Unfortunately #6683 seems to be far from merging, and the next Druid release process should be started between 10th and 20th of January. I think it would be useful to add the new endpoint at least on brokers and coordinators.

Is it better to add similar listener mechanism when DruidNodeAnnouncer.announce(DiscoveryDruidNode) is called?

I didn't understand what could be added when that method is called. Could you please elaborate?

Copy link
Contributor

@egor-ryashin egor-ryashin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns {"selfDiscovered": true} from this endpoint.
Please, describe in comments why it's not safe, probably with an example scenario.


Returns a JSON map of the form `{"selfDiscovered": true/false}`, indicating whether the node has recieved a confirmation
from the central node discovery mechanism (currently ZooKeeper) of the Druid cluster that the node has been added to the
cluster. It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns {"selfDiscovered": true} from this endpoint.

Please, describe why it's not safe, probably with an example scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, extended docs to describe that.

@kaijianding
Copy link
Contributor

@leventov even for broker, this PR still add too many watches: the number of brokers * the number all of nodes. Still think should do some modification base on #6683 . Sorry for doesn't finish #6683 yet, I will finish it ASAP.
Is it better to add similar listener mechanism when DruidNodeAnnouncer.announce(DiscoveryDruidNode) is called?

I mean like current ServerView callback mechanism, when NodeAnnouncer done register, an initial-done callback should be called.

@stale
Copy link

stale bot commented Feb 28, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Feb 28, 2019
@stale
Copy link

stale bot commented Mar 7, 2019

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Mar 7, 2019
@leventov leventov reopened this Mar 7, 2019
@stale
Copy link

stale bot commented Mar 7, 2019

This pull request is no longer marked as stale.

@stale stale bot removed the stale label Mar 7, 2019
@leventov
Copy link
Member Author

leventov commented Mar 7, 2019

Still blocked by #6683.

Copy link
Contributor

@egor-ryashin egor-ryashin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add listener just for <internal-discover>/<nodeType>/<node-host-port> instead of <internal-discover>/<nodeType> ?

@leventov
Copy link
Member Author

leventov commented Apr 15, 2019

@egor-ryashin @kaijianding made listening fine-grained.

@leventov leventov changed the title Add SelfDiscoveryResource Add SelfDiscoveryResource; rename org.apache.druid.discovery.NodeType to NodeRole Apr 15, 2019
leventov added a commit that referenced this pull request Apr 19, 2019
…ile (#7499)

In the IDE interface, "Non-TeamCity Warning" looks exactly like an ordinary warning, but TeamCity should be unaware of it.

This may help to workaround these issues: https://youtrack.jetbrains.com/issue/IDEA-209789 and https://youtrack.jetbrains.com/issue/IDEA-209791, that block the upgrade of IntelliJ engine used in the TeamCity build. It seems like there may be a bug that leads to false positive error and the build fail in this PR: #6702.

Removed the comment regarding "StaticPseudoFunctionalStyleMethod" inspection because the IntelliJ keeps removing it, see this issue: https://youtrack.jetbrains.com/issue/IDEA-211087
@leventov
Copy link
Member Author

The inspections build is failing with an "unresolved Javadoc reference" error which is now fixed in IntelliJ 2018, but we cannot upgrade to it because of persisting problems (see #7589). So it shouldn't be a blocker for this PR.

@kaijianding do you have any more comments?

@jon-wei could you please review the REST API design?

* DI configuration phase.
*/
@Singleton
@Path("/selfDiscovered")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a kind of status/health check, could consider putting the endpoint under status/selfDiscovered to go with status and status/health

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to status/selfDiscoveredStatus

@Produces(MediaType.APPLICATION_JSON)
public Response getSelfDiscovered()
{
return Response.ok(Collections.singletonMap("selfDiscovered", selfDiscovered.getAsBoolean())).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to return 200 OK/503 SERVICE_UNAVAILABLE instead of a JSON response (like HistoricalResource.getReadiness()? Some monitoring checks such as AWS load balancer health checks are not able to look at the response body.

Maybe reporting status via response code could be an option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added status/selfDiscovered in addition to status/selfDiscoveredStatus which responses in the form of returning 200/503 codes.

Copy link
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REST API changes LGTM


if [[ "$OSTYPE" == "darwin"* ]]; then
# On Mac OS X resolveip may not be available
export HOST_IP=$(dig +short $HOSTNAME | awk '{ print ; exit }')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use dig +short $HOSTNAME everywhere?

@leventov
Copy link
Member Author

leventov commented Dec 8, 2019

Merging as there was an approval from @egor-ryashin and API design approval from @jon-wei.

@leventov leventov merged commit 1c62987 into apache:master Dec 8, 2019
@leventov leventov deleted the selfDiscoveryResource branch December 8, 2019 15:48
@leventov leventov added this to the 0.17.0 milestone Dec 8, 2019
@jon-wei jon-wei mentioned this pull request Dec 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Brokers should become "ready" only after they prove themselves registered in ZK
5 participants