New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SelfDiscoveryResource; rename org.apache.druid.discovery.NodeType to NodeRole #6702
Conversation
@leventov Currently only broker/coordinator watches all compute nodes, according to #6683 , if a compute node self watches all other compute nodes to find itself (this is how current CuratorDruidNodeDiscoveryProvider works, it watches all compute nodes), the number of watches in zk can grow to huge number. Is it better to add similar listener mechanism when DruidNodeAnnouncer.announce(DiscoveryDruidNode) is called? |
@kaijianding originally I created this endpoint to be used on brokers only, but then generalized to other node types, because why not. What do you think if I change the code so that the new endpoint is added only on brokers and coordinators for now? Unfortunately #6683 seems to be far from merging, and the next Druid release process should be started between 10th and 20th of January. I think it would be useful to add the new endpoint at least on brokers and coordinators.
I didn't understand what could be added when that method is called. Could you please elaborate? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns {"selfDiscovered": true} from this endpoint.
Please, describe in comments why it's not safe, probably with an example scenario.
|
||
Returns a JSON map of the form `{"selfDiscovered": true/false}`, indicating whether the node has recieved a confirmation | ||
from the central node discovery mechanism (currently ZooKeeper) of the Druid cluster that the node has been added to the | ||
cluster. It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns {"selfDiscovered": true} from this endpoint.
Please, describe why it's not safe, probably with an example scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, extended docs to describe that.
@leventov even for broker, this PR still add too many watches: the number of brokers * the number all of nodes. Still think should do some modification base on #6683 . Sorry for doesn't finish #6683 yet, I will finish it ASAP. I mean like current ServerView callback mechanism, when NodeAnnouncer done register, an initial-done callback should be called. |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
This pull request is no longer marked as stale. |
Still blocked by #6683. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add listener just for <internal-discover>/<nodeType>/<node-host-port>
instead of <internal-discover>/<nodeType>
?
…torDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself).
@egor-ryashin @kaijianding made listening fine-grained. |
…ile (#7499) In the IDE interface, "Non-TeamCity Warning" looks exactly like an ordinary warning, but TeamCity should be unaware of it. This may help to workaround these issues: https://youtrack.jetbrains.com/issue/IDEA-209789 and https://youtrack.jetbrains.com/issue/IDEA-209791, that block the upgrade of IntelliJ engine used in the TeamCity build. It seems like there may be a bug that leads to false positive error and the build fail in this PR: #6702. Removed the comment regarding "StaticPseudoFunctionalStyleMethod" inspection because the IntelliJ keeps removing it, see this issue: https://youtrack.jetbrains.com/issue/IDEA-211087
The inspections build is failing with an "unresolved Javadoc reference" error which is now fixed in IntelliJ 2018, but we cannot upgrade to it because of persisting problems (see #7589). So it shouldn't be a blocker for this PR. @kaijianding do you have any more comments? @jon-wei could you please review the REST API design? |
* DI configuration phase. | ||
*/ | ||
@Singleton | ||
@Path("/selfDiscovered") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a kind of status/health check, could consider putting the endpoint under status/selfDiscovered
to go with status
and status/health
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to status/selfDiscoveredStatus
@Produces(MediaType.APPLICATION_JSON) | ||
public Response getSelfDiscovered() | ||
{ | ||
return Response.ok(Collections.singletonMap("selfDiscovered", selfDiscovered.getAsBoolean())).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to return 200 OK/503 SERVICE_UNAVAILABLE instead of a JSON response (like HistoricalResource.getReadiness()
? Some monitoring checks such as AWS load balancer health checks are not able to look at the response body.
Maybe reporting status via response code could be an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added status/selfDiscovered
in addition to status/selfDiscoveredStatus
which responses in the form of returning 200/503 codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REST API changes LGTM
|
||
if [[ "$OSTYPE" == "darwin"* ]]; then | ||
# On Mac OS X resolveip may not be available | ||
export HOST_IP=$(dig +short $HOSTNAME | awk '{ print ; exit }') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use dig +short $HOSTNAME
everywhere?
Merging as there was an approval from @egor-ryashin and API design approval from @jon-wei. |
Fixes #6372.
Added
/status/selfDiscoveredStatus
endpoint. It returns a JSON map of the form{"selfDiscovered": true/false}
, indicating whether the node has recieved a confirmationfrom the central node discovery mechanism (currently ZooKeeper) of the Druid cluster that the node has been added to the
cluster. It is recommended to not consider a Druid node "healthy" or "ready" in automated deployment/container
management systems until it returns
{"selfDiscovered": true}
from this endpoint.Also added
/status/selfDiscovered
endpoint which does the same as/status/selfDiscoveredStatus
but responses in the form of 200/503 return code depending on whether the node has discovered itself already or not.Design Review
tag because endpoint is added.Release Notes
tag to make Druid cluster operators notice this change.Also, in this PR, I renamed
org.apache.druid.discovery.NodeType
toNodeRole
.