New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transport: shortcut local execution #10350
Conversation
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.
final TransportResponseHandler handler = adapter.onResponseReceived(requestId); | ||
// ignore if its null, the adapter logs it | ||
if (handler != null) { | ||
threadPool.executor(handler.executor()).execute(new Runnable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check if its the same executor as the one its executing on (in sendLocalRequest
) or if its same
, there is no need to call use executor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was doubting about this one. I ended op going for the simplest code, feeling it's not a big deal as most of the time this requests will go to another node. Please educate me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its worth it? most of the time the response handle is SAME, so its not a big problem, but it allows to not overflow the same thread pool if it happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. will do.
@bleskes added a few comments |
@kimchy thx. pushed a minor commit plus responded. |
@bleskes somehow my comment about disconnect from node got lost, I am not sure we should throw an unsupported exception if its from the local node, it breaks the contract of not doing anything for connect if its a local node (and what happens with unicast disco where the existing host is provided as well?) |
@kimchy your comment is folded up here. Here is my response: re this being a noop - yeah, that's how I originally implemented it as well. Then I thought that strictly speaking we don't honor the disconnect (you can still ask nodeConnected(localNode) and get true), so we should throw an exception. I think your point valid regarding connectToNodeLight , but in that case, at least currently, we never use a known node id. Doubting which one is the lesser evil. |
@kimchy I pushed another commit. I was hesitant to add the optimization where we check that the response is returned on the same executor as the request. The API doesn't guarantee us that the channel was not handed off to another thread. I would prefer not to do that one. Feels dangerous. |
@kimchy pushed the noop disconnect thing. |
// ignore if its null, the adapter logs it | ||
if (handler != null) { | ||
final String executor = handler.executor(); | ||
if (ThreadPool.Names.SAME.equals(executor)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass to the channel the executor the TransportRequestHandler
is executing on, and on top of SAME, if its the same as the executor the request is executing on, we don't need to fork it again?
@bleskes left another small comment, other than that LGTM. There are several other places where we check on local node that we can remove and simplify the code, |
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update. Closes elastic#10350
@kimchy thx. I committed. Agreed on cleaning up more places as a second iteration. |
This reverts commit d8bb760. This causes BWC issues for some plugins
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.
This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.
Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.