Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transport: shortcut local execution #10350

Closed
wants to merge 4 commits into from

Conversation

bleskes
Copy link
Contributor

@bleskes bleskes commented Mar 31, 2015

In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.
final TransportResponseHandler handler = adapter.onResponseReceived(requestId);
// ignore if its null, the adapter logs it
if (handler != null) {
threadPool.executor(handler.executor()).execute(new Runnable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check if its the same executor as the one its executing on (in sendLocalRequest) or if its same, there is no need to call use executor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doubting about this one. I ended op going for the simplest code, feeling it's not a big deal as most of the time this requests will go to another node. Please educate me if I'm wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth it? most of the time the response handle is SAME, so its not a big problem, but it allows to not overflow the same thread pool if it happens

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. will do.

@kimchy
Copy link
Member

kimchy commented Apr 2, 2015

@bleskes added a few comments

@bleskes
Copy link
Contributor Author

bleskes commented Apr 2, 2015

@kimchy thx. pushed a minor commit plus responded.

@kimchy
Copy link
Member

kimchy commented Apr 2, 2015

@bleskes somehow my comment about disconnect from node got lost, I am not sure we should throw an unsupported exception if its from the local node, it breaks the contract of not doing anything for connect if its a local node (and what happens with unicast disco where the existing host is provided as well?)

@bleskes
Copy link
Contributor Author

bleskes commented Apr 2, 2015

@kimchy your comment is folded up here. Here is my response:

re this being a noop - yeah, that's how I originally implemented it as well. Then I thought that strictly speaking we don't honor the disconnect (you can still ask nodeConnected(localNode) and get true), so we should throw an exception. I think your point valid regarding connectToNodeLight , but in that case, at least currently, we never use a known node id. Doubting which one is the lesser evil.

@bleskes
Copy link
Contributor Author

bleskes commented Apr 2, 2015

@kimchy I pushed another commit. I was hesitant to add the optimization where we check that the response is returned on the same executor as the request. The API doesn't guarantee us that the channel was not handed off to another thread. I would prefer not to do that one. Feels dangerous.

@bleskes
Copy link
Contributor Author

bleskes commented Apr 2, 2015

@kimchy pushed the noop disconnect thing.

// ignore if its null, the adapter logs it
if (handler != null) {
final String executor = handler.executor();
if (ThreadPool.Names.SAME.equals(executor)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass to the channel the executor the TransportRequestHandler is executing on, and on top of SAME, if its the same as the executor the request is executing on, we don't need to fork it again?

@kimchy
Copy link
Member

kimchy commented Apr 5, 2015

@bleskes left another small comment, other than that LGTM. There are several other places where we check on local node that we can remove and simplify the code, SearchServiceTransportAction is a great example :). We can push this and keep it small, and then go and cleanup the places to keep the changes manageable?

@kimchy kimchy assigned bleskes and unassigned kimchy Apr 5, 2015
@bleskes bleskes closed this in 80e86e5 Apr 8, 2015
bleskes added a commit to bleskes/elasticsearch that referenced this pull request Apr 8, 2015
In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

Closes elastic#10350
@bleskes bleskes deleted the transport_local_shortcut branch April 8, 2015 07:46
@bleskes
Copy link
Contributor Author

bleskes commented Apr 8, 2015

@kimchy thx. I committed. Agreed on cleaning up more places as a second iteration.

bleskes added a commit that referenced this pull request Apr 8, 2015
This reverts commit d8bb760.

This causes BWC issues for some plugins
@bleskes bleskes removed the v1.6.0 label Apr 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants