Transport: shortcut local execution #10350

bleskes · 2015-03-31T19:40:59Z

In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool.

This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local.

Note: this was discovered by #10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update.

kimchy · 2015-04-02T09:05:36Z

src/main/java/org/elasticsearch/transport/TransportService.java

+            final TransportResponseHandler handler = adapter.onResponseReceived(requestId);
+            // ignore if its null, the adapter logs it
+            if (handler != null) {
+                threadPool.executor(handler.executor()).execute(new Runnable() {


can we check if its the same executor as the one its executing on (in sendLocalRequest) or if its same, there is no need to call use executor?

I was doubting about this one. I ended op going for the simplest code, feeling it's not a big deal as most of the time this requests will go to another node. Please educate me if I'm wrong.

I think its worth it? most of the time the response handle is SAME, so its not a big problem, but it allows to not overflow the same thread pool if it happens

sure. will do.

kimchy · 2015-04-02T09:11:44Z

@bleskes added a few comments

bleskes · 2015-04-02T14:24:27Z

@kimchy thx. pushed a minor commit plus responded.

kimchy · 2015-04-02T14:27:45Z

@bleskes somehow my comment about disconnect from node got lost, I am not sure we should throw an unsupported exception if its from the local node, it breaks the contract of not doing anything for connect if its a local node (and what happens with unicast disco where the existing host is provided as well?)

bleskes · 2015-04-02T15:56:49Z

@kimchy your comment is folded up here. Here is my response:

re this being a noop - yeah, that's how I originally implemented it as well. Then I thought that strictly speaking we don't honor the disconnect (you can still ask nodeConnected(localNode) and get true), so we should throw an exception. I think your point valid regarding connectToNodeLight , but in that case, at least currently, we never use a known node id. Doubting which one is the lesser evil.

bleskes · 2015-04-02T16:32:49Z

@kimchy I pushed another commit. I was hesitant to add the optimization where we check that the response is returned on the same executor as the request. The API doesn't guarantee us that the channel was not handed off to another thread. I would prefer not to do that one. Feels dangerous.

bleskes · 2015-04-02T16:58:36Z

@kimchy pushed the noop disconnect thing.

kimchy · 2015-04-05T10:36:33Z

src/main/java/org/elasticsearch/transport/TransportService.java

+            // ignore if its null, the adapter logs it
+            if (handler != null) {
+                final String executor = handler.executor();
+                if (ThreadPool.Names.SAME.equals(executor)) {


can we pass to the channel the executor the TransportRequestHandler is executing on, and on top of SAME, if its the same as the executor the request is executing on, we don't need to fork it again?

kimchy · 2015-04-05T10:40:40Z

@bleskes left another small comment, other than that LGTM. There are several other places where we check on local node that we can remove and simplify the code, SearchServiceTransportAction is a great example :). We can push this and keep it small, and then go and cleanup the places to keep the changes manageable?

In several places in the code we need to notify a node it needs to do something (typically the master). When that node is the local node, we have an optimization in serveral places that runs the execution code immediately instead of sending the request through the wire to itself. This is a shame as we need to implement the same pattern again and again. On top of that we may forget (see note bellow) to do so and we might have to write some craft if the code need to run under another thread pool. This commit folds the optimization in the TrasnportService, shortcutting wire serliazition if the target node is local. Note: this was discovered by elastic#10247 which tries to import a dangling index quickly after the cluster forms. When sending an import dangling request to master, the code didn't take into account that fact that the local node may master. If this happens quickly enough, one would get a NodeNotConnected exception causing the dangling indices not to be imported. This will succeed after 10s where InternalClusterService.ReconnectToNodes runs and actively connects the local node to itself (which is not needed), potentially after another cluster state update. Closes elastic#10350

bleskes · 2015-04-08T07:47:00Z

@kimchy thx. I committed. Agreed on cleaning up more places as a second iteration.

This reverts commit d8bb760. This causes BWC issues for some plugins

bleskes added >enhancement v2.0.0-beta1 review v1.6.0 labels Mar 31, 2015

kimchy reviewed Apr 2, 2015
View reviewed changes

s1monw assigned kimchy Apr 2, 2015

feedback

4ae82ca

run locally on thread if executor is SAME

aa43690

make disconnect a noop for the local node.

de66767

kimchy reviewed Apr 5, 2015
View reviewed changes

kimchy assigned bleskes and unassigned kimchy Apr 5, 2015

bleskes closed this in 80e86e5 Apr 8, 2015

bleskes deleted the transport_local_shortcut branch April 8, 2015 07:46

jaymode mentioned this pull request Apr 8, 2015

add profile name to TransportChannel #10483

Closed

bleskes added a commit that referenced this pull request Apr 8, 2015

Revert "Transport: shortcut local execution, #10350"

b53afb0

This reverts commit d8bb760. This causes BWC issues for some plugins

bleskes removed the v1.6.0 label Apr 8, 2015

bleskes mentioned this pull request Apr 8, 2015

Dangling indices may not be imported if found on elected master node #10487

Closed

clintongormley removed the review label Aug 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transport: shortcut local execution #10350

Transport: shortcut local execution #10350

bleskes commented Mar 31, 2015

kimchy Apr 2, 2015

bleskes Apr 2, 2015

kimchy Apr 2, 2015

bleskes Apr 2, 2015

kimchy commented Apr 2, 2015

bleskes commented Apr 2, 2015

kimchy commented Apr 2, 2015

bleskes commented Apr 2, 2015

bleskes commented Apr 2, 2015

bleskes commented Apr 2, 2015

kimchy Apr 5, 2015

kimchy commented Apr 5, 2015

bleskes commented Apr 8, 2015

Transport: shortcut local execution #10350

Transport: shortcut local execution #10350

Conversation

bleskes commented Mar 31, 2015

kimchy Apr 2, 2015

Choose a reason for hiding this comment

bleskes Apr 2, 2015

Choose a reason for hiding this comment

kimchy Apr 2, 2015

Choose a reason for hiding this comment

bleskes Apr 2, 2015

Choose a reason for hiding this comment

kimchy commented Apr 2, 2015

bleskes commented Apr 2, 2015

kimchy commented Apr 2, 2015

bleskes commented Apr 2, 2015

bleskes commented Apr 2, 2015

bleskes commented Apr 2, 2015

kimchy Apr 5, 2015

Choose a reason for hiding this comment

kimchy commented Apr 5, 2015

bleskes commented Apr 8, 2015