Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Warning] Do not execute a blocking operation on a networking thread #58

Closed
lukas-vlcek opened this issue Oct 3, 2017 · 9 comments
Closed

Comments

@lukas-vlcek
Copy link
Collaborator

I am currently implementing integration test for this plugin based on ESIntegTestCase. See docs about integration tests for more details. (PS: I am about to send PR soon)

Within that integration test, when doing HTTP request to ES node pointing to /_prometheus/metrics endpoint I am getting the following assertion error:

10 03, 2017 4:22:20 午後 org.elasticsearch.rest.prometheus.SmokeElasticsearchIntegrationTest testClusterIsEmpty
INFO: Node http address: 127.0.0.1:19301
10 03, 2017 4:22:20 午後 org.elasticsearch.rest.BytesRestResponse convert
WARNING: path: /_prometheus/metrics, params: {}
java.lang.AssertionError: Expected current thread [Thread[elasticsearch[node_s1][http_server_worker][T#1]{New I/O worker #6},5,TGRP-SmokeElasticsearchIntegrationTest]] to not be a transport thread. Reason: [Blocking operation]
	at org.elasticsearch.transport.Transports.assertNotTransportThread(Transports.java:62)
	at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:119)
	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:42)
	at org.compuscene.metrics.prometheus.PrometheusMetricsCollector.<init>(PrometheusMetricsCollector.java:35)
...

This can happen when calls like the following one are made (taken from PrometheusMetricsCollector.java)

NodesStatsResponse nodesStatsResponse = this.client.admin().cluster().nodesStats(nodesStatsRequest).actionGet();

I think this is similar problem that is discussed in elastic/elasticsearch#17865

As explained in the ticket this assertion error is not thrown when ES is run standalone (unless enabled) but in general I think this might be a good indication the internals of the plugin should not use blocking calls but should be reimplemented to async fashion.

@jcantrill
Copy link

@lukas-vlcek I also see this in the integration test but am uncertain how to resolve.

@lukas-vlcek
Copy link
Collaborator Author

@jcantrill You are getting the same exception? Where? I know how Prometheus exporter plugin needs to be improved (I am almost done with my PR here).

@jcantrill
Copy link

try adding the '-da' switch to your args and it will go away. The take away though is there is some async work that needs to be done but I don't know what the correct solution is.

@lukas-vlcek
Copy link
Collaborator Author

Correct solution is to go full async (I am baking the change in my branch, you can see the comparison of branches here), which means the real work can not occur on rest layer but deeper at transport layer. This means messing with all the transport actions, requests and responses. Then the IntegTests are working like a charm.

But there is more to it. I think the other issue with current implementation of Prometheus exporter plugin is that single instance of PrometheusMetricCollector is kept and shared at the RestPrometheusMetricsAction layer for all HTTP requests which I think can not work correctly for concurrent HTTP calls. In other words if there are overlapping rest calls then results of Prometheus metrics may not be reliable (I did not find much information about "thread safiness" of Prometheus Java client that this plugin is using). I think the solution here is to create a new instance of collector per rest call (this is one of the last things I need to solve before my PR is done).

@lukas-vlcek
Copy link
Collaborator Author

And just to make it clear, all this apply to 2.x branch. (Though I think the same will have to be fixed for master as well).

lukas-vlcek added a commit to ViaQ/elasticsearch-prometheus-exporter that referenced this issue Oct 15, 2017
Fixing the following issues:

- [vvanholl#57] thread context handling is buggy
- [vvanholl#58] Do not execute a blocking operation on a networking thread
- [vvanholl#63] Why using node name instead of node ID for label?

Adding new features:

- Possibility to control access to node REST API with SearchGuard
- Introduction of integration tests
- Some other minor updates (like updating Prometheus java client)
@lukas-vlcek
Copy link
Collaborator Author

Just for the record: I found some resources that make it very clear that using blocking calls on transport thread (*) should be avoided. The roots of the assert that does this check can be tracked down to this discussion and it led to explicit PR elastic/elasticsearch#9164.

If I understand the reasons for this check correctly then what is going on here is blocking calls can lead to creating new threads in generic thread pool (which is of cached type, meaning there is no upper bound wrt to how many threads can be crated) and this is unsafe. This practice should be avoided hence the Assert.

(*) It is possible that my previous terminology may not have been valid all the time. It is evident that what I earlier called rest layer is probably the transport thread context. Well, if only internals of ES were better documented :-(

@lukas-vlcek
Copy link
Collaborator Author

@jcantrill I was looking again into the code and if I am not mistaken then a lot of the changes (if not all) from the 2.x fix were brought into master branch and should be part of the plugin since 5.6.5.0, see this commit 775b226

This includes full async operational mode and ability to protect and control direct access at the REST layer. This is great!

@lukas-vlcek
Copy link
Collaborator Author

@jcantrill the only thing that I think is really missing here is a set of unit and integration tests. We should consider help building them. If using unit test framework by Elastic will not work well here (for example because it may be discontinued, see #66) then we can try to implement unit tests in similar way we did it for some of our other plugins.

@lukas-vlcek
Copy link
Collaborator Author

This issue has been fixed quite some time ago. There no longer any blocking calls on transport thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants