New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repository-hdfs plugin not always closing tcp connexions #220
Comments
What version of the repository-hdfs plugin are you using? Any information on where the TCP connections point to? Also what version of hadoop are you using? Thanks |
I'm using :
The TCP connections are pointing to the nodes of my hadoop cluster. Here is an extract of "lsof" output for the elasticsearch process :
I noticed that even few hours after the last snapshot, the connections are still not closed. Thanks for your help |
I've done some quick searches and it looks like this is likely caused by Hadoop itself. For example, see this thread and this issue. |
@jubagarie Can you try a quick fix? Hope it forces hadoop to close the connections. After you do the backup, can you try unregistering the repository and see whether it has any effect on the number of connections opened? Let me know how it goes - thanks! |
Unfortunately, unregistering the repository and registering it again doesn't affect the number of connections. I can also add that my server version of hadoop-hdfs is a CDH 4.1.2 but my Elasticsearch servers are running a CDH 4.6.0. Both of them seem to include the patch created for the issue you found. Thanks |
Hmm, I'm afraid I'm not sure what else can be done on this front. Can you confirm in the ES logs that the file-system is created again when you register it again after unregistering it? The only thing I can think of is restart the node which is clearly not ideal... |
Here are the ES logs when I unregister/register the repository so it seems fine :
I will dig on the Hadoop side. Thanks for your time ! |
As a work-around, you could potentially add some firewall rules to kill connections in |
Closing this with won't fix since there's no much we can do unfortunately... |
As it seems this bug in Hadoop keeps occurring some pointers in the docs would help on how to try to fix it. |
Hmmm. As a point of reference, Elasticsearch 1.1.1 + CDH 5.2.1 (Hadoop 2.5.x) here, and I don't see any Aside: is there a recommended way to setup the light plugin with Hadoop jars (like CDH) for startup? Is ES_CLASSPATH the way to go? Might be worth a mention in the documentation, especially since on CentOS/RHEL provided files, its not obvious since the variable not in the sysconfig file or init script. Found it in the Elasticsearch shell script itself. |
Oh wait, totally lying, the connections were on the master nodes:
So yes, still an issue! |
@bflad Sorry to heart that. |
@bflad @jubagarie you might want to try 2.1.0.rc1 since it improves the creation and closing of the |
As there hasn't been any update, I'm closing the issue. |
I'm using the "repository-hdfs" plugin to store snapshots on HDFS with Elasticsearch 1.1.1. It seems that Elasticsearch doesn't properly close the TCP connections after a snapshot is created.
The result for me was a "too many open files" errors in the Elasticsearch logs. Using the "lsof" command I found a pile of more than 50k TCP connections in the CLOSE_WAIT state and as many file descriptors.
The text was updated successfully, but these errors were encountered: