-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection to AWS ElasticSearch will be lost after a certain period of time #15
Comments
Hi @darwin67 , Thank you for report. Maybe it had expired of the credentials. I'm sorry, but it takes a long time to fix... |
There's a similar issue with the plain elasticsearch plugin + aws-es-proxy... :( |
I'm seeing this issue also. My credentials are OK. Logging works for a few days then dies.
|
+1 |
It seems to be the issue that AWS Elasticsearch is not 100% compatible with native ES when it comes to reloading connections to ES. Reloading connections happens every 10,000 requests by default. This can be useful when you've multiple hosts configure, but in the case of AWS ES there is only a single (HA) endpoint. For the compatibility issue, see: https://forums.aws.amazon.com/thread.jspa?threadID=222600 In our case preventing fluentd from reloading the host configuration as a workaround (add output plugin part of fluentd-config) seems to work:
Note: Prior to applying this workaround the connection was lost every ~3 hours (~10,800 seconds), which makes sense since we are flushing our data every 1s to ES. I think this should be fixed in the Ruby Elasticsearch Client since will not only affect Fluentd, but potentially every Ruby/AWS ES implementation |
@steynovich Thank you for the research. |
+1 Experiencing the same issue @steynovich We are experiencing the same issue, can you please give some more info where to place the mentioned work around.. |
@mpas Those options are part of the 'parent' plugin that this plugin uses. https://github.com/uken/fluent-plugin-elasticsearch#reload_on_failure The https://github.com/uken/fluent-plugin-elasticsearch#usage section shows where to put it (in the match block). |
Hi, I'm afraid specifying "reload_connections false" won't work due to type mismatching.
The parent plugin handles "reload_connections" options as a string type. However, Elasticsearch Ruby client handles "reload_connections" as a FixNum or Boolean. And the parent plugin set "reload_connections" true as default. |
Would there be a fix to this issue then? I keep running into the same
problem.
|
I don't think there's a fix yet. The current workaround is to restart fluentd regularly (we use monit). :( |
After I looked into this issue, I found it won't happen if you don't use Dynamic configuration: Also, I think we must modify parent plugin code. I forked the plugin and made some modification: |
After running 24hours, there're no connection errors tough I usually get errors after 16hours without this fix. I'll make PR to the parent plugin repository. |
The upstream PR is merged and now released. Are there any chance to grab the new version into this plugin? |
@tanaka-takayoshi If the repo owner isn't available, do you mind to fork it and create another plugin? This issue is extremely critical, and it's waiting for more then 26 days for the owner response |
@vendrov It's good that repo owner releases the new version to fix this issue. I think any code change is unnecessary because it refers the latest version plugin when it builds. However, repo owner will not response, I will be able to do it. |
@vendrov I forked and uploaded the gem, could you test it? I have poor knowledge of ruby gems versioning. @atomita I'll turn down my hotfix gems, once you release a new version. There's no need to update any file. Just build a new gem again is required. |
@tanaka-takayoshi with your hotfix plugin is it still necessary to specify |
@malford Yes, may have to specify |
haven't taken a look at this for a while. |
Hi,
I'm seeing something like this in the logs recently.
Basically, what this is according to my understanding is that this plugin will lose connection to the AWS ElasticSearch service after 1, 2 days from the start/restart of the
td-agent
.There is a
Websocket connection duration
limit listed here but not sure if it's related.However, I'm guessing that the connection is closed from the AWS side, but I couldn't find any documents mentioning it. Any ideas why this is happening?
Also, is there a solution to this problem already (besides manually restarting
td-agent
) since I'm assuming that this plugin is being used elsewhere too.The text was updated successfully, but these errors were encountered: