Connection to AWS ElasticSearch will be lost after a certain period of time #15

darwin67 · 2016-06-30T21:17:55Z

Hi,
I'm seeing something like this in the logs recently.

2016-06-30 19:30:53 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-06-30 19:30:54 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f9e9deeaf04"
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/base.rb:249:in `perform_request'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-1.0.18/lib/elasticsearch/transport/client.rb:128:in `perform_request'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-1.0.18/lib/elasticsearch/api/actions/bulk.rb:90:in `bulk'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:278:in `send'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:271:in `write'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:345:in `write_chunk'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/buffer.rb:324:in `pop'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:329:in `try_flush'
  2016-06-30 19:30:53 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.20/lib/fluent/output.rb:140:in `run'
2016-06-30 19:30:54 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-06-30 19:30:56 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f9e9deeaf04"
  2016-06-30 19:30:54 +0000 [warn]: suppressed same stacktrace

Basically, what this is according to my understanding is that this plugin will lose connection to the AWS ElasticSearch service after 1, 2 days from the start/restart of the td-agent.

There is a Websocket connection duration limit listed here but not sure if it's related.

However, I'm guessing that the connection is closed from the AWS side, but I couldn't find any documents mentioning it. Any ideas why this is happening?

Also, is there a solution to this problem already (besides manually restarting td-agent) since I'm assuming that this plugin is being used elsewhere too.

The text was updated successfully, but these errors were encountered:

atomita · 2016-07-01T02:13:17Z

Hi @darwin67 ,

Thank you for report.

Maybe it had expired of the credentials.

I'm sorry, but it takes a long time to fix...

therc · 2016-07-28T15:56:58Z

There's a similar issue with the plain elasticsearch plugin + aws-es-proxy... :(

…he expired. #15

aerickson · 2016-08-15T20:50:57Z

I'm seeing this issue also. My credentials are OK. Logging works for a few days then dies.

2016-08-13 21:20:01 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:20:02 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-2.0.0/lib/elasticsearch/transport/transport/base.rb:249:in `perform_request'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-2.0.0/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-transport-2.0.0/lib/elasticsearch/transport/client.rb:128:in `perform_request'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/elasticsearch-api-2.0.0/lib/elasticsearch/api/actions/bulk.rb:93:in `bulk'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:278:in `send'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluent-plugin-elasticsearch-1.5.0/lib/fluent/plugin/out_elasticsearch.rb:271:in `write'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.26/lib/fluent/buffer.rb:354:in `write_chunk'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.26/lib/fluent/buffer.rb:333:in `pop'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.26/lib/fluent/output.rb:338:in `try_flush'
  2016-08-13 21:20:01 +0000 [warn]: /opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/fluentd-0.12.26/lib/fluent/output.rb:149:in `run'
2016-08-13 21:20:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:20:04 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:20:02 +0000 [warn]: suppressed same stacktrace
2016-08-13 21:20:04 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:20:08 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:20:04 +0000 [warn]: suppressed same stacktrace
2016-08-13 21:20:08 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:20:16 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:20:08 +0000 [warn]: suppressed same stacktrace
2016-08-13 21:20:16 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:20:33 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:20:16 +0000 [warn]: suppressed same stacktrace
2016-08-13 21:20:33 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:21:02 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:20:33 +0000 [warn]: suppressed same stacktrace
2016-08-13 21:21:02 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:22:14 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:21:02 +0000 [warn]: suppressed same stacktrace
2016-08-13 21:22:14 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2016-08-13 21:24:25 +0000 error_class="Elasticsearch::Transport::Transport::Error" error="Cannot get new connection from pool." plugin_id="object:3f859d6ae4a8"
  2016-08-13 21:22:14 +0000 [warn]: suppressed same stacktrace

vendrov · 2016-09-19T06:58:53Z

+1
The same problem here

steynovich · 2016-10-19T12:06:38Z

It seems to be the issue that AWS Elasticsearch is not 100% compatible with native ES when it comes to reloading connections to ES. Reloading connections happens every 10,000 requests by default. This can be useful when you've multiple hosts configure, but in the case of AWS ES there is only a single (HA) endpoint.

For the compatibility issue, see: https://forums.aws.amazon.com/thread.jspa?threadID=222600

In our case preventing fluentd from reloading the host configuration as a workaround (add output plugin part of fluentd-config) seems to work:

# Prevent reloading connections to AWS ES
reload_on_failure false
reload_connections false

Note: reload_connections is false by default.

Prior to applying this workaround the connection was lost every ~3 hours (~10,800 seconds), which makes sense since we are flushing our data every 1s to ES.

I think this should be fixed in the Ruby Elasticsearch Client since will not only affect Fluentd, but potentially every Ruby/AWS ES implementation

darwin67 · 2016-10-19T16:34:01Z

@steynovich Thank you for the research.
My current work around is to have a cron job to restart td-agent everyday.
That works fine so far, but I'll like to check on what you suggested too once I have the time.

mpas · 2016-10-26T17:58:42Z

+1 Experiencing the same issue

@steynovich We are experiencing the same issue, can you please give some more info where to place the mentioned work around..

aerickson · 2016-10-26T18:04:03Z

@mpas Those options are part of the 'parent' plugin that this plugin uses.

https://github.com/uken/fluent-plugin-elasticsearch#reload_on_failure

The https://github.com/uken/fluent-plugin-elasticsearch#usage section shows where to put it (in the match block).

tanaka-takayoshi · 2016-11-30T08:32:08Z

Hi, I'm afraid specifying "reload_connections false" won't work due to type mismatching.

reload_connections false

The parent plugin handles "reload_connections" options as a string type.
https://github.com/uken/fluent-plugin-elasticsearch/blob/v1.9.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb#L15
https://github.com/uken/fluent-plugin-elasticsearch/blob/v1.9.0/lib/fluent/plugin/out_elasticsearch_dynamic.rb#L222

However, Elasticsearch Ruby client handles "reload_connections" as a FixNum or Boolean.
https://github.com/elastic/elasticsearch-ruby/blob/v2.0.0/elasticsearch-transport/lib/elasticsearch/transport/transport/base.rb#L51-L52
https://github.com/elastic/elasticsearch-ruby/blob/v2.0.0/elasticsearch-transport/lib/elasticsearch/transport/transport/base.rb#L70
elastic/elasticsearch-ruby#164

And the parent plugin set "reload_connections" true as default.

mpas · 2016-11-30T12:01:14Z

Would there be a fix to this issue then? I keep running into the same problem.

aerickson · 2016-11-30T21:23:49Z

I don't think there's a fix yet. The current workaround is to restart fluentd regularly (we use monit). :(

tanaka-takayoshi · 2016-12-01T05:17:03Z

After I looked into this issue, I found it won't happen if you don't use Dynamic configuration:
https://github.com/uken/fluent-plugin-elasticsearch/blob/master/lib/fluent/plugin/out_elasticsearch.rb#L41-L42

Also, I think we must modify parent plugin code. I forked the plugin and made some modification:
tanaka-takayoshi/fluent-plugin-elasticsearch@646fe26
I'm now testing this code and will feedback the result.

tanaka-takayoshi · 2016-12-02T07:03:08Z

After running 24hours, there're no connection errors tough I usually get errors after 16hours without this fix. I'll make PR to the parent plugin repository.

tanaka-takayoshi · 2016-12-14T02:14:31Z

The upstream PR is merged and now released. Are there any chance to grab the new version into this plugin?

vendrov · 2017-01-09T09:51:39Z

@tanaka-takayoshi If the repo owner isn't available, do you mind to fork it and create another plugin? This issue is extremely critical, and it's waiting for more then 26 days for the owner response

tanaka-takayoshi · 2017-01-09T15:52:26Z

@vendrov It's good that repo owner releases the new version to fix this issue. I think any code change is unnecessary because it refers the latest version plugin when it builds.
https://github.com/atomita/fluent-plugin-aws-elasticsearch-service/blob/master/fluent-plugin-aws-elasticsearch-service.gemspec#L27

However, repo owner will not response, I will be able to do it.

tanaka-takayoshi · 2017-01-18T07:46:59Z

@vendrov I forked and uploaded the gem, could you test it? I have poor knowledge of ruby gems versioning.
https://rubygems.org/gems/fluent-plugin-aws-elasticsearch-service-hotfix

@atomita I'll turn down my hotfix gems, once you release a new version. There's no need to update any file. Just build a new gem again is required.

malford · 2017-04-27T19:13:46Z

@tanaka-takayoshi with your hotfix plugin is it still necessary to specify reload_connections false in the config?

tanaka-takayoshi · 2017-04-28T14:01:56Z

@malford Yes, may have to specify reload_connections false as I intended to inherit the parent plugin settings and it's true by default.
https://github.com/uken/fluent-plugin-elasticsearch/blob/v1.9.3/lib/fluent/plugin/out_elasticsearch.rb#L41

darwin67 · 2017-07-10T16:28:43Z

haven't taken a look at this for a while.
looks like it's solved now so i'm closing this issue.

therc mentioned this issue Jul 26, 2016

Can't reconnect under load with reload_connections=false uken/fluent-plugin-elasticsearch#182

Closed

atomita added a commit that referenced this issue Jul 30, 2016

There is a possibility of an error to memorized the credentials, by t…

62ed767

…he expired. #15

tanaka-takayoshi mentioned this issue Dec 2, 2016

Converting string to bool in dynamicconfig. uken/fluent-plugin-elasticsearch#220

Merged

7 tasks

repeatedly mentioned this issue Jan 31, 2017

Cannot get new connection from pool fluent/fluentd#1435

Closed

darwin67 closed this as completed Jul 10, 2017

goruha mentioned this issue Dec 14, 2018

[fluentd-elasticsearch-logs] Update fluentd version cloudposse/helmfiles#67

Merged

nlowe mentioned this issue Jul 24, 2019

Logging: AWS Elasticsearch: Cannot get new connection from pool rancher/rancher#21744

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection to AWS ElasticSearch will be lost after a certain period of time #15

Connection to AWS ElasticSearch will be lost after a certain period of time #15

darwin67 commented Jun 30, 2016 •

edited

Loading

atomita commented Jul 1, 2016

therc commented Jul 28, 2016

aerickson commented Aug 15, 2016

vendrov commented Sep 19, 2016

steynovich commented Oct 19, 2016

darwin67 commented Oct 19, 2016

mpas commented Oct 26, 2016 •

edited

Loading

aerickson commented Oct 26, 2016

tanaka-takayoshi commented Nov 30, 2016

mpas commented Nov 30, 2016 via email •

edited

Loading

aerickson commented Nov 30, 2016

tanaka-takayoshi commented Dec 1, 2016

tanaka-takayoshi commented Dec 2, 2016

tanaka-takayoshi commented Dec 14, 2016

vendrov commented Jan 9, 2017

tanaka-takayoshi commented Jan 9, 2017

tanaka-takayoshi commented Jan 18, 2017

malford commented Apr 27, 2017

tanaka-takayoshi commented Apr 28, 2017

darwin67 commented Jul 10, 2017

Connection to AWS ElasticSearch will be lost after a certain period of time #15

Connection to AWS ElasticSearch will be lost after a certain period of time #15

Comments

darwin67 commented Jun 30, 2016 • edited Loading

atomita commented Jul 1, 2016

therc commented Jul 28, 2016

aerickson commented Aug 15, 2016

vendrov commented Sep 19, 2016

steynovich commented Oct 19, 2016

darwin67 commented Oct 19, 2016

mpas commented Oct 26, 2016 • edited Loading

aerickson commented Oct 26, 2016

tanaka-takayoshi commented Nov 30, 2016

mpas commented Nov 30, 2016 via email • edited Loading

aerickson commented Nov 30, 2016

tanaka-takayoshi commented Dec 1, 2016

tanaka-takayoshi commented Dec 2, 2016

tanaka-takayoshi commented Dec 14, 2016

vendrov commented Jan 9, 2017

tanaka-takayoshi commented Jan 9, 2017

tanaka-takayoshi commented Jan 18, 2017

malford commented Apr 27, 2017

tanaka-takayoshi commented Apr 28, 2017

darwin67 commented Jul 10, 2017

darwin67 commented Jun 30, 2016 •

edited

Loading

mpas commented Oct 26, 2016 •

edited

Loading

mpas commented Nov 30, 2016 via email •

edited

Loading