Fuentbit (k8s log) connections not catching up if fluentd goes down or if first time deployment #1551

vmayerupgrade · 2021-04-12T20:04:12Z

Hello!

My Fluent-bit(k8s log) pods are having a hard time recovering when Fluentd goes down or during the first time deployment if the Fluentbit pods are ready before fluentd.
Once Fluentd is up/back, new logs are sent properly but older connections seem to have difficulty and the fluent-bit pods are logging more and more of the following logs. It seems to get worse in time as more and more connection/lines are added at every refresh.
The solution I have to fix the problem is to restart the Fluent-bit pod. Everything looks good after the restart.

[2021/04/12 19:46:49] [error] [upstream] connection #123 to sumologic-kubernetes-collection-k8s-fluentd-logs.sumologic.svc.cluster.local.:24321 timed out after 10 seconds
[2021/04/12 19:46:49] [error] [upstream] connection #131 to sumologic-kubernetes-collection-k8s-fluentd-logs.sumologic.svc.cluster.local.:24321 timed out after 10 seconds
[2021/04/12 19:46:49] [error] [upstream] connection #132 to sumologic-kubernetes-collection-k8s-fluentd-logs.sumologic.svc.cluster.local.:24321 timed out after 10 seconds
.....

How to replicate this

Install the collection with at least logs and Fluentd activated
scale down to 0 the Fluentd stateful set and put it back to normal
watch the Fluend-bit pods logs.

I am using the v2.0.6 version and I am deploying using helm to generate the manifest.

The text was updated successfully, but these errors were encountered:

sumo-drosiek · 2021-04-13T06:11:20Z

HI, thanks for the input.
This is (probably) related to fluent/fluent-bit#3192

We are going to backport #1543 and release v2.0.7
Meantime, you can overwrite fluent-bit image version:

fluent-bit:
  image:
    tag: 1.7.3

vmayerupgrade · 2021-04-13T14:45:18Z

Your solution to bump fluent bit to 1.7.3 solved the issue! Thanks!

sumo-drosiek · 2021-04-19T10:08:25Z

@vmayerupgrade we released v2.1.1 which should solve the issue. We downgraded fluent-bit to 1.6.10, but you can use 1.7.3 if that works for you

Please close the issue if everything is fine

sumo-drosiek · 2021-05-18T13:46:19Z

I'm closing this issue due to no further comments. Please open it again in case of reoccurring the error

vmayerupgrade added the question Further information is requested label Apr 12, 2021

sumo-drosiek added this to the v2.0 milestone Apr 13, 2021

sumo-drosiek added the bug Something isn't working label Apr 13, 2021

sumo-drosiek self-assigned this Apr 13, 2021

sumo-drosiek closed this as completed May 18, 2021

hemanth-codaio mentioned this issue Aug 21, 2021

Will you add fluent-bit 1.8+ support? #1709

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuentbit (k8s log) connections not catching up if fluentd goes down or if first time deployment #1551

Fuentbit (k8s log) connections not catching up if fluentd goes down or if first time deployment #1551

vmayerupgrade commented Apr 12, 2021

sumo-drosiek commented Apr 13, 2021 •

edited

Loading

vmayerupgrade commented Apr 13, 2021

sumo-drosiek commented Apr 19, 2021

sumo-drosiek commented May 18, 2021

Fuentbit (k8s log) connections not catching up if fluentd goes down or if first time deployment #1551

Fuentbit (k8s log) connections not catching up if fluentd goes down or if first time deployment #1551

Comments

vmayerupgrade commented Apr 12, 2021

sumo-drosiek commented Apr 13, 2021 • edited Loading

vmayerupgrade commented Apr 13, 2021

sumo-drosiek commented Apr 19, 2021

sumo-drosiek commented May 18, 2021

sumo-drosiek commented Apr 13, 2021 •

edited

Loading