Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuentbit (k8s log) connections not catching up if fluentd goes down or if first time deployment #1551

Closed
vmayerupgrade opened this issue Apr 12, 2021 · 4 comments
Assignees
Labels
bug Something isn't working question Further information is requested
Milestone

Comments

@vmayerupgrade
Copy link

Hello!

My Fluent-bit(k8s log) pods are having a hard time recovering when Fluentd goes down or during the first time deployment if the Fluentbit pods are ready before fluentd.
Once Fluentd is up/back, new logs are sent properly but older connections seem to have difficulty and the fluent-bit pods are logging more and more of the following logs. It seems to get worse in time as more and more connection/lines are added at every refresh.
The solution I have to fix the problem is to restart the Fluent-bit pod. Everything looks good after the restart.

[2021/04/12 19:46:49] [error] [upstream] connection #123 to sumologic-kubernetes-collection-k8s-fluentd-logs.sumologic.svc.cluster.local.:24321 timed out after 10 seconds
[2021/04/12 19:46:49] [error] [upstream] connection #131 to sumologic-kubernetes-collection-k8s-fluentd-logs.sumologic.svc.cluster.local.:24321 timed out after 10 seconds
[2021/04/12 19:46:49] [error] [upstream] connection #132 to sumologic-kubernetes-collection-k8s-fluentd-logs.sumologic.svc.cluster.local.:24321 timed out after 10 seconds
.....

How to replicate this

  • Install the collection with at least logs and Fluentd activated
  • scale down to 0 the Fluentd stateful set and put it back to normal
  • watch the Fluend-bit pods logs.

I am using the v2.0.6 version and I am deploying using helm to generate the manifest.

@vmayerupgrade vmayerupgrade added the question Further information is requested label Apr 12, 2021
@sumo-drosiek
Copy link
Contributor

sumo-drosiek commented Apr 13, 2021

HI, thanks for the input.
This is (probably) related to fluent/fluent-bit#3192

We are going to backport #1543 and release v2.0.7
Meantime, you can overwrite fluent-bit image version:

fluent-bit:
  image:
    tag: 1.7.3

@sumo-drosiek sumo-drosiek added this to the v2.0 milestone Apr 13, 2021
@sumo-drosiek sumo-drosiek added the bug Something isn't working label Apr 13, 2021
@sumo-drosiek sumo-drosiek self-assigned this Apr 13, 2021
@vmayerupgrade
Copy link
Author

Your solution to bump fluent bit to 1.7.3 solved the issue! Thanks!

@sumo-drosiek
Copy link
Contributor

@vmayerupgrade we released v2.1.1 which should solve the issue. We downgraded fluent-bit to 1.6.10, but you can use 1.7.3 if that works for you

Please close the issue if everything is fine

@sumo-drosiek
Copy link
Contributor

I'm closing this issue due to no further comments. Please open it again in case of reoccurring the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants