Application log group not created after deploying FluentBit #35

steveellis · 2022-04-08T20:09:35Z

I'm deploying the FluentBit part of this chart. Here's what my pods look like:

$ kubectl get pods --all-namespaces | grep amazon
amazon-cloudwatch   fluent-bit-448z6                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-9s8jz                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-jblg5                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-ts4kg                                     1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-2s4zj                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-9fhd7                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-g6t9m                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-qdcf2                       1/1     Running   0          123m

The docs say here that 4 log groups should be created once the pods are deployed. But in the CloudWatch dashboard I only see one group /performance for the cluster.

What additional config do I need to see application logs? The only thing I'm doing now is setting fluentBit.enabled to true in the values.

My pods generate logs (I can see them by doing kubectl logs <pod name> at least).

The logs for a given fluent-bit pod look like this:

Fluent Bit v1.8.9
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/04/08 17:57:22] [ info] [engine] started (pid=1)
[2022/04/08 17:57:22] [ info] [storage] created root path /var/fluent-bit/state/flb-storage/
[2022/04/08 17:57:22] [ info] [storage] version=1.1.5, initializing...
[2022/04/08 17:57:22] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/08 17:57:22] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/08 17:57:22] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/08 17:57:22] [ info] [cmetrics] version=0.2.2
[2022/04/08 17:57:22] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] connectivity OK

The text was updated successfully, but these errors were encountered:

sunnygoel87 · 2022-04-08T20:14:19Z

@steveellis , Is following managed IAM policy attached to your EKS cluster specific NodeGroup NodeInstanceRole ? If not, please attach that and it will create the application, dataplane and host specific LogGroups in CloudWatch.

CloudWatchAgentServerPolicy

steveellis · 2022-04-08T21:09:02Z

@sunnygoel87 thanks again for your help. Yes the CloudWatchAgentServerPolicy managed policy is attached to the node group node instance (at least I think it is). Here's how we do that just to make sure we're talking about the same thing:

Create a Role which assumes role policy for principal for the service ec2.amazonaws.com. That's here and here.
For each managed policy of which the above is one, create a role policy attachment to the role we just created.
Bind that role to an instance profile. Here and here.
Finally when we create the cluster we bind this same role to the instance roles of the cluster.

Code for all this starts here which is where the managed policy is referenced.

sunnygoel87 · 2022-04-08T21:44:43Z

@steveellis - I normally use terraform/eksctl Iac tools to provision the EKS cluster on AWS. But regardless of the IaC tool being used to create the cluster, as long as you can see the CloudWatch managed policy attachment to Node group node instance role in IAM console, then it should be fine.

Could you please retrieve the fluentbit pod log again and share the same here? The 1st shipment of pod logs were more about starting the container, testing connectivity with k8s API server etc.

steveellis · 2022-04-08T22:59:43Z

@sunnygoel87 thanks for the idea of checking the console. Indeed the managed policy wasn't there, so my IaC code wasn't adding the managed policy to the role. So I decided to add the CloudWatchAgentServerPolicy to the role using the console and redeployed the chart.

Sadly that hasn't made the application logs appear in CW. Some screenshots from the console.

Here's the logs from one of the FB pods (maybe that warn is something?). This is the whole log, so not much chatting is going on.

[2022/04/08 22:31:50] [ info] [engine] started (pid=1)
[2022/04/08 22:31:50] [ info] [storage] version=1.1.5, initializing...
[2022/04/08 22:31:50] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/08 22:31:50] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/08 22:31:50] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/08 22:31:50] [ info] [cmetrics] version=0.2.2
[2022/04/08 22:31:50] [ warn] [input:systemd:systemd.3] seek_cursor failed
[2022/04/08 22:31:50] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] connectivity OK

ruthvik17 · 2022-04-11T21:15:07Z

Hi @steveellis, could you provide the config files for the FluentBit deployment? Maybe that might give us an idea for the issue as I'm trying to reproduce it on my end.

steveellis · 2022-04-12T01:05:52Z

Hi @ruthvik17 - I'm just installing the helm chart without customization of the fluentbit config. In other words it's your chart's fluentbit config.

I see that the config makes some assumptions about where the logs are in the cluster. Is this location consistent in all EKS deployments? We haven't changed our log locations AFAIK.

steveellis · 2022-04-12T10:19:00Z

@ruthvik17 these are the FluentBit config files that I'm referring to above. We're not modifying these. Also we're running k8s version 21.1 without any changes to the default pod logging configuration. I can see a given pod's application logs with the standard kubectl log <pod name> command.

ruthvik17 · 2022-04-12T22:14:34Z

Hi @steveellis , I'm not sure if the issue that I raised with FluentBit might have been causing this bug as well. But, can you please pull down the new code with the applied fix and try again. I ran it on my local setup and it worked as expected where I could see updated logs for all the log groups.

steveellis · 2022-04-13T18:34:46Z

Thanks @ruthvik17. I'm out until Monday. Will give it a try then and let u know.

steveellis · 2022-04-18T12:55:44Z

@ruthvik17 I have installed 0.5.0 and still am not getting the application logs (only performance) in the console.

$ helm ls
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                  APP VERSION
observability   default         1               2022-04-18 08:19:28.654448 -0400 EDT    deployed        adot-exporter-for-eks-on-ec2-0.5.0     0.17.0

However if I install everything via the command provided here everything works. I can see in a FB pod's logs that events are being sent (messages like [2022/04/18 12:41:56] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 7 events to CloudWatch which don't appear after installing via helm), and I can see the application logs in the console. This is the command:

ClusterName=<my-cluster-name>
RegionName=<my-cluster-region>
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f -

Perhaps the solution is in the diff between what your chart is doing for config and what this yaml is doing.

ruthvik17 · 2022-04-19T23:01:43Z

Hi @steveellis, thanks for bringing this up. I'll look into this and let you know.

mbeaucha · 2022-09-14T20:27:50Z

Have there been any findings on this? I'm having the same issue with the same setup. I'm running in us-gov-west-1.

mbeaucha · 2022-09-20T15:43:45Z

I was able to get the application, dataplane, and host log groups to be created finally after changing the daemonset configuration to use hostNetwork: true. It still does has not created the performance log group, but looking into that now.

github-actions · 2022-12-25T20:01:07Z

This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled

github-actions · 2023-01-29T20:01:06Z

This issue was closed because it has been marked as stale for 30 days with no activity.

bryan-aguilar assigned ruthvik17 Apr 12, 2022

steveellis mentioned this issue Apr 20, 2022

Use ADOT collector for metrics and potentially logging culibraries/folio#256

Open

Aneurysm9 unassigned ruthvik17 Jun 29, 2022

github-actions bot added the stale label Dec 25, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application log group not created after deploying FluentBit #35

Application log group not created after deploying FluentBit #35

steveellis commented Apr 8, 2022

sunnygoel87 commented Apr 8, 2022

steveellis commented Apr 8, 2022

sunnygoel87 commented Apr 8, 2022 •

edited

Loading

steveellis commented Apr 8, 2022

ruthvik17 commented Apr 11, 2022

steveellis commented Apr 12, 2022

steveellis commented Apr 12, 2022 •

edited

Loading

ruthvik17 commented Apr 12, 2022

steveellis commented Apr 13, 2022

steveellis commented Apr 18, 2022

ruthvik17 commented Apr 19, 2022

mbeaucha commented Sep 14, 2022

mbeaucha commented Sep 20, 2022

github-actions bot commented Dec 25, 2022

github-actions bot commented Jan 29, 2023

Application log group not created after deploying FluentBit #35

Application log group not created after deploying FluentBit #35

Comments

steveellis commented Apr 8, 2022

sunnygoel87 commented Apr 8, 2022

steveellis commented Apr 8, 2022

sunnygoel87 commented Apr 8, 2022 • edited Loading

steveellis commented Apr 8, 2022

ruthvik17 commented Apr 11, 2022

steveellis commented Apr 12, 2022

steveellis commented Apr 12, 2022 • edited Loading

ruthvik17 commented Apr 12, 2022

steveellis commented Apr 13, 2022

steveellis commented Apr 18, 2022

ruthvik17 commented Apr 19, 2022

mbeaucha commented Sep 14, 2022

mbeaucha commented Sep 20, 2022

github-actions bot commented Dec 25, 2022

github-actions bot commented Jan 29, 2023

sunnygoel87 commented Apr 8, 2022 •

edited

Loading

steveellis commented Apr 12, 2022 •

edited

Loading