Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application log group not created after deploying FluentBit #35

Closed
steveellis opened this issue Apr 8, 2022 · 15 comments
Closed

Application log group not created after deploying FluentBit #35

steveellis opened this issue Apr 8, 2022 · 15 comments
Labels

Comments

@steveellis
Copy link

I'm deploying the FluentBit part of this chart. Here's what my pods look like:

$ kubectl get pods --all-namespaces | grep amazon
amazon-cloudwatch   fluent-bit-448z6                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-9s8jz                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-jblg5                                     1/1     Running   0          123m
amazon-cloudwatch   fluent-bit-ts4kg                                     1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-2s4zj                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-9fhd7                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-g6t9m                       1/1     Running   0          123m
amazon-metrics      adot-collector-daemonset-qdcf2                       1/1     Running   0          123m

The docs say here that 4 log groups should be created once the pods are deployed. But in the CloudWatch dashboard I only see one group /performance for the cluster.

What additional config do I need to see application logs? The only thing I'm doing now is setting fluentBit.enabled to true in the values.

My pods generate logs (I can see them by doing kubectl logs <pod name> at least).

The logs for a given fluent-bit pod look like this:

Fluent Bit v1.8.9
* Copyright (C) 2019-2021 The Fluent Bit Authors
* Copyright (C) 2015-2018 Treasure Data
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/04/08 17:57:22] [ info] [engine] started (pid=1)
[2022/04/08 17:57:22] [ info] [storage] created root path /var/fluent-bit/state/flb-storage/
[2022/04/08 17:57:22] [ info] [storage] version=1.1.5, initializing...
[2022/04/08 17:57:22] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/08 17:57:22] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/08 17:57:22] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/08 17:57:22] [ info] [cmetrics] version=0.2.2
[2022/04/08 17:57:22] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/08 17:57:22] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
@sunnygoel87
Copy link
Contributor

@steveellis , Is following managed IAM policy attached to your EKS cluster specific NodeGroup NodeInstanceRole ? If not, please attach that and it will create the application, dataplane and host specific LogGroups in CloudWatch.

CloudWatchAgentServerPolicy

@steveellis
Copy link
Author

@sunnygoel87 thanks again for your help. Yes the CloudWatchAgentServerPolicy managed policy is attached to the node group node instance (at least I think it is). Here's how we do that just to make sure we're talking about the same thing:

  1. Create a Role which assumes role policy for principal for the service ec2.amazonaws.com. That's here and here.
  2. For each managed policy of which the above is one, create a role policy attachment to the role we just created.
  3. Bind that role to an instance profile. Here and here.
  4. Finally when we create the cluster we bind this same role to the instance roles of the cluster.

Code for all this starts here which is where the managed policy is referenced.

@sunnygoel87
Copy link
Contributor

sunnygoel87 commented Apr 8, 2022

@steveellis - I normally use terraform/eksctl Iac tools to provision the EKS cluster on AWS. But regardless of the IaC tool being used to create the cluster, as long as you can see the CloudWatch managed policy attachment to Node group node instance role in IAM console, then it should be fine.

Could you please retrieve the fluentbit pod log again and share the same here? The 1st shipment of pod logs were more about starting the container, testing connectivity with k8s API server etc.

@steveellis
Copy link
Author

@sunnygoel87 thanks for the idea of checking the console. Indeed the managed policy wasn't there, so my IaC code wasn't adding the managed policy to the role. So I decided to add the CloudWatchAgentServerPolicy to the role using the console and redeployed the chart.

Sadly that hasn't made the application logs appear in CW. Some screenshots from the console.

Snip20220408_6

Snip20220408_5

Snip20220408_7

Here's the logs from one of the FB pods (maybe that warn is something?). This is the whole log, so not much chatting is going on.

[2022/04/08 22:31:50] [ info] [engine] started (pid=1)
[2022/04/08 22:31:50] [ info] [storage] version=1.1.5, initializing...
[2022/04/08 22:31:50] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/08 22:31:50] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/08 22:31:50] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/08 22:31:50] [ info] [cmetrics] version=0.2.2
[2022/04/08 22:31:50] [ warn] [input:systemd:systemd.3] seek_cursor failed
[2022/04/08 22:31:50] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/08 22:31:50] [ info] [filter:kubernetes:kubernetes.0] connectivity OK

@ruthvik17
Copy link
Contributor

Hi @steveellis, could you provide the config files for the FluentBit deployment? Maybe that might give us an idea for the issue as I'm trying to reproduce it on my end.

@steveellis
Copy link
Author

Hi @ruthvik17 - I'm just installing the helm chart without customization of the fluentbit config. In other words it's your chart's fluentbit config.

I see that the config makes some assumptions about where the logs are in the cluster. Is this location consistent in all EKS deployments? We haven't changed our log locations AFAIK.

@steveellis
Copy link
Author

steveellis commented Apr 12, 2022

@ruthvik17 these are the FluentBit config files that I'm referring to above. We're not modifying these. Also we're running k8s version 21.1 without any changes to the default pod logging configuration. I can see a given pod's application logs with the standard kubectl log <pod name> command.

@ruthvik17
Copy link
Contributor

Hi @steveellis , I'm not sure if the issue that I raised with FluentBit might have been causing this bug as well. But, can you please pull down the new code with the applied fix and try again. I ran it on my local setup and it worked as expected where I could see updated logs for all the log groups.

@steveellis
Copy link
Author

Thanks @ruthvik17. I'm out until Monday. Will give it a try then and let u know.

@steveellis
Copy link
Author

@ruthvik17 I have installed 0.5.0 and still am not getting the application logs (only performance) in the console.

$ helm ls
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                  APP VERSION
observability   default         1               2022-04-18 08:19:28.654448 -0400 EDT    deployed        adot-exporter-for-eks-on-ec2-0.5.0     0.17.0 

However if I install everything via the command provided here everything works. I can see in a FB pod's logs that events are being sent (messages like [2022/04/18 12:41:56] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Sent 7 events to CloudWatch which don't appear after installing via helm), and I can see the application logs in the console. This is the command:

ClusterName=<my-cluster-name>
RegionName=<my-cluster-region>
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
curl https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${ClusterName}'/;s/{{region_name}}/'${RegionName}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f - 

Perhaps the solution is in the diff between what your chart is doing for config and what this yaml is doing.

@ruthvik17
Copy link
Contributor

Hi @steveellis, thanks for bringing this up. I'll look into this and let you know.

@mbeaucha
Copy link

Have there been any findings on this? I'm having the same issue with the same setup. I'm running in us-gov-west-1.

@mbeaucha
Copy link

I was able to get the application, dataplane, and host log groups to be created finally after changing the daemonset configuration to use hostNetwork: true. It still does has not created the performance log group, but looking into that now.

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled

@github-actions github-actions bot added the stale label Dec 25, 2022
@github-actions
Copy link

This issue was closed because it has been marked as stale for 30 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants