Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC2 IMDS errors upon launch #333

Open
byrneo opened this issue Apr 21, 2022 · 8 comments
Open

EC2 IMDS errors upon launch #333

byrneo opened this issue Apr 21, 2022 · 8 comments

Comments

@byrneo
Copy link

byrneo commented Apr 21, 2022

Describe the question/issue

Noticing some errors appearing when fluentbit launches

[error] [src/flb_network.c:224 errno=9] Bad file descriptor
[error] [http_client] broken connection to 169.254.169.254:80 ?
[error] [http_client] broken connection to 169.254.169.254:80 ?
AWS for Fluent Bit Container Image Version 2.23.3[2022/04/21 09:58:51] [  Error] epoll_ctl: Bad file descript�r, errno=9 at /tmp/fluent-bit-1.8.15/lib/monkey/mk_core/mk_event_epoll.c:136

these errors appear a few times upon startup but don't cause the pod to crash.

Configuration

Fluent Bit Log Output

Fluent Bit v1.8.15
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/04/21 09:58:50] [ info] [engine] started (pid=1)
[2022/04/21 09:58:50] [ info] [storage] version=1.1.6, initializing...
[2022/04/21 09:58:50] [ info] [storage] root path '/var/fluent-bit/state/flb-storage/'
[2022/04/21 09:58:50] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/04/21 09:58:50] [ info] [storage] backlog input plugin: storage_backlog.8
[2022/04/21 09:58:50] [ info] [cmetrics] version=0.2.2
[2022/04/21 09:58:50] [ info] [input:systemd:systemd.3] seek_cursor=s=7028adf2155a4b3ca09a2a342ca71203;i=ffa... OK
[2022/04/21 09:58:50] [ info] [input:storage_backlog:storage_backlog.8] queue memory limit: 4.8M
[2022/04/21 09:58:50] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2022/04/21 09:58:50] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2022/04/21 09:58:50] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server...
[2022/04/21 09:58:50] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2022/04/21 09:58:51] [ warn] [net] io_read #119 timeout after 1 seconds from: 169.254.169.254:80
[2022/04/21 09:58:51] [error] [src/flb_network.c:224 errno=9] Bad file descriptor
[2022/04/21 09:58:51] [error] [http_client] broken connection to 169.254.169.254:80 ?
AWS for Fluent Bit Container Image Version 2.23.3[2022/04/21 09:58:51] [  Error] epoll_ctl: Bad file descript�r, errno=9 at /tmp/fluent-bit-1.8.15/lib/monkey/mk_core/mk_event_epoll.c:136
[2022/04/21 09:58:51] [ info] [imds] to use IMDSv2, set --http-put-response-limit to 2
[2022/04/21 09:58:51] [ warn] [imds] falling back on IMDSv1
[2022/04/21 09:58:52] [ warn] [net] io_read #121 timeout after 1 seconds from: 169.254.169.254:80
[2022/04/21 09:58:52] [error] [src/flb_network.c:224 errno=9] Bad file descriptor
[2022/04/21 09:58:52] [error] [http_client] broken connection to 169.254.169.254:80 ?
[2022/04/21 09:58:52] [  Error] epoll_ctl: Bad file descriptor, errno=9 at /tmp/fluent-bit-1.8.15/lib/monkey/mk_core/mk_event_epoll.c:136
[2022/04/21 09:58:52] [ info] [imds] to use IMDSv2, set --http-put-response-limit to 2
[2022/04/21 09:58:52] [ warn] [imds] falling back on IMDSv1
[2022/04/21 09:58:53] [  Error] epoll_ctl: Bad file descriptor, errno=9 at /tmp/fluent-bit-1.8.15/lib/monkey/mk_core/mk_event_epoll.c:136
[2022/04/21 09:58:53] [ warn] [net] io_read #123 timeout after 1 seconds from: 169.254.169.254:80
[2022/04/21 09:58:53] [error] [src/flb_network.c:224 errno=9] Bad file descriptor
[2022/04/21 09:58:53] [error] [http_client] broken connection to 169.254.169.254:80 ?
[2022/04/21 09:58:53] [ info] [imds] to use IMDSv2, set --http-put-response-limit to 2
[2022/04/21 09:58:53] [ warn] [imds] falling back on IMDSv1

Fluent Bit Version Info

Fluent Bit v1.8.15

AWS for Fluent Bit Container Image Version 2.23.3

Cluster Details

Application Details

Steps to reproduce issue

Related Issues

@byrneo byrneo changed the title Errors upon Log errors upon launch Apr 21, 2022
@PettitWesley PettitWesley changed the title Log errors upon launch EC2 IMDS errors upon launch Apr 25, 2022
@PettitWesley
Copy link
Contributor

Does Fluent Bit function normally and successfully send logs after startup? Does these errors only occur on startup?

[2022/04/21 09:58:51] [error] [src/flb_network.c:224 errno=9] Bad file descriptor
[2022/04/21 09:58:51] [error] [http_client] broken connection to 169.254.169.254:80 ?

Both of these errors are almost certainly the same root error- first the core network library logs the "Bad file descriptor" message, then the http client logs that thus the connection is broken. 169.254.169.254 is the EC2 IMDS IP. Notice the lines after this about setting a hop limit.

What's happening here is that when each AWS plugin instance is initialized, each one must initialize its credential providers. So it will go through the standard chain of AWS credential sources, including EC2 IMDS, and look for creds. This will happen for each AWS output instance. Hence, you probably got one error message per output instance. For the EC2 provider, it tries IMDS version 2 first: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html

If this fails it falls back to IMDS v1 style requests, where the auth token is omitted.

So I think what is happening here is expected. We wish the errors here were more clear to prevent confusion. @matthewfala Did I miss anything and can you think of any ways to improve the error messaging here?

@matthewfala
Copy link
Contributor

That's right, @PettitWesley. We're thinking that the issue is a combination of the following:

  1. hop limit is set to 1
  2. IMDSv1 is disabled (which is good)

AWS recommends using IMDSv2, so in order to do that, you'll need to set hop limit to 2 or greater so network within the container can access the IMDS endpoint properly: #259 (comment)

If you don't want to go through the trouble of increasing the hop limit, you can also enable IMDSv1, in which case it should be detected and used by Fluent Bit.

@byrneo
Copy link
Author

byrneo commented May 10, 2022

Sorry for the late response @PettitWesley @matthewfala . Yes: FluentBit did indeed appear to function normally despite the startup errors.

I've made a bunch of changes in my environment since creating this issue: one of which was to use IRSA with Fluentbit (previously i had been using an IAM instance role/profile for the ec2 host). I can't be 100% certain that made the difference, but i no longer see the errors during startup any more.

@vkadi
Copy link

vkadi commented Feb 22, 2023

@PettitWesley @matthewfala I have been struggling with the IMDS related issues , I am using the latest image 2.31.2

[2023/02/22 22:03:10] [error] [net] connection #44 timeout after 10 seconds to: 169.254.169.254:80
[2023/02/22 22:03:10] [error] [filter:aws:aws.0] connection initialization error
[2023/02/22 22:03:10] [error] [filter:aws:aws.0] Could not retrieve ec2 metadata from IMDS
[0] dummy: [1677103380.297254617, {"message"=>"dummy"}]

This is what I have in configmap


[INPUT]
    Name dummy
    Tag dummy

[FILTER]
    Name aws
    Match *
    imds_version v2
    az true
    ec2_instance_id true
    ec2_instance_type true
    private_ip true
    ami_id true
    account_id true
    hostname true
    vpc_id true

[OUTPUT]
    Name stdout
    Match *

I tried changing the hop count to 2 , snip from the ec2 describe

              MetadataOptions": {
                        "State": "applied",
                        "HttpTokens": "optional",  --> tried even with required
                        "HttpPutResponseHopLimit": 2,
                        "HttpEndpoint": "enabled",
                        "HttpProtocolIpv6": "disabled",
                        "InstanceMetadataTags": "disabled"
                    }

I am trying to use this metadata plugin to enrich the logs for the instance_id in specific , is there something I am missing ? what is required to be set from ec2 side to get this https://docs.fluentbit.io/manual/pipeline/filters/aws-metadata to work

@PettitWesley
Copy link
Contributor

@vkadi that should work... what network setup are your containers running in? Can you try ssh/kubectl exec into the pod and see if you can reach IMDS via curl?

@vkadi
Copy link

vkadi commented Feb 23, 2023

@PettitWesley I am running this on a EKS cluster and from pods I am not able to access the metadata

bash-4.2# curl http://169.254.169.254/latest/meta-data/
curl: (28) Failed to connect to 169.254.169.254 port 80 after 129614 ms: Couldn't connect to server

@PettitWesley
Copy link
Contributor

@vkadi then something about your network configuration is blocking access. I am not sure what. I know there are some CNI plugins that will block link local IP addresses from pods, which would block IMDS.

@vkadi
Copy link

vkadi commented Feb 24, 2023

@PettitWesley By enabling "hostNetwork: true" I was able to access the IMDS on fluentbit pod
as mentioned here in this doc - https://docs.fluentbit.io/manual/pipeline/filters/kubernetes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants