Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent cannot connect to kubelet #6621

Closed
Gowiem opened this issue Oct 23, 2020 · 6 comments
Closed

Agent cannot connect to kubelet #6621

Gowiem opened this issue Oct 23, 2020 · 6 comments

Comments

@Gowiem
Copy link

Gowiem commented Oct 23, 2020

Output of the info page (if this is a bug)

Getting the status from the agent.

===============
Agent (v7.23.1)
===============

  Status date: 2020-10-23 00:47:05.200537 UTC
  Agent start: 2020-10-22 23:53:34.100610 UTC
  Pid: 435
  Go Version: go1.14.7
  Python Version: 3.8.5
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System UTC time: 2020-10-23 00:47:05.200537 UTC

  Host Info
  =========
    bootTime: 2020-10-22 23:40:26.000000 UTC
    kernelArch: x86_64
    kernelVersion: 4.14.193-149.317.amzn2.x86_64
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 15
    uptime: 13m25s
    virtualizationRole: guest
    virtualizationSystem: xen

  Hostnames
  =========
    socket-fqdn: REDACTED-7db7bfc879-s9df7
    socket-hostname: REDACTED-7db7bfc879-s9df7
    host tags:
      cluster_name:REDACTED
      env:REDACTED
    hostname provider: 
    unused hostname providers:
      configuration/environment: hostname is empty

  Metadata
  ========

=========
Collector
=========

  Running Checks
  ==============
    
    eks_fargate (1.1.1)
    -------------------
      Instance ID: eks_fargate:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/eks_fargate.d/conf.yaml.default
      Total Runs: 214
      Metric Samples: Last Run: 1, Total: 214
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-10-23 00:47:04.000000 UTC
      Last Successful Execution Date : 2020-10-23 00:47:04.000000 UTC
      
    
    kubelet (5.0.0)
    ---------------
      Instance ID: kubelet:d884b5186b651429 [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/kubelet.d/conf.yaml.default
      Total Runs: 213
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-10-23 00:46:56.000000 UTC
      Last Successful Execution Date : Never
      Error: Unable to detect the kubelet URL automatically: cannot connect: https: "Get \"https://:10250/pods\": dial tcp :10250: connect: connection refused", http: "Get \"http://:10255/pods\": dial tcp :10255: connect: connection refused"
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 828, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py", line 295, in check
          raise CheckException("Unable to detect the kubelet URL automatically: " + kubelet_conn_info.get('err', ''))
      datadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically: cannot connect: https: "Get \"https://:10250/pods\": dial tcp :10250: connect: connection refused", http: "Get \"http://:10255/pods\": dial tcp :10255: connect: connection refused"
========
JMXFetch
========

  Initialized checks
  ==================
    no checks
    
  Failed checks
  =============
    no checks
    
=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 213
    Connections: 0
    Containers: 0
    Deployments: 0
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 22
    Metadata: 0
    Nodes: 0
    Pods: 0
    Processes: 0
    RTContainers: 0
    RTProcesses: 0
    ReplicaSets: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    Services: 0
    SketchSeries: 0
    Success: 448
    TimeseriesV1: 213

  API Keys status
  ===============
    API key ending with 42793: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - 42793

Describe what happened:

I'm trying to run DataDog as a sidecar on my EKS Fargate Nodes/Pods, but I'm continuing to get the seemingly common "cannot connect to kubelet" like errors - this is the latest iteration:

2020-10-23T01:27:58.521589856Z starting agent
2020-10-23T01:27:58.522683161Z starting system-probe
2020-10-23T01:27:59.224556973Z [services.d] done.
2020-10-23T01:28:01.423422387Z 2020-10-23 01:28:01 UTC | PROCESS | INFO | (pkg/util/log/log.go:465 in func1) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:01.423628412Z 2020-10-23 01:28:01 UTC | PROCESS | INFO | (pkg/util/log/log.go:460 in func1) | Skipping TLS verification
2020-10-23T01:28:01.423877368Z 2020-10-23 01:28:01 UTC | PROCESS | WARN | (pkg/util/log/log.go:480 in func1) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:01.424106681Z 2020-10-23 01:28:01 UTC | PROCESS | INFO | (pkg/util/log/log.go:465 in func1) | overriding API key from env DD_API_KEY value
2020-10-23T01:28:01.424163083Z 2020-10-23 01:28:01 UTC | PROCESS | INFO | (pkg/process/config/config.go:295 in mergeConfigIfExists) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2020-10-23T01:28:01.424191510Z 2020-10-23 01:28:01 UTC | PROCESS | INFO | (pkg/process/config/config.go:454 in loadEnvVariables) | overriding API key from env DD_API_KEY value
2020-10-23T01:28:01.424647590Z 2020-10-23 01:28:01 UTC | PROCESS | INFO | (pkg/process/config/yaml_config.go:189 in loadSysProbeYamlConfig) | network_config not found, enabling network check by default
2020-10-23T01:28:02.321248893Z 2020-10-23 01:28:02 UTC | CORE | INFO | (cmd/agent/app/run.go:183 in StartAgent) | Starting Datadog Agent v7.23.1
2020-10-23T01:28:02.321272392Z 2020-10-23 01:28:02 UTC | CORE | INFO | (cmd/agent/app/run.go:227 in StartAgent) | Hostname is: 
2020-10-23T01:28:04.321989049Z 2020-10-23 01:28:04 UTC | CORE | INFO | (pkg/api/security/security.go:145 in fetchAuthToken) | Saved a new authentication token to /etc/datadog-agent/auth_token
2020-10-23T01:28:04.424047973Z 2020-10-23 01:28:04 UTC | CORE | INFO | (cmd/agent/app/run.go:254 in StartAgent) | GUI server port -1 specified: not starting the GUI.
2020-10-23T01:28:04.424569167Z 2020-10-23 01:28:04 UTC | CORE | INFO | (pkg/forwarder/forwarder.go:270 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://7-23-1-app.agent.datadoghq.com" (1 api key(s))
2020-10-23T01:28:04.424723548Z 2020-10-23 01:28:04 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:176 in CheckConnectivity) | Checking HTTP connectivity...
2020-10-23T01:28:04.524575498Z 2020-10-23 01:28:04 UTC | TRACE | INFO | (pkg/util/log/log.go:465 in func1) | Loaded configuration: /etc/datadog-agent/datadog.yaml
2020-10-23T01:28:04.619969109Z 2020-10-23 01:28:04 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:182 in CheckConnectivity) | Sending HTTP connectivity request to https://agent-http-intake.logs.datadoghq.com/v1/input/***************************42793...
2020-10-23T01:28:04.620174238Z 2020-10-23 01:28:04 UTC | CORE | INFO | (pkg/dogstatsd/listeners/udp.go:97 in Listen) | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2020-10-23T01:28:05.723968645Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/logs/client/http/destination.go:187 in CheckConnectivity) | HTTP connectivity successful
2020-10-23T01:28:05.723991173Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:55 in NewLauncher) | Could not setup the docker launcher: temporary failure in dockerutil, will retry later: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2020-10-23T01:28:05.724000970Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:62 in NewLauncher) | Could not setup the kubernetes launcher: /var/log/pods not found
2020-10-23T01:28:05.724059580Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/logs/input/container/launcher.go:71 in NewLauncher) | Container logs won't be collected unless a docker daemon is eventually started
2020-10-23T01:28:05.724072482Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/logs/logs.go:85 in Start) | Starting logs-agent...
2020-10-23T01:28:05.724098372Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/logs/logs.go:88 in Start) | logs-agent started
2020-10-23T01:28:05.724107432Z 2020-10-23 01:28:05 UTC | CORE | INFO | (cmd/agent/app/run.go:312 in StartAgent) | System probe config not found, disabling pulling system probe info in the status page: open /etc/datadog-agent/system-probe.yaml: no such file or directory
2020-10-23T01:28:05.724114782Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/util/version_history.go:43 in logVersionHistoryToFile) | Cannot read file: /opt/datadog-agent/run/version-history.json, will create a new one. open /opt/datadog-agent/run/version-history.json: no such file or directory
2020-10-23T01:28:05.724122054Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:05.724138193Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:05.729744533Z 2020-10-23 01:28:05 UTC | CORE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:05.730304078Z 2020-10-23 01:28:05 UTC | CORE | INFO | (pkg/tagger/tagger.go:158 in tryCollectors) | static tag collector successfully started
2020-10-23T01:28:06.623134486Z 2020-10-23 01:28:06 UTC | TRACE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:06.623158307Z 2020-10-23 01:28:06 UTC | TRACE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:06.628984833Z 2020-10-23 01:28:06 UTC | TRACE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:06.724218223Z 2020-10-23 01:28:06 UTC | TRACE | INFO | (pkg/tagger/tagger.go:158 in tryCollectors) | static tag collector successfully started
2020-10-23T01:28:06.724338855Z 2020-10-23 01:28:06 UTC | TRACE | INFO | (pkg/trace/agent/run.go:131 in Run) | Trace agent running on host oc-location-7db7bfc879-9zjw5
2020-10-23T01:28:06.724352273Z 2020-10-23 01:28:06 UTC | TRACE | INFO | (pkg/trace/api/api.go:125 in Start) | Listening for traces at http://0.0.0.0:8126
2020-10-23T01:28:07.226389221Z 2020-10-23 01:28:07 UTC | PROCESS | INFO | (main_common.go:107 in runAgent) | running on platform: linux-4.14.193-149.317.amzn2.x86_64-x86_64-with-glibc2.2.5
2020-10-23T01:28:07.226479677Z 2020-10-23 01:28:07 UTC | PROCESS | INFO | (main_common.go:110 in runAgent) | running version: Version: 7.23.1, Git hash: 8099db1, Git branch: HEAD, Build date: 2020-10-20T22:32:54, Go Version: go version go1.14.7 linux/amd64, 
2020-10-23T01:28:07.821651248Z 2020-10-23 01:28:07 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:07.821673758Z 2020-10-23 01:28:07 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:07.826900130Z 2020-10-23 01:28:07 UTC | CORE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:07.828126859Z 2020-10-23 01:28:07 UTC | CORE | INFO | (pkg/collector/runner/runner.go:92 in NewRunner) | Runner started with 4 workers.
2020-10-23T01:28:07.919867043Z 2020-10-23 01:28:07 UTC | CORE | INFO | (pkg/collector/python/init.go:311 in Initialize) | Initializing rtloader with python3 /opt/datadog-agent/embedded
2020-10-23T01:28:09.227917701Z 2020-10-23 01:28:09 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:09.227951706Z 2020-10-23 01:28:09 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:09.321818958Z 2020-10-23 01:28:09 UTC | PROCESS | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:09.323896283Z 2020-10-23 01:28:09 UTC | PROCESS | INFO | (pkg/tagger/tagger.go:158 in tryCollectors) | static tag collector successfully started
2020-10-23T01:28:09.626237584Z 2020-10-23 01:28:09 UTC | CORE | INFO | (pkg/util/cloudprovider.go:54 in DetectCloudProvider) | No cloud provider detected
2020-10-23T01:28:10.725954689Z 2020-10-23 01:28:10 UTC | PROCESS | INFO | (pkg/process/checks/process.go:48 in Init) | no network ID detected: could not detect network ID
2020-10-23T01:28:10.726097973Z 2020-10-23 01:28:10 UTC | PROCESS | INFO | (collector.go:175 in run) | Starting process-agent for host=fargate-ip-10-9-22-178.ec2.internal, endpoints=[https://process.datadoghq.com], orchestrator endpoints=[https://orchestrator.datadoghq.com], enabled checks=[process rtprocess Network]
2020-10-23T01:28:10.726363579Z 2020-10-23 01:28:10 UTC | PROCESS | INFO | (pkg/forwarder/forwarder.go:270 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://process.datadoghq.com" (1 api key(s))
2020-10-23T01:28:10.726497573Z 2020-10-23 01:28:10 UTC | PROCESS | INFO | (pkg/forwarder/forwarder.go:270 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://orchestrator.datadoghq.com" (1 api key(s))
2020-10-23T01:28:10.821284195Z 2020-10-23 01:28:10 UTC | PROCESS | INFO | (collector.go:157 in runCheck) | Finished process check #1 in 94.543378ms
2020-10-23T01:28:10.922702914Z 2020-10-23 01:28:10 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:465 in func1) | no config exists at /etc/datadog-agent/system-probe.yaml, ignoring...
2020-10-23T01:28:10.922771328Z 2020-10-23 01:28:10 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:465 in func1) | overriding API key from env DD_API_KEY value
2020-10-23T01:28:10.922784613Z 2020-10-23 01:28:10 UTC | SYS-PROBE | INFO | (pkg/util/log/log.go:460 in func1) | network_config not found, enabling network check by default
2020-10-23T01:28:10.922871375Z 2020-10-23 01:28:10 UTC | SYS-PROBE | INFO | (cmd/system-probe/main.go:84 in runAgent) | system probe not enabled. exiting.
2020-10-23T01:28:11.426357142Z 2020-10-23 01:28:11 UTC | SECURITY | INFO | (app/app.go:165 in start) | All security-agent components are deactivated, exiting
2020-10-23T01:28:14.925534798Z 2020-10-23 01:28:14 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:122 in LogMessage) | - | (ddyaml.py:123) | monkey patching yaml.load...
2020-10-23T01:28:14.925694867Z 2020-10-23 01:28:14 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:122 in LogMessage) | - | (ddyaml.py:127) | monkey patching yaml.load_all...
2020-10-23T01:28:14.925905775Z 2020-10-23 01:28:14 UTC | CORE | INFO | (pkg/collector/python/datadog_agent.go:122 in LogMessage) | - | (ddyaml.py:131) | monkey patching yaml.dump_all... (affects all yaml dump operations)
2020-10-23T01:28:15.321453333Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/collector/collector.go:57 in NewCollector) | Embedding Python 3.8.5 (default, Oct 20 2020, 22:31:39) [GCC 4.7.2]
2020-10-23T01:28:15.525722651Z 2020-10-23 01:28:15 UTC | CORE | INFO | (cmd/agent/common/autoconfig.go:72 in SetupAutoConfig) | Registering kubelet config provider polled every 10s
2020-10-23T01:28:15.526204380Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:15.526332972Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:15.531728062Z 2020-10-23 01:28:15 UTC | CORE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:15.535856068Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:363 in initListenerCandidates) | kubelet listener cannot start, will retry: temporary failure in kubeutil, will retry later: cannot connect: https: "Get \"https://:10250/pods\": dial tcp :10250: connect: connection refused", http: "Get \"http://:10255/pods\": dial tcp :10255: connect: connection refused"
2020-10-23T01:28:15.536002068Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: /etc/datadog-agent/conf.d
2020-10-23T01:28:15.922179191Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: /opt/datadog-agent/bin/agent/dist/conf.d
2020-10-23T01:28:15.922327117Z 2020-10-23 01:28:15 UTC | CORE | WARN | (pkg/autodiscovery/providers/file.go:78 in Collect) | Skipping, open /opt/datadog-agent/bin/agent/dist/conf.d: no such file or directory
2020-10-23T01:28:15.922478553Z 2020-10-23 01:28:15 UTC | CORE | INFO | (pkg/autodiscovery/providers/file.go:74 in Collect) | file: searching for configuration files at: 
2020-10-23T01:28:15.922568319Z 2020-10-23 01:28:15 UTC | CORE | WARN | (pkg/autodiscovery/providers/file.go:78 in Collect) | Skipping, open : no such file or directory
2020-10-23T01:28:16.020076169Z system-probe exited with code 0, disabling
2020-10-23T01:28:16.039140785Z 2020-10-23 01:28:16 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:85 in Enter) | Scheduling check eks_fargate with an interval of 15s
2020-10-23T01:28:16.039247025Z 2020-10-23 01:28:16 UTC | CORE | INFO | (pkg/collector/scheduler/scheduler.go:85 in Enter) | Scheduling check kubelet with an interval of 15s
2020-10-23T01:28:16.039383659Z 2020-10-23 01:28:16 UTC | CORE | INFO | (pkg/logs/scheduler/scheduler.go:66 in Schedule) | Received a new logs config: custom_log_collection
2020-10-23T01:28:16.120692131Z 2020-10-23 01:28:16 UTC | CORE | INFO | (pkg/logs/input/file/scanner.go:248 in handleTailingModeChange) | Tailing mode changed for file:/var/log/containers/application.log. Was: end: Now: beginning
2020-10-23T01:28:16.120876105Z 2020-10-23 01:28:16 UTC | CORE | INFO | (pkg/logs/input/file/scanner.go:225 in startNewTailer) | Starting a new tailer for: /var/log/containers/application.log (offset: 0, whence: 0) for tailer key /var/log/containers/application.log
2020-10-23T01:28:16.120974288Z 2020-10-23 01:28:16 UTC | CORE | INFO | (pkg/logs/input/file/tailer_nix.go:29 in setup) | Opening /var/log/containers/application.log for tailer key /var/log/containers/application.log
2020-10-23T01:28:16.430223728Z security-agent exited with code 0, disabling
2020-10-23T01:28:16.725359315Z 2020-10-23 01:28:16 UTC | TRACE | INFO | (pkg/trace/info/stats.go:101 in LogStats) | No data received
2020-10-23T01:28:17.039473708Z 2020-10-23 01:28:17 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:eks_fargate | Running check
2020-10-23T01:28:17.040322890Z 2020-10-23 01:28:17 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | check:eks_fargate | Done running check
2020-10-23T01:28:19.447889419Z 2020-10-23 01:28:19 UTC | CORE | INFO | (pkg/forwarder/transaction.go:293 in internalProcess) | Successfully posted payload to "https://7-23-1-app.agent.datadoghq.com/api/v1/check_run?api_key=*************************42793", the agent will only log transaction success every 500 transactions
2020-10-23T01:28:20.824332895Z 2020-10-23 01:28:20 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:20.824514815Z 2020-10-23 01:28:20 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:20.829773592Z 2020-10-23 01:28:20 UTC | PROCESS | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:20.831014266Z 2020-10-23 01:28:20 UTC | PROCESS | INFO | (collector.go:157 in runCheck) | Finished process check #2 in 9.506448ms
2020-10-23T01:28:20.924494655Z 2020-10-23 01:28:20 UTC | CORE | WARN | (pkg/util/ec2/ec2_tags.go:90 in GetTags) | unable to get tags from aws and cache is empty: unable to fetch EC2 API, Get "http://169.254.169.254/latest/dynamic/instance-identity/document/": dial tcp 169.254.169.254:80: i/o timeout (Client.Timeout exceeded while awaiting headers)
2020-10-23T01:28:20.924685111Z 2020-10-23 01:28:20 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:20.924829655Z 2020-10-23 01:28:20 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:21.021496382Z 2020-10-23 01:28:21 UTC | CORE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:21.034212379Z 2020-10-23 01:28:21 UTC | PROCESS | INFO | (pkg/forwarder/transaction.go:293 in internalProcess) | Successfully posted payload to "https://process.datadoghq.com/api/v1/collector", the agent will only log transaction success every 500 transactions
2020-10-23T01:28:22.022345546Z 2020-10-23 01:28:22 UTC | CORE | WARN | (pkg/util/gce/gce_tags.go:48 in getCachedTags) | unable to get tags from gce and cache is empty: Get "http://169.254.169.254/computeMetadata/v1/?recursive=true": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2020-10-23T01:28:23.323157843Z 2020-10-23 01:28:23 UTC | CORE | INFO | (pkg/metadata/host/host.go:187 in getNetworkMeta) | could not get network metadata: could not detect network ID
2020-10-23T01:28:23.340037660Z 2020-10-23 01:28:23 UTC | CORE | INFO | (pkg/serializer/serializer.go:356 in sendMetadata) | Sent metadata payload, size (raw/compressed): 2839/1257 bytes.
2020-10-23T01:28:24.039319627Z 2020-10-23 01:28:24 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:kubelet | Running check
2020-10-23T01:28:24.039763654Z 2020-10-23 01:28:24 UTC | CORE | ERROR | (pkg/collector/python/kubeutil.go:40 in getConnections) | connection to kubelet failed: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-10-23T01:28:24.041136843Z 2020-10-23 01:28:24 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically: cannot connect: https: \"Get \\\"https://:10250/pods\\\": dial tcp :10250: connect: connection refused\", http: \"Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused\"", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 828, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 295, in check\n    raise CheckException(\"Unable to detect the kubelet URL automatically: \" + kubelet_conn_info.get('err', ''))\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically: cannot connect: https: \"Get \\\"https://:10250/pods\\\": dial tcp :10250: connect: connection refused\", http: \"Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused\"\n"}]
2020-10-23T01:28:24.041290328Z 2020-10-23 01:28:24 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | check:kubelet | Done running check
2020-10-23T01:28:25.526248759Z 2020-10-23 01:28:25 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-10-23T01:28:30.824412748Z 2020-10-23 01:28:30 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:30.824612150Z 2020-10-23 01:28:30 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:30.829810716Z 2020-10-23 01:28:30 UTC | PROCESS | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:30.830633075Z 2020-10-23 01:28:30 UTC | PROCESS | INFO | (collector.go:157 in runCheck) | Finished process check #3 in 9.154047ms
2020-10-23T01:28:31.523832204Z 2020-10-23 01:28:31 UTC | TRACE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:31.523950319Z 2020-10-23 01:28:31 UTC | TRACE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:31.528839712Z 2020-10-23 01:28:31 UTC | TRACE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:32.039495149Z 2020-10-23 01:28:32 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:eks_fargate | Running check
2020-10-23T01:28:32.039518121Z 2020-10-23 01:28:32 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | check:eks_fargate | Done running check
2020-10-23T01:28:33.627787576Z 2020-10-23 01:28:33 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:33.627878933Z 2020-10-23 01:28:33 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:33.632979356Z 2020-10-23 01:28:33 UTC | CORE | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:35.526102059Z 2020-10-23 01:28:35 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-10-23T01:28:39.039470107Z 2020-10-23 01:28:39 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:kubelet | Running check
2020-10-23T01:28:39.040448300Z 2020-10-23 01:28:39 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically: cannot connect: https: \"Get \\\"https://:10250/pods\\\": dial tcp :10250: connect: connection refused\", http: \"Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused\"", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 828, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 295, in check\n    raise CheckException(\"Unable to detect the kubelet URL automatically: \" + kubelet_conn_info.get('err', ''))\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically: cannot connect: https: \"Get \\\"https://:10250/pods\\\": dial tcp :10250: connect: connection refused\", http: \"Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused\"\n"}]
2020-10-23T01:28:39.040604712Z 2020-10-23 01:28:39 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | check:kubelet | Done running check
2020-10-23T01:28:40.824446620Z 2020-10-23 01:28:40 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:40.824470177Z 2020-10-23 01:28:40 UTC | PROCESS | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification
2020-10-23T01:28:40.829379036Z 2020-10-23 01:28:40 UTC | PROCESS | WARN | (pkg/util/kubernetes/kubelet/kubelet.go:526 in setupKubeletAPIEndpoint) | Failed to securely reach the kubelet over HTTPS, received a status 403. Trying a non secure connection over HTTP. We highly recommend configuring TLS to access the kubelet
2020-10-23T01:28:40.830267214Z 2020-10-23 01:28:40 UTC | PROCESS | INFO | (collector.go:157 in runCheck) | Finished process check #4 in 8.786437ms
2020-10-23T01:28:45.526222028Z 2020-10-23 01:28:45 UTC | CORE | ERROR | (pkg/autodiscovery/config_poller.go:123 in collect) | Unable to collect configurations from provider kubernetes: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-10-23T01:28:45.536641144Z 2020-10-23 01:28:45 UTC | CORE | INFO | (pkg/autodiscovery/autoconfig.go:363 in initListenerCandidates) | kubelet listener cannot start, will retry: temporary failure in kubeutil, will retry later: try delay not elapsed yet
2020-10-23T01:28:47.039415952Z 2020-10-23 01:28:47 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:eks_fargate | Running check
2020-10-23T01:28:47.039750169Z 2020-10-23 01:28:47 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | check:eks_fargate | Done running check
2020-10-23T01:28:50.824650192Z 2020-10-23 01:28:50 UTC | PROCESS | INFO | (collector.go:159 in runCheck) | Finished process check #5 in 3.081572ms. First 5 check runs finished, next runs will be logged every 20 runs.
2020-10-23T01:28:54.039344258Z 2020-10-23 01:28:54 UTC | CORE | INFO | (pkg/collector/runner/runner.go:261 in work) | check:kubelet | Running check
2020-10-23T01:28:54.040321386Z 2020-10-23 01:28:54 UTC | CORE | ERROR | (pkg/collector/runner/runner.go:292 in work) | Error running check kubelet: [{"message": "Unable to detect the kubelet URL automatically: cannot connect: https: \"Get \\\"https://:10250/pods\\\": dial tcp :10250: connect: connection refused\", http: \"Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused\"", "traceback": "Traceback (most recent call last):\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py\", line 828, in run\n    self.check(instance)\n  File \"/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/kubelet/kubelet.py\", line 295, in check\n    raise CheckException(\"Unable to detect the kubelet URL automatically: \" + kubelet_conn_info.get('err', ''))\ndatadog_checks.base.errors.CheckException: Unable to detect the kubelet URL automatically: cannot connect: https: \"Get \\\"https://:10250/pods\\\": dial tcp :10250: connect: connection refused\", http: \"Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused\"\n"}]
2020-10-23T01:28:54.040404723Z 2020-10-23 01:28:54 UTC | CORE | INFO | (pkg/collector/runner/runner.go:327 in work) | check:kubelet | Done running check
2020-10-23T01:28:55.526610792Z 2020-10-23 01:28:55 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/kubelet.go:714 in setKubeletHost) | EKS on Fargate mode detected, will proxy calls to the Kubelet through the APIServer at https://172.20.0.1:443/api/v1/nodes/fargate-ip-10-9-22-178.ec2.internal/proxy/
2020-10-23T01:28:55.526759965Z 2020-10-23 01:28:55 UTC | CORE | INFO | (pkg/util/kubernetes/kubelet/init.go:30 in buildTLSConfig) | Skipping TLS verification

The important bit and the one that continues to repeat itself is Get \\\"http://:10255/pods\\\": dial tcp :10255: connect: connection refused.

I followed this tutorial and the documentation to get this setup, but there is very little documentation on EKS + Fargate.

This is a similar issue to datadog/integrations#2582 && datadog/datadog-agent#2582 (and a bunch of others).

It is worth noting that I do have the datadog agent running successfully on my normal EKS worker nodes, but I have yet to have any success with Fargate. Would appreciate a pointer in the right direction or what I can do to further debug this. For example, I believe I have RBAC setup correctly (yaml below), but how can I test that? Thanks!

Describe what you expected:

I expected the pod to run without errors and be able to reach the kubelet.

Steps to reproduce the issue:

Here is my datadog agent sidecar helm template:

- image: datadog/agent:7
  name: datadog-agent

  ## Enabling port 8125 for DogStatsD metric collection
  ports:
  - containerPort: 8125
    name: dogstatsdport
    protocol: UDP

  - containerPort: 8126
    name: traceport
    protocol: TCP

  env:

  - name: DD_API_KEY
    valueFrom:
      secretKeyRef:
        name: datadog-secrets
        key: api-key

  - name: DD_APP_KEY
    valueFrom:
      secretKeyRef:
        name: datadog-secrets
        key: app-key

  - name: DD_ENV
    value: {{ include "REDACTED" . }}

  - name: DD_TAGS
    value: 'cluster_name:{{ include "REDACTED" . }}'

  - name: DD_KUBERNETES_POD_LABELS_AS_TAGS
    value: '{"app.kubernetes.io/name": "kube_app_name","app.kubernetes.io/version": "kube_app_version"}'

  - name: DD_COLLECT_KUBERNETES_EVENTS
    value: "true"

  - name: DD_LEADER_ELECTION
    value: "true"

  - name: DD_PROCESS_AGENT_ENABLED
    value: "true"

  - name: DD_LOG_LEVEL
    value: "INFO"

  - name: DD_LOGS_ENABLED
    value: "true"

  - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
    value: "true"

  - name: DD_CONTAINER_EXCLUDE
    value: "name:datadog-agent"

  - name: DD_APM_ENABLED
    value: "true"

  - name: DD_EKS_FARGATE
    value: "true"

  - name: DD_KUBELET_TLS_VERIFY
    value: "false"

  - name: DD_KUBERNETES_KUBELET_NODENAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

  resources:
    requests:
      memory: "256Mi"
      cpu: "200m"
    limits:
      memory: "256Mi"
      cpu: "200m"

  volumeMounts:
    - name: app-logs
      mountPath: /var/log/containers/

    - name: {{ include "oc-lib.ddConfigMapName" . }}
      mountPath: /etc/datadog-agent/conf.d/custom_log_collection.d/

The underlying app's service account has the following RBAC permissions bound to it and the service account directory is mount:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: datadog-agent
rules:
  - apiGroups:
      - ""
    resources:
      - nodes/metrics
      - nodes/spec
      - nodes/stats
      - nodes/proxy
      - nodes/pods
      - nodes/healthz
    verbs:
      - get

Additional environment details (Operating System, Cloud provider, etc):

Kubernetes Version: 1.17
EKS Platform: eks.3

@dogewithit
Copy link

I have the same issue

@nmadmon
Copy link

nmadmon commented Oct 28, 2020

I have the same issue
root@datadog-rx29v:/# env | grep DD_KUBERNETES_KUBELET_HOST
DD_KUBERNETES_KUBELET_HOST=172.50.0.90
root@datadog-rx29v:/# curl $DD_KUBERNETES_KUBELET_HOST:10255/healthz
curl: (7) Failed to connect to 172.50.0.90 port 10255: Connection refused

@Gowiem
Copy link
Author

Gowiem commented Oct 28, 2020

@assinnata @nmadmon I'm being told by DD support that this likely related to permissions, which I thought but didn't have a good way to test. I'll be testing that out today and if that ends up bring the case then I'll let you folks know.

@nmadmon
Copy link

nmadmon commented Nov 2, 2020

@Gowiem , did you succeed to find the root cause?

@Gowiem
Copy link
Author

Gowiem commented Nov 4, 2020

@nmadmon @assinnata I did. It did end up being a RBAC permissions issue. Here are the notes from DD support that helped me figure that out:

Could you please, inside the Datadog Pod test the following command?
TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token) && curl https://$DD_KUBERNETES_KUBELET_HOST:10250/pods -v -k -H "Authorization: Bearer $TOKEN"

If this works, let's try adding the SSL certifications.
TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token) && curl https://$DD_KUBERNETES_KUBELET_HOST:10250/pods -v --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt -H "Authorization: Bearer $TOKEN"

If this doesn't work, this issue might be an authorization issue.

Use the following Agent RBAC when deploying the Agent as a sidecar in AWS EKS Fargate:

apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRole
metadata:
  name: datadog-agent
rules:
  - apiGroups:
      - ""
    resources:
      - nodes/metrics
      - nodes/spec
      - nodes/stats
      - nodes/proxy
      - nodes/pods
      - nodes/healthz
    verbs:
      - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: datadog-agent
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: datadog-agent
subjects:
  - kind: ServiceAccount
    name: datadog-agent
    namespace: default
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: datadog-agent
  namespace: default

Could you connect directly to the host and run?
ps aux | grep kubelet | grep -v grep

Is --authentication-token-webhook set?

Good luck with it!

@bhanu8824
Copy link

bhanu8824 commented Oct 12, 2022

"Unable to detect the kubelet URL automatically: impossible to reach Kubelet with host: 172.31.33.128. Please check if your setup requires kubelet_tls_verify = false. Activate debug logs to see all attempts made"

i am getting this error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants