Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to read authentication token file #16559

Closed
ljsinclair opened this issue Apr 13, 2023 · 8 comments
Closed

[BUG] Unable to read authentication token file #16559

ljsinclair opened this issue Apr 13, 2023 · 8 comments

Comments

@ljsinclair
Copy link

Agent Environment

  • Agent v7
  • Ubuntu (running in WSL2 on WIndows 10)
  • Logged in as administrator user (non ROOT)

Describe what happened:

  1. Datadog not successfully connecting to datadoghq
  2. Datadog reports error "Error: unable to read authentication token file: open /etc/datadog-agent/auth_token: no such file or directory"

Describe what you expected:

  • Datadog connects successfully to datadoghq
  • Datadog does not report errors

Steps to reproduce the issue:

  • Installed Datadog-Agent via the New account UI

Result: Agent installed and started up correctly:

* Starting the Datadog Agent...

  Your Datadog Agent is running and functioning properly.
  It will continue to run in the background and submit metrics to Datadog.
  If you ever want to stop the Datadog Agent, run:

      sudo systemctl stop datadog-agent

  And to run it again run:

      sudo systemctl start datadog-agent
  • Run sudo datadog-agent status

Result:

Error: unable to read authentication token file: open /etc/datadog-agent/auth_token: no such file or directory

Additional environment details (Operating System, Cloud provider, etc):

  • Host system = windows 10
  • WSL2 running Ubuntu
  • Datadog installed into WSL2 Ubuntu
  • Docker installed on WSL2 but not running any containers
@ljsinclair ljsinclair changed the title [BUG] [BUG] Unable to read authentication token file Apr 13, 2023
@azaky
Copy link

azaky commented Aug 1, 2023

I have the same exact issue. I was able to fix this by manually setting hostname in /etc/datadog-agent/datadog.yaml.

Running from Windows 11, WSL2 Ubuntu 22.04, Agent version 7.46.0.

@fagianijunior
Copy link

I have the same issue running as sidecar on ECS fargate.

@wemersonferreira
Copy link

Hi everyone, I'm having the same issue, this problem started when I updated datadog agent from 7.38.2 to 7.49.1. I can't use the same soluction as @azaky because we have a fleet of hots, we don't have the exacle hostname for all.
My orchestration is EKS with EC2 linux2.

@tofi86
Copy link

tofi86 commented Dec 12, 2023

Happens for me as well when upgrading from 7.45.1 to 7.49 or even 7.49.1.

hostname has already been declared manually in /etc/datadog-agent/datadog.yaml

Downgrading to 7.45.1 worked for me.

@2rs2ts
Copy link

2rs2ts commented Dec 19, 2023

Happening for me too. (Agent is running Kubernetes, on nodes using Debian Bullseye.) Why is this marked as completed when there is no resolution? Manually declaring the hostname doesn't even make sense as a solution here, it has nothing to do with "no such file or directory" errors. Our agents are not having problems finding their hostnames anyway.

Some sample logs when it goes to write the file:

2023-12-19 20:14:48 UTC | CORE | INFO | (pkg/api/security/security.go:250 in saveAuthToken) | Wrote auth token
2023-12-19 20:14:48 UTC | CORE | INFO | (pkg/api/security/security.go:144 in fetchAuthToken) | Saved a new authentication token to /etc/datadog-agent/auth_token
2023-12-19 20:14:48 UTC | CORE | ERROR | (cmd/agent/subcommands/run/command.go:472 in startAgent) | Error while starting api server, exiting: unable to read authentication token file: open /etc/datadog-agent/auth_token: permission denied

Of course we have tried changing DD_AUTH_TOKEN_FILE_PATH to point to a mounted emptyDir, but no dice. We're wondering if the issue is with something that happened to the docker images themselves, because the docs site has example templates that run the agent as root instead of as dd-agent, which is a departure from the norm.

By the way, I tested 7.45.1 per the last user's comment, but it still has this issue. I even tried as old of images as 7.38.2, still has the issue. I know 7.32.4 works, because that's what we're running now.

@2rs2ts
Copy link

2rs2ts commented Jan 10, 2024

We figured the problem out. At some point, the docker image was changed so that it didn't declare these directories as VOLUMEs in the Dockerfile, and since the docker image runs as non-root by default, you run into this funny case in Kubernetes where, because the uid in the container doesn't have permission to write to kubelet's dir where it creates emptyDirs, you can't write to any emptyDirs unless you give your container the DAC_OVERRIDE capability or you run it as root. When you download manifests from datadog's docs site, you will see they configure it to run as root, but that seemed like using a sledgehammer to drill a hole in a wall, so we just gave our agent the capability necessary.

Apparently, other features of the agent require even more capabilities, so running as root removes the most roadblocks for you, so, pick your poison, I guess.

@baltox61
Copy link

baltox61 commented May 9, 2024

2rs2ts so what did you do to give it the necessary capabilities

@2rs2ts
Copy link

2rs2ts commented May 13, 2024

@baltox61 it seems to depend on the exact stuff your container is doing, but, you at least have to give the container DAC_OVERRIDE. (Forgive me for being inexact. We've had to deal with other issues and so our capabilities list isn't just this one capability for this one issue; I just know for sure that DAC_OVERRIDE is needed.)

Alternatively you can just run the container as root, which is the default configuration when you pull a manifest from DataDog's docs site. The image itself says not to run as root, but the manifest they provide on their docs site overrides the securityContext to make the container run as root. (I have, at times, wondered if we should just run as root... even though that's not considered a best practice, it also isn't good to give containers DAC_OVERRIDE. It's kind of nuts that they introduced such a major security regression.)

If you are asking "how does one give a container extra capabilities in Kubernetes," then it's all in the securityContext: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants