Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Root Daemon: Not running #3551

Closed
phooijenga opened this issue Mar 26, 2024 · 9 comments · Fixed by #3669
Closed

Root Daemon: Not running #3551

phooijenga opened this issue Mar 26, 2024 · 9 comments · Fixed by #3669
Labels
feature New feature or enhancement request stale Issue is stale and will be closed

Comments

@phooijenga
Copy link
Contributor

Describe the bug

The telepresence root daemon does not start. The daemon.log file is empty.

To Reproduce

  1. Run telepresence connect. It tells me it needs root privileges, which I provide:
    $ telepresence connect
    Launching Telepresence Root Daemon
    Need root privileges to run: /Users/paul/bin/telepresence daemon-foreground /Users/paul/Library/Logs/telepresence '/Users/paul/Library/Application Support/telepresence'
  2. Run telepresence status, observe that root daemon is not running
    $ telepresence status
    OSS User Daemon: Running
      Version           : v2.18.0
      Executable        : /Users/paul/bin/telepresence
      Install ID        : b4931622-dabf-46bc-8218-266d4b782476
      Status            : Connected
      Kubernetes server : https://198.19.249.184:6443
      Kubernetes context: founda-k3s-1
      Namespace         : apps
      Manager namespace : ambassador
      Intercepts        : 0 total
    Root Daemon: Not running
    OSS Traffic Manager: Connected
      Version      : v2.18.0
      Traffic Agent: docker.io/datawire/tel2:2.18.0

I don't see the daemon-foreground in ps output. When I run it manually it doesn't seem to crash (and writes a startup message to daemon.log), but telepresence status still reports 'not running'. It does create a /var/run/telepresence-daemon.socket.

Expected behavior

A clear and concise description of what you expected to happen.

Versions (please complete the following information):

OSS Client         : v2.18.0
OSS Root Daemon    : v2.18.0
OSS User Daemon    : v2.18.0
OSS Traffic Manager: v2.18.0
Traffic Agent      : docker.io/datawire/tel2:2.18.0

macOS Sonoma 14.4.1 (23E224)

Additional context

It appears as if this issue started happening after upgrading to macOS Sonoma 14.4.1.

daemon.log

@thallgren
Copy link
Member

Is this amd64 or arm64 (M1)?

@phooijenga
Copy link
Contributor Author

M1.

$ arch
arm64
$ file `which telepresence`
/Users/paul/bin/telepresence: Mach-O 64-bit executable arm64

@phooijenga
Copy link
Contributor Author

I did some debugging, and it turns out that EnsureUserDaemon swallows the error returned by ensureRootDaemonRunning here.

In my case, the error is "daemon service did not start: timeout while waiting for daemon to start", which unfortunately does not tell us anything new.

@phooijenga
Copy link
Contributor Author

phooijenga commented Mar 28, 2024

So, it turns out that this system has timestamp_timeout=0 configured, and running sudo true doesn't actually do anything.

Apparently timestamp_timeout=0 is now company policy, so I can't simply change it.

@phooijenga
Copy link
Contributor Author

Alright, to wrap this all up: if I manually start the root daemon with sudo before running telepresence connect, it works.

@thallgren
Copy link
Member

Thanks for the info. Any ideas on how we can improve how this is handled in Telepresence?

@phooijenga
Copy link
Contributor Author

I think not hiding the error is a good start (#3559), but I'm not sure if the underlying problem can be solved completely. Maybe ensureRootDaemonRunning could check if the process is still alive as well as trying to connect to the socket. That way the user wouldn't have to wait the full 10 seconds to be told the daemon failed to start.

Another possibility (which I've not extensively tested) might be to run sudo --list (instead of sudo true) and check for timestamp_timeout=0 in the output. If it's there, telepresence can instruct the user how to run the daemon themself.
Another possibility would be to use sudo --non-interactive --no-update --validate to check if the user's cached credentials are valid (or no authentication is required) twice, once before prompting (instead of the current sudo --non-interactive true) and once again after to make sure the credentials are indeed cached.

@cindymullins-dw
Copy link
Collaborator

It looks like the error display has been addressed. I'll leave this open as a feature request for the process check suggestions.

@cindymullins-dw cindymullins-dw added the feature New feature or enhancement request label Apr 8, 2024
Copy link

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment, or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and will be closed label Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or enhancement request stale Issue is stale and will be closed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants