Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxc exec inconsistently fails to return when detached #13425

Closed
masnax opened this issue May 2, 2024 · 5 comments
Closed

lxc exec inconsistently fails to return when detached #13425

masnax opened this issue May 2, 2024 · 5 comments

Comments

@masnax
Copy link
Contributor

masnax commented May 2, 2024

If lxc exec is detached and run concurrently many times, if the command returns quick enough (for example if the command doesn't exist), then there is a chance lxc exec will fail to return:

#!/bin/bash

for i in $(seq 0 50) ; do
  # This will return `command not found` in most cases, but sometimes get stuck.
  lxc exec c1 -- a-fake-command &
done

wait
@simondeziel
Copy link
Member

simondeziel commented May 2, 2024

Unfortunately, I haven't been able to reproduce this even after bumping from 50 to 500 the loop count. I've been running close to 3k iterations at this point and it didn't hang :/

I'm running this on my laptop (not a GH runner) with latest/edge and c1 is a container, if that matters.

@masnax
Copy link
Contributor Author

masnax commented May 2, 2024

Yes I'm starting to get very perplexed about it myself. I've seen it happen in the github runners and on both a container and a VM in LXD which runs the microcloud test suite.

However, if I spin up a pristine VM and try it, it always passes, so it might be something to do with the configuration of the VM?

@simondeziel
Copy link
Member

I've got a bit more luck in a single CPU VM running LXD 5.0.3:

Error: Command not found
...
Error: Command not found
Error: Command not found
Error: Command not found
Error: Command not found
Error: Command not found
Error: Command not found
Error: Command not found
Error: Command not found
Error: unexpected EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: Get "http://unix.socket/1.0": EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: unexpected EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: unexpected EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: unexpected EOF
Error: read unix @->/var/snap/lxd/common/lxd/unix.socket: read: connection reset by peer
Error: unexpected EOF
Error: Get "http://unix.socket/1.0/operations/ef0eaa43-8ef8-4f22-8ad4-6c4013e9ed2d": EOF
Error: Post "http://unix.socket/1.0/instances/c1/exec": EOF
Error: websocket: close 1006 (abnormal closure): unexpected EOF
Error: Operation not found
Error: Operation not found
Error: Operation not found

@simondeziel
Copy link
Member

With latest/edge in that same VM, it seems to "work" with the caveat that the automatic scope allocation by systemd is having trouble coping with the "DoS":

Error: Command not found
....
Error: Command not found
internal error, please report: running "lxd.lxc" failed: transient scope not created in 10s
internal error, please report: running "lxd.lxc" failed: transient scope not created in 10s
...

@masnax
Copy link
Contributor Author

masnax commented May 2, 2024

Yeah I've narrowed this down to something up with my old containers.

If I spin up a fresh container with the same config as the old one, I don't run into any issues. But if I copy the original container, the new container still has this issue.

So looks like whatever is causing this was on my end. Still not sure what it could be, but we can close the issue at least.

@masnax masnax closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants