Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process crashes and hangs if invalid entrypoint provided #326

Closed
jsturtevant opened this issue Sep 20, 2023 · 4 comments · Fixed by #342
Closed

Process crashes and hangs if invalid entrypoint provided #326

jsturtevant opened this issue Sep 20, 2023 · 4 comments · Fixed by #342

Comments

@jsturtevant
Copy link
Contributor

Using the lastest commit commit df35a387799248b3e82b52609890d02f58a411e5

Pass an invalid entrypoint and the process hangs for ~30 seconds eventually spitting out the error Others("failed to receive. \"waiting for init ready\". BrokenChannel"). During which time no logs are output to containerds log stream.

❯ sudo ./ctr run --rm --runtime=io.containerd.wasmtime.v1 ghcr.io/containerd/runwasi/wasi-demo-app:latest testwasm /something.exe echo

~ 30sec

ctr: Others("failed to receive. \"waiting for init ready\". BrokenChannel"): unknown

Running it a second time and the process hangs indefinitely.

❯ sudo ./ctr run --rm --runtime=io.containerd.wasmtime.v1 ghcr.io/containerd/runwasi/wasi-demo-app:latest testwasm /something.exe echo
hangs....

analysis

Was able to track that down to the failed process earlier where it didn't clean up properly ending up with

bind(5, {sa_family=AF_UNIX, sun_path="/run/containerd/dfe5ace9c8dbc561.sock"}, 40) = -1 EADDRINUSE (Address already in use)

I've tracked this error down "failed to receive. \"waiting for init ready\". BrokenChannel" to https://github.com/containers/youki/blob/57ffefe89318df1cd1d1487c3cbcd4576a2d4dea/crates/libcontainer/src/process/channel.rs#L166-L173

I believe reason for the first "init ready" is somewhere in

https://github.com/containerd/runwasi/blob/main/crates/containerd-shim-wasm/src/sys/unix/container/executor.rs#L87C1-L106

and it not handling the errors properly. Not sure if this is libcontainer or runwasi handling it wrong.

@jprendes
Copy link
Collaborator

jprendes commented Sep 20, 2023

Interesting. We probe for Linux entrypoint in https://github.com/containerd/runwasi/blob/main/crates/containerd-shim-wasm/src/sys/unix/container/executor.rs#L87C1-L106 as you pointed out, and if that fails, we probe for a wasm entrypoint here https://github.com/containerd/runwasi/blob/main/crates/containerd-shim-wasm/src/container/engine.rs#L22-L37. If that also fails, we tell libcontainer that we can't handle that container.

I don't see where it might be getting stuck.

@jprendes
Copy link
Collaborator

Ok, I see a slightly different behaviour.
When I run it the first time, it errors immediately with ctr: Others("failed to receive. \"waiting for init ready\". BrokenChannel"): unknown
When I run it a second time, it hangs forever.

I also noticed that after the first run, there's a dangling containerd-shim-wasmtime-v1 process.
If I kill that process and run it a second time, then the second time it also errors immediately.

I also tested with plain youki, and it always errors immediately, so the dangling process is due to something we are (not?) doing.

@jprendes
Copy link
Collaborator

Ok, I did some more debuggging.
The dangling process is waiting here: https://github.com/containerd/runwasi/blob/main/crates/containerd-shim-wasm/src/sandbox/shim.rs#L1132
Will debug further tomorrow.

@jsturtevant
Copy link
Contributor Author

this should be resolved in #340. some future improvements could be #342 and containers/youki#2389 and are being tracked independently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants