Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[intermittent]: podman run -a stdin&stderr: read unixpacket: connection reset by peer #3302

Closed
edsantiago opened this issue Jun 11, 2019 · 27 comments · Fixed by #4818
Closed
Assignees
Labels
do-not-close locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@edsantiago
Copy link
Collaborator

This is a combination that I can find no realistic use for, but even so I think merits attention because it might present itself in other more real-world situations:

# echo true | podman run -a stdin -a stderr --tty=false alpine sh
Error: error attaching to container 22e7b45a00508b0e60a27288bb6c745100a2ee42335800de1d8e246cfe24bd48: read unixpacket @->/var/run/libpod/socket/22e7b45a00508b0e60a27288bb6c745100a2ee42335800de1d8e246cfe24bd48/attach: read: connection reset by peer

Reproduces maybe one in five attempts; the rest of the time it succeeds. It also fails with echo ls / and echo ls /sdf, but does not (seem to) fail with </dev/null (redirection, not pipe).

--log-level=debug adds nothing of value AFAICT—both pass and fail look identical to my eye—but I will provide on request.

podman-1.4.0-2.fc29 and fc30; but I think I've seen it before, just never taken time to pursue it. I can look in logs if necessary.

@rhatdan
Copy link
Member

rhatdan commented Jun 12, 2019

Must be a race condition.

@edsantiago
Copy link
Collaborator Author

Still present in podman-1.4.2-1.fc29 and fc30.

There's another failure which I think must be related. Instead of sh, run something that outputs to stderr:

# echo hi | podman run -a stdin -a stderr --interactive --tty=false  alpine ls /nonesuch
ls: /nonesuch: No such file or directory

At this point it hangs - AFAICT forever. ^D has no effect. podman ps in a separate window shows the container running. ^C yields:

^CERRO[0000] container not running
container not running
ERRO[0178] Error forwarding signal 2 to container 0af91336dd3700b2acd9889e92cd91e558a016510efc9dc5f5bd669ef921d69a: error sending si
gnal to container 0af91336dd3700b2acd9889e92cd91e558a016510efc9dc5f5bd669ef921d69a: `/usr/bin/runc kill 0af91336dd3700b2acd9889e92cd
91e558a016510efc9dc5f5bd669ef921d69a 2` failed: exit status 1

"container not running" even though a few seconds ago podman ps showed it running and, right now, after the ^C, podman ps still shows it running. But:

# podman exec 0af9 date
ERRO[0000] exec failed: cannot exec a container that has stopped
exec failed: cannot exec a container that has stopped
Error: exit status 1
# podman ps
CONTAINER ID  IMAGE                            COMMAND       CREATED        STATUS            PORTS  NAMES
0af91336dd37  docker.io/library/alpine:latest  ls /nonesuch  6 minutes ago  Up 6 minutes ago         gifted_zhukovsky

This one happens less frequently -- one in ten times, maybe -- but is still worth someone looking at, pretty please?

@mheon
Copy link
Member

mheon commented Jun 19, 2019

The exec thing will hopefully not be a problem after the wide-reaching refactor by @haircommander lands. Which will be by end of week. We're going to force it in if we need to.

For the rest... I'm going to bet it has something to do with the lack of a terminal. Terminal-less attach is poorly tested and rarely used.

@edsantiago
Copy link
Collaborator Author

Excellent - thank you.

@rhatdan
Copy link
Member

rhatdan commented Aug 5, 2019

Well the refactor has happened thanks @haircommander , @edsantiago could you confirm it works so we can close this issue.

@edsantiago
Copy link
Collaborator Author

I still see it - albeit only after tens of iterations - with podman 1.4.4-4.fc30 (rpm) and master @ b5618d9 (hand-built). Do I need a new conmon?

@rhatdan
Copy link
Member

rhatdan commented Aug 8, 2019

@haircommander Could answer @edsantiago question?

@haircommander
Copy link
Collaborator

the first issue as posted is not helped with a new conmon, and I also have observed a failure with:
echo hi | podman run -a stdin -a stderr --interactive --tty=false --rm alpine ls /nonesuch
However, the failure is similar to the issue as posted: intermittent

Error: error attaching to container 417eda8d842a3037d9be62108df9d20e85f03a963641b486a2b5ac467c31784b: read unixpacket @->/run/user/1000/libpod/tmp/socket/417eda8d842a3037d9be62108df9d20e85f03a963641b486a2b5ac467c31784b/attach: read: connection reset by peer

I did not observe a hang like you did, however

@edsantiago
Copy link
Collaborator Author

Issue still present in podman-1.5.0-2.fc30.x86_64 with, presumably, new conmon and everything.

@fkaempfer
Copy link

fkaempfer commented Aug 31, 2019

I get the same error consistently when running

podman run -it  --rm --name=alpine alpine:latest cat

And while it is open

echo hi | podman exec -i alpine cat
Error: read unixpacket @->/run/user/1000/libpod/tmp/socket/e7e03cd4fdc245d1dd391c443ad3fa532a4b92d18f62ba53ba79112968a1af3a/attach: read: connection reset by peer

It works fine when you run a command without pipe.

Is this the same issue? Can somebody reproduce this? I was wondering if something is wrong on my end (Fedora 30 with podman both from master or official, rootless)

EDIT: This seems to be a regression. On another machine it worked fine with f29 and podman 1.1.2, but as soon as I dnf update to podman 1.5.1 the error appears.
EDIT2: It also works fine with 1.4.4

@rhatdan
Copy link
Member

rhatdan commented Sep 1, 2019

$ echo hi | podman run -i alpine cat
hi

But I get the same error

$ podman run -d --name alpine alpine sleep 1000
8377f8b78b4d93d437f132d85ce6e2b656bba508e1ca979b77bc3d0678544367
$ echo hi | podman exec -i alpine cat
Error: read unixpacket @->/run/user/3267/libpod/tmp/socket/9d2580cac3db7a5251b34e0beb6b6412cb14482ea230bd65d8f77e7ef353b3aa/attach: read: connection reset by peer

@rhatdan
Copy link
Member

rhatdan commented Sep 1, 2019

@mheon @haircommander PTAL

@tinyzimmer
Copy link

tinyzimmer commented Sep 11, 2019

I don't know if its addressable (might even be desired) because sometimes the command inside the container is actually succeeding, just with nothing on stdin.

But thought I might mention that when this happens, the exit code is 0. Was very confusing to track down in an automated workflow.

@edsantiago
Copy link
Collaborator Author

ping, still seeing this in podman-1.6.0-2.fc30

@maflcko
Copy link

maflcko commented Sep 25, 2019

This can be worked around by downgrading the package to 1.4.4, it seems:

$ podman --version
podman version 1.5.1
$ podman run -d --name alpine alpine sleep 1000
$ echo hi | podman exec -i alpine cat
Error: read unixpacket @->/run/user/1000/libpod/tmp/socket/a87821f16138588f14f583486683fc1bce72258be4bde94d549db5ec2f544d6f/attach: read: connection reset by peer

$ podman --version
podman version 1.4.4
$ podman run -d --name alpine alpine sleep 1000
$ echo hi | podman exec -i alpine cat
hi
$ rpm -q podman
podman-1.4.4-4.fc30.x86_64

@edsantiago
Copy link
Collaborator Author

This can be worked around by downgrading the package to 1.4.4,

Unlikely: I filed against 1.4.0, and the problem has never been fixed. I suspect if you try enough times, you'll run into it again. After all, it's an intermittent problem.

@mheon
Copy link
Member

mheon commented Sep 25, 2019 via email

@mheon
Copy link
Member

mheon commented Oct 1, 2019

Both @haircommander and I are looking at this one. We're pretty sure it has something to do with the attach socket - it seems like it's being closed prematurely, potentially on Conmon's side?

@mheon
Copy link
Member

mheon commented Oct 3, 2019

Debugging further:

We're getting an EOF on the container STDERR fd in Conmon, but I'm not sure if it's actually the container - the only things I see being copied there are Conmon debug logs?

@haircommander
Copy link
Collaborator

@mheon I think that EOF comes from the piped input. If you do the same command but with no pipe cat hello, the go routine redirecting stdin to conmon doesn't actually get past a buf.Read(), as it's waiting for enough bytes, where the EOF makes the read terminate. I think stdin actually gets passed down through the os.Exec() call, and the go routine reading stdin is just a way to figure out when to terminate podman (which it is doing prematurely, it seems). All suspicions, and I haven't gotten much more time to check out more

@mheon
Copy link
Member

mheon commented Oct 3, 2019

@haircommander I think that Conmon might not be handling -i properly in this case - it seems like the exec session is finishing immediately (cat decided to output nothing and exit), when it should be waiting for input.

@haircommander
Copy link
Collaborator

there's an option --leave-stdin-open that doesn't close stdin immediately when it thinks input is done, but that didn't help. I would not be surprised if -i wasn't being handled right. That said, I'm not even sure we pass -i down to conmon in exec as is.

@github-actions
Copy link

github-actions bot commented Nov 3, 2019

This issue had no activity for 30 days. In the absence of activity or the "do-not-close" label, the issue will be automatically closed within 7 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 3, 2019

@haircommander Did your merge fix this issue?
@edsantiago Could you check if this issue is fixed now?

@haircommander
Copy link
Collaborator

I'm not sure this issue is fixed yet

@edsantiago
Copy link
Collaborator Author

Still present in podman-1.6.2-2.fc30 and in master @ efc7f15

@maflcko
Copy link

maflcko commented Nov 12, 2019

Would be nice to get this fixed in the next couple of months, because it prevents affected users from upgrading to fedora 31, as I believe there haven't been any compiled and tested releases of pre-1.5 versions of podman for fedora 31.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-close locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants