Report error events on pull failures #1842

rhatdan · 2024-02-05T16:19:05Z

Fixes: containers/podman#21458

openshift-ci · 2024-02-05T16:19:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rhatdan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rhatdan · 2024-02-05T16:19:19Z

@vrothberg PTAL

rhatdan · 2024-02-05T16:23:15Z

@vrothberg @mheon Would this work?

rhatdan · 2024-02-05T21:08:33Z

@benoitf PTAL

rhatdan · 2024-02-05T21:12:25Z

@mheon @baude is this something we want to support?

benoitf · 2024-02-05T21:12:58Z

thanks @rhatdan for the notice !

I was wondering if in the error message it will display as image name, the image name that has been resolved or the image name the user entered.

for example if I use 'podman pull hello' and I've an error, will it provide hello as image name or quay.io/podman/hello ?

rhatdan · 2024-02-05T21:15:26Z

Currently what I have written here will not work, because Events do not currently cover errors. We would need to handle the Error struct in podman.

I think the Name would be the expanded name.

vrothberg

This breaks compatibility with Docker and previous versions of Podman. Until now, a pull event has only been emitted on successful pull. With this change, we'd always emit an event - even in case of errors.

We can - for sure - do this, but we'd be more and more leaving Docker compat. An alternative could be to make this behavior opt-in via a containers.conf field.

benoitf · 2024-02-06T08:39:59Z

could it be a separate channel event like 'pull_error' (so it doesn't break the 'pull' channel)

vrothberg · 2024-02-06T08:43:55Z

for example if I use 'podman pull hello' and I've an error, will it provide hello as image name or quay.io/podman/hello ?

It would be hello but the error would include the fully-qualified name.

vrothberg · 2024-02-06T08:45:04Z

could it be a separate channel event like 'pull_error' (so it doesn't break the 'pull' channel)

That would also be a good option. I don't expect a potential pull_error to cause much noise, so I'd be totally on board.

Luap99 · 2024-02-06T09:46:43Z

I don't think this is a good idea long term, events are for things that happened not for things that failed.
Errors are always directly reported to the caller, I cannot see how another application can make much sense of such error and do something useful with it. The existing events are used by a lot of application to update their state and I rather not start including something completely different in the event stream.

From the issue:

Then, we could for example prompt the user to authenticate to some registries if not done, etc.

Pull errors are fundamentally flawed because registries can report whatever they choose. For example pulling docker.io/library/notexists results in an error saying requested access to the resource is denied so you would think that you need to login when the real error is that the image simply does not exists so promoting a user here is not helpful as the error was likely some typo in the image name instead.
And if I already run via cli then see such auth error then I for one would try to login via podman login on the cli, no? Would podman-desktop open a pop-up in the foreground to tell me to log in or how would this work

vrothberg · 2024-02-06T09:51:45Z

Pull errors are fundamentally flawed because registries can report whatever they choose. For example pulling docker.io/library/notexists results in an error saying requested access to the resource is denied so you would think that you need to login when the real error is that the image simply does not exists so promoting a user here is not helpful as the error was likely some typo in the image name instead.

It would always be an approximation. The good news is that we only care about registry.redhat.io and we know how this registry behaves.

And if I already run via cli then see such auth error then I for one would try to login via podman login on the cli, no? Would podman-desktop open a pop-up in the foreground to tell me to log in or how would this work

How I understand the approach is to also consider CLI workflows which are common for, let's say, using compose. Podman Desktop can then, after the fact, look into the logs/events and see whether there are errors related to the registry we care about. If there are such errors, Podman Desktop may choose to propose installing a given extension and to sign into the RH account.

rhatdan · 2024-02-06T12:26:00Z

Ok, I think this should be a cabal issue, where we all talk about the pros and cons of allowing this information flow.

Options:

Do nothing, keep the status quo
Fixup this patch and allow events to have a success/failure/error message
Create a new type of failure events where we can monitor for failures. ( I don't think this should be just for pull, since we could find other failure modes later.)

vrothberg · 2024-02-06T12:30:12Z

* Do nothing, keep the status quo

I am lazy but not that lazy :^)

* Fixup this patch and allow events to have a success/failure/error message

It's breaking backwards compat, so we if go forward, we need a good justification.

* Create a new type of failure events where we can monitor for failures. ( I don't think this should be just for pull, since we could find other failure modes later.)

I am perfectly fine with doing that only for pull. For that, we have a good use case which is currently lacking for other kinds of events. If other events need it in the future, we could use pull as a precedent.

So I am in favor of option 3): create pull_error event.

rhatdan · 2024-02-06T12:34:56Z

Ok so this is a normal event, just a pull error event. Where does the text go then?

vrothberg · 2024-02-06T12:38:54Z

Good question.

I see beauty in both options but have a slight tendency toward adding an Error error field to the event and let Podman deal with it. If it's a pullevent and the .Error is set, then it should write a pull_error event.

vrothberg

A test would be good. Set the channel, pull a bogus image (e.g., with an invalid transport) and make sure the .Error is set.

vrothberg · 2024-02-06T12:41:00Z

libimage/pull.go


 	// Dispatch the copy operation.
 	switch ref.Transport().Name() {
 	// DOCKER REGISTRY
 	case registryTransport.Transport.Name():
-		pulledImages, pullError = r.copyFromRegistry(ctx, ref, possiblyUnqualifiedName, pullPolicy, options)
+		pulledImages, err = r.copyFromRegistry(ctx, ref, possiblyUnqualifiedName, pullPolicy, options)


Shouldn't this continue to be pullError?

I could leave it as pullError, but wanted to avoid the confusion that pullError is only set here, when code above returning an error will get set as a pullError in the defer block.

vrothberg · 2024-02-06T12:41:33Z

libimage/events.go

@@ -18,6 +18,8 @@ const (
 	EventTypeUnknown EventType = iota
 	// EventTypeImagePull represents an image pull.
 	EventTypeImagePull
+	// EventTypeImagePullError represents an image pull failed.
+	EventTypeImagePullError


Do we need that here as well OR should we let Podman deal with it? I feel like Podman should handle it.

I am fine with going that way, but thought this might not be a breaking change, since we are adding a new event and not forcing apps that are looking for pullevent as a success indicator.

We can break here as Podman is the only user/caller of c/common and we're below 1.0.0.

What I meant by breaking change is that we cannot make this user visible. So external users of Podman reading the events should not notice a change. So we need a new pull_error (or something like that) for podman events (and the backends).

Assume some automation that is reading the logs/events for pulled images. Now, pull errors would show up and cause false positives.

We need to know that a pull happened, if we want to put this into Podman then we have to plumb this into tons of different places, and we would probably need to plumb it into buildah as well.

Luap99 · 2024-02-06T12:46:54Z

@mheon WDYT?

mheon · 2024-02-06T20:42:45Z

I'm inclined not to do this. We're already abusing the events system too much, and we have more than one request to scale back the amount we're logging. Adding errors on top of everything else makes things an order of magnitude worse.

rhatdan · 2024-02-06T20:45:28Z

Well this would just be on pull errors. Not all errors.

vrothberg · 2024-02-28T08:13:47Z

I'm inclined not to do this. We're already abusing the events system too much, and we have more than one request to scale back the amount we're logging. Adding errors on top of everything else makes things an order of magnitude worse.

We can make this opt-in, and hide it behind a containers.conf option. We did the very same for the container audit events.

Let's circle back to the problem at hand to make sure we're not getting stuck in discussing a solution:

Podman Desktop has a number of extensions that plug into the Red Hat ecosystem. The SSO extension, for instance, is used to register subscription-manager and to log into the RH registry.

The goal is to make Podman Desktop (and the extensions) a bit smarter in guiding/helping the users. Imagine somebody tries to pull a protected image but isn't logged in. In such a case, the user can be prompted by the SSO extension to sign into their Red Hat account.

Such prompting is fairly easy when using Podman Desktop directly. The new /resolve endpoint can help to catch such issues pro-actively and errors can be parsed. BUT here we are looking into catching errors after the fact which includes CLI usage. Imagine a podman compose [...].

So we need some record for pull errors. The events subsystem seems like the right place to me.

Luap99 · 2024-02-28T09:39:50Z

I understand the use case but I disagree that using events is the right thing. Today you say it is only pull error but tomorrow comes the next ask... We need to draw the line somewhere and I think it is here. Errors are not events IMO.

As you said if podman desktop makes the API request it can already see the error and offer solutions. For cli users I do not see how offering them some GUI pop-up to prompt them to log in is that helpful. It seems to be you are trying to solve the problem backwards. If the problem is cli users having no idea how to log in into the RH registry then the podman error message is bad in the first place and should be improved to guide the user, however I don't think this is the case. The message seems good to me:

$ podman pull registry.redhat.io/ubi9
Trying to pull registry.redhat.io/ubi9:latest...
Error: initializing source docker://registry.redhat.io/ubi9:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication

I think you should look into getting the link updated to mention the Podman Desktop login method as well.

benoitf · 2024-02-28T09:47:39Z

As you said if podman desktop makes the API request it can already see the error and offer solutions

The problem is that we can't do the request if we don't know what users are trying to do in case of errors
we're only able to capture successful commands which won't help the user as user need help on errors.

vrothberg · 2024-02-28T10:00:40Z

Today you say it is only pull error but tomorrow comes the next ask...

I don't perceive this as a problem as tomorrow's asks guarantee our paychecks ;-) I understand we're talking about it from a technical perspective, but there is a precedent for the audit events.

For cli users I do not see how offering them some GUI pop-up to prompt them to log in is that helpful. It seems to be you are trying to solve the problem backwards. If the problem is cli users having no idea how to log in into the RH registry then the podman error message is bad in the first place and should be improved to guide the user, however I don't think this is the case. The message seems good to me:

I think the devil lies in the details. It is certainly something versed users can deal with, but we are looking into improving on that and assist users who just want to get their job done. Making it as easy as possible to create and sign into a Red Hat account is incremental. It's important to understand that it is more than "just logging in" and more than just accessing the container registry. Trying to pull a trusted/protected container image will very very very likely entail accessing protected content/repositories inside the container which in turn requires subscription-manager to be installed and registered.

There are certain workflows where users are just not going through the GUI. So it is quite useful for Podman Desktop to know whether a image pull from a certain registry has failed. The assumption is not that users are incapable to do that own their own but to help them do it.

So in other words, for Podman Desktop to know which image pulls have failed is an important corner stone for the UX of https://github.com/redhat-developer/podman-desktop-redhat-account-ext.

rhatdan · 2024-02-28T15:16:30Z

@Luap99 what other method could we allow machine readable events to improve the user experience? If I get this type of error inside of compose of build, the message will be logged to stderr, but there is no real way for podman desktop to react. I think enhancing events to record pull failures might be a slippery sloap but as long as it can be turned off, I don't see the big issue. I don't think failures on pull is all that common. Most people complaining about events are around flooding of events, and not using something like Journal.

Fixes: containers/podman#21458 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

baude · 2024-02-29T16:42:37Z

code LGTM ... @mheon im tagging you for the merge

mheon · 2024-02-29T16:53:01Z

/lgtm

openshift-ci bot added the do-not-merge/work-in-progress label Feb 5, 2024

openshift-ci bot added the approved label Feb 5, 2024

rhatdan force-pushed the pull branch from 22844b5 to fb82686 Compare February 5, 2024 16:21

rhatdan force-pushed the pull branch 6 times, most recently from f2e7cfd to 200b9b7 Compare February 5, 2024 20:40

vrothberg reviewed Feb 6, 2024

View reviewed changes

rhatdan force-pushed the pull branch from 200b9b7 to ea1c8ee Compare February 6, 2024 12:38

vrothberg reviewed Feb 6, 2024

View reviewed changes

rhatdan changed the title ~~[WIP] Report error events on pull failures~~ Report error events on pull failures Feb 29, 2024

openshift-ci bot removed the do-not-merge/work-in-progress label Feb 29, 2024

Report error events on pull failures

a7c6a01

Fixes: containers/podman#21458 Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>

rhatdan force-pushed the pull branch from ea1c8ee to a7c6a01 Compare February 29, 2024 16:32

openshift-ci bot assigned mheon Feb 29, 2024

openshift-ci bot added the lgtm label Feb 29, 2024

openshift-merge-bot bot merged commit cec0992 into containers:main Feb 29, 2024
7 checks passed

Report error events on pull failures #1842

Report error events on pull failures #1842

Conversation

rhatdan commented Feb 5, 2024 • edited

openshift-ci bot commented Feb 5, 2024

rhatdan commented Feb 5, 2024

rhatdan commented Feb 5, 2024

rhatdan commented Feb 5, 2024

rhatdan commented Feb 5, 2024

benoitf commented Feb 5, 2024

rhatdan commented Feb 5, 2024

vrothberg left a comment

Choose a reason for hiding this comment

benoitf commented Feb 6, 2024

vrothberg commented Feb 6, 2024

vrothberg commented Feb 6, 2024

Luap99 commented Feb 6, 2024

vrothberg commented Feb 6, 2024

rhatdan commented Feb 6, 2024

vrothberg commented Feb 6, 2024

rhatdan commented Feb 6, 2024

vrothberg commented Feb 6, 2024 • edited

vrothberg left a comment

Choose a reason for hiding this comment

vrothberg Feb 6, 2024

Choose a reason for hiding this comment

rhatdan Feb 6, 2024

Choose a reason for hiding this comment

vrothberg Feb 6, 2024

Choose a reason for hiding this comment

rhatdan Feb 6, 2024

Choose a reason for hiding this comment

vrothberg Feb 6, 2024 • edited

Choose a reason for hiding this comment

vrothberg Feb 6, 2024

Choose a reason for hiding this comment

rhatdan Feb 29, 2024

Choose a reason for hiding this comment

Luap99 commented Feb 6, 2024

mheon commented Feb 6, 2024

rhatdan commented Feb 6, 2024

vrothberg commented Feb 28, 2024

Luap99 commented Feb 28, 2024

benoitf commented Feb 28, 2024

vrothberg commented Feb 28, 2024

rhatdan commented Feb 28, 2024

baude commented Feb 29, 2024

mheon commented Feb 29, 2024

rhatdan commented Feb 5, 2024 •

edited

vrothberg commented Feb 6, 2024 •

edited

vrothberg Feb 6, 2024 •

edited