Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report error events on pull failures #1842

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Feb 5, 2024

Copy link
Contributor

openshift-ci bot commented Feb 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Feb 5, 2024
@rhatdan
Copy link
Member Author

rhatdan commented Feb 5, 2024

@vrothberg PTAL

@rhatdan
Copy link
Member Author

rhatdan commented Feb 5, 2024

@vrothberg @mheon Would this work?

@rhatdan
Copy link
Member Author

rhatdan commented Feb 5, 2024

@benoitf PTAL

@rhatdan
Copy link
Member Author

rhatdan commented Feb 5, 2024

@mheon @baude is this something we want to support?

@benoitf
Copy link

benoitf commented Feb 5, 2024

thanks @rhatdan for the notice !

I was wondering if in the error message it will display as image name, the image name that has been resolved or the image name the user entered.

for example if I use 'podman pull hello' and I've an error, will it provide hello as image name or quay.io/podman/hello ?

@rhatdan
Copy link
Member Author

rhatdan commented Feb 5, 2024

Currently what I have written here will not work, because Events do not currently cover errors. We would need to handle the Error struct in podman.

I think the Name would be the expanded name.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks compatibility with Docker and previous versions of Podman. Until now, a pull event has only been emitted on successful pull. With this change, we'd always emit an event - even in case of errors.

We can - for sure - do this, but we'd be more and more leaving Docker compat. An alternative could be to make this behavior opt-in via a containers.conf field.

@benoitf
Copy link

benoitf commented Feb 6, 2024

could it be a separate channel event like 'pull_error' (so it doesn't break the 'pull' channel)

@vrothberg
Copy link
Member

for example if I use 'podman pull hello' and I've an error, will it provide hello as image name or quay.io/podman/hello ?

It would be hello but the error would include the fully-qualified name.

@vrothberg
Copy link
Member

could it be a separate channel event like 'pull_error' (so it doesn't break the 'pull' channel)

That would also be a good option. I don't expect a potential pull_error to cause much noise, so I'd be totally on board.

@Luap99
Copy link
Member

Luap99 commented Feb 6, 2024

I don't think this is a good idea long term, events are for things that happened not for things that failed.
Errors are always directly reported to the caller, I cannot see how another application can make much sense of such error and do something useful with it. The existing events are used by a lot of application to update their state and I rather not start including something completely different in the event stream.

From the issue:

Then, we could for example prompt the user to authenticate to some registries if not done, etc.

Pull errors are fundamentally flawed because registries can report whatever they choose. For example pulling docker.io/library/notexists results in an error saying requested access to the resource is denied so you would think that you need to login when the real error is that the image simply does not exists so promoting a user here is not helpful as the error was likely some typo in the image name instead.
And if I already run via cli then see such auth error then I for one would try to login via podman login on the cli, no? Would podman-desktop open a pop-up in the foreground to tell me to log in or how would this work

@vrothberg
Copy link
Member

Pull errors are fundamentally flawed because registries can report whatever they choose. For example pulling docker.io/library/notexists results in an error saying requested access to the resource is denied so you would think that you need to login when the real error is that the image simply does not exists so promoting a user here is not helpful as the error was likely some typo in the image name instead.

It would always be an approximation. The good news is that we only care about registry.redhat.io and we know how this registry behaves.

And if I already run via cli then see such auth error then I for one would try to login via podman login on the cli, no? Would podman-desktop open a pop-up in the foreground to tell me to log in or how would this work

How I understand the approach is to also consider CLI workflows which are common for, let's say, using compose. Podman Desktop can then, after the fact, look into the logs/events and see whether there are errors related to the registry we care about. If there are such errors, Podman Desktop may choose to propose installing a given extension and to sign into the RH account.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 6, 2024

Ok, I think this should be a cabal issue, where we all talk about the pros and cons of allowing this information flow.

Options:

  • Do nothing, keep the status quo
  • Fixup this patch and allow events to have a success/failure/error message
  • Create a new type of failure events where we can monitor for failures. ( I don't think this should be just for pull, since we could find other failure modes later.)

@vrothberg
Copy link
Member

* Do nothing, keep the status quo

I am lazy but not that lazy :^)

* Fixup this patch and allow events to have a success/failure/error message

It's breaking backwards compat, so we if go forward, we need a good justification.

* Create a new type of failure events where we can monitor for failures. ( I don't think this should be just for pull, since we could find other failure modes later.)

I am perfectly fine with doing that only for pull. For that, we have a good use case which is currently lacking for other kinds of events. If other events need it in the future, we could use pull as a precedent.

So I am in favor of option 3): create pull_error event.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 6, 2024

Ok so this is a normal event, just a pull error event. Where does the text go then?

@vrothberg
Copy link
Member

vrothberg commented Feb 6, 2024

Good question.

I see beauty in both options but have a slight tendency toward adding an Error error field to the event and let Podman deal with it. If it's a pullevent and the .Error is set, then it should write a pull_error event.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test would be good. Set the channel, pull a bogus image (e.g., with an invalid transport) and make sure the .Error is set.


// Dispatch the copy operation.
switch ref.Transport().Name() {
// DOCKER REGISTRY
case registryTransport.Transport.Name():
pulledImages, pullError = r.copyFromRegistry(ctx, ref, possiblyUnqualifiedName, pullPolicy, options)
pulledImages, err = r.copyFromRegistry(ctx, ref, possiblyUnqualifiedName, pullPolicy, options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this continue to be pullError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could leave it as pullError, but wanted to avoid the confusion that pullError is only set here, when code above returning an error will get set as a pullError in the defer block.

@@ -18,6 +18,8 @@ const (
EventTypeUnknown EventType = iota
// EventTypeImagePull represents an image pull.
EventTypeImagePull
// EventTypeImagePullError represents an image pull failed.
EventTypeImagePullError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that here as well OR should we let Podman deal with it? I feel like Podman should handle it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with going that way, but thought this might not be a breaking change, since we are adding a new event and not forcing apps that are looking for pullevent as a success indicator.

Copy link
Member

@vrothberg vrothberg Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can break here as Podman is the only user/caller of c/common and we're below 1.0.0.

What I meant by breaking change is that we cannot make this user visible. So external users of Podman reading the events should not notice a change. So we need a new pull_error (or something like that) for podman events (and the backends).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume some automation that is reading the logs/events for pulled images. Now, pull errors would show up and cause false positives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know that a pull happened, if we want to put this into Podman then we have to plumb this into tons of different places, and we would probably need to plumb it into buildah as well.

@Luap99
Copy link
Member

Luap99 commented Feb 6, 2024

@mheon WDYT?

@mheon
Copy link
Member

mheon commented Feb 6, 2024

I'm inclined not to do this. We're already abusing the events system too much, and we have more than one request to scale back the amount we're logging. Adding errors on top of everything else makes things an order of magnitude worse.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 6, 2024

Well this would just be on pull errors. Not all errors.

@vrothberg
Copy link
Member

I'm inclined not to do this. We're already abusing the events system too much, and we have more than one request to scale back the amount we're logging. Adding errors on top of everything else makes things an order of magnitude worse.

We can make this opt-in, and hide it behind a containers.conf option. We did the very same for the container audit events.

Let's circle back to the problem at hand to make sure we're not getting stuck in discussing a solution:

Podman Desktop has a number of extensions that plug into the Red Hat ecosystem. The SSO extension, for instance, is used to register subscription-manager and to log into the RH registry.

The goal is to make Podman Desktop (and the extensions) a bit smarter in guiding/helping the users. Imagine somebody tries to pull a protected image but isn't logged in. In such a case, the user can be prompted by the SSO extension to sign into their Red Hat account.

Such prompting is fairly easy when using Podman Desktop directly. The new /resolve endpoint can help to catch such issues pro-actively and errors can be parsed. BUT here we are looking into catching errors after the fact which includes CLI usage. Imagine a podman compose [...].

So we need some record for pull errors. The events subsystem seems like the right place to me.

@Luap99
Copy link
Member

Luap99 commented Feb 28, 2024

I understand the use case but I disagree that using events is the right thing. Today you say it is only pull error but tomorrow comes the next ask... We need to draw the line somewhere and I think it is here. Errors are not events IMO.

As you said if podman desktop makes the API request it can already see the error and offer solutions. For cli users I do not see how offering them some GUI pop-up to prompt them to log in is that helpful. It seems to be you are trying to solve the problem backwards. If the problem is cli users having no idea how to log in into the RH registry then the podman error message is bad in the first place and should be improved to guide the user, however I don't think this is the case. The message seems good to me:

$ podman pull registry.redhat.io/ubi9
Trying to pull registry.redhat.io/ubi9:latest...
Error: initializing source docker://registry.redhat.io/ubi9:latest: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication

I think you should look into getting the link updated to mention the Podman Desktop login method as well.

@benoitf
Copy link

benoitf commented Feb 28, 2024

As you said if podman desktop makes the API request it can already see the error and offer solutions

The problem is that we can't do the request if we don't know what users are trying to do in case of errors
we're only able to capture successful commands which won't help the user as user need help on errors.

@vrothberg
Copy link
Member

Today you say it is only pull error but tomorrow comes the next ask...

I don't perceive this as a problem as tomorrow's asks guarantee our paychecks ;-) I understand we're talking about it from a technical perspective, but there is a precedent for the audit events.

For cli users I do not see how offering them some GUI pop-up to prompt them to log in is that helpful. It seems to be you are trying to solve the problem backwards. If the problem is cli users having no idea how to log in into the RH registry then the podman error message is bad in the first place and should be improved to guide the user, however I don't think this is the case. The message seems good to me:

I think the devil lies in the details. It is certainly something versed users can deal with, but we are looking into improving on that and assist users who just want to get their job done. Making it as easy as possible to create and sign into a Red Hat account is incremental. It's important to understand that it is more than "just logging in" and more than just accessing the container registry. Trying to pull a trusted/protected container image will very very very likely entail accessing protected content/repositories inside the container which in turn requires subscription-manager to be installed and registered.

There are certain workflows where users are just not going through the GUI. So it is quite useful for Podman Desktop to know whether a image pull from a certain registry has failed. The assumption is not that users are incapable to do that own their own but to help them do it.

So in other words, for Podman Desktop to know which image pulls have failed is an important corner stone for the UX of https://github.com/redhat-developer/podman-desktop-redhat-account-ext.

@rhatdan
Copy link
Member Author

rhatdan commented Feb 28, 2024

@Luap99 what other method could we allow machine readable events to improve the user experience? If I get this type of error inside of compose of build, the message will be logged to stderr, but there is no real way for podman desktop to react. I think enhancing events to record pull failures might be a slippery sloap but as long as it can be turned off, I don't see the big issue. I don't think failures on pull is all that common. Most people complaining about events are around flooding of events, and not using something like Journal.

@rhatdan rhatdan changed the title [WIP] Report error events on pull failures Report error events on pull failures Feb 29, 2024
Fixes: containers/podman#21458

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
@baude
Copy link
Member

baude commented Feb 29, 2024

code LGTM ... @mheon im tagging you for the merge

@mheon
Copy link
Member

mheon commented Feb 29, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Feb 29, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit cec0992 into containers:main Feb 29, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

publish event when there are errors as well
6 participants