feat: webhook does not return error on 'instance not found'. #28

maigl · 2022-10-06T15:05:51Z

This case is probably caused by a webhook event that was meant for another runner controller / manager. No need to report this as error, we can simply ignore this and avoid noise in the logs.

_{Michael Kuhnt michael.kuhnt@mercedes-benz.com Mercedes-Benz Tech Innovation GmbH (ProviderInformation)}

maigl · 2022-10-06T15:07:43Z

Hi @gabriel-samfira, here is another nit. I don't know if you're interested in those little things .. we might add bigger features in future :)

gabriel-samfira · 2022-10-06T15:42:43Z

That's right. We get all webhooks, including those that target other runner types (Github hosted for example). Garm currently returns an error in all cases (queued, in_progress and completed) if the action targets tags we don't care about, or if the webhook targets a worker that we don't manage (like a github hosted one).

In all cases, the store will return a runnerErrors.ErrNotFound. The only potentially undesirable effects of this are:

log spam
you see a 404 error in github webhooks under "Recent deliveries"

I am a bit torn between knowing it's pointless to log/react to these events and correctly reporting back the fact that the webhook we got, was not meant for us 😄.

I think we can just not report an error if we get an ErrNotFound when a webhook is fired. Will leave a comment on the changed lines in a sec.

gabriel-samfira · 2022-10-06T15:46:15Z

apiserver/controllers/controllers.go

@@ -103,6 +104,11 @@ func (a *APIController) handleWorkflowJobEvent(w http.ResponseWriter, r *http.Re

 	if err := a.r.DispatchWorkflowJob(hookType, signature, body); err != nil {
 		log.Printf("failed to dispatch work: %s", err)
+		// in this case we probably got a webhook that was not meant for us
+		// no need to return an error
+		if strings.Contains(err.Error(), "fetching instance by name: not found") {


We try to return meaningful errors we can test for, as much as we can. This case can be handled with:

if errors.Is(err, gErrors.ErrNotFound) { log.Printf("got not found error from DispatchWorkflowJob. webhook not meant for us?: %q", err) return nil }

It would also be useful to move the log.Printf("failed to dispatch work: %s", err) statement under the if statement.

sure .. I'd love to use errors.Is but I was under the impression that there are multiple cases where we return ErrNotFound but only one of this cases means that its not our runner ("fetching instance by name: not found"). Other cases still might point to a real error. Is that wrong ?

TL;DR: It should be safe to ignore any ErrNotFound in this particular code path (webhook handling).

TS; NM: I will elaborate ( bare with me, I sometimes tend to be verbose 😄 ).

The web hook handler is how we consume GitHub events and react to them in 3 particular ways:

When a new job is queued (a queued event is sent)

When a job is started (an in_progress event is sent)

When a job is completed (a completed event is sent)

In the case of a queued event, garm will look at the job info and using the tags that were requested by the workflow, will attempt to find a pool where it will try to spawn a worker. Garm looks for a pool that matches all requested tags. If a pool is not found, then an ErrNotFound is returned.

This is not necessarily an error. It's just that we don't have a pool to handle that particular workflow, so for us, that webhook is a noop.

In the case of an in_progress event, garm will attempt to determine the worker name that is handling the job we were just notified about. When a worker name is found, garm will look for that particular worker and attempt to change it's status from idle to active. This is important because once that happens, if we have min_idle_runners set on the pool, garm will attempt to automatically create a new worker, if one was not already created during the queued event (which can happen if max_runners is reached).

If garm can't find the runner in the database, it will return ErrNotFound. But this means that we never managed the runner we got a hook for. So this may be handled by another controller or by github itself. So it is safe to ignore.

In the case of a completed event, we do pretty much the same thing as in the in_progress event, but we mark the runner as pending_delete. At this point the runner was already removed from github, so all that's left for us to do is remove it from our provider. If we can't find the runner, then it's either a runner we don't manage, or it was already deleted. Desired state is current state, so this too will be a noop.

There are a few things to consider here. This change only affects the webhook API endpoint that GitHub sends hook events to. The only consumer of this endpoint is GitHub itself. We don't interact with this at all. Ignoring the ErrNotFound error here, will only have the side effect of not seeing an error in the GitHub console. But there is no further handling of the error.

We should still log that error condition. We may want to know that GitHub requested something and we generated an ErrNotFound, even though we're ignoring it.

Also, thanks for taking the time to create this PR! I was on the fence about this issue, but considering it bugs other people too, we might as well change it. 😅

thx for the elaborate explanation :) .. I changed the code as you have suggested.

This case is probably caused by a webhook event that was meant for another runner controller / manager. No need to report this as error, we can simply ignore this and avoid noise in the logs.

gabriel-samfira · 2022-10-07T09:27:58Z

Thanks for the PR!

gabriel-samfira reviewed Oct 6, 2022

View reviewed changes

feat: webhook does not return error on 'instance not found'.

5bfecba

This case is probably caused by a webhook event that was meant for another runner controller / manager. No need to report this as error, we can simply ignore this and avoid noise in the logs.

maigl force-pushed the no-404 branch from ac27c73 to 5bfecba Compare October 7, 2022 09:19

gabriel-samfira merged commit 59c8f39 into cloudbase:main Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: webhook does not return error on 'instance not found'. #28

feat: webhook does not return error on 'instance not found'. #28

maigl commented Oct 6, 2022 •

edited

Loading

maigl commented Oct 6, 2022

gabriel-samfira commented Oct 6, 2022

gabriel-samfira Oct 6, 2022

maigl Oct 6, 2022

gabriel-samfira Oct 7, 2022 •

edited

Loading

gabriel-samfira Oct 7, 2022

maigl Oct 7, 2022

gabriel-samfira commented Oct 7, 2022

feat: webhook does not return error on 'instance not found'. #28

feat: webhook does not return error on 'instance not found'. #28

Conversation

maigl commented Oct 6, 2022 • edited Loading

maigl commented Oct 6, 2022

gabriel-samfira commented Oct 6, 2022

gabriel-samfira Oct 6, 2022

Choose a reason for hiding this comment

maigl Oct 6, 2022

Choose a reason for hiding this comment

gabriel-samfira Oct 7, 2022 • edited Loading

Choose a reason for hiding this comment

gabriel-samfira Oct 7, 2022

Choose a reason for hiding this comment

maigl Oct 7, 2022

Choose a reason for hiding this comment

gabriel-samfira commented Oct 7, 2022

maigl commented Oct 6, 2022 •

edited

Loading

gabriel-samfira Oct 7, 2022 •

edited

Loading