fix(image): resolve scan deadlock when error occurs in slow mode #4336

mpoindexter · 2023-05-11T19:54:11Z

Description

See #4335

Related issues

Close Trivy can hang if an error occurs scanning a container image with --slow option #4343

Checklist

I've read the guidelines for contributing to this repository.
I've followed the conventions in the PR title.
I've added tests that prove my fix is effective or that my feature works.

CLAassistant · 2023-05-11T19:54:17Z

All committers have signed the CLA.

CLAassistant · 2023-05-11T19:54:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

DmitriyLewen · 2023-05-12T06:34:01Z

Hello @mpoindexter
Thanks for your work!

As you said in #4335 i reproduced this case, but i can't find information why semaphore blocks channel. Do you have docs about this case?
Also i found that there is no problem if buffered channel is used (i mean errCh := make(chan error, 1))

mpoindexter · 2023-05-12T06:43:51Z

It's not that the semaphore blocks a channel send directly, the channel send is blocked because it's an unbuffered channel, and nothing is yet reading from the channel. In turn, the goroutine that will eventually be responsible for reading from the channel is blocked because the goroutine trying to send on the channel holds the semaphore which allows at most one holder in slow mode. Using a buffered channel sort of fixes the problem, but for it to be a full fix the buffer size of the channel must be equivalent to max number of errors that could happen, not 1. Otherwise if there are errors on more than one layers the problem can occur.

DmitriyLewen · 2023-05-12T06:48:04Z

Thanks that you explained in more detail and fast answer!

DmitriyLewen

This way looks good for me.
Thanks for your work @mpoindexter .
@knqyf263 I approved this PR.

knqyf263 · 2023-05-14T13:53:32Z

What if putting limit.Acquire into goroutine? The downside is the number of layers would create a goroutine. But I think it is acceptable since goroutine is lightweight, and the number of layers wouldn't be in the thousands.

mpoindexter · 2023-05-15T20:53:58Z

@knqyf263 I moved the limit.Acquire into goroutine. I think that the select blocks around all the sends from the goroutine are needed since the scan is aborted on first error and hence we need to be able to avoid sending for other goroutines once a single one has errored

knqyf263 · 2023-05-16T02:29:14Z

We may want to use errgroup in this case.
https://pkg.go.dev/golang.org/x/sync/errgroup

mpoindexter · 2023-05-16T06:10:54Z

OK, updated to use errgroup

knqyf263 · 2023-05-17T08:18:52Z

pkg/fanal/artifact/image/image.go

+		if ctx.Err() != nil {
+			break
+		}


I think it is acceptable to run goroutine even after an error. What if removing this error check?

I think it would be worse to remove it - there's no correctness problem with removing the check, but inspecting a layer can be quite expensive, so it seems like we should stop doing it if we already know that we're going to get an error result when we call group.Wait()

knqyf263 · 2023-05-17T08:22:39Z

pkg/fanal/artifact/image/image.go

-			}()
-
+		layerKey := k
+		ctx := groupCtx


groupCtx is not updated in the loop. Is there any specific reason to overwrite ctx every time here?

I thought it looked cleaner to pass ctx in the body of the goroutine. The code previously took ctx as an argument to the goroutine, but with errgroup we can't pass arguments to the goroutine, so just binding some variables was the replacement. We could just change to use groupCtx within the goroutine, let me know.

Fix scan deadlock when error in slow mode

e4b54a7

mpoindexter requested a review from knqyf263 as a code owner May 11, 2023 19:54

knqyf263 requested a review from DmitriyLewen May 12, 2023 05:00

DmitriyLewen approved these changes May 12, 2023

View reviewed changes

AliDatadog mentioned this pull request May 12, 2023

fix(image): fix deadlock occuring in slow mode for multiple layers images #4345

Closed

6 tasks

Move semaphore into goroutine

48fff72

Use errgroup

891cb4b

knqyf263 reviewed May 17, 2023

View reviewed changes

knqyf263 approved these changes May 21, 2023

View reviewed changes

knqyf263 merged commit 29b5f7e into aquasecurity:main May 21, 2023
8 checks passed

knqyf263 mentioned this pull request May 21, 2023

refactor: enable cases where return values are not needed in pipeline #4443

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(image): resolve scan deadlock when error occurs in slow mode #4336

fix(image): resolve scan deadlock when error occurs in slow mode #4336

mpoindexter commented May 11, 2023 •

edited by DmitriyLewen

CLAassistant commented May 11, 2023 •

edited

CLAassistant commented May 11, 2023

DmitriyLewen commented May 12, 2023 •

edited

mpoindexter commented May 12, 2023

DmitriyLewen commented May 12, 2023 •

edited

DmitriyLewen left a comment

knqyf263 commented May 14, 2023

mpoindexter commented May 15, 2023

knqyf263 commented May 16, 2023

mpoindexter commented May 16, 2023

knqyf263 May 17, 2023

mpoindexter May 18, 2023

knqyf263 May 17, 2023

mpoindexter May 18, 2023

fix(image): resolve scan deadlock when error occurs in slow mode #4336

fix(image): resolve scan deadlock when error occurs in slow mode #4336

Conversation

mpoindexter commented May 11, 2023 • edited by DmitriyLewen

Description

Related issues

Checklist

CLAassistant commented May 11, 2023 • edited

CLAassistant commented May 11, 2023

DmitriyLewen commented May 12, 2023 • edited

mpoindexter commented May 12, 2023

DmitriyLewen commented May 12, 2023 • edited

DmitriyLewen left a comment

Choose a reason for hiding this comment

knqyf263 commented May 14, 2023

mpoindexter commented May 15, 2023

knqyf263 commented May 16, 2023

mpoindexter commented May 16, 2023

knqyf263 May 17, 2023

Choose a reason for hiding this comment

mpoindexter May 18, 2023

Choose a reason for hiding this comment

knqyf263 May 17, 2023

Choose a reason for hiding this comment

mpoindexter May 18, 2023

Choose a reason for hiding this comment

mpoindexter commented May 11, 2023 •

edited by DmitriyLewen

CLAassistant commented May 11, 2023 •

edited

DmitriyLewen commented May 12, 2023 •

edited

DmitriyLewen commented May 12, 2023 •

edited