Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grype scan command gets stuck #1731

Closed
githala-deepak opened this issue Feb 28, 2024 · 11 comments
Closed

Grype scan command gets stuck #1731

githala-deepak opened this issue Feb 28, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@githala-deepak
Copy link

What happened:
Grype command gets stuck and I get the error after 3 hours
failed to load vulnerability db: unable to update vulnerability database: unable to download db: stream error: stream ID 1; INTERNAL_ERROR; received from peer
What you expected to happen:
Grype scan should get completed in under a minute
How to reproduce it (as minimally and precisely as possible):
Occurs randomly, can't reproduce
Anything else we need to know?:

Environment:

  • Output of grype version: Application: grype
    Version: 0.74.5
    BuildDate: 2024-02-07T21:34:47Z
    GitCommit: 7478090
    GitDescription: v0.74.5
    Platform: linux/amd64
    GoVersion: go1.21.6
    Compiler: gc
    Syft Version: v0.104.0
    Supported DB Schema: 5

  • OS (e.g: cat /etc/os-release or similar): PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
    NAME="Debian GNU/Linux"
    VERSION_ID="11"
    VERSION="11 (bullseye)"
    VERSION_CODENAME=bullseye
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"

@githala-deepak githala-deepak added the bug Something isn't working label Feb 28, 2024
@willmurphyscode
Copy link
Contributor

Hi @githala-deepak,

Thanks for the report.

It sounds like grype is having trouble downloading its updated vulnerability DB, which it will try to do about once per day.

If you run grype db update -vvv, do you seen any errors?

If you download the db directly, with a command like this:

curl -vvv -o /tmp/db.tar.gz 'https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-02-28T01:23:28Z_ea5efb77a61bf939917f.tar.gz'

Do you see any errors? Does the download succeed? I think you probably need to troubleshoot a network issue, and that curl command will start you in the right direction.

@hkadakia
Copy link

hkadakia commented Mar 8, 2024

I am having a similar issue.

Syft: Summary of packages by <count> <type>
00:03:13 See mediaimage.syft.json for full package details
00:03:13     122 "go-module"
00:03:13       3 "python"
00:03:13     159 "rpm"
00:03:13 
00:03:13 Grype: scanning for vulnerabilities 
00:03:13 /root/.local/bin/grype -q -o json --config=default-ignore-rules.yaml  --only-fixed  sbom:mediaimage
00:08:06 Killed
SYFT_VER=0.92.0
GRYPE_VER=0.69.1

@mathrock
Copy link

I have recently noticed that occasionally requests to fetch the listing.json file are super slow, like there's a bad/slow backend in rotation. I suspect the same thing is happening fetching the larger tar.gz DB sqlite files, causing the hang that users are reporting.

Additionally it seems as though there is no retry/timeout logic on the db update process, so that may also be an area to look into improving.

Are the DB files located in S3 or in an S3 bucket fronted by Cloudflare? Or just in Cloudflare R2 directly?

Some examples from earlier today if it's helpful for you to look into logs on toolbox-data.anchore.io and diagnose the issue. The initial requests to download the ~ 156KB listing.json file took over 30s!

The following requests were made around Tue, 12 Mar 2024 15:42:00 GMT

[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0  1039k      0 --:--:-- --:--:-- --:--:-- 10597

real    0m32.164s
user    0m0.060s
sys     0m0.071s

And then some requests are quick, like we're hitting a bad/slow backend in the rotation:

[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0  1039k      0 --:--:-- --:--:-- --:--:-- 1044k

real    0m0.160s
user    0m0.061s
sys     0m0.056s
[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0   939k      0 --:--:-- --:--:-- --:--:--  940k

real    0m0.177s
user    0m0.062s
sys     0m0.055s
[mathrock ~]$ time curl https://toolbox-data.anchore.io/grype/databases/listing.json -o /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  159k  100  159k    0     0  1004k      0 --:--:-- --:--:-- --:--:-- 1011k

real    0m0.166s
user    0m0.049s
sys     0m0.070s

@willmurphyscode
Copy link
Contributor

Thanks for the detailed info @mathrock! I've also seen grype db updates be slow, but haven't yet figured out why. We're investigating on our end.

@willmurphyscode
Copy link
Contributor

Hi all! Thanks for reporting this.

We've changed some configs with our CDN to try to fix the issue. Since it's only intermittent, it's hard to know for sure that it's fixed, so please let us know if you continue having anymore slowness or hangs with grype database downloads.

We'll also look into putting in some timeouts in grype, since that should prevent the client from hanging regardless of the behavior of the CDN / database download.

I'll leave this issue open while we continue to monitor, and until we have client side timeouts merged.

@jcote-tc
Copy link

jcote-tc commented Apr 3, 2024

I'm having the issue today:

[0000] DEBUG checking for available database updates
23
[0000] DEBUG found database update candidate: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
24
[0000] DEBUG cannot find existing metadata, using update...
25
[0000] DEBUG database update available: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
26
[0000]  INFO downloading new vulnerability DB

It's stuck on the last line ^ : "[0000] INFO downloading new vulnerability DB"

@jcote-tc
Copy link

jcote-tc commented Apr 3, 2024

I'm having the issue today:

[0000] DEBUG checking for available database updates
23
[0000] DEBUG found database update candidate: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
24
[0000] DEBUG cannot find existing metadata, using update...
25
[0000] DEBUG database update available: Listing(url=https://toolbox-data.anchore.io/grype/databases/vulnerability-db_v5_2024-04-03T01:24:31Z_1712118027.tar.gz)
26
[0000]  INFO downloading new vulnerability DB

It's stuck on the last line ^ : "[0000] INFO downloading new vulnerability DB"

FYI: It fixed itself after a few hours.

@spiffcs
Copy link
Contributor

spiffcs commented Apr 4, 2024

Hey everyone! Check out the latest release of grype where we now have default timeouts included (user configurable as well).

PR that was merged: #1777

We're currently looking into why the CDN that hosts the listing and db files ever gets into the state where it connects, but fails to transfer the bytes.

@Fajkowsky
Copy link

@spiffcs Any update on why CDN is acting so slow?

@willmurphyscode
Copy link
Contributor

Hi @Fajkowsky, can you tell us a bit about when you're seeing this slowness?

The only deterministic bit of slowness we've found is when new Grype DBs come out, there's some slowness shortly after, because all the Grype invocations shortly after the new DB is published download the new DB, but after this initial burst of traffic, a large percentage of Grype clients have the new DB cached and the download traffic is greatly reduced. We're looking at ways to put some jitter in there.

So when you see the slow downloads, is it short after 5AM UTC or so? If so, we expect this situation to improve when we introduce some jitter/staggering in when different Grype installs download the new DB.

If it's at a different time, we would really appreciate some more details if you don't mind sharing them, like what time the slow runs were at and what geographic region they're in. (Feel free to join the community slack and DM one of us if you'd rather not post that information publicly.)

@Fajkowsky
Copy link

Hi @willmurphyscode,

Today is the day.
curl -o listing.json https://toolbox-data.anchore.io/grype/databases/listing.json

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 152k 100 152k 0 0 4974 0 0:00:31 0:00:31 --:--:-- 8563

The transfer is so low I was downloading json file with listings for 31 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

7 participants