Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command terminated by signal 9 due to OOM (Out of Memory) #1509

Closed
ammarkhanone opened this issue Sep 20, 2023 · 10 comments
Closed

Command terminated by signal 9 due to OOM (Out of Memory) #1509

ammarkhanone opened this issue Sep 20, 2023 · 10 comments
Labels
changelog-ignore Don't include this issue in the release changelog question Further information is requested

Comments

@ammarkhanone
Copy link

I have k8 pod in EKS cluster with limit 1GB memory, i run grype to scan a container image used local cache db with GRYPE_DB_AUTO_UPDATE=false and tried to scanned large images (1300+ MB), when i run "/usr/bin/time -v grype "image-name" -c config.yaml" command, it gives the Command terminated by signal 9, when i searched this error, i find out the OOM (Out of Memory) issue.
how much resources does grype need to have reserved for this task?
Is there any way to limit the grype resources?
image

@edgeinterrupts
Copy link

Can the grype team help

@willmurphyscode
Copy link
Contributor

Hi @edgeinterrupts, thanks for the report! I don't know the answer off the top of my head, so I've added the needs-discussion label so that we'll discuss this the next time we read community issues as a team. In the meantime, you might be able to unblock yourself by experimenting with giving grype more memory. I don't think there's any configurable way to limit the memory use of grype, but that's an interesting feature request.

@spiffcs
Copy link
Contributor

spiffcs commented Sep 26, 2023

👋 The grype config also can profile the process if you can generate a PPROF report for us we can take a look.

.grype.yaml

dev:
  profile-mem: true

I ran some large images locally: nvidia/cuda:12.2.0-devel-ubuntu20.04 3.27GB compressed

The largest memory report for that was 69.88 MB inuse space. This leads me to believe it might not be correlated with image size.

If you're passing an image as the input to grype in the above example it might be that a certain cataloger of syft is causing the memory usage to spike. If we get the specific PPROF report then we'll be able to get more specifics =)

@edgeinterrupts
Copy link

@spiffcs thanks I will try to break things into two steps 1) syft sbom -> json, 2) cat the file to grype.

@tgerla tgerla added question Further information is requested changelog-ignore Don't include this issue in the release changelog and removed needs-discussion labels Nov 9, 2023
@tgerla
Copy link
Contributor

tgerla commented Nov 9, 2023

Hi @edgeinterrupts, sounds good, if you run into more trouble please let us know and we can take a look at the profiling report.

@tgerla tgerla closed this as completed Nov 9, 2023
@sfc-gh-dbasavin
Copy link

Hi @spiffcs , is the profile-mem config option still supported?

dev:
  profile-mem: true

If so, where can I find the output?

For context, I am running grype on an 15 GB image, and the peak memory footprint I am observing is ~10 GB. I am trying to understand whether I can do anything to reduce grype's memory consumption.

@spiffcs
Copy link
Contributor

spiffcs commented May 14, 2024

👋 hey @sfc-gh-dbasavin - what version of grype are you currently using?

I think the config may have changed a bit when grype upgraded it's config values:

  log:
      quiet: false
      level: trace
      file: ""
  dev:
      profile: mem

If you run grype -vvv <your-image> you should see a log at the end showing the path it's written to:

2024/05/14 14:00:33 profile: memory profiling disabled, /var/folders/l0/_71m09512ss7lv9c64ldzld80000gn/T/profile1094353490/mem.pprof

The reason I asked what version you're using is we recently merged a new syft (consumed by grype) that improved performance:
anchore/syft#2814

Are you using the latest version and still seeing an issue?

@sfc-gh-dbasavin
Copy link

sfc-gh-dbasavin commented May 14, 2024

Hi @spiffcs, thank you very much for a quick response! I am using the most recent version I found on the Releases page:

> grype --version
grype 0.77.4

So yes, I am still seeing this issue with the latest version of grype.

Actually, the issue can be seen when running grype on a public image from Matlab (link), so feel free to check for yourself. Here is the command that results in ~10 GB memory usage at peak:

grype docker:mathworks/matlab-deep-learning:latest

Edit: I was able to generate the PPROF report. Here it is for illustrative purposes:

Showing nodes accounting for 3.37GB, 94.08% of 3.58GB total
Dropped 429 nodes (cum <= 0.02GB)
Showing top 10 nodes out of 53
      flat  flat%   sum%        cum   cum%
    1.80GB 50.30% 50.30%     2.69GB 75.21%  github.com/anchore/stereoscope/pkg/tree.(*Tree).Copy
    0.89GB 24.91% 75.21%     0.89GB 24.91%  github.com/anchore/stereoscope/pkg/filetree/filenode.(*FileNode).Copy
    0.15GB  4.27% 79.48%     0.15GB  4.27%  github.com/anchore/stereoscope/pkg/tree.(*Tree).addNode
    0.11GB  3.06% 82.54%     0.55GB 15.47%  github.com/anchore/stereoscope/pkg/file.NewTarIndex.func1
    0.10GB  2.69% 85.23%     0.10GB  2.79%  github.com/anchore/stereoscope/pkg/file.NewMetadata
    0.08GB  2.36% 87.59%     0.44GB 12.41%  github.com/anchore/stereoscope/pkg/image.(*Layer).readStandardImageLayer.layerTarIndexer.func1
    0.08GB  2.20% 89.79%     0.08GB  2.22%  io.copyBuffer
    0.07GB  1.98% 91.77%     0.11GB  3.11%  github.com/anchore/stereoscope/pkg/filetree.(*index).Add
    0.05GB  1.37% 93.14%     0.20GB  5.63%  github.com/anchore/stereoscope/pkg/tree.(*Tree).AddChild
    0.03GB  0.94% 94.08%     0.03GB  0.94%  github.com/anchore/stereoscope/pkg/tree/node.IDSet.Add

I am not sure what to make of it.

@spiffcs
Copy link
Contributor

spiffcs commented May 15, 2024

@sfc-gh-dbasavin - Thanks a million for the reproducible image and the report. We'll try and come back to grype performance tuning when we have a bit more time freed up on other tasks at the moment

@sfc-gh-dbasavin
Copy link

sfc-gh-dbasavin commented May 15, 2024

Hey @spiffcs, another update from me. I did more testing today, and it looks like one possible cause of high memory consumption is that grype/syft does something special when scanning images with a lot of files. To confirm, I created a test image with 3 million dummy files; total image size was only 30 MB. Despite its small size, grype/syft took 10 minutes to scan that image, and peak memory consumption was 12 GB, as reported by my Mac's Activity Monitor. The PPROF report for that test looks a bit different than the one above though:

Showing nodes accounting for 4495.41MB, 92.63% of 4853.32MB total
Dropped 146 nodes (cum <= 24.27MB)
Showing top 10 nodes out of 40
      flat  flat%   sum%        cum   cum%
  887.49MB 18.29% 18.29%  4763.23MB 98.14%  github.com/anchore/stereoscope/pkg/file.NewTarIndex.func1
  745.66MB 15.36% 33.65%  3875.73MB 79.86%  github.com/anchore/stereoscope/pkg/image.(*Layer).readStandardImageLayer.layerTarIndexer.func1
  732.63MB 15.10% 48.75%   853.49MB 17.59%  github.com/anchore/stereoscope/pkg/file.NewMetadata
  655.19MB 13.50% 62.25%  1220.10MB 25.14%  github.com/anchore/stereoscope/pkg/filetree.(*index).Add
  548.29MB 11.30% 73.54%   548.29MB 11.30%  github.com/anchore/stereoscope/pkg/tree.(*Tree).addNode
  352.98MB  7.27% 80.82%   352.98MB  7.27%  github.com/anchore/stereoscope/pkg/file.IDSet.Add (inline)
  159.02MB  3.28% 84.09%   159.02MB  3.28%  github.com/anchore/stereoscope/pkg/image.(*FileCatalog).addImageReferences
  150.56MB  3.10% 87.19%   698.85MB 14.40%  github.com/anchore/stereoscope/pkg/tree.(*Tree).AddChild
  132.35MB  2.73% 89.92%   132.35MB  2.73%  github.com/anchore/stereoscope/pkg/file.NewIDSet (inline)
  131.23MB  2.70% 92.63%   131.23MB  2.70%  github.com/anchore/stereoscope/pkg/filetree/filenode.NewFile (inline)

I hope that information will be useful to you. We are trying to figure out how to make grype consume less memory (so it doesn't get killed by Linux OOM killer), regardless of the size/shape of the target image, but no luck so far. If you have any ideas whatsoever, then please do let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-ignore Don't include this issue in the release changelog question Further information is requested
Projects
Archived in project
Development

No branches or pull requests

6 participants