-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(report): Replacing source_location
in github
report when scanning an image
#5999
Conversation
pkg/report/github/github.go
Outdated
SrcLocation: result.Target, | ||
if report.ArtifactType == "container_image" { | ||
// Replacing `source_location` by the image name and tag | ||
image_reference := strings.Join(report.Metadata.RepoTags, ", ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using report.ArtifactName
instead of report.Metadata.RepoTags
would miss the image tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use "RepoDigests" instead, as that included both the tag and digest? That would address the comment I just put on the PR discussion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm using RepoDigests
would work I did some testing and it seems to be working fine.
Each time you scan your image with a new sha it will close all the previously open alerts for that image and only keep the latest version open.
But you're still able to see the previously closed alerts by simply using the is:closed
filter and searching for a specific manifest works fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mix of suggestion at #5999 (comment) and yours would give us the longest form possible storing all information:
<image_registry>/<image_name>:<tag>@sha256:<image_hash> - <filePath_to_package>
I tested it and although it's very long, filtering seems to work too (the image_hash
is fake).
Important
One big drawback of using DmitriyLewen@8431664 is that since each vulnerable package has its own source_location
set using its FilePath
, it means there is no more grouping in Github.
So, if you scan an image with 10 vulnerable packages, Github will show 10 different manifest instead of 1.
To me this defeats the purpose, and that's why I would advise to only use <image_registry>/<image_name>:<tag>@sha256:<image_hash>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, showing the package location (while having those disadvantages) will help with investigations about what actually needs to be fixed/updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Maxim-Durand
Thanks for your work!
I left some comments. Take a look, please.
Can you also update tests, please
@@ -105,8 +105,17 @@ func (w Writer) Write(ctx context.Context, report types.Report) error { | |||
manifest.Name = string(result.Type) | |||
// show path for language-specific packages only | |||
if result.Class == types.ClassLangPkg { | |||
manifest.File = &File{ | |||
SrcLocation: result.Target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aggregate some packages to 1 result -
trivy/pkg/fanal/applier/docker.go
Lines 263 to 291 in fb36c4e
func aggregate(detail *ftypes.ArtifactDetail) { | |
var apps []ftypes.Application | |
aggregatedApps := map[ftypes.LangType]*ftypes.Application{ | |
ftypes.PythonPkg: {Type: ftypes.PythonPkg}, | |
ftypes.CondaPkg: {Type: ftypes.CondaPkg}, | |
ftypes.GemSpec: {Type: ftypes.GemSpec}, | |
ftypes.NodePkg: {Type: ftypes.NodePkg}, | |
ftypes.Jar: {Type: ftypes.Jar}, | |
} | |
for _, app := range detail.Applications { | |
a, ok := aggregatedApps[app.Type] | |
if !ok { | |
apps = append(apps, app) | |
continue | |
} | |
a.Libraries = append(a.Libraries, app.Libraries...) | |
} | |
for _, app := range aggregatedApps { | |
if len(app.Libraries) > 0 { | |
apps = append(apps, *app) | |
} | |
} | |
// Overwrite Applications | |
detail.Applications = apps | |
} |
But we can find path from Results.Packages.FilePath:
{
"Target": "Python",
"Class": "lang-pkgs",
"Type": "python-pkg",
"Packages": [
{
"Name": "pip",
"Version": "23.2.1",
"Licenses": [
"MIT"
],
"Layer": {
"DiffID": "sha256:a7d483b6066e2ed6c9cd1d5890a93326e3f192ef6e537c046296053fa547a844"
},
"FilePath": "usr/local/lib/python3.12/site-packages/pip-23.2.1.dist-info/METADATA"
},
Maybe we just use FilePath for these packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree with you it's an interesting change as it would make the manifest more precise.
Nonetheless, I think the full FilePath
would be too much to render in Github Dependency and would bloat the UI.
I can do a test with that change if you want and show the results here ?
Otherwise we could try to extract important values from the FilePath
and adding them to result.Target
, but that seems complex and prone to errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For other languages we already use filePath (filePath is stored in target
):
➜ 5999 trivy -d fs -f github . | jq .manifests
...
{
"foo/bar/requirements.txt": {
"name": "pip",
"file": {
"source_location": "foo/bar/requirements.txt"
},
"resolved": {
"click": {
"package_url": "pkg:pypi/click@8.0.0",
"relationship": "direct",
"scope": "runtime"
}
}
}
}
And we didn't get issues about UI.
I can do a test with that change if you want and show the results here ?
Yes please. If you have time, check it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have a few things to report:
-
As you pointed out trivy aggregates multiple packages in the same "App" (for instance
Python
orNode
) into one result.
The issue is that in the sbom format, thesource_location
is set at themanifest
level not at the package level (resolved
key in sbom format). Meaning, if we keep the default trivy aggregation then we have to use only one of all the packagesFilePath
. How to define that value is either arbitrary, or we need some sort of heuristic ? -
In terms of UI it isn't as bloated as I expected, so I think you're right and it's ok to use.
Thesource_location
used in the above screenshot is simply theFilePath
of the last package inresult.Packages
so it's not even correct. -
In the case trivy is scanning an image I think it's still pretty useless to replace the
manifest
'ssource_location
with thisFilePath
value and we should instead reference the image's name and tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning, if we keep the default trivy aggregation then we have to use only one of all the packages FilePath. How to define that value is either arbitrary, or we need some sort of heuristic ?
We have already seen cases where it is necessary to remove aggregation, but this requires large changes and we are not sure about it.
I created a small function to get filePath for each package - DmitriyLewen@8431664
Can you take a look?
In the case trivy is scanning an image I think it's still pretty useless to replace the manifest's source_location with this FilePath value and we should instead reference the image's name and tag.
This makes sense for UI. But what worries me is case where user only knows that image contains vulnerable package, but doesn't have filePath to find and update that package. It would be great to add filePath somehow.
e.g. we can use <image_name>:<tag> - <filePath>
format for manifest map key. Is it possible to see gsbom
file after uploading to GitHub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So user can only get package metadata when taking snapshot, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they use trivy then there's no issue as the user will now be able to find the file path for each package by parsing the JSON report before sending it to GitHub.
But if they use trivy-action then the report is sent automatically and the user doesn't have access to the original report hence he wouldn't be able to parse it.
Maybe if the user then export it from GitHub, the metadata tag with each package file path will still be there, I'll test that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I got from github export to SBOM for the same package as in #5999 (comment):
{
"SPDXID": "SPDXRef-pip-pillow-10.1.0",
"name": "pip:pillow",
"versionInfo": "10.1.0",
"downloadLocation": "NOASSERTION",
"filesAnalyzed": false,
"licenseConcluded": "CAL-1.0",
"supplier": "NOASSERTION",
"externalRefs": [
{
"referenceCategory": "PACKAGE-MANAGER",
"referenceLocator": "pkg:pypi/pillow@10.1.0",
"referenceType": "purl"
}
]
},
As you can see the metadata
tag doesn't show up in the exported SBOM.
Important
Conclusion: users of trivy-action
would currently have no way of finding out the filePath
of each package when using the github
format.
As this issue only involves trivy-action
and not trivy
itself anymore, I think we should continue this discussion once this PR is merged to find a solution for trivy-action
users at aquasecurity/trivy-action#286
(I would recommend to add an option for upload as an artifact so that user can easily fetch the report).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created aquasecurity/trivy-action#307 to make sure users of trivy-action
will know where to find the missing details from Github Dependency.
@DmitriyLewen and @knqyf263 do you agree with this compromise ?
If so, can you review this PR code ?
Co-authored-by: DmitriyLewen <91113035+DmitriyLewen@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
source_location
in github
report when scanning an imagesource_location
in github
report when scanning an image
…ag as source_location
This will be very handy, to give an example I do nightly scans of "nightly" containers, and the most recently release of my project kube-audit-rest and it's not possible to tell them apart which means I'm not sure if something's a regression or not Given aquasecurity/trivy-action#279 (comment) I think it should also include the digest e.g. example.com/my-image:my-tag@sha256-fake-hash so that it's possible to tell which version of an image was scanned, as the image could have been retagged Below is an example screenshot showing how it's not possible to tell them apart currently |
…t for source_location of an image scan
source_location
in github
report when scanning an imagesource_location
in github
report when scanning an image
pkg/report/github/github.go
Outdated
image_with_hash := strings.Join(report.Metadata.RepoDigests, ", ") | ||
_, image_hash, found := strings.Cut(image_with_hash, "@") | ||
if found { | ||
image_reference += "@" + image_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you doing this rather than just using RepoDigests directly? Mind adding a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RepoDigests
looks like <registry>/<image_name>@sha256:<image_hash>
.
Whereas RepoTag
looks like looks like <registry>/<image_name>:<image_tag>
.
So if only using RepoDigests
it means loosing the <image_tag>
.
Good question, I added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Maxim-Durand
Sorry for delay.
Left comments. Take a look, please.
Co-authored-by: DmitriyLewen <91113035+DmitriyLewen@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Maxim-Duran
Thank you very much for your work, discussion, opinions and patience.
LGTM. I think we choosed the best solution for this problem.
@knqyf263 I approved this PR. Take a look, please.
Description
Relates to #5998
Fixes #6008
The goal is to change the
source_location
of themanifest
in SBOM report so it provides useful information when displayed in Github Dependency.The current solution is to use
<image_registry>/<image_name>:<tag>@sha256:<image_hash>
format.Before changes
As you can see instead of specifying the image scanned, it only show
Python
.After changes
The report now shows the image name and tag as the
source_location
for all packages.Checklist