-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Go main module pseudo version matching: turn off by default #1894
Conversation
Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
Thanks for the insights here! Are there any specific modules/version/CVE data points that you have? I'd like to try and capture these as new data that we use continually in the quality gates for vunnel/grype-db/grype (happy to help). |
Great idea. Here's one handful of examples: https://github.com/search?q=repo:wolfi-dev/advisories+%22which+means+the+installed+version+was+misidentified%22&type=code |
I could certainly be wrong here, but for what it's worth, I think the right behavior is to consider only comparing pseudo-versions with pseudo-versions, but maybe consider using the timestamp portion. What I mean by this is as follows: I've seen go module versions be one of 3 distinct things (although the require syntax docs only mention 2):
Vulnerabilities are reported against the best version someone can find, and people use modules from different sources including, say a
The |
Hm, I think it's a little simpler than that... In general, I think pseudo versions are fine to use in vulnerability matching. I don't believe we need to create a new kind of version comparison logic just for them. The specific issue here is that there's a case where Syft is creating new pseudo versions on the fly, that is, without sourcing them from data found in the actual Go binary. It only does this for Go binary main modules, and only when there's no detected semver or pseudo version for the module already. This logic is here: So from Grype's perspective, the sign that a false positive is likely to happen is when a Go main module has a version prefixed by |
I think the change in this PR will help recover Grype's accuracy the fastest. But maybe we could treat this as an incremental improvement, and do more to help after? 🤔 For example, what if Syft was more explicit in its output data that it had invented a module version rather than detected it? This would let us adjust the logic in Grype to be more precise, where it could turn off matching (by default) only for the case where the |
Throwing in a 👍 to the @luhring comment of:
I'm pro changes that make our tool by default better for people to use. If we merge this PR I can file an issue summarizing the other problems we want to fix going forward surrounding syft's hallucinated Golang versions and it's ability to communicate to scanners whether the version in the package is a best guess or actually read from the metadata |
Thanks @spiffcs! |
About 6 weeks ago, I opened #1797 in an effort to reduce false negatives for the main modules of Go binaries. Before that PR, Grype's previous behavior was to avoid vulnerability matching for the main module altogether if the detected module version was prefixed with
v0.0.0-
. Grype's reasoning for this had been that Syft will manufacture pseudo versions that start withv0.0.0-
in the case where it's not able to find a real version in the Go binary itself.By making the change in the PR linked above, we indeed saw some reduction in false negatives 🎉 , which was great. We weren't sure about the larger impact of false positives. I only speculated about the potential effects:
The bad news: After seeing this change released and used on a broader set of images, it looks like this net effect is that Grype's F1 score is worsened considerably more than we anticipated. We've been able to recover some recall, but at a painful hit to precision.
The good news: I saw that after my original PR was merged, @spiffcs opened #1816, which made this new behavior configurable and defaulted to on. IMHO, that was brilliant, and in hindsight I probably should've added that in my original PR, too. 😞
At scale, this now-configurable behavior is working very much like CPE matching: it's great for scenarios where the user is willing to pay a high FP cost for the chance at not missing any matches, but on average it adds more noise than signal.
So with this PR, I suggest we adjust the noisy config option to work just like language package CPE matching: off by default but configurable to on, such that consumers can get more sane behavior by default, but they can crank up the sensitivity knob if they know what they're doing.
Curious for your thoughts!