Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Go main module pseudo version matching: turn off by default #1894

Merged
merged 1 commit into from
May 30, 2024

Conversation

luhring
Copy link
Contributor

@luhring luhring commented May 28, 2024

About 6 weeks ago, I opened #1797 in an effort to reduce false negatives for the main modules of Go binaries. Before that PR, Grype's previous behavior was to avoid vulnerability matching for the main module altogether if the detected module version was prefixed with v0.0.0-. Grype's reasoning for this had been that Syft will manufacture pseudo versions that start with v0.0.0- in the case where it's not able to find a real version in the Go binary itself.

By making the change in the PR linked above, we indeed saw some reduction in false negatives 🎉 , which was great. We weren't sure about the larger impact of false positives. I only speculated about the potential effects:

My reason for expecting them is that I can dream up edge cases, like when the GHSA fixed version is v3.0.1, but the computed pseudoversion would be v0.0.0-<some commit that's later than v3.0.1>, and so Grype would show the main module as vulnerable when it's not.

The bad news: After seeing this change released and used on a broader set of images, it looks like this net effect is that Grype's F1 score is worsened considerably more than we anticipated. We've been able to recover some recall, but at a painful hit to precision.

The good news: I saw that after my original PR was merged, @spiffcs opened #1816, which made this new behavior configurable and defaulted to on. IMHO, that was brilliant, and in hindsight I probably should've added that in my original PR, too. 😞

At scale, this now-configurable behavior is working very much like CPE matching: it's great for scenarios where the user is willing to pay a high FP cost for the chance at not missing any matches, but on average it adds more noise than signal.

So with this PR, I suggest we adjust the noisy config option to work just like language package CPE matching: off by default but configurable to on, such that consumers can get more sane behavior by default, but they can crank up the sensitivity knob if they know what they're doing.

Curious for your thoughts!

Signed-off-by: Dan Luhring <dluhring@chainguard.dev>
@wagoodman
Copy link
Contributor

Thanks for the insights here! Are there any specific modules/version/CVE data points that you have? I'd like to try and capture these as new data that we use continually in the quality gates for vunnel/grype-db/grype (happy to help).

@luhring
Copy link
Contributor Author

luhring commented May 28, 2024

Thanks for the insights here! Are there any specific modules/version/CVE data points that you have? I'd like to try and capture these as new data that we use continually in the quality gates for vunnel/grype-db/grype (happy to help).

Great idea. Here's one handful of examples: https://github.com/search?q=repo:wolfi-dev/advisories+%22which+means+the+installed+version+was+misidentified%22&type=code

@kzantow
Copy link
Contributor

kzantow commented May 28, 2024

I could certainly be wrong here, but for what it's worth, I think the right behavior is to consider only comparing pseudo-versions with pseudo-versions, but maybe consider using the timestamp portion. What I mean by this is as follows: I've seen go module versions be one of 3 distinct things (although the require syntax docs only mention 2):

  • a complete pseudo-version: v0.0.0-<date>-<hash>
  • a speculated pseudo-version: v3.2.4-<date>-<hash>
  • a semver-like version: v2.5.6

Vulnerabilities are reported against the best version someone can find, and people use modules from different sources including, say a main branch, which means we will potentially see any combination of these versions on both sides of the matching equation.
If we have a pseudo-version with all zeroes, there's no direct relation to a real semver-like version and we really just can't compare them -- it's the proverbial apples and oranges. If we have a speculated pseudo-version, there's a relation to both and we could potentially compare to either but comparing to a full-zero pseudo-version again isn't going to be especially fruitful. So my proposal would be as follows:

if <both versions have semver-like numbering>
  compare semver section
else if <both versions have date-like numbering>
  compare date section
else
  don't compare

The <date> portion is supposed to be repository revision date, which admittedly is not a substitute for proper version lineage. But I believe it's the best we can do without having more context about the hash info. Or I could be all wrong!

@luhring
Copy link
Contributor Author

luhring commented May 28, 2024

Hm, I think it's a little simpler than that... In general, I think pseudo versions are fine to use in vulnerability matching. I don't believe we need to create a new kind of version comparison logic just for them.

The specific issue here is that there's a case where Syft is creating new pseudo versions on the fly, that is, without sourcing them from data found in the actual Go binary. It only does this for Go binary main modules, and only when there's no detected semver or pseudo version for the module already.

This logic is here:

https://github.com/anchore/syft/blob/ac34808b9c55bb274b1205f9b5d9cf495239577d/syft/pkg/cataloger/golang/parse_go_binary.go#L196-L207

So from Grype's perspective, the sign that a false positive is likely to happen is when a Go main module has a version prefixed by v0.0.0-. This doesn't guarantee a false positive, or even guarantee that Syft crafted this value on the fly, but it seems to produce false positives more often then not, due to the (seemingly common) case where Syft has gotten the version value completely wrong (e.g. v0.0.0... instead of v3.4.2...).

@luhring
Copy link
Contributor Author

luhring commented May 28, 2024

I think the change in this PR will help recover Grype's accuracy the fastest. But maybe we could treat this as an incremental improvement, and do more to help after? 🤔

For example, what if Syft was more explicit in its output data that it had invented a module version rather than detected it? This would let us adjust the logic in Grype to be more precise, where it could turn off matching (by default) only for the case where the v0.0.0- version was known to be made up rather than detected?

@spiffcs
Copy link
Contributor

spiffcs commented May 30, 2024

Throwing in a 👍 to the @luhring comment of:

I think the change in this PR will help recover Grype's accuracy the fastest.

I'm pro changes that make our tool by default better for people to use. If we merge this PR I can file an issue summarizing the other problems we want to fix going forward surrounding syft's hallucinated Golang versions and it's ability to communicate to scanners whether the version in the package is a best guess or actually read from the metadata

@spiffcs spiffcs merged commit 316c0e9 into anchore:main May 30, 2024
10 checks passed
@luhring
Copy link
Contributor Author

luhring commented May 30, 2024

Thanks @spiffcs!

@wagoodman wagoodman added the bug Something isn't working label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants