Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/use go mod for go bom generation #42

Merged

Conversation

sullivtr
Copy link

@sullivtr sullivtr commented Mar 2, 2021

Description

  • Golang BOM generation should use go.mod for dependency analysis instead of using go.sum

Changes

  • Refactor createGoBom to parse dependencies from go.mod, and use go.sum to gather checksum data for each dependency listed in go.mod
  • Generated BOM no longer includes all components gathered from go.sum, but only includes components gathered from go.mod

Checklist

  • Unit tests
  • Documentation

References

https://github.com/golang/go/wiki/Modules#is-gosum-a-lock-file-why-does-gosum-include-information-for-module-versions-i-am-no-longer-using

No, go.sum is not a lock file. The go.mod files in a build provide enough information for 100% reproducible builds.

@sullivtr
Copy link
Author

sullivtr commented Mar 2, 2021

I just found one more edge case to handle when parsing the go.mod, so Ill close this for now and reopen once I have this fixed

@sullivtr sullivtr closed this Mar 2, 2021
@sullivtr sullivtr reopened this Mar 2, 2021
@prabhu
Copy link
Contributor

prabhu commented Mar 2, 2021

Hi @sullivtr Thank you for this significant contribution! If I understand correctly go.mod would have direct dependencies for a reproducible build where as go.sum would have all packages including direct and indirect dependencies right?

From a bill of materials perspective we can include all dependencies both direct and indirect but set the scope attribute to required for direct dependencies and optional for indirect ones. This way we would not break any existing go users who are looking for all the packages in the BoM file.

Would you be able to implement this change?

@sullivtr
Copy link
Author

sullivtr commented Mar 2, 2021

Hi @sullivtr Thank you for this significant contribution! If I understand correctly go.mod would have direct dependencies for a reproducible build where as go.sum would have all packages including direct and indirect dependencies right?

From a bill of materials perspective we can include all dependencies both direct and indirect but set the scope attribute to required for direct dependencies and optional for indirect ones. This way we would not break any existing go users who are looking for all the packages in the BoM file.

Would you be able to implement this change?

Hi @prabhu, thank you for the quick response on this.

There are a couple of reasons the go.sum should not be the primary source of truth when building the bill of materials.
(The reference material I linked above provides more detail). Ultimately, it is possible, and very common, for a go.sum file to contain checksums for modules that are no longer in use.

In part because go.sum is not a lock file, it retains cryptographic checksums for module versions even after you stop using a module or particular module version. This allows validation of the checksums if you later resume using something, which provides additional safety.

go.mod does in fact contain indirect dependencies which are typically added automatically during a go mod tidy or at build time. With that said, the BOM generation should use go.mod as its source of truth for dependency analysis, as go.sum does is not an accurate representation of transitive dependencies of a go project.

@prabhu
Copy link
Contributor

prabhu commented Mar 3, 2021

Thank you for the detailed clarification @sullivtr

What we need is a way to retain both the old and new modes for go since we are more likely to return less results for existing users. Also noticed that this PR is changing the package lock file format. Let me take this PR to a new branch and add some flags to support both the invocation modes along with a clear warning that the old mode might be returning legacy package versions that are not in use.

Once this is done I will share the PR with you for comments.

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

@prabhu this should be fine. I think as long as we have the ability to generate with go.mod as the source of truth, it certainly does not hurt to have an option to generate based on go.sum if one so desires. I am happy to help in the effort to make this happen.
In the meantime, I'll take a quick stab at adding a --gosum flag to generate golang BOMs using the current behavior.

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

@prabhu So I just went and added the flag (see my latest commit). The output looks like this when passing the --gosum flag:
image

Let me know if this works for you 😄

@prabhu
Copy link
Contributor

prabhu commented Mar 3, 2021

oh nice, love the warning! Instead of cli argument can we use env variable USE_GOSUM. The reason is that while invoking this tool via slscan it is a bit easier to pass env variables without having to change the default commands.

Last change is to revert package-lock.json change and submit it as a separate PR.

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

@prabhu done on a bun. I now added the USE_GOSUM env variable. I also reverted the changes on package lock json

utils.js Outdated
group = name;
}
const version = tmpA[1];
await getGoPkgComponent(gosumData, group, name, version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an expensive operation since it is iterating all the lines in the text and constructing the string look to determine the hash. Is it possible to build a dict of key and hash as the value once and refer it here directly?

utils.js Outdated
group = name;
}
const version = tmpA[3]
await getGoPkgComponent(gosumData, group, name, version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

index.js Outdated
if (DEBUG_MODE) {
console.log(`Parsing ${f}`);
}
goSumReader.push(fs.readFileSync(f, { encoding: "utf-8" }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of holding the raw text perhaps here we build up an object with key being the full name and value containing the hash. This can be passed to parseGoModData directly.

@prabhu
Copy link
Contributor

prabhu commented Mar 3, 2021

I was about to click merge and then noticed few more issues. Any thoughts on the refactoring suggested?

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

I was about to click merge and then noticed few more issues. Any thoughts on the refactoring suggested?

I was on the fence about optimizing this since its not something typically ran in a performance dependent state (as far as I know). However, I can refactor for the optimizations.

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

@prabhu Just pushed some changes to use a dict for faster comparison with go.sum for hash. I did not benchmark the changes or anything, but theoretically, it should be much faster on larger datasets.

@prabhu
Copy link
Contributor

prabhu commented Mar 3, 2021

This is looking a lot better and is quite fast too! I just started testing with some of the sample go apps and got the below error for one of them. Happy to still merge this but make a release after a bit more testing actually. Or can wait if you can take a look at what is going on.

bin/cdxgen ~/go/opa -o /tmp/bom.json
Parsing /Users/prabhu/go/opa/go.sum
(node:71610) UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'replace' of undefined
    at createGoBom (/Volumes/Work/cdxgen/index.js:1063:33)
    at createXBom (/Volumes/Work/cdxgen/index.js:1503:18)

Problematic file

go.sum.txt
go.mod.txt

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

No worries. Ill look into this today, and see if I can repro

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

@prabhu were you running this with USE_GOSUM enabled? Or do you also have a go.mod file I can use alongside the go.sum you attached?

EDIT: NVM I was able to repo. Looking into this now

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

Okay, so I found the issue (whitespace/empty line at the end of the sum file). I just need to ensure there is a value before I try to process it. I will have this fix in before the end of the day 😄

@sullivtr
Copy link
Author

sullivtr commented Mar 3, 2021

Can you try now, @prabhu?

checkSums = data.split("\n")
for (let d in checkSums) {
const l = checkSums[d];
if (l.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Not a biggie that this might not be effective if the line is empty but with a space character.

Copy link
Author

@sullivtr sullivtr Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in this case the go.sum should always be generated, and the /linendings are likely dependent on the machine the go.sum was generated on. In theory a go.sum shouldn't ever contain an empty line with added whitespace

@prabhu prabhu merged commit 47e738f into CycloneDX:master Mar 3, 2021
@prabhu
Copy link
Contributor

prabhu commented Mar 3, 2021

Thanks a ton @sullivtr ! Will do a bit more testing by setting FETCH_LICENSE=true and make a release tonight.

@sullivtr sullivtr deleted the fix/use-go-mod-for-go-bom-generation branch March 3, 2021 22:06
@dlorenc dlorenc mentioned this pull request Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants