Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan local go mod licenses for golang packages #1645

Merged
merged 21 commits into from
Mar 23, 2023

Conversation

deitch
Copy link
Contributor

@deitch deitch commented Mar 3, 2023

Partial replacement for #1630

After discussions with @kzantow , it was decided to split #1630 into 2 PRs: one that scans local $GOPATH/mod, which is guaranteed to be safe, and another (after) that will take a CLI option to enable pulling down the golang package from the Internet via the GOPROXY, if it is not found in local GOPATH.

This is the first part: look for the go package in GOPATH/mod, and thus has no CLI changes, and does not reach out to the Internet.

@deitch
Copy link
Contributor Author

deitch commented Mar 3, 2023

Note: I did not remove the remote package retrieval code, as it has been tested and works. Instead, I put in a const disableRemotePackage, which we can just remove when the future PR opens.

Signed-off-by: Avi Deitcher <avi@deitcher.net>
@deitch
Copy link
Contributor Author

deitch commented Mar 7, 2023

CI is clean. What else can I do to help move this along?

@kzantow kzantow changed the title support for scanning license files in golang packages support for scanning local go mod licenses for golang packages Mar 17, 2023
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Copy link
Contributor

@spiffcs spiffcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred resolver stuff for finding resolver outside of the source looks good to me. Thanks for updating this from the fs pattern to the internal resolver stuff.

I just had some minor comments about licenses being nil vs empty and if the check for splitting on the go.mod ever runs into cases where we get 2 parts for a valid module name

syft/pkg/cataloger/golang/package.go Show resolved Hide resolved
syft/pkg/cataloger/golang/licenses.go Outdated Show resolved Hide resolved
@kzantow kzantow changed the title support for scanning local go mod licenses for golang packages Scan local go mod licenses for golang packages Mar 17, 2023
@deitch
Copy link
Contributor Author

deitch commented Mar 20, 2023

Thanks for pushing the rewrite onto it.

How does this work now, then? As far as I can tell, you pushed in a few key changes:

  • enabling search for licenses outside of current tree using env var
  • resolver to find those licenses
  • nice test structure (I like what you did there)

Is that about correct? So would golang license searching happen only if the env var is set SYFT_PACKAGE_CATALOGER_GOLANG_SEARCH_LOCAL_GO_MOD_LICENSES=true?

And if so, would a similar env var enable network search, e.g. SYFT_PACKAGE_CATALOGER_GOLANG_SEARCH_INTERNET_GO_MOD_LICENSES=true?

Also, why is it LOCAL_GO_MOD_LICENSES? The licenses aren't in go.mod, they are in the actual packages?

And does it work when scanning a go binary, rather than source? (in which case the GO_MOD in the env var is confusing)

@kzantow
Copy link
Contributor

kzantow commented Mar 20, 2023

How does this work now, then? As far as I can tell, you pushed in a few key changes:

  • enabling search for licenses outside of current tree using env var
  • resolver to find those licenses
  • nice test structure (I like what you did there)

Yes, exactly.

So would golang license searching happen only if the env var is set SYFT_PACKAGE_CATALOGER_GOLANG_SEARCH_LOCAL_GO_MOD_LICENSES=true?

Also correct.

And if so, would a similar env var enable network search, e.g. SYFT_PACKAGE_CATALOGER_GOLANG_SEARCH_INTERNET_GO_MOD_LICENSES=true?

I think that's the idea for the follow-on network search PR, something like that.

Also, why is it LOCAL_GO_MOD_LICENSES? The licenses aren't in go.mod, they are in the actual packages?

It is LOCAL_GO_MOD_LICENSES because it is enabling scanning outside the scan target on the machine running Syft -- e.g. if I run syft some-container-with-go-binaries, Syft will now by default try to find licenses in the container but it will not look at my local go mod cache. If I run SYFT_PACKAGE_CATALOGER_GOLANG_SEARCH_INTERNET_GO_MOD_LICENSES=true syft some-container-with-go-binaries it will look first in the container, and then fall back to looking for go mod cache on the local machine running Syft itself.

And does it work when scanning a go binary, rather than source? (in which case the GO_MOD in the env var is confusing)

It works for scanning both source and go binaries. GO_MOD is the technology, not the file. I had GO_MOD_CACHE but the env var is already so unwieldy, I just left it as GO_MOD... happy for naming suggestions here.

I hope that all makes sense!

Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
@kzantow
Copy link
Contributor

kzantow commented Mar 20, 2023

Based on feedback, I've updated the env var to: SYFT_GOLANG_SEARCH_LOCAL_MOD_CACHE_LICENSES is that more clear @deitch ?

@deitch
Copy link
Contributor Author

deitch commented Mar 20, 2023

Yup. All clear. I would request (as a user) that we document that capabilities somewhere? Whether CLI flag with --help explaining it, or docs in the repo (or both?). I guarantee I will hunt to find it in 2 months, and I am thinking the last thing maintainers need is people opening issues or slack messages asking how to do this. 😄

@kzantow
Copy link
Contributor

kzantow commented Mar 20, 2023

@deitch good call, I'll add it to the configuration section of the documentation. NOTE: you can always dump the application config by running syft -vv, and you'd see the golang section, like:

golang:
  search-local-mod-cache-licenses: false

The environment variables are always prefixed by SYFT_ and then match the full path in the config file with underscores and capitalized, so as noted above golang.search-local-mod-cache-licenses translates to SYFT_GOLANG_SEARCH_LOCAL_MOD_CACHE_LICENSES

Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
if err != nil {
log.Debug("unable to determine user home dir: %v", err)
}
goPath = path.Join(homeDir, "go")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like we should use the value of homeDir if error != nil... should we return an error instead (and remove the log statement)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment. I don't think we should try and recreate the GOPATH. Either it is set and we look there, or it is not, and we do not even try.

If we get requests later that lots of people want us to determine this automatically, we always can add it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think I like that better. If goPath is empty, then return don't return a resolver to search against at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the go docs, the default gopath is:

The GOPATH environment variable specifies the location of your workspace. It defaults to a directory named go inside your home directory, so $HOME/go on Unix, $home/go on Plan 9, and %USERPROFILE%\go (usually C:\Users\YourName\go) on Windows.

Why wouldn't we default to this same location if GOPATH is not explicitly set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, you are right. I was thinking that the logic of, "if you set this flag, we will look wherever GOPATH points, trusting that you know where you want us to go; if not, we will not check." But that is not aligned with the go standard, as @kzantow pointed out.

That leaves us with two possibilities, based on two assumptions:

  • we assume that the user only wants us to go where they explicitly declare; therefore if GOPATH is not set, we do not go anywhere.
  • we assume that the user wants to follow the go standard, whether GOPATH is implicit or explicit; therefore if GOPATH is not set, we replicate the logic

Much as I like the first, I think the second is what people will expect. Requiring them to set the env var to enable it is sufficient for them to say, "go where GOPATH takes you, whether explicit or implicit." If you don't want us to go there, do not enable it.

I was hoping the go command's resolution of GOPATH was a library we could just hook into, but not such luck; it is a private func inside src/cmd/go/internal/cfg/, i.e. internal.

Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deitch and @kzantow nice work!

Future work for the config: as more parameters are added that equate to relatively the same thing we might want to think about "single switch" options that enable all options of that kind.

@kzantow kzantow merged commit 9fd5322 into anchore:main Mar 23, 2023
@deitch
Copy link
Contributor Author

deitch commented Mar 23, 2023

nice work!

I just got the train rolling; I think that @kzantow did all of the real heavy lifting here.

Either way, I am happy to see this go in. Network-based should be straightforward after that, methinks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants