Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to identify license on Golang packages imported by URL #1056

Closed
lucasgonze opened this issue Jun 21, 2022 · 7 comments · Fixed by #1630
Closed

Unable to identify license on Golang packages imported by URL #1056

lucasgonze opened this issue Jun 21, 2022 · 7 comments · Fixed by #1630
Assignees
Labels
enhancement New feature or request license relating to software licensing

Comments

@lucasgonze
Copy link

What happened:

Given a Golang package that imports from a URL, licenseConcluded in the associated SPDX file is "none". The remote package has a clearly identifiable license file.

What you expected to happen:

The remote repo (https://github.com/aws/aws-sdk-go) has a LICENSE.txt. I expected that to be deferenced and used.

How to reproduce it (as minimally and precisely as possible):

I have a golang file:
./test.go

Containing a remote import:

package test
import (
        awsSDK "github.com/aws/aws-sdk-go/aws"

I process the file using the syft cli and generate spdx. The spdx code generated is:


 "packages": [
  {
   "SPDXID": "SPDXRef-320fa0ac8fde1360",
   "name": "github.com/aws/aws-sdk-go",
   "licenseConcluded": "NONE",

Anything else we need to know?:

Environment:

  • Output of syft version:

syft 0.46.3

  • OS (e.g: cat /etc/os-release or similar):

OS X 11.6

** Screenshot**

image

@lucasgonze lucasgonze added the bug Something isn't working label Jun 21, 2022
@spiffcs
Copy link
Contributor

spiffcs commented Jun 22, 2022

Thanks for filing the issue @lucasgonze. We're currently taking a look at making license detection consistent for all outputs of syft. I'll add this issue in with the group of other license bugs that have been filed.

@wagoodman wagoodman added the license relating to software licensing label Jun 23, 2022
@deitch
Copy link
Contributor

deitch commented Jan 31, 2023

I have been struggling with a similar issue.

My sample main.go:

package main

import (
        "fmt"
        "github.com/eriknordmark/ipinfo"
)

func main() {
        fmt.Println(ipinfo.IPInfo{})
}

and the relevant part of spdx-json:

  {
   "name": "github.com/eriknordmark/ipinfo",
   "SPDXID": "SPDXRef-Package-go-module-github.com-eriknordmark-ipinfo-fd9b9f9ec75774ef",
   "versionInfo": "v0.0.0-20190220084921-7ee0839158f9",
   "downloadLocation": "NOASSERTION",
   "checksums": [
    {
     "algorithm": "SHA256",
     "checksumValue": "0e2222f1766f677ea05598fe650e31313fd9927289035c9162ff32aa2e8371b9"
    }
   ],
   "sourceInfo": "acquired package info from go module information: /tmp/spdxtest/dtest",
   "licenseConcluded": "NONE",

The license file exists at https://github.com/eriknordmark/ipinfo/blob/master/LICENSE, just like the original poster.

So what am I missing? More to the point, how can I help fix this?

@deitch
Copy link
Contributor

deitch commented Feb 1, 2023

I just spent a while going through the syft code. I gained a bit of an understanding of how it works.

As far as I can tell, there is no stage wherein it gets the golang pkg license. Is that correct? The package info is added here, which includes no information about the license.

I can see a few ways to get the license:

  • Inside the golang cataloger: look at the package information as downloaded onto the system. This obviously only works if it is on the system that downloaded it, e.g. $GOPATH/pkg/mod/<pkg>@<version>/<license>, and thus is highly limited. But it would work. This is not done, as it would have to be inside the golang packager.
  • Inside the golang cataloger: look at the package URL and retrieve it from the Internet. This would work, and fairly reliably even without the source, but it also would have to be inside the golang cataloger and is not.
  • Outside the golang cataloger: have some standard way for specific catalogers (like golang) to report, "this is my package URL and commit/version", and then the generic cataloger would know, "if I found that, I can get the license". I didn't see this there either.

Is my analysis correct @spiffcs ? Is there anything in progress? If not, how can we assist?

@deitch
Copy link
Contributor

deitch commented Feb 7, 2023

Having resolved the apk package issue with #1540 with the help of @kzantow , I want to try and submit a PR for this as well, but need some help.

Send me some pointers based on the previous comment, and I will get a PR started.

@deitch
Copy link
Contributor

deitch commented Feb 9, 2023

OK, so it is a fair analysis. @spiffcs I am willing to take a stab at this, if you (or someone else on the team) can point me as to which approach to take from the above.

@kzantow
Copy link
Contributor

kzantow commented Feb 9, 2023

Hey @deitch -- one thing we want to be very cautious of is connecting to external (network) sources. We have users of Syft on air-gapped environments where this won't work. We are not opposed to adding things that reach out to the internet, but need to make sure these are not enabled by default and/or can be disabled.

We are currently searching for Node package licenses in node_modules if they are downloaded -- much like the suggestion to look in $GOPATH/pkg/mod/<pkg>@<version>/<license>, so I don't think anyone would object to using that approach if we are able to find downloaded files.

That said, are there any go commands that might get this information for you? We are also looking at improving build-time information by shelling out to ecosystem tools like go and mvn. Maybe this could be a start?

@deitch
Copy link
Contributor

deitch commented Feb 10, 2023

I'll dig back into the tools, but I don't recall any that provides it offhand.

The challenge with relying on the pkg directory is that it only exists on the build machine. If you scan it elsewhere - very obvious example is a container image - then all of that doesn't exist.

I agree about reaching out to the network as a problem for airgapped environments, or anyone who doesn't want to be connecting outwards. My thought was to do it as an option. Something like

syft packages --retrieve-licenses ./path/to/root

Or retrieve-licenses=true where the default is false.

That says, "if you come across a package (of any kind) whose license might be retrievable from the network, attempt to do so if the option is true."

Would that work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request license relating to software licensing
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants