Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse license from the pom.xml if not contained in the manifest #2115

Merged
merged 9 commits into from
Oct 10, 2023

Conversation

coheigea
Copy link
Contributor

@coheigea coheigea commented Sep 11, 2023

Right now, the license is only found for a Maven artifact if it's an OSGi Bundle with "Bundle-License" declared in the manifest. This PR adds functionality to also check for the license information in the pom.xml.

Running syft before on https://repo1.maven.org/maven2/eu/neilalexander/jnacl/1.0.0/ does not output any license information, with this PR for cyclonedx-json it outputs correctly:

"licenses": [
        {
          "license": {
            "name": "The BSD 2-Clause License"
          }
        }
      ],

@kzantow
Copy link
Contributor

kzantow commented Sep 11, 2023

Thanks for the contribution @coheigea -- NOTE: this modifies the Syft JSON, so you'd need to update the JSON schema. Basically: bump the patch version (since this is just addition) in https://github.com/anchore/syft/blob/main/internal/constants.go#L6, run make generate-json-schema, and commit the changes.

…fest

Signed-off-by: Colm O hEigeartaigh <coheigea@apache.org>
@coheigea
Copy link
Contributor Author

Thanks for the contribution @coheigea -- NOTE: this modifies the Syft JSON, so you'd need to update the JSON schema. Basically: bump the patch version (since this is just addition) in https://github.com/anchore/syft/blob/main/internal/constants.go#L6, run make generate-json-schema, and commit the changes.

Done thanks, I rebased the PR. Please re-run the workflows so that I can check if all the tests are fixed.

@spiffcs
Copy link
Contributor

spiffcs commented Sep 12, 2023

@coheigea thanks for the PR here! I really appreciate you doing the schema bump - would you mind if I added a commit that possibly makes it so we don't have to change the schema here? We might not have to add licenses to the metadata for this to work =)

We can remove the commit if we don't like it, but I think it might be advantageous to not have duplicate values across Metadata and Package.

I think we can take your code and curry the licenses up to where package creation happens and have them all accounted for on the pkg.Licenses field without having to stop off in the Metadata here.

WDYT?

@coheigea
Copy link
Contributor Author

@spiffcs Sure, whatever works really! Let me know when the PR is updated and I'll review and test it again.

The only thing to add is that I'm working on a few follow-ups, where if the license is not defined in the pom.xml, but in a parent pom, then it will go to maven central and extract the license from the parent. So for this I would need access to the pomPropertiesObject found in guessMainPackageNameAndVersionFromPomInfo.

@coheigea
Copy link
Contributor Author

coheigea commented Oct 4, 2023

@spiffcs Any update on getting this merged?

@coheigea
Copy link
Contributor Author

coheigea commented Oct 4, 2023

Maybe we could just merge this and look at refactoring later? It's blocking a number of contributions I want to make to Syft to improve license detection for Maven projects.

@spiffcs
Copy link
Contributor

spiffcs commented Oct 9, 2023

@coheigea you're right - and huge apologies for dragging my feet on this - let me try and get these conflicts resolved and try to get this merged in for our next release

spiffcs and others added 5 commits October 9, 2023 14:06
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
@spiffcs
Copy link
Contributor

spiffcs commented Oct 9, 2023

cc @wagoodman for review so we can get this into tomorrow's planned release

@spiffcs spiffcs requested a review from wagoodman October 9, 2023 20:53
if name == "" {
name = selectName(manifest, j.fileInfo)
}
if version == "" {
version = selectVersion(manifest, j.fileInfo)
}
if len(pomLicenses) > 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @spiffcs ! I just have a query about this line, in my PR the pomLicenses were only added if we didn't already get licenses from the Bundle-License. I'm going to contribute PRs after this is merged to reach out to Maven Central to retrieve the license if not contained in the pom, which we probably want to avoid if we already have a license from the Manifest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! cc @wagoodman for the morning - we discussed this yesterday. I thought that it might be best to surface all licenses found regardless of source to provide the most information to the user (manifest or pom). @coheigea's comment makes sense though with respect to the other PRs coming. Leaving this open for @wagoodman comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason to make this mutually exclusive? Typically we raise up as much information as is useful, and I feel that license info should be fairly unbiased. That is, if there are two licenses then knowing that they agree or disagree is useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a big deal for now, we can revisit it later if it's an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I take back my comment -- today we can't distinguish paths within the jar. We have a future enhancement to the be able to capture paths within archives, but without that enhancement the licenses captured appear to be from the same location (the top-level jar path) and not a manifest vs the pom.xml. I think the results would initially be confusing without this information.

I'll restore the behavior you originally put in @coheigea such that the pom licenses are only added if the license within the manifest cannot be found. When the Location objects get this additional pathing information we could enhance this to show both licenses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been talking about the Location enhancement for a while but I couldn't find an issue for it -- created #2211 to track (and linking with this comment 😄 )

spiffcs and others added 3 commits October 10, 2023 10:14
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman
Copy link
Contributor

wagoodman commented Oct 10, 2023

I adjusted the parsing logic to preserve what the specific license data we are cataloging (name vs URL) since the pkg.License object keeps track of both separately. I also added a parser-level test to ensure this was working as expected.

@wagoodman wagoodman enabled auto-merge (squash) October 10, 2023 15:56
@wagoodman wagoodman merged commit 0748945 into anchore:main Oct 10, 2023
9 checks passed
@wagoodman wagoodman changed the title Parse the Maven license from the pom.xml if not contained in the mani… Parse license from the pom.xml if not contained in the manifest Oct 10, 2023
@wagoodman wagoodman added the enhancement New feature or request label Oct 10, 2023
@coheigea coheigea deleted the coheigea/maven-pom-license branch October 11, 2023 06:20
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
anchore#2115)

* Parse the Maven license from the pom.xml if not contained in the manifest

Signed-off-by: Colm O hEigeartaigh <coheigea@apache.org>

* chore: restore 10.0.2 schema

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>

* chore: generate new 11.0.1 schema

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>

* refactor: remove schema change

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>

* test: update unit tests to align with new pattern

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>

* chore: pr feedback

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>

* chore: remove struct tags

Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>

* keep license name and url semantics preserved on the pkg object

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

---------

Signed-off-by: Colm O hEigeartaigh <coheigea@apache.org>
Signed-off-by: Christopher Phillips <christopher.phillips@anchore.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Co-authored-by: Christopher Angelo Phillips <32073428+spiffcs@users.noreply.github.com>
Co-authored-by: Christopher Phillips <christopher.phillips@anchore.com>
Co-authored-by: Alex Goodman <wagoodman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

4 participants