Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import transitive dependencies from SBOMs if available #497

Open
ppkarwasz opened this issue May 13, 2024 · 8 comments
Open

Import transitive dependencies from SBOMs if available #497

ppkarwasz opened this issue May 13, 2024 · 8 comments

Comments

@ppkarwasz
Copy link
Contributor

Some Maven libraries publish shaded artifacts that contain many if not all their dependencies.

Since it is impossible to guess which artifacts were shaded from the POM file alone, the CycloneDX plugin should try to use the CycloneDX SBOMs of their dependencies, if available.

This feature request is related to #472 .

@VinodAnandan
Copy link

@ppkarwasz Thanks for creating this issue. I had a similar idea that I discussed with @hboutemy in CycloneDX Slack, but I failed to provide him with concrete examples. Is it possible for you or anyone else (Cc: @raboof , @prabhu, @lfrancke, @stevespringett ) interested in this feature to provide some concrete examples?

Screenshot 2024-05-15 at 02 20 24

@lfrancke
Copy link

Examples of where this would be useful?

@ppkarwasz
Copy link
Contributor Author

ppkarwasz commented May 15, 2024

In many cases shaded artifacts are the final product and are not consumed by other Java artifacts.
They end up in the binary tar.gz distribution of an application, so they are not a problem for CycloneDX Maven plugin.

There are however valid (or at least justified) cases, when a library shades another and often repackages it (in the sense that it changes the names of Java packages):

  • Shading and repackaging ASM was a common practice. Now it is less common, since ASM has a stable API.
  • pax-logging-log4j shades log4j-core and makes minor modifications to improve its OSGi support. This case is very unfortunate since versions of pax-logging-log4j2 prior to 2.0.13 are affected by at least one of the Log4Shell CVE's.
  • tomcat-dbcp is a repackaged version of commons-dbcp2 with the logging API replaced with tomcat-juli (which itself is a repackaged version of an old commons-logging).
  • Most of the bouncy-castle artifacts might be considered a "shaded" version of another BC artifact.

I consider SBOMs as an build tool and language independent way to expose a project's dependencies. It would be useful to use them to complement Maven's simplified dependency system, e.g. regarding conflicts.

@raboof
Copy link

raboof commented May 15, 2024

As scenario:

Imagine:

  • You're using SBOMs to scan for advisories: you want to know what advisories exist for a product p and its parts.
  • This project has a dependency (d) that shades artifact a: the d jar contains all classes from a, but moved to a different package
  • d publishes an SBOM that correctly reports the fact that d contains a (e.g. through support for maven-shade-plugin #472 or otherwise)
  • There is an advisory published for a
  • The SBOM for p is created with cyclonedx-maven-plugin

When running the vulnerability scanner, it should identify that p is potentially affected by the advisory for a. There are two approaches the vulnerability scanner could learn about the fact that a is part of p:

  • If the vulnerability scanner sees the dependency of p on d, fetches the SBOM for d, and finds out about the shaded a from there
  • If cyclonedx-maven-plugin sees the dependency of p on d, fetches the SBOM for d, and uses this information to include a into the SBOM for p (i.e., the feature described in this issue). The vulnerability scanner then takes this information from the SBOM of p.

So the choice is between going implementing this in all vulnerability scanners (first approach), or implementing this in all SBOM generators (including cyclonedx-maven-plugin, second approach). AFAICS there is no obvious 'architectural' reason to choose one or the other. For 'regular' dependencies, you definitely want the second approach (because the pom of p may influence which transitive dependencies of d would get picked, so looking at d's SBOM would not be accurate for these). For 'shaded' dependencies, either approach would work. The fact that you want the second approach for 'regular' dependencies might be a motivation to go for the second approach and implement this in SBOM generators such as cyclonedx-maven-plugin.

@prabhu
Copy link

prabhu commented May 15, 2024

For cases like these, we need to go beyond the package names to a vulnerability database that offer affected modules, imports, symbols, etc, which doesn't exist in the open-source world. When running cdxgen with --deep argument, the Namespaces belonging to each package would also get collected and stored as an internal property, so some work on the SBOM side is possible.

@hboutemy
Copy link
Contributor

to the examples of shaded content shared previously, I'd add one typical case: in the same gav, there are both the initial .jar and one shaded one, like https://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-http/3.5.3/

on this case, what should THE sbom contain to describe the 2 different jars? how would a project consuming one of these jars as a dependency know what to use? Additional question: as wagon project is a multi-module build, what about the aggregate SBOM vs the gav-only ones? And this question about aggregated is valid both from a producer perspective (wagon) and a consumer perspective (a project consuming one artifact of wagon)?

has really cyclonedx-maven-plugin a chance to magically detect different case without user deep configuration? How many additional files will have the plugin to download to do the advanced analysis?

notice: is this specific to the java world or do other ecosystems have such cases?

there are serious deep dives discussion to have to get the whole picture

@raboof
Copy link

raboof commented May 16, 2024

in the same gav, there are both the initial .jar and one shaded one, like https://repo1.maven.org/maven2/org/apache/maven/wagon/wagon-http/3.5.3/

on this case, what should THE sbom contain to describe the 2 different jars?

Even though those are in the same gav, shouldn't we treat those jars as different artifacts and thus create different SBOMs for them?

there are serious deep dives discussion to have to get the whole picture

Indeed!

@ppkarwasz
Copy link
Contributor Author

on this case, what should THE sbom contain to describe the 2 different jars? how would a project consuming one of these jars as a dependency know what to use?

I think that the SBOM should describe all the artifacts sharing the same GAV (at least the binary ones).
Some will be described as components, while other as assemblies. The classifier and type property of a pURL should be enough to make them apart.

A complex example, jakartaee-migration has 3 assemblies:

  • a shaded.jar,
  • a bin.zip,
  • and a bin.tar.gz.

BTW: I think that if VEX-es become compulsory, developers will think twice before publishing this kind of assemblies. jakartaee-migration contains commons-compress and is vulnerable to all its CVEs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants