Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SBOM from a source code repos have missing or repetitive component inventory #1260

Closed
sprathod369 opened this issue Oct 13, 2022 · 6 comments

Comments

@sprathod369
Copy link

Although Syft focuses on container image scans, it can also create an SBOM for arbitrary filesystem paths. My understanding is that one can use Syft to index host’s packages by scanning directories that commonly contain software binaries and libraries.

However, it's not clear to me if Syft can generate a reasonable and accurate SBOM against a source-code repository. I tested it against a Java repository that had a pom.xml (WebGoat), a Typescript repository that has package.xml and a reference Python repository that has requirements.txt. The generated SBOM from Syft in CycloneDX format when ingested in Dependency-Track does not give a component inventory that is close to something that say a cdxgen tool generates for the same source-code repositories.

My questions are (other than container image scans)

  1. is Syft meant for use cases to generate a reasonable CycloneDX SBOM based on source-code only repositories (that do not have any jars / binaries in the code base) as part of scanning directories?
  2. is Syft meant for use cases to generate a reasonable Operating system BOM in CycloneDX format based on running a scan on a VM's as part of filesystem scanning starting at / directory?
@kzantow
Copy link
Contributor

kzantow commented Oct 13, 2022

Hi @sprathod369, to answer your specific questions:

  1. Yes, Syft is intended to generate an accurate SBOM from source code. When running directory scans, Syft locates things like pom.xml, package-lock.json, and requirements.txt among a number of other things to catalog dependencies.
  2. Yes, Syft is intended to generate an accurate SBOM from a root scan of the filesystem, e.g. for Linux distributions it should locate /etc/os-release and appropriate package manager files along with whatever else is found in the filesystem. I should note that currently there are a slightly different set of catalogers that run depending if you're running an image scan vs. a filesystem scan. For example: a filesystem scan does not process package.json but does process package-lock.json whereas an image scan does the opposite.

Having said that, it looks like cdxgen has explicit Dependency Track support, whereas Syft has not been designed to be imported by any specific tool. CycloneDX documents can be structured differently, and it's possible the way Syft structures CycloneDX is not especially good for Dependency Track.

Do you have any specific repositories that you could point to where you've run cdxgen and Syft and you could explain the differences you see in Dependency Track?

@sprathod369
Copy link
Author

Thanks @kzantow - appreciate your inputs. Yes, I'll run a scan against WebGoat (java-pom based repo) and share my findings. Thanks again!!

@sprathod369
Copy link
Author

sprathod369 commented Oct 20, 2022

@kzantow - I generated an SBOM using Syft and Cdxgen against the same code base and branch of WebGoat source repo and here are some observations.

  1. Component inventory - 106 in syft Vs 188 in cdxgen (sorted by risk score in the screenshot below)
  2. Version number of some components not reflected in syft and license information is missing. Some components are repeated in the inventory (e.g. postgresql in the screenshot below)
  3. Dependency graph tree visible and generated based on cdxgen SBOM but not available via syft SBOM
  4. A search on jackson* component does not have all the components in syft SBOM but these component are available in cdxgen SBOM (screenshot below)

Your above comment confirms that Syft's cycloneDX is structured differently and not designed to integrate with other tools (points 2 and 3 confirm that) but the inventory and missing components (points 1 & 4) may be an issue to investigate further.
I understand that the SBOM generation landscape itself is evolving and tools look at different way in terms of design and approaches so my earlier questions whether Syft is suited for SBOM generation on code repository based file systems or it's more ideal for container image scans.

image

image

image

image

@kzantow
Copy link
Contributor

kzantow commented Oct 20, 2022

Hi @sprathod369 -- I've looked into this a bit further. It seems cdxgen is using external tools (maven, gradle, sbt) to deal with Java projects. This is something Syft has avoided, remaining a static analysis tool instead of invoking external tools. The drawback is that some things -- especially Maven-like projects -- only use the information available that we found; e.g. the current maven pom.xml but not necessarily a parent pom or transitive dependencies. We definitely have some ideas to improve this, but I suspect that's the main reason you're seeing differences in the dependency counts and version information -- cdxgen is running maven to download external pom.xml files from a repository with transitive dependencies and license information. Without doing this, license information is not available, nor are transitive dependencies.

I think this may be a bit of a duplicate of #1251 -- would you agree?

@kzantow
Copy link
Contributor

kzantow commented Nov 9, 2022

NOTE: this is referring to WebGoat: https://github.com/WebGoat/WebGoat

@tgerla
Copy link
Contributor

tgerla commented May 4, 2023

I'll go ahead and close this ticket, but please feel free to let us know if you have any more questions or concerns. Thanks!

@tgerla tgerla closed this as not planned Won't fix, can't repro, duplicate, stale May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants