Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace core SBOM-creation API with builder pattern #1383

Merged
merged 35 commits into from
Jan 12, 2024

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented Dec 2, 2022

Adds a top-level replacement for the syft API. The idea is to allow for encapsulation of more kinds of cataloging without the need to share a data interface. This allows for file-base cataloging and package-based cataloging to share the same approach to selection and configuration.

The existing cataloging functions have been removed, which is why this is a breaking change. I initially attempted to keep both schemes in place, however, the configuration management became ultimately confusing.

This PR adds high-level configuration:

  • syft/cataloging/*.go: cross-cutting configuration that could affect all catalogers, the artifacts they produce, or add downstream artifacts based on these descriptions. This is a set of configurations NOT capabilities (e.g. behavior, such as catalogers themselves).
  • syft/cataloging/pkgcataloging/*.go: wires up configurations for all package catalogers
  • syft/cataloging/filecataloging/*.go: wires up all configurations for file catalogers

This PR removes the existing configurations:

  • syft/pkg/cataloger/config.go

From a high-level, the CreateSBOMConfig is the entrypoint to all cataloging. The configuration itself describes what should be done. Ultimately all capabilities (file cataloging, pkg cataloging, linux distro identification, and cross-cutting relationship additions) are expressed as "tasks". Tasks act like a facade, similar to the command pattern, and encapsulates pre-configured behavior that ultimately writes to an SBOM. The notion of "tasks" has not been exported to the public API.

Secondarily, this PR makes the following adjustments:

  • migrates all cross-cutting relationship functions to an internal relationship package
  • leverages the recent UI enhancements to show catalogers in a tree on the CLI
  • migrates some application configurations in a breaking fashion:
    • exclude-binary-overlap-by-ownership has been moved to package.exclude-binary-overlap-by-ownership
    • default-image-pull-source has been moved to source.image.default-pull-source

Minimal example of using the new API:

src := source.New(...)
sbom, err := syft.CreateSBOM(ctx, src, nil)

or

src := source.New(...)
cfg := syft.DefaultCreateSBOMConfig()
sbom, err := syft.CreateSBOM(ctx, src, cfg)

Leveraging a little more of the API:

cfg := syft.DefaultCreateSBOMConfig().
	WithTool("my-tool", "v1.0.2").
	WithParallelism(5).
	WithCatalogerSelection("+sbom-cataloger", "-rpm").
	WithRelationshipsConfig(
		cataloging.RelationshipsConfig{
			FileOwnership:        false,
			FileOwnershipOverlap: false,
			ExcludeBinaryPackagesWithFileOwnershipOverlap: false,
		},
	).
	WithSearchConfig(
		cataloging.SearchConfig{
			Scope: source.SquashedScope,
		},
	).
	WithDataGenerationConfig(
		cataloging.DataGenerationConfig{
			GenerateCPEs:          false,
		},
	).
	WithFilesConfig(
		filecataloging.Config{
			Selection: file.OwnedFilesSelection,
			Hashers: []crypto.Hash{
				crypto.SHA256,
				crypto.SHA1,
			},
		},
	).
	WithPackagesConfig(
		pkgcataloging.Config{
			Golang: golang.CatalogerConfig{
				SearchLocalModCacheLicenses: true,
				SearchRemoteLicenses:        true,
			},
			LinuxKernel: kernel.LinuxKernelCatalogerConfig{
				CatalogModules: true,
			},
			Python: python.CatalogerConfig{
				GuessUnpinnedRequirements: true,
			},
			Java: java.CatalogerConfig{
				ArchiveSearchConfig: cataloging.ArchiveSearchConfig{
					IncludeIndexedArchives:   true,
					IncludeUnindexedArchives: false,
				},
				UseNetwork: true,
			},
		},
	)

sbom, err := syft.CreateSBOM(ctx, src, cfg)

Today when the cataloging process is run, the application configuration is captured to show the exact input. I've changed this some to instead capture an API-level construct instead of a construct that is in the cmd package. Here is an example of the syft-json descriptor section:

{
  "name": "syft",
  "version": "v0.99.0",
  "configuration": {
    "catalogers": {
      "requested": {
        "default": [
          "binary"
        ],
        "selection": []
      },
      "used": [
        "binary-cataloger",
        "cargo-auditable-binary-cataloger",
        "dotnet-portable-executable-cataloger",
        "go-module-binary-cataloger"
      ]
    },
    "data-generation": {
      "generate-cpes": true
    },
    "extra": null,
    "files": {
      "hashers": [
        "sha-1",
        "sha-256"
      ],
      "selection": "owned-files"
    },
    "packages": {
      "golang": {...},
      "java": {...},
      "javascript": {... },
      "linux-kernel": {... },
      "python": {...}
    },
    "relationships": {
      "exclude-binary-packages-with-file-ownership-overlap": true,
      "file-ownership": true,
      "file-ownership-overlap": true
    },
    "search": {
      "scope": "squashed"
    }
  }
}
click to see all options in an example
{
  "name": "syft",
  "version": "[not provided]",
  "configuration": {
    "catalogers": {
      "requested": {
        "default": [
          "binary"
        ],
        "selection": []
      },
      "used": [
        "binary-cataloger",
        "cargo-auditable-binary-cataloger",
        "dotnet-portable-executable-cataloger",
        "go-module-binary-cataloger"
      ]
    },
    "data-generation": {
      "generate-cpes": true
    },
    "extra": null,
    "files": {
      "hashers": [
        "sha-1",
        "sha-256"
      ],
      "selection": "owned-files"
    },
    "packages": {
      "golang": {
        "local-mod-cache-dir": "/Users/wagoodman/.local/share/rtx/installs/go/1.21.1/packages/pkg/mod",
        "proxies": [
          "https://proxy.golang.org",
          "direct"
        ],
        "search-local-mod-cache-licenses": false,
        "search-remote-licenses": false
      },
      "java-archive": {
        "include-indexed-archives": true,
        "include-unindexed-archives": false,
        "maven-base-url": "https://repo1.maven.org/maven2",
        "max-parent-recursive-depth": 5,
        "use-network": false
      },
      "javascript": {
        "npm-base-url": "https://registry.npmjs.org",
        "search-remote-licenses": false
      },
      "linux-kernel": {
        "catalog-modules": true
      },
      "python": {
        "guess-unpinned-requirements": false
      }
    },
    "relationships": {
      "exclude-binary-packages-with-file-ownership-overlap": true,
      "file-ownership": true,
      "file-ownership-overlap": true
    },
    "search": {
      "scope": "squashed"
    }
  }
}

This beats the current approach of using the catalogers: https://gist.github.com/wagoodman/57ed59a6d57600c23913071b8470175b

PRs broken off of this one

Follow up PRs

Partially implements #558

Fixes #2136
Closes #1731
Closes #1039
Closes #477

@github-actions
Copy link

github-actions bot commented Dec 16, 2022

Benchmark Test Results

Benchmark results from the latest changes vs base branch
name                                                       old time/op    new time/op    delta
ImagePackageCatalogers/alpmdb-cataloger-2                    11.4ms ± 1%    13.8ms ± 4%  +20.83%  (p=0.016 n=4+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2              1.31ms ± 1%    1.68ms ± 7%  +28.60%  (p=0.016 n=4+5)
ImagePackageCatalogers/python-package-cataloger-2            3.34ms ± 1%    4.00ms ± 4%  +19.64%  (p=0.008 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2    1.08ms ± 1%    1.31ms ± 3%  +21.59%  (p=0.008 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         768µs ± 2%     932µs ± 3%  +21.35%  (p=0.008 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                     886µs ± 1%    1116µs ± 5%  +25.94%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                    1.30ms ± 1%    1.62ms ± 5%  +25.00%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                      14.8ms ± 1%    17.4ms ± 5%  +17.32%  (p=0.008 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                      891µs ± 2%    1084µs ± 2%  +21.64%  (p=0.008 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2          6.41µs ± 1%    7.41µs ± 4%  +15.58%  (p=0.008 n=5+5)
ImagePackageCatalogers/dotnet-deps-cataloger-2               1.37ms ± 2%    1.70ms ± 4%  +23.99%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    715µs ± 1%     882µs ± 3%  +23.45%  (p=0.008 n=5+5)
ImagePackageCatalogers/sbom-cataloger-2                      4.46ms ± 0%    5.30ms ± 1%  +18.79%  (p=0.008 n=5+5)
ImagePackageCatalogers/binary-cataloger-2                    3.90ms ± 1%    4.72ms ± 2%  +21.03%  (p=0.008 n=5+5)

name                                                       old alloc/op   new alloc/op   delta
ImagePackageCatalogers/alpmdb-cataloger-2                    5.26MB ± 0%    5.27MB ± 0%     ~     (p=0.095 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2               205kB ± 0%     205kB ± 0%   -0.08%  (p=0.032 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2             963kB ± 0%     961kB ± 0%   -0.12%  (p=0.008 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2     218kB ± 0%     217kB ± 0%   -0.11%  (p=0.032 n=5+5)
ImagePackageCatalogers/javascript-package-cataloger-2         159kB ± 0%     159kB ± 0%     ~     (p=0.444 n=5+5)
ImagePackageCatalogers/dpkgdb-cataloger-2                     200kB ± 0%     199kB ± 0%   -0.18%  (p=0.008 n=5+5)
ImagePackageCatalogers/rpm-db-cataloger-2                     303kB ± 0%     302kB ± 0%   -0.20%  (p=0.008 n=5+5)
ImagePackageCatalogers/java-cataloger-2                      3.49MB ± 0%    3.49MB ± 0%     ~     (p=0.548 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                      182kB ± 0%     182kB ± 0%   -0.06%  (p=0.032 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2          1.12kB ± 0%    1.12kB ± 0%     ~     (all equal)
ImagePackageCatalogers/dotnet-deps-cataloger-2                374kB ± 0%     375kB ± 0%   +0.15%  (p=0.008 n=5+5)
ImagePackageCatalogers/portage-cataloger-2                    139kB ± 0%     138kB ± 0%   -0.07%  (p=0.032 n=5+5)
ImagePackageCatalogers/sbom-cataloger-2                       722kB ± 0%     722kB ± 0%   +0.02%  (p=0.008 n=5+5)
ImagePackageCatalogers/binary-cataloger-2                     656kB ± 0%     656kB ± 0%   +0.04%  (p=0.008 n=5+5)

name                                                       old allocs/op  new allocs/op  delta
ImagePackageCatalogers/alpmdb-cataloger-2                     85.7k ± 0%     85.7k ± 0%     ~     (p=0.683 n=5+5)
ImagePackageCatalogers/ruby-gemspec-cataloger-2               4.25k ± 0%     4.25k ± 0%     ~     (p=0.444 n=5+5)
ImagePackageCatalogers/python-package-cataloger-2             16.5k ± 0%     16.5k ± 0%   -0.05%  (p=0.008 n=5+5)
ImagePackageCatalogers/php-composer-installed-cataloger-2     5.50k ± 0%     5.50k ± 0%     ~     (p=0.556 n=4+5)
ImagePackageCatalogers/javascript-package-cataloger-2         3.33k ± 0%     3.33k ± 0%     ~     (all equal)
ImagePackageCatalogers/dpkgdb-cataloger-2                     4.47k ± 0%     4.47k ± 0%     ~     (all equal)
ImagePackageCatalogers/rpm-db-cataloger-2                     8.12k ± 0%     8.12k ± 0%     ~     (all equal)
ImagePackageCatalogers/java-cataloger-2                       57.5k ± 0%     57.5k ± 0%     ~     (p=0.111 n=5+5)
ImagePackageCatalogers/apkdb-cataloger-2                      5.23k ± 0%     5.23k ± 0%     ~     (p=0.444 n=5+5)
ImagePackageCatalogers/go-module-binary-cataloger-2            38.0 ± 0%      38.0 ± 0%     ~     (all equal)
ImagePackageCatalogers/dotnet-deps-cataloger-2                7.12k ± 0%     7.12k ± 0%     ~     (all equal)
ImagePackageCatalogers/portage-cataloger-2                    3.58k ± 0%     3.58k ± 0%     ~     (p=1.000 n=5+4)
ImagePackageCatalogers/sbom-cataloger-2                       24.4k ± 0%     24.4k ± 0%     ~     (all equal)
ImagePackageCatalogers/binary-cataloger-2                     22.2k ± 0%     22.2k ± 0%     ~     (all equal)

@tommyknows

This comment was marked as outdated.

@wagoodman

This comment was marked as outdated.

@wagoodman wagoodman self-assigned this Apr 26, 2023
@wagoodman wagoodman added the breaking-change Change is not backwards compatible label Nov 14, 2023
@wagoodman wagoodman changed the title Add SBOM builder configuration Replace core SBOM-creation API with builder pattern Nov 14, 2023
@github-actions github-actions bot removed the breaking-change Change is not backwards compatible label Nov 16, 2023
@wagoodman wagoodman force-pushed the refactor-cataloging-api branch 5 times, most recently from 624ff9f to f8aaae5 Compare November 27, 2023 20:16
syft/create_sbom_config.go Outdated Show resolved Hide resolved
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
}
})

result, err := digestsCataloger.Catalog(resolver, coordinates...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from @willmurphyscode , blocking: we need to explicitly pass all coordinates, since there is no guarantee to have any results from a owned-files indication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to fix the functional problem in this PR, but to address the signature and generator issue I really should break that into a separate PR that I follow up with after this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will have some play into the solution here #2487

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Copy link
Contributor

@willmurphyscode willmurphyscode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the careful thought in making the API and configs easier to use for the future.

@wagoodman wagoodman merged commit b0ab75f into main Jan 12, 2024
10 checks passed
@wagoodman wagoodman deleted the refactor-cataloging-api branch January 12, 2024 22:39
@wagoodman wagoodman added the enhancement New feature or request label Jan 17, 2024
GijsCalis pushed a commit to GijsCalis/syft that referenced this pull request Feb 19, 2024
* remove existing cataloging API

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add file cataloging config

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add package cataloging config

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add configs for cross-cutting concerns

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* rename CLI option configs to not require import aliases later

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update all nested structs for the Catalog struct

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update Catalog cli options

- add new cataloger selection options (selection and default)
- remove the excludeBinaryOverlapByOwnership
- deprecate "catalogers" flag
- add new javascript configuration

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* migrate relationship capabilities to separate internal package

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* refactor golang cataloger to use configuration options when creating packages

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* create internal object to facilitate reading from and writing to an SBOM

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* create a command-like object (task) to facilitate partial SBOM creation

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add cataloger selection capability

- be able to parse string expressions into a set of resolved actions against sets
- be able to use expressions to select/add/remove tasks to/from the final set of tasks to run

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add package, file, and environment related tasks

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update existing file catalogers to use nested UI elements

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add CreateSBOMConfig that drives the SBOM creation process

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* capture SBOM creation info as a struct

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add CreateSBOM() function

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix tests

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update docs with SBOM selection help + breaking changes

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix multiple override default inputs

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix deprecation flag printing to stdout

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* refactor cataloger selection description to separate object

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* address review comments

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* keep expression errors and show specific suggestions only

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* address additional review feedback

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* address more review comments

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* addressed additional PR review feedback

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix file selection references

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* remove guess language data generation option

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add tests for coordinatesForSelection

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* rename relationship attributes

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add descriptions to relationships config fields

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* improve documentation around configuration options

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add explicit errors around legacy config entries

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

---------

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants