Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contributing New Catalogers - A Guide for Hacktoberfest and Beyond #2184

Closed
spiffcs opened this issue Sep 28, 2023 · 1 comment
Closed

Contributing New Catalogers - A Guide for Hacktoberfest and Beyond #2184

spiffcs opened this issue Sep 28, 2023 · 1 comment
Labels
enhancement New feature or request hacktoberfest Pointing Users to Hacktoberfest activities

Comments

@spiffcs
Copy link
Contributor

spiffcs commented Sep 28, 2023

Contributing New Catalogers to Syft

Join the community meeting on October 12 where we will be discussing new features being added to syft and the inflight cataloger work. Calendar Link

To see the current open cataloger requests check out the list here. If an issue is not assigned it is open for anyone to contribute.


What is a Cataloger

A cataloger is syft's term for a module that knows how to detect and analyze components from a particular package manager or ecosystem.

If you're interested in contributing a new cataloger take a look at the below documentation. The issue also goes further into how to start thinking about what type of cataloger you contribute after selecting the ecosystem. If you have questions you can always @anchore/tools on this thread and we will come by and answer any questions you have.
Developing.MD
Be sure to following the Contributing documentation when authoring your feature!

Types of Cataloger

Catalogers generally come in two flavors:

Declared package Cataloger

One type of cataloger describes declared packages. These catalogers are used by default when scanning directories ("directory catalogers"). The default list can be found here:

syft/README.md

Lines 185 to 215 in 44e5480

##### Directory Scanning:
- alpmdb
- apkdb
- binary
- cocoapods
- conan
- dartlang-lock
- dotnet-deps
- dpkgdb
- elixir-mix-lock
- erlang-rebar-lock
- go-mod-file
- go-module-binary
- graalvm-native-image
- haskell
- java
- java-gradle-lockfile
- java-pom
- javascript-lock
- linux-kernel
- nix-store
- php-composer-lock
- portage
- python-index
- python-package
- rpm-db
- rpm-file
- ruby-gemfile
- rust-cargo-lock
- sbom
- swift-package-manager
These catalogers tend to be when you are parsing manifest files for package managers (e.g. python requirements.txt, a ruby gemfile.lock, javascript package.json).

Example:

Given this package-lock.json

{
  "requires": true,
  "lockfileVersion": 1,
  "dependencies": {
    "@actions/core": {
      "version": "1.6.0",
      "resolved": "https://registry.npmjs.org/@actions/core/-/core-1.6.0.tgz",
      "integrity": "sha512-NB1UAZomZlCV/LmJqkLhNTqtKfFXJZAUPcfl/zqG7EfsQdeUJtaWO98SGbuQ3pydJ3fHl2CvI/51OKYlCYYcaw==",
      "requires": {
        "@actions/http-client": "^1.0.11"
      }
    }
  }

Syft would construct this package

{
	Name:      "@actions/core",
	Version:   "1.6.0",
	FoundBy:   "javascript-lock-cataloger",
	PURL:      "pkg:npm/%40actions/core@1.6.0",
	Language:  pkg.JavaScript,
	Type:      pkg.NpmPkg,
	MetadataType: pkg.NpmPackageLockJSONMetadataType,
},

Installed

The second type of cataloger is one that catalogs installed packages. These catalogers are used by default when scanning container images ("image catalogers"). This tends to be when you are parsing files that are left behind by package managers when you use them to install software packages (e.g. the RPM database, python egg or wheel metadata, etc.). The default list used for image scanning can be found here:

syft/README.md

Lines 166 to 183 in 38d5ef2

##### Image Scanning:
- alpmdb
- apkdb
- binary
- dotnet-deps
- dpkgdb
- go-module-binary
- graalvm-native-image
- java
- javascript-package
- linux-kernel
- nix-store
- php-composer-installed
- portage
- python-package
- rpm-db
- ruby-gemspec
- sbom

Example:

The ALPM cataloger searches for desc files using the following glob:
**/var/lib/pacman/local/**/desc. Here is a quick primer on glob matching.

So given a desc file like below is found:

%NAME%
gmp

%VERSION%
6.2.1-2

%BASE%
gmp

%DESC%
A free library for arbitrary precision arithmetic

%URL%
https://gmplib.org/

...........

Syft would construct:

{
	Name:    "gmp",
	Version: "6.2.1-2",
	Type:    pkg.AlpmPkg,
	FoundBy: "alpmdb-cataloger",
	MetadataType: "AlpmMetadata",
}

IMPORTANT

Make sure to read through different cataloger examples to be sure you're including all required information for a syft package. The above examples have been cut down for brevity, but other important fields like purl, licenses, and locations should also be considered.

How to write a cataloger

Before getting started in implementing a cataloger you need to determine which one you are trying to build from the above flavors. If you’re not sure about which of the above flavors you’re trying to implement, feel free to tag the @anchore/tools in the issue - we’re always happy to answer questions or help with the design of new features.

Examples

Here are some good example of catalogers added recently to work off of when considering the contribution:

  1. The GithubAction and Workflow cataloger is the most recent Add GitHub actions and shared workflow usage catalogers #2140
  2. The R cataloger is a good example of a new image cataloger feat: Add R cataloger #1790
  3. The Haskell cataloger Update haskell cataloger to use updated generic cataloger #1290

Components of a Cataloger

Let's take a look at a single cataloger and its constituent components and how it gets wired up into syft, starting with the Haskell cataloger:

The parser function

This function is what does all of the work in a cataloger. It takes a io.Reader to a file that contains content to be cataloged, in this case a stack.yaml file

The cataloger object itself,

This object pairs up a parser function with one or more globs to files that should be cataloged, in this case **/stack.yaml files. For a primer on globs see the previous link from this issue.

The list of catalogers

Lastly you need to wire up your cataloger into that syft will use at runtime. As mentioned earlier, there are two kinds of catalogers (and not necessarily mutually exclusive), so you’ll need to add your cataloger to one or both lists.

After looking through the above examples section, the developing document is the best place to head next for details on building a new cataloger:
https://github.com/anchore/syft/blob/main/DEVELOPING.md#building-a-new-cataloger

Summary

As maintainers we're always happy to help guide this process so if you're interested please feel free to tag @anchore/tools in any of the spaces you might have questions.

@spiffcs spiffcs added enhancement New feature or request hacktoberfest Pointing Users to Hacktoberfest activities labels Sep 28, 2023
@wagoodman
Copy link
Contributor

Cleaning up hacktoberfest activities

@wagoodman wagoodman closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hacktoberfest Pointing Users to Hacktoberfest activities
Projects
None yet
Development

No branches or pull requests

2 participants