Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python wheel egg relationships #2903

Merged
merged 6 commits into from
May 25, 2024
Merged

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented May 24, 2024

Partially implements #572

Adds dependency relationships between python packages installed within site-package directories. This only associates packages that are either:

  • within global python installation site-packages dir
  • within a virtual env site-packages dir (which explicitly does not link to a global site package dir)
  • across a virtual env site-packages dir and global site-packages dir (when the pyenv.cfg indicates to do this)

This adds indications of Requires-Dist, Provides-Extra, and Requires-Python to the existing pkg.PythonPackage metadata struct.

The extras field is only partially supported, in that, if package A has an extra declared that will install package B conditionally, and package B is also found to be installed, then there is a relationship explicitly created that links B dependency-of A, even if A was not installed with any extras. There are a couple of reasons to do this:

  1. there is not (as far as I can tell) a great way to really determine if a package was installed with extras in a declarative sense, really the existence of installed packages is the best way to do this
  2. many python packages have try/except clauses around dependencies that are conditionally installed via extras. If the package happens to be installed anyway, then the import will function, and it would be as if the user installed the package with that particular extra (in the simplest of cases). In that way, even if an extra is partially installed, there is a chance that that dependency would be used, so we make the relationship.

What is not supported in this PR is validating any version constraints from any Requires-Dist declaration... if the package is installed, then it is assumed that the virtual environment is valid and consistent... so pip or other tooling has already performed to correct version constraint processing during installation.

Regarding changes made to cataloger/generic: In this particular instance we could not use exclusively material that was already present on packages, we needed to gather additional material via the file.Resolver to do so. Additionally, this material pyenv.cfg is not evidence for any single package, but instead configuration for an entire virtual environment, so didn't belong as additional data attached to package metadata. For this reason a new pattern was added next to generic.Processor, called generic.ResolvingProcessor as to not make any breaking changes.

I did decide that users should be able to specify both Processors and ResolvingProcessors in the same slice of post-cataloging processor functions, so that callers can closely control the order between these two kinds of processors. This lead to additional non-exported interfaces to facilitate this (and added more complexity than I wanted) but seemed like the best way to achieve the stated goals.

Lastly, this rearranges some internal functionality for necessary re-use (specifically using the relationship.Index capability). This caused additional utility functions to be moved because of this (otherwise it would be called binary.RelationshipIndex which makes no sense).

@wagoodman wagoodman requested a review from a team May 24, 2024 15:10
@wagoodman wagoodman added the enhancement New feature or request label May 24, 2024
@wagoodman wagoodman self-assigned this May 24, 2024
@github-actions github-actions bot added the json-schema Changes the json schema label May 24, 2024
Copy link
Contributor

@spiffcs spiffcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see the relationship index being useful over in other ecosystems!

I don't have any comments blocking this to merge. I tried it on a couple of anchore projects written in python to get an idea of the before and after for what new relationships show up and it all LGTM as far as what was expected to be added appeared with this branch.

The schema bump is 🟢 so no comments there.

This comment was covered in the summary. There was a new processExecutor private interface added with a private function.

So resolvingProcessorWrapper implements processorWrapper which embeds the public ResolvingProcessor

I agree that it's complex and took me a little while to get a mental model of how everything behaved. I like how it works now and can get on board with the tradeoffs made here.

There was a dependency_test.go stubbed out syft/pkg/cataloger/python/dependency_test.go did you mean to go back and fill this out or delete it?

I tried a couple different configurations for virtual env and found syft/pkg/cataloger/python/virtual_env.go worked, but suspect if there is anything I missed it's in this file as I don't have a great mental model of all python virtual env configurations or if there is something missing here.

@wagoodman
Copy link
Contributor Author

There was a dependency_test.go stubbed out syft/pkg/cataloger/python/dependency_test.go did you mean to go back and fill this out or delete it?

indeed, I was going to add a test here but then realized the code in question was already covered... however, I have a follow up PR #2906 that will add a test in this file, so I'll leave this for now.

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman wagoodman force-pushed the add-python-wheel-egg-relationships branch from b05f536 to 27eee8c Compare May 25, 2024 20:02
@wagoodman
Copy link
Contributor Author

note: force pushed to rebased on main (resolving conflicting schema changes)

@wagoodman wagoodman enabled auto-merge (squash) May 25, 2024 20:03
@wagoodman wagoodman merged commit 05e8ba9 into main May 25, 2024
11 checks passed
@wagoodman wagoodman deleted the add-python-wheel-egg-relationships branch May 25, 2024 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request json-schema Changes the json schema
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants