Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relationships for python poetry packages #2906

Merged
merged 7 commits into from
Jun 4, 2024

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented May 25, 2024

Partially implements #572

Creates relationships for packages found within poetry.lock files. Only packages within the same poetry.lock file are created (no cross-file relationships allowed at this time). Honors python package "extras" so that optional additional package requirements may be specified. At this time the top-level package requirements residing in the pyproject.toml are not captured (thus if there are extras specified on dependencies there, they will not be considered). All test-fixtures added to test this behavior captures both the poetry.lock and pyproject.toml in case we want to extend this functionality in the future.

In order to support python "extras", the generic dependency.Specification and dependency.Resolver have been amended to consider specification variants. These variants represent all additional requirement configurations based on conditional names being present. In python's case this is represented as a package spec and a package[extra-name-goes-here] spec nested onto the first spec. A python package that provides multiple extras must make a variant spec for each extra, and a package that requires more than one extra must specify a requirement for each extra individually (e.g. package[extra1] and package[extra2] not packages[extra1,extra2]).

@wagoodman wagoodman self-assigned this May 25, 2024
@github-actions github-actions bot added the json-schema Changes the json schema label May 25, 2024
@wagoodman wagoodman requested a review from a team May 25, 2024 19:47
@wagoodman wagoodman marked this pull request as ready for review May 25, 2024 19:47
internal/set.go Outdated
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the internal string set is not being used, and also had a need for another set implementation for artifact.ID. I decided to simply replace this implementation with a generic one.

@wagoodman

This comment was marked as outdated.

@wagoodman wagoodman force-pushed the add-python-wheel-egg-relationships branch from b05f536 to 27eee8c Compare May 25, 2024 20:02
Base automatically changed from add-python-wheel-egg-relationships to main May 25, 2024 20:11
@github-actions github-actions bot removed the json-schema Changes the json schema label May 25, 2024
@github-actions github-actions bot added the json-schema Changes the json schema label May 25, 2024

// Variants allows for specifying multiple sets of provides/requires for a single package. This is useful
// in cases when you have conditional optional dependencies for a package.
Variants []Specification
Copy link
Contributor Author

@wagoodman wagoodman May 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: a design variation of this is to instead change the dependency.Specifier to return []Specification instead of just Specification, which would be simpler, but would require more changes to the existing code base -- any thoughts/opinions on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a question before I decide on which variation to land on.

Should dependency.Specifier ever be able to return more than a single instance of Specification, and if so how are those deemed to be separate instances of a Specification?

Since Specification already allows for multiple Provides []string, Requires []string

then

If dependency.Specifier returns []Specification with two distinct items:

s0 := Specification{
    Provides: [foo, bar]
    Requires: [baz]
}

s1 := Specification{
    Provides: [faa, bas]
    Requires: [bat]
}

Why are those returned as separate Specification vs the combined?

s0 := Specification{
    Provides: [foo, bar, faa, bas]
    Requires: [baz, bat]
}

Basically, at this point I don't think I understand yet circumstances where a Specifier would return a list vs a single merged instance when the struct doesn't have any other fields that label its origin or unique value that it originated from

Copy link
Contributor Author

@wagoodman wagoodman Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer:if you combine these together into a single specification then you don't know which conditional "extra" is associated with which "provides" hint.

I think the python package extras features motivates this nicely. You can install a python package foo with pip install foo. But there may be optional features that you may choose to install (provided by the package)... say this package has the ability to work with the aws api, but not by default, since it wouldn't install the boto3 aws sdk... unless you specify the correct "extra" indication in the installation step:

pip install foo[aws]

This assumes that the package author provides the extra called "aws". This extra comes registered a set of required packages that also must be installed... such as boto3. So:

pip install foo
# install standard dependencies (baz)

pip install foo[aws]
# install standard dependencies (baz) + boto3 package and it's dependencies

So! that means, in terms of the dependency.Specification this means that the foo package can be broken into distinct requirements:

fooStandard := Specification{
    Provides: []string{"foo"},
    Requires: []string{"baz"},
}

fooWithAwsPartial := Specification{
    Provides: []string{"foo[aws]"},
    Requires: []string{"boto3"},
}

This allows for matching on each extra separately (so pip install foo[aws,gcp,azure] we'd look for 4 package specs: foo, foo[aws], foo[gcp], foo[azure]).

If a user only specifies foo[gcp], and we combined all 4 together... then it would appear that the specification both provides all variants and requires all dependencies, which is not true!

// this is not true
all := Specification{
    Provides: []string{"foo", "foo[aws]", "foo[gcp]", "foo[azure]"},
    Requires: []string{"batz", "boto3", "gcp-sdk", "azure-sdk"},
}

What we're really looking for is:

standard := Specification{
    Provides: []string{"foo"},
    Requires: []string{"batz"},
}

aws := Specification{
    Provides: []string{"foo[aws]"},
    Requires: []string{"boto3"},
}

gcp := Specification{
    Provides: []string{"foo[gcp]"},
    Requires: []string{"gcp-sdk"},
}

azure := Specification{
    Provides: []string{"foo[azure]"},
    Requires: []string{"azure-sdk"},
}

Where the assumption is that the matching process will specifically look for all parts of the request... so a user input of foo[aws] will result in a specification search for both foo and foo[aws] partials.

The current data structure allows for:

all := Specification{
    Provides: []string{"foo"},
    Requires: []string{"batz"},
    Variant: []Specification{aws, gcp, azure}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion we decided that:

  • An infinitely recursive data structure isn't useful nor simple to reason about if it happens to be leveraged (specs having specs)
  • Changing Specifier to return []Specification implies that returning no specification is valid, however, technically it is not. There should always be at least a single specification that provides the package in question and requires nothing.

Instead we elected to modify the existing spec to this:

type Specification struct {
	ProvidesRequires
	Variants []ProvidesRequires
}

type ProvidesRequires struct {
	Provides []string
	Requires []string
}

spiffcs
spiffcs previously approved these changes Jun 3, 2024
Copy link
Contributor

@spiffcs spiffcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 given the tests look correct from my understanding of what edges we're trying to add with this PR

I asked some questions around the Specifier design decisions that you asked for feedback on.

This might be easier to talk about on a sync review where I write up the conclusion from that as the final note on this PR


// Variants allows for specifying multiple sets of provides/requires for a single package. This is useful
// in cases when you have conditional optional dependencies for a package.
Variants []Specification
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a question before I decide on which variation to land on.

Should dependency.Specifier ever be able to return more than a single instance of Specification, and if so how are those deemed to be separate instances of a Specification?

Since Specification already allows for multiple Provides []string, Requires []string

then

If dependency.Specifier returns []Specification with two distinct items:

s0 := Specification{
    Provides: [foo, bar]
    Requires: [baz]
}

s1 := Specification{
    Provides: [faa, bas]
    Requires: [bat]
}

Why are those returned as separate Specification vs the combined?

s0 := Specification{
    Provides: [foo, bar, faa, bas]
    Requires: [baz, bat]
}

Basically, at this point I don't think I understand yet circumstances where a Specifier would return a list vs a single merged instance when the struct doesn't have any other fields that label its origin or unique value that it originated from

syft/pkg/cataloger/internal/dependency/resolver.go Outdated Show resolved Hide resolved
@wagoodman wagoodman dismissed spiffcs’s stale review June 3, 2024 21:24

I think there are still design questions in the air

@spiffcs
Copy link
Contributor

spiffcs commented Jun 4, 2024

🟢 🥳 Nice change on making the Specification structure easier to understand and not potentially infinite

@wagoodman wagoodman enabled auto-merge (squash) June 4, 2024 19:22
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman wagoodman merged commit 3472b48 into main Jun 4, 2024
11 checks passed
@wagoodman wagoodman deleted the add-poetry-relationships branch June 4, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
json-schema Changes the json schema
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants