Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make python requirements.txt parser more inclusive #1597

Conversation

manifestori
Copy link
Contributor

Hi :)

This is my first contribution to the fantastic Syft CLI. I tried my best to keep to the standard 馃懐

Today, when parsing the requirements.txt file, Syft ignores unpinned packages when creating an SBOM.
In most cases, this is OK. However, if we want to use the generated SBOM for vulnerability detection, we won't be able to know if an asset contains "any" occurrences of a skipped package.

I know that unpinned version cannot suggest the exact package version, which is problematic. However, I used the "closest version" approach to at least enable CPE matching when searching for vulnerabilities.

This behavior exists in other generators such as cdxgen

Thank you for your time :))

Signed-off-by: manifestori <ori@manifestcyber.com>
Signed-off-by: manifestori <ori@manifestcyber.com>
@manifestori manifestori force-pushed the feat-python-requirementstxt-parsing branch from 2590aaa to 23acfe5 Compare February 21, 2023 12:06
@wagoodman
Copy link
Contributor

Sorry for the delay on this PR -- thanks for the contributing!

I'm a little hesitant to guess at a specific version for these dependencies. We're starting to add more cataloger specific configurations in the future, I feel that this would be a great opt-in functionality, where by default there would still be no guessing, but an app config could allow for turning it on:

# example .syft.yaml

...
python:
  # when running across entries in requirements.txt that do not specify a specific 
  # version, attempt to guess what the version could be based on the version 
  # requirements specified.
  #
  # allowable input: "lowest-version", "highest-version", "false", "true"
  # note: "true" is an alias for "lowest-version"
  # default: "false"
  guess-unpinned-requirements: true

@wagoodman wagoodman self-assigned this Apr 6, 2023
@wagoodman
Copy link
Contributor

wagoodman commented Apr 6, 2023

Since this would be opening up what's expressible for python packages from requirements.txt this is probably a good time to start capturing more metadata from the requirements line too, something like:

The metadata could include all elements from https://peps.python.org/pep-0508/

type PythonRequirementsMetadata struct {
  name string
  extras []string
  versionConstraint string
  url string
  markers map[string]string
}

So for example...

# example requirements.txt
requests [security] >= 2.8.1, == 2.8.* ; python_version < "2.7"
{
  "name": "requests[security]"
  "version": "2.8.1",
  "metadata": {
    "name": "requests",
    "extras": [ "security" ],
    "version": ">= 2.8.1, == 2.8.*",
    "markers": {
      "python_version": "< \"2.7\""
    }
  }
}

Would all this parsing be due in this PR? Not at all. But adding a new PythonRequirementsMetadata that at least had the versionConstraint captured for folks downstream to understand what the original requirements were would be awesome in this PR (I felt the need to motivate the remaining options for the future here).

I can help out with this and push code to this branch if you wanted help / didn't mind.

Edit: this was implemented in #1759 , but on rebase this could nicely leverage it 馃憤

@wagoodman
Copy link
Contributor

Now that #1759 is merged, we'll need to account for URLs that are parsed to see if we can extract tags from URLs in cases where == is not specified in the version constraint:

# requirements.txt

package-one @ git+https://github.com/owner/repo@main

package-two @ git+https://github.com/owner/repo@0.1

package-three @ git+https://github.com/owner/repo@releases/tag/v3.7.1

@manifestori
Copy link
Contributor Author

Sorry for my delay; I've lost track of this PR and want to review all your comments. Pinging back soon :)

@wagoodman
Copy link
Contributor

no worries @manifestori, I was ready to rebase, add a little bit of code, and bring it across the finish line in case you were short on time (I got to this PR rather late 馃槼 ). Up to you!

@wagoodman
Copy link
Contributor

Hey @manifestori I went to rebase this and push more changes however it seems that this PR might not be enabled to allow maintainers to modify the PR. I could change the target branch to a feature branch in this repo, merge it, then make further adjustments in a subsequent PR of the in-repo feature branch. Are you ok with this?

@manifestori
Copy link
Contributor Author

wagoodman

Hey @manifestori I went to rebase this and push more changes however it seems that this PR might not be enabled to allow maintainers to modify the PR. I could change the target branch to a feature branch in this repo, merge it, then make further adjustments in a subsequent PR of the in-repo feature branch. Are you ok with this?

I was afk for awhile, yeah go ahead. as long as people benefit from this change, this would be amazing :)

@wagoodman wagoodman changed the base branch from main to feat-python-requirementstxt-parsing July 27, 2023 12:48
Copy link
Contributor

@wagoodman wagoodman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! I've changed the base branch to an in-repo branch so I can incorporate a rebase and configuration updates as discussed in the comments above.

@wagoodman wagoodman changed the base branch from feat-python-requirementstxt-parsing to main July 27, 2023 12:51
@wagoodman wagoodman changed the base branch from main to feat-python-requirementstxt-parsing July 27, 2023 12:52
@wagoodman wagoodman added enhancement New feature or request changelog-ignore Don't include this issue in the release changelog labels Jul 27, 2023
@wagoodman wagoodman merged commit 7d73751 into anchore:feat-python-requirementstxt-parsing Jul 27, 2023
@wagoodman wagoodman removed the changelog-ignore Don't include this issue in the release changelog label Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants