I see the PackageContentType value is set in various collectors but it is missing for the pypi one.
Most Python packages have usually a binary wheel (or several) and a source (tar.gz, zip, ..) available on PyPI.
It would be very useful to have the package_content value for PyPI packages on the PurlDB API consumer side (DejaCode for example) to easily select the source as the primary package when multiple records are available (2 or more usually per PURL).
For example: purl=pkg:pypi/boto3@1.37.26
2 records are available in the PurlDB:
pkg:pypi/boto3@1.37.26?file_name=boto3-1.37.26-py3-none-any.whl
pkg:pypi/boto3@1.37.26?file_name=boto3-1.37.26.tar.gz
Those 2 are properly part of the same PackageSet, but lack a package_content value.
Something along this logic could be implemented (required to be adapted and tested though):
from pathlib import Path
from urllib.parse import urlparse
from packagedb.models import PackageContentType
def get_pypi_package_content_type(download_url):
source_extensions = (".tar.gz", ".zip", ".tar.bz2", ".tar.xz", ".tar.Z", ".tgz", ".tbz")
binary_extensions = (".whl", ".egg")
filename = Path(urlparse(download_url).path).name
if filename.endswith(source_extensions):
return PackageContentType.SOURCE_ARCHIVE
if filename.endswith(source_extensions):
return PackageContentType.BINARY
I see the PackageContentType value is set in various collectors but it is missing for the pypi one.
Most Python packages have usually a binary wheel (or several) and a source (tar.gz, zip, ..) available on PyPI.
It would be very useful to have the
package_contentvalue for PyPI packages on the PurlDB API consumer side (DejaCode for example) to easily select the source as the primary package when multiple records are available (2 or more usually per PURL).For example:
purl=pkg:pypi/boto3@1.37.262 records are available in the PurlDB:
pkg:pypi/boto3@1.37.26?file_name=boto3-1.37.26-py3-none-any.whlpkg:pypi/boto3@1.37.26?file_name=boto3-1.37.26.tar.gzThose 2 are properly part of the same PackageSet, but lack a
package_contentvalue.Something along this logic could be implemented (required to be adapted and tested though):