enh(sbom-licenses): refactor common code; enable multi-version diffs#1597
Conversation
Signed-off-by: Will Killian <wkillian@nvidia.com>
WalkthroughCentralizes PyPI license resolution into Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@ci/scripts/license_diff.py`:
- Around line 116-122: The current logic uses the sets added and removed
directly which leads to nondeterministic pairing and drops packages when the
sets differ in size; change the block so you first convert and sort the
candidates into deterministic lists (e.g., sort removed_list and added_list by a
stable key such as version or package name extracted from base_variants[...] and
head_variants[...]), then pair up to min(len(removed_list), len(added_list)) and
append those pairs to changed_entries, and finally extend added_entries with any
remaining items from added_list and removed_entries with any remaining items
from removed_list; update the code paths that reference added, removed,
base_variants, head_variants, changed_entries, added_entries, and
removed_entries accordingly.
In `@ci/scripts/package_utils.py`:
- Around line 21-28: The TypedDict Package currently declares version: str as
required but runtime code treats it as optional; change the declaration to make
version optional using typing.NotRequired (e.g., replace "version: str" with
"version: typing.NotRequired[str]") and ensure typing.NotRequired is imported;
leave UvLock as-is. Update any callsites (e.g., package_variant_key and usages
in license_diff.py) that rely on pkg.get("version") remain unchanged but now
type-check correctly. Run the type checker to confirm no further TypedDict
errors.
- Line 62: Add a single trailing newline to the end of the file that contains
the statement returning typing.cast(str, min(candidates, key=len,
default="(License not found)")); locate the return expression using the unique
symbols typing.cast and min(candidates, key=len, default="(License not found)")
and ensure the file ends with exactly one newline character (no extra blank
lines).
- Around line 46-51: The URL fetch in package_utils.py uses request.urlopen(url)
without a timeout which can hang CI; update the call to include a reasonable
timeout (e.g., timeout=10) when calling request.urlopen(url, timeout=10) and
ensure exception handling covers timeout/URLError (the existing except
Exception: return "(License not found)" is fine but you can optionally catch
socket.timeout/URLError explicitly); reference the variables/url building around
url, name, version and the request.urlopen/json.load block to locate and update
the code.
In `@ci/scripts/sbom_list.py`:
- Around line 47-48: The docstring is stale: the function signature returns None
and writes its output to output_path, yet the "Returns" section still states
"Path to the generated SBOM list file." Update the function's docstring (the
function with signature "-> None" that writes to output_path) to either remove
the incorrect "Returns" entry or replace it with "Returns: None" and a brief
note that the SBOM list is written to the provided output_path, ensuring the
docstring accurately matches the function behavior.
🧹 Nitpick comments (4)
ci/scripts/license_diff.py (2)
72-73: Redundantiter()calls —groupbyalready returns an iterator.
head_by_nameandbase_by_nameare already iterators (returned byitertools.groupby), so wrapping them initer()is a no-op.Proposed cleanup
- # iterators over the grouped entries - heads: Iterator[tuple[str, Iterator[Package]]] = iter(head_by_name) - bases: Iterator[tuple[str, Iterator[Package]]] = iter(base_by_name) - - # cursors over the grouped entries - current_head: tuple[str, Iterator[Package]] | None = next(heads, None) - current_base: tuple[str, Iterator[Package]] | None = next(bases, None) + # cursors over the grouped entries + current_head: tuple[str, Iterator[Package]] | None = next(head_by_name, None) + current_base: tuple[str, Iterator[Package]] | None = next(base_by_name, None)Then replace
next(heads, None)→next(head_by_name, None)andnext(bases, None)→next(base_by_name, None)on lines 87, 95, 104, 105.
59-60:UPPER_CASEnaming for a local variable is misleading.
FILTERED_PACKAGE_PREFIXESuses module-constant naming but is declared inside a function body. Either move it to module scope as a true constant or rename it tofiltered_package_prefixesper PEP 8 conventions for locals. As per coding guidelines,UPPER_CASEis reserved for constants.ci/scripts/sbom_list.py (2)
71-71: Specifyencoding="utf-8"when writing the TSV.
open(output_path, "w")uses the platform default encoding, which may not be UTF-8 on all systems (e.g., Windows). Package names and license strings can contain non-ASCII characters.Proposed fix
- with open(output_path, "w") as f: + with open(output_path, "w", encoding="utf-8") as f:
21-25: Import grouping nit:argparseis separated from the other stdlib imports by a blank line.PEP 8 groups all standard-library imports together. Move
argparseinto the same block ascsv,tomllib, etc.Proposed fix
-import argparse - -import csv +import argparse +import csv
Signed-off-by: Will Killian <wkillian@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@ci/scripts/license_diff.py`:
- Around line 149-150: The bare "except KeyError: pass" around accesses like
head_pkg["name"] and the variant key lookup should be narrowed and replaced with
explicit handling: limit the try/except to the specific dict access, catch
KeyError and log which package/variant was skipped (include head_pkg or its repr
and the missing key) instead of silently passing, or pre-check keys using "in"
before indexing; update the block around head_pkg["name"]/variant lookup to use
logger.warning (or print) so missing-key incidents are visible and debuggable.
- Around line 63-64: The grouping uses itertools.groupby on head["package"] and
base["package"] which requires sorted input; sort each package list by the
"name" key before calling itertools.groupby to avoid fragmented groups and
incorrect merge-join results. Specifically, replace direct use of
head["package"]/base["package"] with their sorted equivalents (sorted(...,
key=itemgetter("name"))) and then call itertools.groupby to produce head_by_name
and base_by_name so the downstream merge-join logic correctly classifies
added/removed packages.
🧹 Nitpick comments (3)
ci/scripts/license_diff.py (1)
59-60: MoveFILTERED_PACKAGE_PREFIXESto module level.UPPER_CASE names conventionally denote module-level constants. Defining it inside
main()re-creates the list on every call and obscures it from discoverability. Consider moving it to the top of the module after imports.ci/scripts/sbom_list.py (2)
68-68: Specifyencoding="utf-8"when writing the TSV.
open(output_path, "w")uses the platform default encoding, which varies across systems. For a CI artifact, pin to UTF-8 for reproducibility.Proposed fix
- with open(output_path, "w") as f: + with open(output_path, "w", encoding="utf-8") as f:
89-90: Lines exceed the 120-character column limit.Both
add_argumentcalls are over 120 columns. As per coding guidelines, yapf is configured withcolumn_limit = 120.Proposed fix
- parser.add_argument("--uvlock", type=Path, help="Path to the lockfile to process. Defaults to 'uv.lock'.", default="uv.lock") - parser.add_argument("--output", type=Path, help="Path to the output file. Defaults to 'sbom_list.tsv'.", default="sbom_list.tsv") + parser.add_argument("--uvlock", + type=Path, + help="Path to the lockfile to process. Defaults to 'uv.lock'.", + default="uv.lock") + parser.add_argument("--output", + type=Path, + help="Path to the output file. Defaults to 'sbom_list.tsv'.", + default="sbom_list.tsv")
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
mnajafian-nv
left a comment
There was a problem hiding this comment.
LGTM! plus inline nits
|
/merge |
Description
Closes
By Submitting this PR I confirm:
Summary by CodeRabbit