Skip to content

enh(sbom-licenses): refactor common code; enable multi-version diffs#1597

Merged
rapids-bot[bot] merged 6 commits intoNVIDIA:developfrom
willkill07:wkk_enhance-license-tools
Feb 13, 2026
Merged

enh(sbom-licenses): refactor common code; enable multi-version diffs#1597
rapids-bot[bot] merged 6 commits intoNVIDIA:developfrom
willkill07:wkk_enhance-license-tools

Conversation

@willkill07
Copy link
Member

@willkill07 willkill07 commented Feb 12, 2026

Description

  • refactor common code to a single python file
  • enable flags for options / output
  • add missing support for duplicate versions of same package for diff

Closes

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features
    • CLI now accepts configurable input/output paths and an optional base branch (default "develop"); SBOM list is written to the specified output file.
  • Chores
    • Centralized license resolution into a shared utility for consistent license lookup and reporting.
  • Bug Fixes
    • Improved package diffing with name-grouped, filtered comparisons, deduplicated SBOM entries, and clearer added/removed/changed version and license deltas.

Signed-off-by: Will Killian <wkillian@nvidia.com>
@willkill07 willkill07 requested a review from a team as a code owner February 12, 2026 17:17
@willkill07 willkill07 added improvement Improvement to existing functionality non-breaking Non-breaking change labels Feb 12, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Walkthrough

Centralizes PyPI license resolution into ci/scripts/package_utils.py; removes in-file license logic from ci/scripts/license_diff.py and ci/scripts/sbom_list.py; refactors lockfile parsing and diffing to typed Package/UvLock structures, single-pass grouped comparisons, and parameterized SBOM output.

Changes

Cohort / File(s) Summary
New utility
ci/scripts/package_utils.py
Adds Package and UvLock TypedDicts, package_variant_key() helper, and pypi_license(name, version=None) that fetches PyPI metadata and returns a best-effort license string with robust error handling.
License diff
ci/scripts/license_diff.py
Removed the local pypi_license; imports license/type helpers from package_utils. Replaced raw dict diffing with a name-grouped, single-pass comparison (itertools.groupby), added internal-prefix filtering and per-variant logic, and reworked output formatting and CLI (optional base_branch positional, default "develop").
SBOM generator
ci/scripts/sbom_list.py
Removed local pypi_license; imports UvLock and pypi_license. Introduces public SbomEntry TypedDict. process_uvlock() now accepts a UvLock and output_path, deduplicates by (name,version), resolves licenses via pypi_license, writes consolidated SBOM to provided path. CLI updated to accept --uvlock and --output.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise, descriptive, and uses imperative mood, clearly summarizing the main changes: refactoring common license-related code into a shared module and enabling multi-version package diffs.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into develop

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
ci/scripts/sbom_list.py (1)

66-70: Add newline="" and explicit encoding to the open() call for csv.writer.

Per the Python csv module documentation, files opened for writing should specify newline="" to prevent the writer from double-translating line endings on Windows. Adding encoding="utf-8" is also good practice for reproducible output.

Proposed fix
-    with open(output_path, "w") as f:
+    with open(output_path, "w", newline="", encoding="utf-8") as f:

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@ci/scripts/license_diff.py`:
- Around line 116-122: The current logic uses the sets added and removed
directly which leads to nondeterministic pairing and drops packages when the
sets differ in size; change the block so you first convert and sort the
candidates into deterministic lists (e.g., sort removed_list and added_list by a
stable key such as version or package name extracted from base_variants[...] and
head_variants[...]), then pair up to min(len(removed_list), len(added_list)) and
append those pairs to changed_entries, and finally extend added_entries with any
remaining items from added_list and removed_entries with any remaining items
from removed_list; update the code paths that reference added, removed,
base_variants, head_variants, changed_entries, added_entries, and
removed_entries accordingly.

In `@ci/scripts/package_utils.py`:
- Around line 21-28: The TypedDict Package currently declares version: str as
required but runtime code treats it as optional; change the declaration to make
version optional using typing.NotRequired (e.g., replace "version: str" with
"version: typing.NotRequired[str]") and ensure typing.NotRequired is imported;
leave UvLock as-is. Update any callsites (e.g., package_variant_key and usages
in license_diff.py) that rely on pkg.get("version") remain unchanged but now
type-check correctly. Run the type checker to confirm no further TypedDict
errors.
- Line 62: Add a single trailing newline to the end of the file that contains
the statement returning typing.cast(str, min(candidates, key=len,
default="(License not found)")); locate the return expression using the unique
symbols typing.cast and min(candidates, key=len, default="(License not found)")
and ensure the file ends with exactly one newline character (no extra blank
lines).
- Around line 46-51: The URL fetch in package_utils.py uses request.urlopen(url)
without a timeout which can hang CI; update the call to include a reasonable
timeout (e.g., timeout=10) when calling request.urlopen(url, timeout=10) and
ensure exception handling covers timeout/URLError (the existing except
Exception: return "(License not found)" is fine but you can optionally catch
socket.timeout/URLError explicitly); reference the variables/url building around
url, name, version and the request.urlopen/json.load block to locate and update
the code.

In `@ci/scripts/sbom_list.py`:
- Around line 47-48: The docstring is stale: the function signature returns None
and writes its output to output_path, yet the "Returns" section still states
"Path to the generated SBOM list file." Update the function's docstring (the
function with signature "-> None" that writes to output_path) to either remove
the incorrect "Returns" entry or replace it with "Returns: None" and a brief
note that the SBOM list is written to the provided output_path, ensuring the
docstring accurately matches the function behavior.
🧹 Nitpick comments (4)
ci/scripts/license_diff.py (2)

72-73: Redundant iter() calls — groupby already returns an iterator.

head_by_name and base_by_name are already iterators (returned by itertools.groupby), so wrapping them in iter() is a no-op.

Proposed cleanup
-    # iterators over the grouped entries
-    heads: Iterator[tuple[str, Iterator[Package]]] = iter(head_by_name)
-    bases: Iterator[tuple[str, Iterator[Package]]] = iter(base_by_name)
-
-    # cursors over the grouped entries
-    current_head: tuple[str, Iterator[Package]] | None = next(heads, None)
-    current_base: tuple[str, Iterator[Package]] | None = next(bases, None)
+    # cursors over the grouped entries
+    current_head: tuple[str, Iterator[Package]] | None = next(head_by_name, None)
+    current_base: tuple[str, Iterator[Package]] | None = next(base_by_name, None)

Then replace next(heads, None)next(head_by_name, None) and next(bases, None)next(base_by_name, None) on lines 87, 95, 104, 105.


59-60: UPPER_CASE naming for a local variable is misleading.

FILTERED_PACKAGE_PREFIXES uses module-constant naming but is declared inside a function body. Either move it to module scope as a true constant or rename it to filtered_package_prefixes per PEP 8 conventions for locals. As per coding guidelines, UPPER_CASE is reserved for constants.

ci/scripts/sbom_list.py (2)

71-71: Specify encoding="utf-8" when writing the TSV.

open(output_path, "w") uses the platform default encoding, which may not be UTF-8 on all systems (e.g., Windows). Package names and license strings can contain non-ASCII characters.

Proposed fix
-    with open(output_path, "w") as f:
+    with open(output_path, "w", encoding="utf-8") as f:

21-25: Import grouping nit: argparse is separated from the other stdlib imports by a blank line.

PEP 8 groups all standard-library imports together. Move argparse into the same block as csv, tomllib, etc.

Proposed fix
-import argparse
-
-import csv
+import argparse
+import csv

@willkill07 willkill07 self-assigned this Feb 12, 2026
Signed-off-by: Will Killian <wkillian@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@ci/scripts/license_diff.py`:
- Around line 149-150: The bare "except KeyError: pass" around accesses like
head_pkg["name"] and the variant key lookup should be narrowed and replaced with
explicit handling: limit the try/except to the specific dict access, catch
KeyError and log which package/variant was skipped (include head_pkg or its repr
and the missing key) instead of silently passing, or pre-check keys using "in"
before indexing; update the block around head_pkg["name"]/variant lookup to use
logger.warning (or print) so missing-key incidents are visible and debuggable.
- Around line 63-64: The grouping uses itertools.groupby on head["package"] and
base["package"] which requires sorted input; sort each package list by the
"name" key before calling itertools.groupby to avoid fragmented groups and
incorrect merge-join results. Specifically, replace direct use of
head["package"]/base["package"] with their sorted equivalents (sorted(...,
key=itemgetter("name"))) and then call itertools.groupby to produce head_by_name
and base_by_name so the downstream merge-join logic correctly classifies
added/removed packages.
🧹 Nitpick comments (3)
ci/scripts/license_diff.py (1)

59-60: Move FILTERED_PACKAGE_PREFIXES to module level.

UPPER_CASE names conventionally denote module-level constants. Defining it inside main() re-creates the list on every call and obscures it from discoverability. Consider moving it to the top of the module after imports.

ci/scripts/sbom_list.py (2)

68-68: Specify encoding="utf-8" when writing the TSV.

open(output_path, "w") uses the platform default encoding, which varies across systems. For a CI artifact, pin to UTF-8 for reproducibility.

Proposed fix
-    with open(output_path, "w") as f:
+    with open(output_path, "w", encoding="utf-8") as f:

89-90: Lines exceed the 120-character column limit.

Both add_argument calls are over 120 columns. As per coding guidelines, yapf is configured with column_limit = 120.

Proposed fix
-    parser.add_argument("--uvlock", type=Path, help="Path to the lockfile to process. Defaults to 'uv.lock'.", default="uv.lock")
-    parser.add_argument("--output", type=Path, help="Path to the output file. Defaults to 'sbom_list.tsv'.", default="sbom_list.tsv")
+    parser.add_argument("--uvlock",
+                        type=Path,
+                        help="Path to the lockfile to process. Defaults to 'uv.lock'.",
+                        default="uv.lock")
+    parser.add_argument("--output",
+                        type=Path,
+                        help="Path to the output file. Defaults to 'sbom_list.tsv'.",
+                        default="sbom_list.tsv")

Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Copy link
Contributor

@mnajafian-nv mnajafian-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! plus inline nits

@willkill07
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 920d5fc into NVIDIA:develop Feb 13, 2026
17 checks passed
@willkill07 willkill07 deleted the wkk_enhance-license-tools branch February 25, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants