enh(sbom-licenses): refactor common code; enable multi-version diffs by willkill07 · Pull Request #1597 · NVIDIA/NeMo-Agent-Toolkit

willkill07 · 2026-02-12T17:17:05Z

Description

refactor common code to a single python file
enable flags for options / output
add missing support for duplicate versions of same package for diff

Closes

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

New Features
- CLI now accepts configurable input/output paths and an optional base branch (default "develop"); SBOM list is written to the specified output file.
Chores
- Centralized license resolution into a shared utility for consistent license lookup and reporting.
Bug Fixes
- Improved package diffing with name-grouped, filtered comparisons, deduplicated SBOM entries, and clearer added/removed/changed version and license deltas.

Signed-off-by: Will Killian <wkillian@nvidia.com>

coderabbitai · 2026-02-12T17:17:31Z

Walkthrough

Centralizes PyPI license resolution into ci/scripts/package_utils.py; removes in-file license logic from ci/scripts/license_diff.py and ci/scripts/sbom_list.py; refactors lockfile parsing and diffing to typed Package/UvLock structures, single-pass grouped comparisons, and parameterized SBOM output.

Changes

Cohort / File(s)	Summary
New utility `ci/scripts/package_utils.py`	Adds `Package` and `UvLock` TypedDicts, `package_variant_key()` helper, and `pypi_license(name, version=None)` that fetches PyPI metadata and returns a best-effort license string with robust error handling.
License diff `ci/scripts/license_diff.py`	Removed the local `pypi_license`; imports license/type helpers from `package_utils`. Replaced raw dict diffing with a name-grouped, single-pass comparison (itertools.groupby), added internal-prefix filtering and per-variant logic, and reworked output formatting and CLI (optional `base_branch` positional, default "develop").
SBOM generator `ci/scripts/sbom_list.py`	Removed local `pypi_license`; imports `UvLock` and `pypi_license`. Introduces public `SbomEntry` TypedDict. `process_uvlock()` now accepts a `UvLock` and `output_path`, deduplicates by (name,version), resolves licenses via `pypi_license`, writes consolidated SBOM to provided path. CLI updated to accept `--uvlock` and `--output`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is concise, descriptive, and uses imperative mood, clearly summarizing the main changes: refactoring common license-related code into a shared module and enabling multi-version package diffs.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `develop`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments

ci/scripts/sbom_list.py (1)
66-70: Add newline="" and explicit encoding to the open() call for csv.writer.

Per the Python csv module documentation, files opened for writing should specify newline="" to prevent the writer from double-translating line endings on Windows. Adding encoding="utf-8" is also good practice for reproducible output.
Proposed fix
-    with open(output_path, "w") as f:
+    with open(output_path, "w", newline="", encoding="utf-8") as f:

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Fix all issues with AI agents

In `@ci/scripts/license_diff.py`:
- Around line 116-122: The current logic uses the sets added and removed
directly which leads to nondeterministic pairing and drops packages when the
sets differ in size; change the block so you first convert and sort the
candidates into deterministic lists (e.g., sort removed_list and added_list by a
stable key such as version or package name extracted from base_variants[...] and
head_variants[...]), then pair up to min(len(removed_list), len(added_list)) and
append those pairs to changed_entries, and finally extend added_entries with any
remaining items from added_list and removed_entries with any remaining items
from removed_list; update the code paths that reference added, removed,
base_variants, head_variants, changed_entries, added_entries, and
removed_entries accordingly.

In `@ci/scripts/package_utils.py`:
- Around line 21-28: The TypedDict Package currently declares version: str as
required but runtime code treats it as optional; change the declaration to make
version optional using typing.NotRequired (e.g., replace "version: str" with
"version: typing.NotRequired[str]") and ensure typing.NotRequired is imported;
leave UvLock as-is. Update any callsites (e.g., package_variant_key and usages
in license_diff.py) that rely on pkg.get("version") remain unchanged but now
type-check correctly. Run the type checker to confirm no further TypedDict
errors.
- Line 62: Add a single trailing newline to the end of the file that contains
the statement returning typing.cast(str, min(candidates, key=len,
default="(License not found)")); locate the return expression using the unique
symbols typing.cast and min(candidates, key=len, default="(License not found)")
and ensure the file ends with exactly one newline character (no extra blank
lines).
- Around line 46-51: The URL fetch in package_utils.py uses request.urlopen(url)
without a timeout which can hang CI; update the call to include a reasonable
timeout (e.g., timeout=10) when calling request.urlopen(url, timeout=10) and
ensure exception handling covers timeout/URLError (the existing except
Exception: return "(License not found)" is fine but you can optionally catch
socket.timeout/URLError explicitly); reference the variables/url building around
url, name, version and the request.urlopen/json.load block to locate and update
the code.

In `@ci/scripts/sbom_list.py`:
- Around line 47-48: The docstring is stale: the function signature returns None
and writes its output to output_path, yet the "Returns" section still states
"Path to the generated SBOM list file." Update the function's docstring (the
function with signature "-> None" that writes to output_path) to either remove
the incorrect "Returns" entry or replace it with "Returns: None" and a brief
note that the SBOM list is written to the provided output_path, ensuring the
docstring accurately matches the function behavior.

🧹 Nitpick comments (4)

ci/scripts/license_diff.py (2)
72-73: Redundant iter() calls — groupby already returns an iterator.

head_by_name and base_by_name are already iterators (returned by itertools.groupby), so wrapping them in iter() is a no-op.
Proposed cleanup
-    # iterators over the grouped entries
-    heads: Iterator[tuple[str, Iterator[Package]]] = iter(head_by_name)
-    bases: Iterator[tuple[str, Iterator[Package]]] = iter(base_by_name)
-
-    # cursors over the grouped entries
-    current_head: tuple[str, Iterator[Package]] | None = next(heads, None)
-    current_base: tuple[str, Iterator[Package]] | None = next(bases, None)
+    # cursors over the grouped entries
+    current_head: tuple[str, Iterator[Package]] | None = next(head_by_name, None)
+    current_base: tuple[str, Iterator[Package]] | None = next(base_by_name, None)
Then replace next(heads, None) → next(head_by_name, None) and next(bases, None) → next(base_by_name, None) on lines 87, 95, 104, 105.
59-60: UPPER_CASE naming for a local variable is misleading.

FILTERED_PACKAGE_PREFIXES uses module-constant naming but is declared inside a function body. Either move it to module scope as a true constant or rename it to filtered_package_prefixes per PEP 8 conventions for locals. As per coding guidelines, UPPER_CASE is reserved for constants.
ci/scripts/sbom_list.py (2)
71-71: Specify encoding="utf-8" when writing the TSV.

open(output_path, "w") uses the platform default encoding, which may not be UTF-8 on all systems (e.g., Windows). Package names and license strings can contain non-ASCII characters.
Proposed fix
-    with open(output_path, "w") as f:
+    with open(output_path, "w", encoding="utf-8") as f:
21-25: Import grouping nit: argparse is separated from the other stdlib imports by a blank line.

PEP 8 groups all standard-library imports together. Move argparse into the same block as csv, tomllib, etc.
Proposed fix
-import argparse
-
-import csv
+import argparse
+import csv

ci/scripts/license_diff.py

ci/scripts/package_utils.py

ci/scripts/sbom_list.py

Signed-off-by: Will Killian <wkillian@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@ci/scripts/license_diff.py`:
- Around line 149-150: The bare "except KeyError: pass" around accesses like
head_pkg["name"] and the variant key lookup should be narrowed and replaced with
explicit handling: limit the try/except to the specific dict access, catch
KeyError and log which package/variant was skipped (include head_pkg or its repr
and the missing key) instead of silently passing, or pre-check keys using "in"
before indexing; update the block around head_pkg["name"]/variant lookup to use
logger.warning (or print) so missing-key incidents are visible and debuggable.
- Around line 63-64: The grouping uses itertools.groupby on head["package"] and
base["package"] which requires sorted input; sort each package list by the
"name" key before calling itertools.groupby to avoid fragmented groups and
incorrect merge-join results. Specifically, replace direct use of
head["package"]/base["package"] with their sorted equivalents (sorted(...,
key=itemgetter("name"))) and then call itertools.groupby to produce head_by_name
and base_by_name so the downstream merge-join logic correctly classifies
added/removed packages.

🧹 Nitpick comments (3)

ci/scripts/license_diff.py (1)

59-60: Move FILTERED_PACKAGE_PREFIXES to module level.

UPPER_CASE names conventionally denote module-level constants. Defining it inside main() re-creates the list on every call and obscures it from discoverability. Consider moving it to the top of the module after imports.
ci/scripts/sbom_list.py (2)
68-68: Specify encoding="utf-8" when writing the TSV.

open(output_path, "w") uses the platform default encoding, which varies across systems. For a CI artifact, pin to UTF-8 for reproducibility.
Proposed fix
-    with open(output_path, "w") as f:
+    with open(output_path, "w", encoding="utf-8") as f:
89-90: Lines exceed the 120-character column limit.

Both add_argument calls are over 120 columns. As per coding guidelines, yapf is configured with column_limit = 120.
Proposed fix
-    parser.add_argument("--uvlock", type=Path, help="Path to the lockfile to process. Defaults to 'uv.lock'.", default="uv.lock")
-    parser.add_argument("--output", type=Path, help="Path to the output file. Defaults to 'sbom_list.tsv'.", default="sbom_list.tsv")
+    parser.add_argument("--uvlock",
+                        type=Path,
+                        help="Path to the lockfile to process. Defaults to 'uv.lock'.",
+                        default="uv.lock")
+    parser.add_argument("--output",
+                        type=Path,
+                        help="Path to the output file. Defaults to 'sbom_list.tsv'.",
+                        default="sbom_list.tsv")

ci/scripts/license_diff.py

Signed-off-by: Will Killian <wkillian@nvidia.com>

mnajafian-nv

LGTM! plus inline nits

ci/scripts/package_utils.py

ci/scripts/license_diff.py

willkill07 · 2026-02-13T19:56:43Z

/merge

enh(sbom-licenses): refactor common code; enable multi-version diffs

27247ba

Signed-off-by: Will Killian <wkillian@nvidia.com>

willkill07 requested a review from a team as a code owner February 12, 2026 17:17

willkill07 added improvement Improvement to existing functionality non-breaking Non-breaking change labels Feb 12, 2026

Merge branch 'develop' into wkk_enhance-license-tools

8cd167e

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

ci/scripts/license_diff.py Show resolved Hide resolved

ci/scripts/package_utils.py Show resolved Hide resolved

ci/scripts/package_utils.py Show resolved Hide resolved

ci/scripts/package_utils.py Show resolved Hide resolved

ci/scripts/sbom_list.py Outdated Show resolved Hide resolved

Merge branch 'develop' into wkk_enhance-license-tools

3d23b97

willkill07 self-assigned this Feb 12, 2026

Address coderabbit feedback

5f9de2d

Signed-off-by: Will Killian <wkillian@nvidia.com>

coderabbitai bot reviewed Feb 12, 2026

View reviewed changes

ci/scripts/license_diff.py Show resolved Hide resolved

ci/scripts/license_diff.py Show resolved Hide resolved

willkill07 added 2 commits February 13, 2026 09:20

Formatting

22f514b

Signed-off-by: Will Killian <wkillian@nvidia.com>

Formatting

fa90116

Signed-off-by: Will Killian <wkillian@nvidia.com>

mnajafian-nv approved these changes Feb 13, 2026

View reviewed changes

ci/scripts/package_utils.py Show resolved Hide resolved

ci/scripts/license_diff.py Show resolved Hide resolved

ci/scripts/license_diff.py Show resolved Hide resolved

rapids-bot bot merged commit 920d5fc into NVIDIA:develop Feb 13, 2026
17 checks passed

willkill07 deleted the wkk_enhance-license-tools branch February 25, 2026 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh(sbom-licenses): refactor common code; enable multi-version diffs#1597

enh(sbom-licenses): refactor common code; enable multi-version diffs#1597
rapids-bot[bot] merged 6 commits intoNVIDIA:developfrom
willkill07:wkk_enhance-license-tools

willkill07 commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

mnajafian-nv left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willkill07 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

willkill07 commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mnajafian-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willkill07 commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

willkill07 commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading