#30-add boost-library-usage-dashboard app#73
#30-add boost-library-usage-dashboard app#73snowfox1003 merged 23 commits intocppalliance:developfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughIntroduces a new Django app "boost_library_usage_dashboard" that orchestrates end-to-end data collection and rendering of a Boost library usage dashboard. Includes analyzer modules for data gathering, metrics calculation, HTML rendering with Chart.js visualizations, Django models for data representation, a management command for workflow integration, and comprehensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Cmd as Management<br/>Command
participant Analyzer as BoostUsage<br/>DashboardAnalyzer
participant LibData as Library & Metrics<br/>Helpers
participant DB as Database<br/>(ORM Models)
participant Output as Output<br/>Assembly
participant Render as HTML<br/>Renderer
participant FS as File System
Cmd->>Analyzer: __init__(base_dir, output_dir)
Analyzer->>DB: Load repositories & libraries
Analyzer->>DB: Query BoostUsage, BoostExternalRepository
Analyzer->>LibData: Collect library metrics by file usage
LibData->>DB: Aggregate BoostUsage by library/year
LibData-->>Analyzer: metrics_by_library dict
Analyzer->>LibData: Collect dependents & external consumers
LibData->>DB: Query dependency graph
LibData-->>Analyzer: consumer_data dict
Analyzer->>Analyzer: Compute statistics (versions, top libs)
Analyzer-->>Cmd: Return stats dict
Cmd->>Output: Assemble dashboard data
Output->>Analyzer: Fetch filtered libraries, top repos
Output-->>Cmd: Return dashboard_data
Cmd->>FS: Write dashboard_data.json
Cmd->>Cmd: Generate markdown report
Cmd->>FS: Write summary report
Cmd->>Render: render_dashboard_html(output_dir)
Render->>FS: Read dashboard_data.json
Render->>Render: build_index_page(data)
Render->>FS: Write index.html
Render->>Render: build_library_page() for each library
Render->>FS: Write per-library HTML files
Cmd->>Cmd: Success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (1)
boost_library_usage_dashboard/analyzer_metrics.py (1)
132-132: Usestrict=Trueinzip()to enforce sequence alignment.Both
x_valuesandratiosare derived from the same data source (to_last_year_data), so they are guaranteed to have equal length. Addingstrict=Trueprevents silent truncation and resolves Ruff B905. The project targets Python 3.11+, which fully supports this parameter.💡 Proposed fix
- sum_xy = sum(x * y for x, y in zip(x_values, ratios)) + sum_xy = sum(x * y for x, y in zip(x_values, ratios, strict=True))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_metrics.py` at line 132, Replace the plain zip(...) used to compute sum_xy with zip(..., strict=True) to enforce that x_values and ratios have the same length; update the generator expression at sum_xy = sum(x * y for x, y in zip(x_values, ratios)) to use zip(x_values, ratios, strict=True) so sequence misalignment raises an error (this targets the sum_xy calculation in analyzer_metrics.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 247-249: Replace lexical string comparisons with semantic version
comparisons: parse versions using a semantic version parser (e.g.,
packaging.version.parse or distutils.version.LooseVersion) before comparing or
sorting. Specifically, when evaluating if version >= (lib.get("created_version")
or "") use parsed Version objects for both sides; when building chart_data
ensure keys or comparisons use parsed versions; change
table_data.sort(key=lambda x: (x["version"], x["commit_count"]), reverse=True)
to sort by parsed versions (e.g., parse x["version"]) together with
commit_count; and apply the same parsed-version logic in the other
comparison/assignment sites referenced around lines 253-256 (e.g.,
last_updated_version and any chart filtering) so all version comparisons and
orderings use semantic ordering.
- Around line 239-244: The dashboard is emitting raw contributor emails in the
payload at the table_data.append call (see keys "email_address", uses variable
email and person_data); remove or redact PII there instead of returning the full
address—either drop the "email_address" field, replace it with a non-identifying
value (e.g. masked local-part like a*****@domain or only the domain), or store a
deterministic hash/ID if de-duplication is needed; update the same place where
table_data is built (and any downstream serialization) so the dashboard never
receives the cleartext email.
In `@boost_library_usage_dashboard/analyzer_metrics.py`:
- Around line 99-106: The code currently truncates year_data into
to_last_year_data unconditionally, dropping the newest year and skewing
recent_usage, past_usage, and total_usage; change the logic so you only exclude
the latest year when it represents the current incomplete year (e.g., compare
the latest year value in year_data to the current year and check if the data for
that year is partial), otherwise keep all entries in to_last_year_data; update
the creation of to_last_year_data and ensure subsequent sums for recent_usage,
past_usage, and total_usage use that corrected list (referencing the variables
to_last_year_data, year_data, recent_usage, past_usage, total_usage and the
recent_year_threshold).
In `@boost_library_usage_dashboard/dashboard_html_common.py`:
- Around line 76-77: The header cells currently render clickable <th> elements
using the list comprehension that builds f"<th class='sortable'
data-key='{e(key)}'>{e(label)} <span class='sort-indicator'>↕</span></th>" (and
similar code at the other locations); make them keyboard-accessible by moving
the interactive target into a <button type="button"> inside the <th> (e.g.,
f"<th class='sortable' data-key='{e(key)}'><button type='button'
class='sort-button' aria-label='Sort by {e(label)}'>{e(label)} <span
class='sort-indicator'>↕</span></button></th>"), update the event binding so
click handlers (the sort function currently relying on clicks on the <th>) are
attached to the button instead of the th, and ensure aria-sort or aria-pressed
is updated when sorting changes; apply the same change to the other header
generation blocks referenced (the occurrences around the 136-141 and 158-165
code paths) and adjust CSS selectors from .sortable to .sort-button if needed so
styling and indicators continue to work.
In `@boost_library_usage_dashboard/dashboard_html_library.py`:
- Line 175: The contributors table currently exposes raw contributor emails via
the headers mapping using "email_address" and in the other render location that
references the same field; replace any direct use of "email_address" with a
masked or truncated representation (for example compute an "email_masked" value
showing only the first character and domain or just the domain) and update the
headers tuple to use that masked key instead of "email_address"; add a small
helper function (e.g., mask_email(email)) and call it wherever contributor
email_address is rendered or exported so both the headers list and the other
occurrence no longer output full addresses.
- Around line 184-225: The injected JSON and interpolated template values are
unescaped causing XSS; replace the direct json.dumps(...) used to build
headersRows, internalRows, externalRows, contribRows, depVersions, depFirst,
depAll, usageYears, byCreated, byUpdated, repoCounts, commitVersions and
commitCounts with a safe serializer (e.g. _json_for_script(...)) and wrap any
interpolated values in the rowHtml template (r.name, r.stars, r.usage_count,
r.created_at, r.updated_at, r.version, r.person, r.email_address,
r.commit_count, r.depth) with the existing HTML-escape helper e()/esc(...) so
that initDataTable rowHtml strings emit escaped content before being inserted
via innerHTML; keep initDataTable and rowHtml structure but call the escape
helper for every user-sourced insertion and switch JSON serialization to the
safe helper.
In `@boost_library_usage_dashboard/dashboard_html.py`:
- Around line 21-25: Ensure the output directory exists before calling
build_index_page: check/create output_dir (using output_dir.mkdir(parents=True,
exist_ok=True)) prior to reading dashboard_data_file and invoking
build_index_page; update the block that sets lib_dir and then calls
build_index_page so output_dir is created when missing (referencing variables
lib_dir, output_dir, dashboard_data_file and the function build_index_page).
In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`:
- Around line 77-80: Replace the ValueError raised in the publish argument
validation with Django's CommandError: in the run_boost_library_usage_dashboard
Command (the block checking if options["publish"] and using target_repo
variable), raise CommandError("--publish requires --target-repo") instead of
ValueError; also update the corresponding unit test in
boost_library_usage_dashboard/tests/test_command.py to expect CommandError
(replace ValueError expectation with CommandError) so the test asserts the
management command raises the correct exception type.
In `@boost_library_usage_dashboard/tests/test_analyzer.py`:
- Around line 11-12: Replace hardcoded /tmp paths by using a secure temporary
directory mechanism: change the test so that analyzer.base_dir and
analyzer.output_dir are set from a pytest tmp_path (or a
tempfile.TemporaryDirectory()/tempfile.mkdtemp()) instead of Path("/tmp/...");
update the assignment locations where analyzer.base_dir and analyzer.output_dir
are set in tests/test_analyzer.py to derive their Path values from the tmp_path
fixture (or the created secure temp dir) so Ruff S108 is satisfied.
In `@boost_library_usage_dashboard/utils.py`:
- Line 5: The normalization preserves leading/trailing whitespace; change the
initialization of version to strip whitespace first by using (version_str or
"").strip() before the .replace calls so the line becomes: version =
(version_str or "").strip().replace("-", ".").replace("_", "."); update any
usages expecting version to ensure they still work with the trimmed value (refer
to the variable name "version" and the input "version_str").
---
Nitpick comments:
In `@boost_library_usage_dashboard/analyzer_metrics.py`:
- Line 132: Replace the plain zip(...) used to compute sum_xy with zip(...,
strict=True) to enforce that x_values and ratios have the same length; update
the generator expression at sum_xy = sum(x * y for x, y in zip(x_values,
ratios)) to use zip(x_values, ratios, strict=True) so sequence misalignment
raises an error (this targets the sum_xy calculation in analyzer_metrics.py).
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
boost_library_usage_dashboard/__init__.pyboost_library_usage_dashboard/analyzer.pyboost_library_usage_dashboard/analyzer_libraries.pyboost_library_usage_dashboard/analyzer_metrics.pyboost_library_usage_dashboard/analyzer_output.pyboost_library_usage_dashboard/apps.pyboost_library_usage_dashboard/dashboard_html.pyboost_library_usage_dashboard/dashboard_html_common.pyboost_library_usage_dashboard/dashboard_html_index.pyboost_library_usage_dashboard/dashboard_html_library.pyboost_library_usage_dashboard/management/__init__.pyboost_library_usage_dashboard/management/commands/__init__.pyboost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.pyboost_library_usage_dashboard/models.pyboost_library_usage_dashboard/renderer.pyboost_library_usage_dashboard/report.pyboost_library_usage_dashboard/tests/__init__.pyboost_library_usage_dashboard/tests/fixtures.pyboost_library_usage_dashboard/tests/test_analyzer.pyboost_library_usage_dashboard/tests/test_command.pyboost_library_usage_dashboard/tests/test_models.pyboost_library_usage_dashboard/tests/test_renderer.pyboost_library_usage_dashboard/tests/test_report.pyboost_library_usage_dashboard/tests/test_utils.pyboost_library_usage_dashboard/utils.pyconfig/settings.pyconftest.pyworkflow/management/commands/run_all_collectors.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
boost_library_usage_dashboard/analyzer.py (1)
50-52:⚠️ Potential issue | 🟠 MajorUse semantic version ordering consistently across analyzer paths.
Line 50, Line 136, and Line 415 still rely on lexical ordering of version strings. That can select wrong “latest/created” versions and misorder version-sorted outputs.
🛠️ Patch sketch
-from django.db.models import Count, Min +from django.db.models import Count @@ -from .utils import format_percent +from .utils import _version_tuple, format_percent @@ - self.version_info = list( - BoostVersion.objects.all().order_by("version") - ) # pylint: disable=no-member + self.version_info = sorted( + BoostVersion.objects.all(), # pylint: disable=no-member + key=lambda v: _version_tuple(v.version), + ) @@ - created_versions = { - row["library_id"]: row["version__version"] - for row in BoostLibraryVersion.objects.values( - "library_id" - ).annotate( # pylint: disable=no-member - version__version=Min("version__version") - ) - } + created_versions: dict[int, str] = {} + for row in BoostLibraryVersion.objects.values( # pylint: disable=no-member + "library_id", "version__version" + ): + lib_id = row["library_id"] + version = row["version__version"] or "" + prev = created_versions.get(lib_id) + if prev is None or _version_tuple(version) < _version_tuple(prev): + created_versions[lib_id] = version @@ - for row in ( - BoostLibraryVersion.objects.exclude( - description="" - ) # pylint: disable=no-member - .values("library_id", "description", "version__version") - .order_by("library_id", "-version__version") - ): - desc_map.setdefault(row["library_id"], row["description"]) + latest_desc_version: dict[int, str] = {} + for row in ( + BoostLibraryVersion.objects.exclude(description="") # pylint: disable=no-member + .values("library_id", "description", "version__version") + .order_by("library_id") + ): + lib_id = row["library_id"] + version = row["version__version"] or "" + prev = latest_desc_version.get(lib_id) + if prev is None or _version_tuple(version) > _version_tuple(prev): + latest_desc_version[lib_id] = version + desc_map[lib_id] = row["description"] @@ - if sort_field: - ret_data.sort( - key=lambda x: x.get(sort_field) or 0, - reverse=sort_order.lower() == "desc", - ) + if sort_field: + def _sort_key(item: dict[str, Any]) -> Any: + value = item.get(sort_field) + if "version" in sort_field: + return _version_tuple(value or "") + return value or 0 + ret_data.sort( + key=_sort_key, + reverse=sort_order.lower() == "desc", + )#!/bin/bash # Verify lexical-version sites still present in analyzer.py rg -nP 'order_by\("version"\)|Min\("version__version"\)|-version__version|key=lambda x: x.get\(sort_field\) or 0' boost_library_usage_dashboard/analyzer.py # Demonstrate why lexical ordering is unsafe for versions. python - <<'PY' versions = ["boost-1.9.0", "boost-1.10.0", "boost-1.84.0"] print("lexical :", sorted(versions)) def vtuple(v): parts = v.replace("boost-", "").split(".") nums = [int("".join(ch for ch in p if ch.isdigit()) or 0) for p in parts[:3]] while len(nums) < 3: nums.append(0) return tuple(nums) print("semantic:", sorted(versions, key=vtuple)) PYAlso applies to: 136-151, 395-417
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer.py` around lines 50 - 52, Several places (the self.version_info assignment, any Min("version__version") usages, and the custom key=lambda x: x.get(sort_field) sort) rely on lexical ordering of version strings; replace these with semantic version ordering by using packaging.version.parse (from packaging import version) to compare versions. Concretely: import packaging.version and after fetching querysets (e.g., BoostVersion.objects.all()) convert to list and sort with list.sort(key=lambda v: version.parse(v.version) or version.parse(v.version.version) as appropriate), replace Min("version__version")/Max-style aggregations by computing min/max on the list of parsed versions instead of DB lexical aggregates, and update any key=lambda x: x.get(sort_field) to parse that field with packaging.version.parse so all comparisons/select-latest logic (including in methods that populate self.version_info and the code paths around the earlier Min/Max usages) use semantic ordering consistently.
🧹 Nitpick comments (10)
boost_library_usage_dashboard/models.py (1)
1-57: Enforce read-only intent at the model layer.These models are described as read-only mappings, but writes are still technically possible. Adding a small guard prevents accidental mutation of external-owned tables.
♻️ Proposed safeguard
from django.db import models -class BoostExternalRepository(models.Model): +class _ReadOnlyExternalModel(models.Model): + class Meta: + abstract = True + + def save(self, *args, **kwargs): + raise RuntimeError(f"{self.__class__.__name__} is read-only.") + + def delete(self, *args, **kwargs): + raise RuntimeError(f"{self.__class__.__name__} is read-only.") + + +class BoostExternalRepository(_ReadOnlyExternalModel): @@ -class BoostUsage(models.Model): +class BoostUsage(_ReadOnlyExternalModel):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/models.py` around lines 1 - 57, Add explicit read-only guards to prevent accidental writes to these unmanaged mappings: implement and override save() and delete() on both BoostExternalRepository and BoostUsage (and optionally bulk methods like update/refresh_from_db if present) to raise a clear exception (e.g., RuntimeError or PermissionDenied) indicating the model is read-only; keep Meta.managed = False as-is and ensure the exception message names the class (BoostExternalRepository or BoostUsage) so callers know which model is protected.boost_library_usage_dashboard/tests/test_command.py (1)
44-52: Assert renderer call arguments, not only call count.Right now the test would still pass if
render_dashboard_htmlis called with the wrong path. Verifying propagatedoutput_dirwill catch command wiring regressions.🧪 Proposed test hardening
write_report.assert_called_once_with( fake_analyzer.report_file, {"total_repositories": 0}, stars_min_threshold=10, ) render_html.assert_called_once() + args, kwargs = render_html.call_args + called_output_dir = kwargs.get("output_dir", args[1] if len(args) > 1 else None) + assert called_output_dir == Path(tmp_path)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/tests/test_command.py` around lines 44 - 52, The test currently only checks that render_html was called once but not that it received the correct output location; update the test to assert the exact call arguments for render_html (e.g., that it was invoked with the expected output_dir or HTML path produced by render_dashboard_html), so the command wiring is validated. Locate the existing assertions around analyzer_cls, fake_analyzer.run, write_report and render_html in the test and replace or augment the render_html.assert_called_once() with a render_html.assert_called_once_with(...) check that uses the test's output_dir/expected HTML filename and any other expected params.boost_library_usage_dashboard/analyzer_metrics.py (2)
136-136: Consider addingstrict=Truetozip()for defensive coding.While
x_valuesandratiosare derived from the same source and are guaranteed to have equal length, addingstrict=Truewould catch any future refactoring that breaks this invariant.Suggested change
- sum_xy = sum(x * y for x, y in zip(x_values, ratios)) + sum_xy = sum(x * y for x, y in zip(x_values, ratios, strict=True))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_metrics.py` at line 136, The zip call used to compute sum_xy (sum_xy = sum(x * y for x, y in zip(x_values, ratios))) should be made defensive by adding strict=True to the zip invocation to ensure x_values and ratios are the same length at runtime; update the generator expression to use zip(x_values, ratios, strict=True) in the sum_xy computation (and run tests or ensure the project targets Python 3.10+ where strict is supported).
22-35: Large iterator query without batching could cause memory pressure under extreme data volumes.The query uses
.iterator()which is good, but iterates over all matchingBoostUsagerows in a single pass while accumulating results in memory (library_year_data,library_top_repo,library_headers). For very large datasets, consider whether chunking or streaming aggregation is needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_metrics.py` around lines 22 - 35, The large BoostUsage query assigned to rows is iterated while accumulating library_year_data, library_top_repo, and library_headers, which can cause memory pressure for huge datasets; update the loop to process results in bounded batches (e.g., use BoostUsage.objects.filter(...).iterator(chunk_size=...) or implement manual chunking with .order_by(...).values_list('pk', flat=True) and slicing) and ensure each batch updates library_year_data, library_top_repo, and library_headers incrementally and releases references before loading the next batch so memory usage stays bounded; reference the rows variable, BoostUsage.select_related(...), analyzer.stars_min_threshold, and the accumulator structures when making this change.boost_library_usage_dashboard/utils.py (2)
18-26: Consider stripping before removing the "boost-" prefix.If input has leading whitespace like
" boost-1.82", the"boost-"prefix won't be removed because.strip()is applied after the replacement on line 21.Suggested fix
def normalize_version_str(version_str: str) -> str | None: - - version = (version_str or "").replace("boost-", "") - version = version.strip().replace("-", ".").replace("_", ".") + version = (version_str or "").strip().replace("boost-", "") + version = version.replace("-", ".").replace("_", ".")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/utils.py` around lines 18 - 26, normalize_version_str currently calls .replace("boost-", "") before .strip(), so inputs with leading/trailing whitespace like " boost-1.82" won’t have the "boost-" prefix removed; update normalize_version_str to strip whitespace first (e.g., call .strip() on version_str or its fallback before removing the "boost-" prefix), then perform the "boost-" removal and the subsequent replacements and checks so that prefixed versions with surrounding spaces are normalized correctly.
4-15: Minor: Redundant null check on line 8.Line 6-7 already returns
(0, 0, 0)whenversionis falsy, so the(version or "")on line 8 is unnecessary.Suggested simplification
if not version: return (0, 0, 0) - parts = (version or "").strip().split(".") + parts = version.strip().split(".")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/utils.py` around lines 4 - 15, In _version_tuple, remove the redundant (version or "") guard: since the function already returns on falsy version, directly call version.strip().split(".") to parse parts; update the parts assignment in the _version_tuple function and leave the rest of the logic (digit extraction, padding to three components, and returning tuple) unchanged.boost_library_usage_dashboard/analyzer_output.py (2)
80-96: Protected member access is acknowledged but consider refactoring.The
_filter_and_sort_librariesmethod is called with protected access (noqa: SLF001). If this method is needed by multiple modules, consider making it public or extracting it to a shared utility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_output.py` around lines 80 - 96, The code calls the protected method analyzer._filter_and_sort_libraries to build dashboard_data["metrics_by_library"]; make this call use a public API or shared utility instead: either rename and expose _filter_and_sort_libraries as filter_and_sort_libraries on the Analyzer class and update callers (including this call), or extract the filtering/sorting logic into a new utility function (e.g., lib_metrics.filter_and_sort_libraries) and import/use that here; update any tests/imports accordingly and remove the noqa/disable comments once the protected access is eliminated.
62-70: Validate row structure more defensively.The code assumes
rowis a list/tuple with at least 4 elements, but ifdistribution_by_versioncontains differently structured data (e.g., dicts), the code would silently skip them. Consider logging skipped rows for debugging.Suggested enhancement
for row in version_counts: if not isinstance(row, (list, tuple)) or len(row) < 4: + # Log unexpected row format for debugging continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_output.py` around lines 62 - 70, The loop over version_counts in analyzer_output.py is too permissive and silently skips malformed rows; update the logic in the block that references version_counts, _version_tuple, repos_by_version_rows and min_key to validate and handle unexpected row types: if row is a dict, extract keys like "version", "created_at", "confirmed", "guessed"; if it is a list/tuple check length>=4 as today; for any skipped or malformed row emit a debug/warn log that includes the raw row content and reason for skipping so issues can be traced; keep the existing behavior of appending (version, created_at, confirmed+guessed) when values are valid and comparable to min_key.boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py (1)
94-96: Consider atomic publish: clone to temp directory first.If
clone_repo()fails aftershutil.rmtree(publish_root), the publish directory is lost. Consider cloning to a temporary directory first, then replacing the target only on success.Suggested approach
+ import tempfile + publish_root = ( get_workspace_path("shared") / "boost_library_usage_dashboard_publish" ) - if publish_root.exists(): - shutil.rmtree(publish_root) - clone_repo(target_repo, publish_root) + # Clone to temp first to avoid losing existing state on failure + with tempfile.TemporaryDirectory() as tmp: + tmp_clone = Path(tmp) / "clone" + clone_repo(target_repo, tmp_clone) + if publish_root.exists(): + shutil.rmtree(publish_root) + shutil.move(str(tmp_clone), str(publish_root))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py` around lines 94 - 96, The current flow removes publish_root then calls clone_repo(target_repo, publish_root), which can leave the publish directory deleted if cloning fails; change the logic to clone into a temporary directory (e.g., create a temp path next to publish_root), call clone_repo(target_repo, temp_dir), verify success, then atomically replace publish_root with temp_dir (use shutil.move or os.replace for atomic rename on the same filesystem), and ensure proper cleanup of temp_dir on failure; update references in this file to use publish_root and clone_repo accordingly and wrap the sequence in try/except/finally to avoid leaking temp directories.boost_library_usage_dashboard/analyzer_libraries.py (1)
134-136: Consider indexing release dates for faster commit→version lookup.This path does a full
version_infoscan per row. Pre-sorting release timestamps once and using binary search will reduce hot-path cost on large histories.⚡ Optional optimization sketch
+from bisect import bisect_right + +def _build_release_index(version_info: list[Any]) -> list[tuple[datetime, str]]: + return sorted( + [ + (v.version_created_at, v.version) + for v in version_info + if v.version_created_at is not None + ], + key=lambda x: x[0], + ) + +def _first_release_after( + release_index: list[tuple[datetime, str]], commit_at: datetime +) -> str | None: + pos = bisect_right(release_index, (commit_at, "\uffff")) + return release_index[pos][1] if pos < len(release_index) else NoneAlso applies to: 170-183
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_libraries.py` around lines 134 - 136, The current per-row scan using analyzer.version_info in get_first_version_released_after is causing repeated full scans; precompute and cache a sorted list/array of release timestamps (and parallel version refs) for each analyzer (e.g., build_release_timestamps or a cached attribute on Analyzer) and replace the linear scan with a binary search (bisect) lookup using commit_dt to find the first release after the commit; update calls (the lookup at commit_dt and the similar block around the other occurrence) to use the new precomputed index or change get_first_version_released_after to accept the pre-sorted timestamps argument so the hot path uses O(log n) instead of O(n).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@boost_library_usage_dashboard/dashboard_html_index.py`:
- Around line 204-219: The JS context is being populated with raw
json.dumps(...) which doesn't escape characters that can break out of <script>,
so replace all json.dumps(...) uses (the constants yearLabels, yearCounts,
yearCum, versionLabels, versionCounts, versionCum, boostLabels, boostOver10,
nonBoostOver10, boostRate, languageData, allLanguages, tableLibraryOverview,
tableTopStars, tableTopUsage, tableTopCreated) with json_for_script(...) and add
the corresponding import for json_for_script from the same helper module used by
dashboard_html_library.py so the values are properly escaped for inlined
scripts.
- Around line 282-286: The templates rowHtml and topRow are inserting
user-controlled fields (e.g., r.name, r.created_version, r.safe_name,
r.repo_name, r.stars, r.usage_count, r.created_at, r.activity_score,
r.average_stars) directly into HTML; update both functions to apply the same
esc() escaping used elsewhere: wrap every interpolated value with esc(...)
(including formatted numeric/string results like toNumber(...).toLocaleString(),
Number(...).toFixed(3), and (r.created_at || '').toString().slice(...)) and
escape values used in URLs (e.g., esc(r.safe_name) for libraries/${...}.html and
esc(r.repo_name) for the GitHub href) so no external data is inserted unescaped.
---
Duplicate comments:
In `@boost_library_usage_dashboard/analyzer.py`:
- Around line 50-52: Several places (the self.version_info assignment, any
Min("version__version") usages, and the custom key=lambda x: x.get(sort_field)
sort) rely on lexical ordering of version strings; replace these with semantic
version ordering by using packaging.version.parse (from packaging import
version) to compare versions. Concretely: import packaging.version and after
fetching querysets (e.g., BoostVersion.objects.all()) convert to list and sort
with list.sort(key=lambda v: version.parse(v.version) or
version.parse(v.version.version) as appropriate), replace
Min("version__version")/Max-style aggregations by computing min/max on the list
of parsed versions instead of DB lexical aggregates, and update any key=lambda
x: x.get(sort_field) to parse that field with packaging.version.parse so all
comparisons/select-latest logic (including in methods that populate
self.version_info and the code paths around the earlier Min/Max usages) use
semantic ordering consistently.
---
Nitpick comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 134-136: The current per-row scan using analyzer.version_info in
get_first_version_released_after is causing repeated full scans; precompute and
cache a sorted list/array of release timestamps (and parallel version refs) for
each analyzer (e.g., build_release_timestamps or a cached attribute on Analyzer)
and replace the linear scan with a binary search (bisect) lookup using commit_dt
to find the first release after the commit; update calls (the lookup at
commit_dt and the similar block around the other occurrence) to use the new
precomputed index or change get_first_version_released_after to accept the
pre-sorted timestamps argument so the hot path uses O(log n) instead of O(n).
In `@boost_library_usage_dashboard/analyzer_metrics.py`:
- Line 136: The zip call used to compute sum_xy (sum_xy = sum(x * y for x, y in
zip(x_values, ratios))) should be made defensive by adding strict=True to the
zip invocation to ensure x_values and ratios are the same length at runtime;
update the generator expression to use zip(x_values, ratios, strict=True) in the
sum_xy computation (and run tests or ensure the project targets Python 3.10+
where strict is supported).
- Around line 22-35: The large BoostUsage query assigned to rows is iterated
while accumulating library_year_data, library_top_repo, and library_headers,
which can cause memory pressure for huge datasets; update the loop to process
results in bounded batches (e.g., use
BoostUsage.objects.filter(...).iterator(chunk_size=...) or implement manual
chunking with .order_by(...).values_list('pk', flat=True) and slicing) and
ensure each batch updates library_year_data, library_top_repo, and
library_headers incrementally and releases references before loading the next
batch so memory usage stays bounded; reference the rows variable,
BoostUsage.select_related(...), analyzer.stars_min_threshold, and the
accumulator structures when making this change.
In `@boost_library_usage_dashboard/analyzer_output.py`:
- Around line 80-96: The code calls the protected method
analyzer._filter_and_sort_libraries to build
dashboard_data["metrics_by_library"]; make this call use a public API or shared
utility instead: either rename and expose _filter_and_sort_libraries as
filter_and_sort_libraries on the Analyzer class and update callers (including
this call), or extract the filtering/sorting logic into a new utility function
(e.g., lib_metrics.filter_and_sort_libraries) and import/use that here; update
any tests/imports accordingly and remove the noqa/disable comments once the
protected access is eliminated.
- Around line 62-70: The loop over version_counts in analyzer_output.py is too
permissive and silently skips malformed rows; update the logic in the block that
references version_counts, _version_tuple, repos_by_version_rows and min_key to
validate and handle unexpected row types: if row is a dict, extract keys like
"version", "created_at", "confirmed", "guessed"; if it is a list/tuple check
length>=4 as today; for any skipped or malformed row emit a debug/warn log that
includes the raw row content and reason for skipping so issues can be traced;
keep the existing behavior of appending (version, created_at, confirmed+guessed)
when values are valid and comparable to min_key.
In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`:
- Around line 94-96: The current flow removes publish_root then calls
clone_repo(target_repo, publish_root), which can leave the publish directory
deleted if cloning fails; change the logic to clone into a temporary directory
(e.g., create a temp path next to publish_root), call clone_repo(target_repo,
temp_dir), verify success, then atomically replace publish_root with temp_dir
(use shutil.move or os.replace for atomic rename on the same filesystem), and
ensure proper cleanup of temp_dir on failure; update references in this file to
use publish_root and clone_repo accordingly and wrap the sequence in
try/except/finally to avoid leaking temp directories.
In `@boost_library_usage_dashboard/models.py`:
- Around line 1-57: Add explicit read-only guards to prevent accidental writes
to these unmanaged mappings: implement and override save() and delete() on both
BoostExternalRepository and BoostUsage (and optionally bulk methods like
update/refresh_from_db if present) to raise a clear exception (e.g.,
RuntimeError or PermissionDenied) indicating the model is read-only; keep
Meta.managed = False as-is and ensure the exception message names the class
(BoostExternalRepository or BoostUsage) so callers know which model is
protected.
In `@boost_library_usage_dashboard/tests/test_command.py`:
- Around line 44-52: The test currently only checks that render_html was called
once but not that it received the correct output location; update the test to
assert the exact call arguments for render_html (e.g., that it was invoked with
the expected output_dir or HTML path produced by render_dashboard_html), so the
command wiring is validated. Locate the existing assertions around analyzer_cls,
fake_analyzer.run, write_report and render_html in the test and replace or
augment the render_html.assert_called_once() with a
render_html.assert_called_once_with(...) check that uses the test's
output_dir/expected HTML filename and any other expected params.
In `@boost_library_usage_dashboard/utils.py`:
- Around line 18-26: normalize_version_str currently calls .replace("boost-",
"") before .strip(), so inputs with leading/trailing whitespace like "
boost-1.82" won’t have the "boost-" prefix removed; update normalize_version_str
to strip whitespace first (e.g., call .strip() on version_str or its fallback
before removing the "boost-" prefix), then perform the "boost-" removal and the
subsequent replacements and checks so that prefixed versions with surrounding
spaces are normalized correctly.
- Around line 4-15: In _version_tuple, remove the redundant (version or "")
guard: since the function already returns on falsy version, directly call
version.strip().split(".") to parse parts; update the parts assignment in the
_version_tuple function and leave the rest of the logic (digit extraction,
padding to three components, and returning tuple) unchanged.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
boost_library_usage_dashboard/__init__.pyboost_library_usage_dashboard/analyzer.pyboost_library_usage_dashboard/analyzer_libraries.pyboost_library_usage_dashboard/analyzer_metrics.pyboost_library_usage_dashboard/analyzer_output.pyboost_library_usage_dashboard/apps.pyboost_library_usage_dashboard/dashboard_html.pyboost_library_usage_dashboard/dashboard_html_common.pyboost_library_usage_dashboard/dashboard_html_index.pyboost_library_usage_dashboard/dashboard_html_library.pyboost_library_usage_dashboard/management/__init__.pyboost_library_usage_dashboard/management/commands/__init__.pyboost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.pyboost_library_usage_dashboard/models.pyboost_library_usage_dashboard/renderer.pyboost_library_usage_dashboard/report.pyboost_library_usage_dashboard/tests/__init__.pyboost_library_usage_dashboard/tests/fixtures.pyboost_library_usage_dashboard/tests/test_analyzer.pyboost_library_usage_dashboard/tests/test_command.pyboost_library_usage_dashboard/tests/test_models.pyboost_library_usage_dashboard/tests/test_renderer.pyboost_library_usage_dashboard/tests/test_report.pyboost_library_usage_dashboard/tests/test_utils.pyboost_library_usage_dashboard/utils.pyconfig/settings.pyconftest.pyworkflow/management/commands/run_all_collectors.py
|
@coderabbitai continue to review |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (7)
boost_library_usage_dashboard/dashboard_html.py (1)
27-28: Optional: Remove redundant fallback.The
or {}is redundant sincedata.get("libraries_page_data", {})already returns{}when the key is missing.♻️ Suggested simplification
- for library_name in data.get("libraries_page_data", {}) or {}: + for library_name in data.get("libraries_page_data", {}):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/dashboard_html.py` around lines 27 - 28, The for-loop uses a redundant fallback expression; replace the iteration target `data.get("libraries_page_data", {}) or {}` with just `data.get("libraries_page_data", {})` in the loop that calls `build_library_page(data, library_name, lib_dir)` so the default empty dict from get already handles the missing key and removes the unnecessary `or {}`.boost_library_usage_dashboard/tests/test_analyzer.py (1)
10-19: Consider using pytest'stmp_pathfixture instead.While
tempfile.gettempdir()resolves the security warning, using pytest'stmp_pathfixture would provide automatic cleanup and isolation between test runs.♻️ Alternative approach using tmp_path
-def _make_analyzer() -> BoostUsageDashboardAnalyzer: +def _make_analyzer(tmp_path: Path | None = None) -> BoostUsageDashboardAnalyzer: analyzer = BoostUsageDashboardAnalyzer.__new__(BoostUsageDashboardAnalyzer) - analyzer.base_dir = Path(tempfile.gettempdir()) / "boost-dashboard-test-base" - analyzer.output_dir = Path(tempfile.gettempdir()) / "boost-dashboard-test-output" + base = tmp_path if tmp_path else Path(tempfile.gettempdir()) + analyzer.base_dir = base / "boost-dashboard-test-base" + analyzer.output_dir = base / "boost-dashboard-test-output" analyzer.version_name_list = ["1.50.0", "1.51.0", "1.52.0", "1.53.0", "1.54.0"]Then update tests that need actual directories to pass
tmp_path.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/tests/test_analyzer.py` around lines 10 - 19, Replace the hardcoded tempfile.gettempdir() usage in the _make_analyzer helper with pytest's tmp_path fixture: change _make_analyzer to accept a tmp_path parameter (or create a wrapper that calls it) and set analyzer.base_dir and analyzer.output_dir to subpaths of tmp_path (e.g., tmp_path / "boost-dashboard-test-base" and tmp_path / "boost-dashboard-test-output"); update any tests that call _make_analyzer to pass the tmp_path fixture so each test gets isolated, auto-cleaned directories for BoostUsageDashboardAnalyzer instances, and avoid global tempdir usage.boost_library_usage_dashboard/tests/test_command.py (1)
52-56: Minor: Simplify path handling.The
Path(str(tmp_path)).resolve()conversion is redundant sincetmp_pathis already apathlib.Path. Similarly at line 112,Path(tmp_path)is unnecessary.♻️ Suggested simplification
- expected_output_dir = Path(str(tmp_path)).resolve() + expected_output_dir = tmp_path.resolve()And at line 112:
- output_dir=Path(tmp_path), + output_dir=tmp_path,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/tests/test_command.py` around lines 52 - 56, The test unnecessarily wraps tmp_path in Path(...) twice; replace Path(str(tmp_path)).resolve() with tmp_path.resolve() (or tmp_path if resolve not needed) when setting expected_output_dir, and replace Path(tmp_path) at the other occurrence (around line 112) with tmp_path directly; update assertions that call render_html(...) to pass the resulting tmp_path/tmp_path.resolve() and keep settings.BASE_DIR as-is so render_html.assert_called_once_with(base_dir=settings.BASE_DIR, output_dir=expected_output_dir) still matches.boost_library_usage_dashboard/analyzer.py (1)
133-135: Prefer lazy%formatting for logger calls.Using f-strings in logger calls evaluates the string even when the log level is disabled. Use
%formatting for lazy evaluation.Proposed fix
logger.debug( - f"Loaded {len(libs)} libraries. Latest version: {latest_version_id}" + "Loaded %d libraries. Latest version: %s", len(libs), latest_version_id )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer.py` around lines 133 - 135, Replace the f-string in the logger.debug call with lazy %-style formatting so the message isn't eagerly evaluated; locate the logger.debug call that references libs and latest_version_id (the line calling logger.debug(f"Loaded {len(libs)} libraries. Latest version: {latest_version_id}")) and change it to use a format string with % placeholders and pass len(libs) and latest_version_id as arguments to logger.debug.boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py (1)
127-132: Consider adding error handling for git push failures.If the push fails (e.g., due to network issues or permission problems), the error message from
subprocess.run(..., check=True)may not be user-friendly. Thepush()helper fromgit_opsis used which should handle this, but the commit step before it doesn't capture output for diagnostics.Proposed improvement for better error messages
commit_message = "Update Boost library usage dashboard artifacts" - subprocess.run( - ["git", "-C", str(publish_root), "commit", "-m", commit_message], - check=True, - ) + try: + subprocess.run( + ["git", "-C", str(publish_root), "commit", "-m", commit_message], + check=True, + capture_output=True, + text=True, + ) + except subprocess.CalledProcessError as e: + raise CommandError(f"Git commit failed: {e.stderr}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py` around lines 127 - 132, The commit call using subprocess.run(["git", "-C", str(publish_root), "commit", "-m", commit_message], check=True) lacks error handling and doesn't capture stdout/stderr; wrap this in a try/except for subprocess.CalledProcessError (or run with capture_output=True, text=True and check=False) to collect stdout/stderr, then log or include those details in the error message and surface a clear message before calling push(publish_root, branch=branch) (use processLogger or the existing logging mechanism). Ensure the handler references commit_message, publish_root, and branch and either exits/raises after logging the detailed error so push() isn’t called on a failed commit.boost_library_usage_dashboard/utils.py (1)
4-15: Type annotation mismatch on return value.The function is annotated to return
tuple[int, int, int], buttuple(out[:3])returns a generictuple[int, ...]. This causes a type checker warning.Proposed fix
- return tuple(out[:3]) + return (out[0], out[1], out[2])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/utils.py` around lines 4 - 15, The type checker warns because tuple(out[:3]) yields a generic tuple type; update _version_tuple to return an explicit 3-tuple to match the annotation: replace the final return tuple(out[:3]) with an explicit construction like (out[0], out[1], out[2]) (or use typing.cast(tuple[int,int,int], tuple(out[:3])) if you prefer a cast); this ensures the return value from _version_tuple is statically typed as tuple[int,int,int].boost_library_usage_dashboard/models.py (1)
51-51: The field nameexcepted_atis intentional, not a typo. It correctly refers to marking records as "excepted" (excluded/marked as exception). This is confirmed by dedicated test coverage (test_boost_usage_excepted_at_accepts_valid_date) and consistent usage across the analyzer modules filtering forexcepted_at__isnull=True.However, "excepted" is an uncommon word choice that may confuse developers unfamiliar with the codebase. Consider adding a docstring to the model field or renaming to
excluded_atorexception_marked_atfor improved clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/models.py` at line 51, The field name excepted_at is intentionally used to mark records as "excepted" but is unclear to readers; add clarifying documentation rather than changing behavior: update the models.py definition of the excepted_at DateField to include a help_text/docstring (e.g., help_text="Timestamp when a record was marked as excepted (excluded)"), and if you choose to rename the field to excluded_at or exception_marked_at instead, update all references (tests like test_boost_usage_excepted_at_accepts_valid_date and analyzer filters using excepted_at__isnull=True) and run migrations and tests to keep behavior consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 141-149: The current identity_name computation assumes
change.commit.account is non-null and will raise AttributeError if account is
None; update the logic in analyzer_libraries.py around the identity_name
assignment to first check whether change.commit.account is truthy (e.g.,
change.commit.account is not None) before accessing identity_id or any account
attributes, and if account is None return a sensible fallback such as "unknown";
ensure you reference change.commit.account.identity.display_name,
change.commit.account.display_name, change.commit.account.username and
change.commit.account.identity_id in the guarded branches so attribute access
only occurs when account exists.
In `@boost_library_usage_dashboard/report.py`:
- Around line 40-45: The loop directly indexes lib['name'], lib['repo_count'],
and lib['total_usage'], which can raise KeyError for malformed entries; change
those direct accesses to use .get() with safe defaults (e.g., name =
lib.get("name","N/A"), repo_count = lib.get("repo_count",0), total_usage =
lib.get("total_usage",0)) and then use these variables in the
report_lines.append call (keeping earliest/latest handling as-is) so the code is
defensive against missing keys.
---
Nitpick comments:
In `@boost_library_usage_dashboard/analyzer.py`:
- Around line 133-135: Replace the f-string in the logger.debug call with lazy
%-style formatting so the message isn't eagerly evaluated; locate the
logger.debug call that references libs and latest_version_id (the line calling
logger.debug(f"Loaded {len(libs)} libraries. Latest version:
{latest_version_id}")) and change it to use a format string with % placeholders
and pass len(libs) and latest_version_id as arguments to logger.debug.
In `@boost_library_usage_dashboard/dashboard_html.py`:
- Around line 27-28: The for-loop uses a redundant fallback expression; replace
the iteration target `data.get("libraries_page_data", {}) or {}` with just
`data.get("libraries_page_data", {})` in the loop that calls
`build_library_page(data, library_name, lib_dir)` so the default empty dict from
get already handles the missing key and removes the unnecessary `or {}`.
In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`:
- Around line 127-132: The commit call using subprocess.run(["git", "-C",
str(publish_root), "commit", "-m", commit_message], check=True) lacks error
handling and doesn't capture stdout/stderr; wrap this in a try/except for
subprocess.CalledProcessError (or run with capture_output=True, text=True and
check=False) to collect stdout/stderr, then log or include those details in the
error message and surface a clear message before calling push(publish_root,
branch=branch) (use processLogger or the existing logging mechanism). Ensure the
handler references commit_message, publish_root, and branch and either
exits/raises after logging the detailed error so push() isn’t called on a failed
commit.
In `@boost_library_usage_dashboard/models.py`:
- Line 51: The field name excepted_at is intentionally used to mark records as
"excepted" but is unclear to readers; add clarifying documentation rather than
changing behavior: update the models.py definition of the excepted_at DateField
to include a help_text/docstring (e.g., help_text="Timestamp when a record was
marked as excepted (excluded)"), and if you choose to rename the field to
excluded_at or exception_marked_at instead, update all references (tests like
test_boost_usage_excepted_at_accepts_valid_date and analyzer filters using
excepted_at__isnull=True) and run migrations and tests to keep behavior
consistent.
In `@boost_library_usage_dashboard/tests/test_analyzer.py`:
- Around line 10-19: Replace the hardcoded tempfile.gettempdir() usage in the
_make_analyzer helper with pytest's tmp_path fixture: change _make_analyzer to
accept a tmp_path parameter (or create a wrapper that calls it) and set
analyzer.base_dir and analyzer.output_dir to subpaths of tmp_path (e.g.,
tmp_path / "boost-dashboard-test-base" and tmp_path /
"boost-dashboard-test-output"); update any tests that call _make_analyzer to
pass the tmp_path fixture so each test gets isolated, auto-cleaned directories
for BoostUsageDashboardAnalyzer instances, and avoid global tempdir usage.
In `@boost_library_usage_dashboard/tests/test_command.py`:
- Around line 52-56: The test unnecessarily wraps tmp_path in Path(...) twice;
replace Path(str(tmp_path)).resolve() with tmp_path.resolve() (or tmp_path if
resolve not needed) when setting expected_output_dir, and replace Path(tmp_path)
at the other occurrence (around line 112) with tmp_path directly; update
assertions that call render_html(...) to pass the resulting
tmp_path/tmp_path.resolve() and keep settings.BASE_DIR as-is so
render_html.assert_called_once_with(base_dir=settings.BASE_DIR,
output_dir=expected_output_dir) still matches.
In `@boost_library_usage_dashboard/utils.py`:
- Around line 4-15: The type checker warns because tuple(out[:3]) yields a
generic tuple type; update _version_tuple to return an explicit 3-tuple to match
the annotation: replace the final return tuple(out[:3]) with an explicit
construction like (out[0], out[1], out[2]) (or use
typing.cast(tuple[int,int,int], tuple(out[:3])) if you prefer a cast); this
ensures the return value from _version_tuple is statically typed as
tuple[int,int,int].
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
boost_library_usage_dashboard/__init__.pyboost_library_usage_dashboard/analyzer.pyboost_library_usage_dashboard/analyzer_libraries.pyboost_library_usage_dashboard/analyzer_metrics.pyboost_library_usage_dashboard/analyzer_output.pyboost_library_usage_dashboard/apps.pyboost_library_usage_dashboard/dashboard_html.pyboost_library_usage_dashboard/dashboard_html_common.pyboost_library_usage_dashboard/dashboard_html_index.pyboost_library_usage_dashboard/dashboard_html_library.pyboost_library_usage_dashboard/management/__init__.pyboost_library_usage_dashboard/management/commands/__init__.pyboost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.pyboost_library_usage_dashboard/models.pyboost_library_usage_dashboard/renderer.pyboost_library_usage_dashboard/report.pyboost_library_usage_dashboard/tests/__init__.pyboost_library_usage_dashboard/tests/fixtures.pyboost_library_usage_dashboard/tests/test_analyzer.pyboost_library_usage_dashboard/tests/test_command.pyboost_library_usage_dashboard/tests/test_models.pyboost_library_usage_dashboard/tests/test_renderer.pyboost_library_usage_dashboard/tests/test_report.pyboost_library_usage_dashboard/tests/test_utils.pyboost_library_usage_dashboard/utils.pyconfig/settings.pyconftest.pyworkflow/management/commands/run_all_collectors.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
boost_library_usage_dashboard/report.py (2)
78-83: Tuple unpacking assumes exactly 4 elements.The loop unpacks each item as
(version, created_at, confirmed_count, no_confirmed_count). Ifdistribution_by_versioncontains malformed entries with a different number of elements, this will raiseValueError.This is likely acceptable if the data source is trusted, but consider adding a
try/exceptor validation if robustness against malformed data is desired.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/report.py` around lines 78 - 83, The loop over version_stats["distribution_by_version"] currently unpacks each item as (version, created_at, confirmed_count, no_confirmed_count) which fails on malformed tuples; update the loop in the same block (iterating version_stats["distribution_by_version"] and appending to report_lines) to validate each item before unpacking (e.g., check len(item) == 4) or wrap the unpack in a try/except that logs the bad entry and continues so malformed entries don’t raise ValueError and stop the report generation.
29-32: Consider defensive access for required stats keys.Lines 29-32 directly access
stats['total_repositories'],stats['affected_repositories'], etc., which will raiseKeyErrorif missing. While these are core stats and failing fast may be intentional, this is inconsistent with the defensive.get()pattern used in lines 40-48.If failing fast is intended, consider adding explicit validation at the function entry to provide a clearer error message.
Option A: Defensive access with defaults
- f"- **Total Repositories (≥{stars_min_threshold} stars)**: {stats['total_repositories']:,}", - f"- **Repositories Using System Boost**: {stats['affected_repositories']:,}", - f"- **Total Boost Libraries**: {stats['total_libraries']:,}", - f"- **Total Usage Records**: {stats['total_usage_records']:,}", + f"- **Total Repositories (≥{stars_min_threshold} stars)**: {stats.get('total_repositories', 0):,}", + f"- **Repositories Using System Boost**: {stats.get('affected_repositories', 0):,}", + f"- **Total Boost Libraries**: {stats.get('total_libraries', 0):,}", + f"- **Total Usage Records**: {stats.get('total_usage_records', 0):,}",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/report.py` around lines 29 - 32, The code directly indexes stats['total_repositories'], stats['affected_repositories'], stats['total_libraries'], and stats['total_usage_records'] which can raise KeyError and is inconsistent with later .get() usage; fix by either replacing these direct accesses with defensive lookups (e.g., use stats.get('total_repositories', 0) etc.) or add an explicit validation at the start of the function that contains this block to assert/raise a clear ValueError if any of those required keys are missing (mentioning the missing key names). Ensure the change is applied where the four lines are constructed so the rest of the function can rely on consistent, well-defined values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@boost_library_usage_dashboard/report.py`:
- Around line 62-65: In the never-used libraries loop, change the direct access
lib['name'] to a defensive access like lib.get('name') (with a sensible default
such as 'N/A') so it matches the defensive pattern used in the top libraries
loop (see the block around lines 43-45); update the f-string in the
report_lines.append inside that loop to use lib.get('name') (or 'N/A') instead
of lib['name'] to avoid KeyError on missing keys.
---
Nitpick comments:
In `@boost_library_usage_dashboard/report.py`:
- Around line 78-83: The loop over version_stats["distribution_by_version"]
currently unpacks each item as (version, created_at, confirmed_count,
no_confirmed_count) which fails on malformed tuples; update the loop in the same
block (iterating version_stats["distribution_by_version"] and appending to
report_lines) to validate each item before unpacking (e.g., check len(item) ==
4) or wrap the unpack in a try/except that logs the bad entry and continues so
malformed entries don’t raise ValueError and stop the report generation.
- Around line 29-32: The code directly indexes stats['total_repositories'],
stats['affected_repositories'], stats['total_libraries'], and
stats['total_usage_records'] which can raise KeyError and is inconsistent with
later .get() usage; fix by either replacing these direct accesses with defensive
lookups (e.g., use stats.get('total_repositories', 0) etc.) or add an explicit
validation at the start of the function that contains this block to assert/raise
a clear ValueError if any of those required keys are missing (mentioning the
missing key names). Ensure the change is applied where the four lines are
constructed so the rest of the function can rely on consistent, well-defined
values.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
boost_library_usage_dashboard/analyzer_libraries.pyboost_library_usage_dashboard/report.py
🚧 Files skipped from review as they are similar to previous changes (1)
- boost_library_usage_dashboard/analyzer_libraries.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
boost_library_usage_dashboard/analyzer_libraries.py (1)
134-136: Optimize version lookup in the hot loop.
get_first_version_released_after()linearly scansversion_infofor every file change. On large datasets this becomes a dominant cost. Precompute sorted release timestamps once and use binary search.♻️ Suggested refactor
+from bisect import bisect_right ... def collect_commit_info_by_library(analyzer: Any) -> dict[str, Any]: @@ + release_points = sorted( + (v.version_created_at, v.version) + for v in analyzer.version_info + if v.version_created_at is not None + ) + release_times = [ts for ts, _ in release_points] @@ - version = get_first_version_released_after(analyzer.version_info, commit_dt) + version = get_first_version_released_after(commit_dt, release_points, release_times) @@ -def get_first_version_released_after( - version_info: list[Any], commit_at: datetime | None -) -> str | None: +def get_first_version_released_after( + commit_at: datetime | None, + release_points: list[tuple[datetime, str]], + release_times: list[datetime], +) -> str | None: @@ - candidates = [ - v - for v in version_info - if v.version_created_at is not None and v.version_created_at > commit_at - ] - if not candidates: - return None - return min(candidates, key=lambda v: v.version_created_at).version + idx = bisect_right(release_times, commit_at) + return release_points[idx][1] if idx < len(release_points) else NoneAlso applies to: 171-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_usage_dashboard/analyzer_libraries.py` around lines 134 - 136, The current hot loop calls get_first_version_released_after(analyzer.version_info, commit_dt) for every change which scans version_info linearly; precompute a sorted list/array of release timestamps (and corresponding versions) once (e.g., in Analyzer initialization or before iterating changes) and replace the per-change linear scan with a binary search (bisect) lookup using commit_dt to find the first release after the commit; update the loop sites around commit_dt (references in analyzer.version_info, get_first_version_released_after usage at the shown block and the similar block at 171-184) to use the precomputed timestamps and a fast index-to-version mapping instead of calling the linear helper repeatedly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@boost_library_usage_dashboard/report.py`:
- Around line 29-32: The four f-strings directly index the stats dict and can
raise KeyError; update them to use the same defensive .get() pattern used
elsewhere by replacing stats['total_repositories'],
stats['affected_repositories'], stats['total_libraries'], and
stats['total_usage_records'] with stats.get('total_repositories', 0),
stats.get('affected_repositories', 0), stats.get('total_libraries', 0), and
stats.get('total_usage_records', 0) respectively so missing keys default to 0
while preserving the thousand-separator formatting in the f-strings.
---
Nitpick comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 134-136: The current hot loop calls
get_first_version_released_after(analyzer.version_info, commit_dt) for every
change which scans version_info linearly; precompute a sorted list/array of
release timestamps (and corresponding versions) once (e.g., in Analyzer
initialization or before iterating changes) and replace the per-change linear
scan with a binary search (bisect) lookup using commit_dt to find the first
release after the commit; update the loop sites around commit_dt (references in
analyzer.version_info, get_first_version_released_after usage at the shown block
and the similar block at 171-184) to use the precomputed timestamps and a fast
index-to-version mapping instead of calling the linear helper repeatedly.
ℹ️ Review info
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
boost_library_usage_dashboard/analyzer_libraries.pyboost_library_usage_dashboard/report.pyconfig/settings.pyconftest.pyworkflow/management/commands/run_all_collectors.py
🚧 Files skipped from review as they are similar to previous changes (2)
- conftest.py
- config/settings.py
Summary by CodeRabbit
New Features
Chores
Tests