Skip to content

#30-add boost-library-usage-dashboard app#73

Merged
snowfox1003 merged 23 commits intocppalliance:developfrom
henry0816191:dev-30
Mar 4, 2026
Merged

#30-add boost-library-usage-dashboard app#73
snowfox1003 merged 23 commits intocppalliance:developfrom
henry0816191:dev-30

Conversation

@henry0816191
Copy link
Copy Markdown
Collaborator

@henry0816191 henry0816191 commented Feb 25, 2026

Summary by CodeRabbit

  • New Features

    • Added Boost Library Usage Dashboard: interactive HTML index with charts/tables and per-library detail pages.
    • New command to generate dashboard data and a Markdown summary report; supports optional publishing of artifacts to a target repository/branch.
  • Chores

    • Integrated the dashboard app into project settings and the collectors workflow.
  • Tests

    • Added unit tests and fixtures covering analyzer, rendering, models, utils, and the management command.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 25, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
📝 Walkthrough

Walkthrough

Introduces a new Django app "boost_library_usage_dashboard" that orchestrates end-to-end data collection and rendering of a Boost library usage dashboard. Includes analyzer modules for data gathering, metrics calculation, HTML rendering with Chart.js visualizations, Django models for data representation, a management command for workflow integration, and comprehensive tests.

Changes

Cohort / File(s) Summary
Core Analyzer Orchestration
boost_library_usage_dashboard/analyzer.py
Introduces BoostUsageDashboardAnalyzer class with initialization, repository/library data loading, statistics computation (version distributions, top/never-used libraries), filtering/sorting utilities, and delegation to external analytics helpers.
Library & Metrics Data Gathering
boost_library_usage_dashboard/analyzer_libraries.py, boost_library_usage_dashboard/analyzer_metrics.py
Adds functions for building dependency graphs, computing transitive dependencies, aggregating commit info, collecting external consumer data, calculating trend metrics (activity scores), building per-library overview summaries, and assembling complete dashboard datasets.
Dashboard Data Assembly & Output
boost_library_usage_dashboard/analyzer_output.py, boost_library_usage_dashboard/report.py
Implements dashboard data collection (top repositories by stars/usage/date), comprehensive dashboard JSON generation with version/year breakdowns, and Markdown report writing with overview, top libraries, never-used libraries, and version distribution tables.
HTML Rendering Framework
boost_library_usage_dashboard/dashboard_html.py, boost_library_usage_dashboard/dashboard_html_common.py, boost_library_usage_dashboard/dashboard_html_index.py, boost_library_usage_dashboard/dashboard_html_library.py
Provides shared HTML utilities (CSS, escaping, JSON serialization, table controls, Chart.js integration), builds index page with charts (repos by year/version, language comparison, library overviews), and renders individual library detail pages with dependency/usage/contribution charts.
HTML Rendering Orchestration
boost_library_usage_dashboard/renderer.py
Exposes render_dashboard_html entry point that invokes dashboard HTML generation with proper directory setup.
Django Models & Configuration
boost_library_usage_dashboard/models.py, boost_library_usage_dashboard/apps.py, boost_library_usage_dashboard/__init__.py
Adds unmanaged ORM models BoostExternalRepository and BoostUsage linking to existing trackers, Django app configuration, and module docstring.
Utilities & Helpers
boost_library_usage_dashboard/utils.py
Introduces version parsing (_version_tuple, normalize_version_str), percentage formatting, and filesystem-safe library name sanitization.
Django Management Command
boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py
Implements management command orchestrating analyzer execution, report generation, HTML rendering, with optional publish workflow (clone repo, commit, push to target branch).
Test Suite
boost_library_usage_dashboard/tests/*
Adds comprehensive tests for analyzer behavior (trend metrics, transitive deps, filtering), command execution (analyzer invocation, report/HTML generation, publish flow), Django models (field validation, Meta configurations), renderer/HTML generation, report sections, utility functions, and fixtures.
Project Integration
config/settings.py, workflow/management/commands/run_all_collectors.py, conftest.py
Registers new app in INSTALLED_APPS and workspace slugs, adds dashboard command to collector workflow, and extends pytest fixture discovery.

Sequence Diagram(s)

sequenceDiagram
    participant Cmd as Management<br/>Command
    participant Analyzer as BoostUsage<br/>DashboardAnalyzer
    participant LibData as Library & Metrics<br/>Helpers
    participant DB as Database<br/>(ORM Models)
    participant Output as Output<br/>Assembly
    participant Render as HTML<br/>Renderer
    participant FS as File System

    Cmd->>Analyzer: __init__(base_dir, output_dir)
    Analyzer->>DB: Load repositories & libraries
    Analyzer->>DB: Query BoostUsage, BoostExternalRepository
    Analyzer->>LibData: Collect library metrics by file usage
    LibData->>DB: Aggregate BoostUsage by library/year
    LibData-->>Analyzer: metrics_by_library dict
    Analyzer->>LibData: Collect dependents & external consumers
    LibData->>DB: Query dependency graph
    LibData-->>Analyzer: consumer_data dict
    Analyzer->>Analyzer: Compute statistics (versions, top libs)
    Analyzer-->>Cmd: Return stats dict
    Cmd->>Output: Assemble dashboard data
    Output->>Analyzer: Fetch filtered libraries, top repos
    Output-->>Cmd: Return dashboard_data
    Cmd->>FS: Write dashboard_data.json
    Cmd->>Cmd: Generate markdown report
    Cmd->>FS: Write summary report
    Cmd->>Render: render_dashboard_html(output_dir)
    Render->>FS: Read dashboard_data.json
    Render->>Render: build_index_page(data)
    Render->>FS: Write index.html
    Render->>Render: build_library_page() for each library
    Render->>FS: Write per-library HTML files
    Cmd->>Cmd: Success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A dashboard blooms, Boost libraries shine,
Data flows through pipelines, metrics align,
Charts and tables dance, dependencies clear,
HTML pages render the usage frontier!
From database queries to visualization bright,
The Boost story now glows in full light ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '#30-add boost-library-usage-dashboard app' clearly and specifically summarizes the main change: adding a new boost-library-usage-dashboard application to the project.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (1)
boost_library_usage_dashboard/analyzer_metrics.py (1)

132-132: Use strict=True in zip() to enforce sequence alignment.

Both x_values and ratios are derived from the same data source (to_last_year_data), so they are guaranteed to have equal length. Adding strict=True prevents silent truncation and resolves Ruff B905. The project targets Python 3.11+, which fully supports this parameter.

💡 Proposed fix
-        sum_xy = sum(x * y for x, y in zip(x_values, ratios))
+        sum_xy = sum(x * y for x, y in zip(x_values, ratios, strict=True))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_metrics.py` at line 132, Replace the
plain zip(...) used to compute sum_xy with zip(..., strict=True) to enforce that
x_values and ratios have the same length; update the generator expression at
sum_xy = sum(x * y for x, y in zip(x_values, ratios)) to use zip(x_values,
ratios, strict=True) so sequence misalignment raises an error (this targets the
sum_xy calculation in analyzer_metrics.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 247-249: Replace lexical string comparisons with semantic version
comparisons: parse versions using a semantic version parser (e.g.,
packaging.version.parse or distutils.version.LooseVersion) before comparing or
sorting. Specifically, when evaluating if version >= (lib.get("created_version")
or "") use parsed Version objects for both sides; when building chart_data
ensure keys or comparisons use parsed versions; change
table_data.sort(key=lambda x: (x["version"], x["commit_count"]), reverse=True)
to sort by parsed versions (e.g., parse x["version"]) together with
commit_count; and apply the same parsed-version logic in the other
comparison/assignment sites referenced around lines 253-256 (e.g.,
last_updated_version and any chart filtering) so all version comparisons and
orderings use semantic ordering.
- Around line 239-244: The dashboard is emitting raw contributor emails in the
payload at the table_data.append call (see keys "email_address", uses variable
email and person_data); remove or redact PII there instead of returning the full
address—either drop the "email_address" field, replace it with a non-identifying
value (e.g. masked local-part like a*****@domain or only the domain), or store a
deterministic hash/ID if de-duplication is needed; update the same place where
table_data is built (and any downstream serialization) so the dashboard never
receives the cleartext email.

In `@boost_library_usage_dashboard/analyzer_metrics.py`:
- Around line 99-106: The code currently truncates year_data into
to_last_year_data unconditionally, dropping the newest year and skewing
recent_usage, past_usage, and total_usage; change the logic so you only exclude
the latest year when it represents the current incomplete year (e.g., compare
the latest year value in year_data to the current year and check if the data for
that year is partial), otherwise keep all entries in to_last_year_data; update
the creation of to_last_year_data and ensure subsequent sums for recent_usage,
past_usage, and total_usage use that corrected list (referencing the variables
to_last_year_data, year_data, recent_usage, past_usage, total_usage and the
recent_year_threshold).

In `@boost_library_usage_dashboard/dashboard_html_common.py`:
- Around line 76-77: The header cells currently render clickable <th> elements
using the list comprehension that builds f"<th class='sortable'
data-key='{e(key)}'>{e(label)} <span class='sort-indicator'>↕</span></th>" (and
similar code at the other locations); make them keyboard-accessible by moving
the interactive target into a <button type="button"> inside the <th> (e.g.,
f"<th class='sortable' data-key='{e(key)}'><button type='button'
class='sort-button' aria-label='Sort by {e(label)}'>{e(label)} <span
class='sort-indicator'>↕</span></button></th>"), update the event binding so
click handlers (the sort function currently relying on clicks on the <th>) are
attached to the button instead of the th, and ensure aria-sort or aria-pressed
is updated when sorting changes; apply the same change to the other header
generation blocks referenced (the occurrences around the 136-141 and 158-165
code paths) and adjust CSS selectors from .sortable to .sort-button if needed so
styling and indicators continue to work.

In `@boost_library_usage_dashboard/dashboard_html_library.py`:
- Line 175: The contributors table currently exposes raw contributor emails via
the headers mapping using "email_address" and in the other render location that
references the same field; replace any direct use of "email_address" with a
masked or truncated representation (for example compute an "email_masked" value
showing only the first character and domain or just the domain) and update the
headers tuple to use that masked key instead of "email_address"; add a small
helper function (e.g., mask_email(email)) and call it wherever contributor
email_address is rendered or exported so both the headers list and the other
occurrence no longer output full addresses.
- Around line 184-225: The injected JSON and interpolated template values are
unescaped causing XSS; replace the direct json.dumps(...) used to build
headersRows, internalRows, externalRows, contribRows, depVersions, depFirst,
depAll, usageYears, byCreated, byUpdated, repoCounts, commitVersions and
commitCounts with a safe serializer (e.g. _json_for_script(...)) and wrap any
interpolated values in the rowHtml template (r.name, r.stars, r.usage_count,
r.created_at, r.updated_at, r.version, r.person, r.email_address,
r.commit_count, r.depth) with the existing HTML-escape helper e()/esc(...) so
that initDataTable rowHtml strings emit escaped content before being inserted
via innerHTML; keep initDataTable and rowHtml structure but call the escape
helper for every user-sourced insertion and switch JSON serialization to the
safe helper.

In `@boost_library_usage_dashboard/dashboard_html.py`:
- Around line 21-25: Ensure the output directory exists before calling
build_index_page: check/create output_dir (using output_dir.mkdir(parents=True,
exist_ok=True)) prior to reading dashboard_data_file and invoking
build_index_page; update the block that sets lib_dir and then calls
build_index_page so output_dir is created when missing (referencing variables
lib_dir, output_dir, dashboard_data_file and the function build_index_page).

In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`:
- Around line 77-80: Replace the ValueError raised in the publish argument
validation with Django's CommandError: in the run_boost_library_usage_dashboard
Command (the block checking if options["publish"] and using target_repo
variable), raise CommandError("--publish requires --target-repo") instead of
ValueError; also update the corresponding unit test in
boost_library_usage_dashboard/tests/test_command.py to expect CommandError
(replace ValueError expectation with CommandError) so the test asserts the
management command raises the correct exception type.

In `@boost_library_usage_dashboard/tests/test_analyzer.py`:
- Around line 11-12: Replace hardcoded /tmp paths by using a secure temporary
directory mechanism: change the test so that analyzer.base_dir and
analyzer.output_dir are set from a pytest tmp_path (or a
tempfile.TemporaryDirectory()/tempfile.mkdtemp()) instead of Path("/tmp/...");
update the assignment locations where analyzer.base_dir and analyzer.output_dir
are set in tests/test_analyzer.py to derive their Path values from the tmp_path
fixture (or the created secure temp dir) so Ruff S108 is satisfied.

In `@boost_library_usage_dashboard/utils.py`:
- Line 5: The normalization preserves leading/trailing whitespace; change the
initialization of version to strip whitespace first by using (version_str or
"").strip() before the .replace calls so the line becomes: version =
(version_str or "").strip().replace("-", ".").replace("_", "."); update any
usages expecting version to ensure they still work with the trimmed value (refer
to the variable name "version" and the input "version_str").

---

Nitpick comments:
In `@boost_library_usage_dashboard/analyzer_metrics.py`:
- Line 132: Replace the plain zip(...) used to compute sum_xy with zip(...,
strict=True) to enforce that x_values and ratios have the same length; update
the generator expression at sum_xy = sum(x * y for x, y in zip(x_values,
ratios)) to use zip(x_values, ratios, strict=True) so sequence misalignment
raises an error (this targets the sum_xy calculation in analyzer_metrics.py).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34257ca and 2a30f4a.

📒 Files selected for processing (28)
  • boost_library_usage_dashboard/__init__.py
  • boost_library_usage_dashboard/analyzer.py
  • boost_library_usage_dashboard/analyzer_libraries.py
  • boost_library_usage_dashboard/analyzer_metrics.py
  • boost_library_usage_dashboard/analyzer_output.py
  • boost_library_usage_dashboard/apps.py
  • boost_library_usage_dashboard/dashboard_html.py
  • boost_library_usage_dashboard/dashboard_html_common.py
  • boost_library_usage_dashboard/dashboard_html_index.py
  • boost_library_usage_dashboard/dashboard_html_library.py
  • boost_library_usage_dashboard/management/__init__.py
  • boost_library_usage_dashboard/management/commands/__init__.py
  • boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py
  • boost_library_usage_dashboard/models.py
  • boost_library_usage_dashboard/renderer.py
  • boost_library_usage_dashboard/report.py
  • boost_library_usage_dashboard/tests/__init__.py
  • boost_library_usage_dashboard/tests/fixtures.py
  • boost_library_usage_dashboard/tests/test_analyzer.py
  • boost_library_usage_dashboard/tests/test_command.py
  • boost_library_usage_dashboard/tests/test_models.py
  • boost_library_usage_dashboard/tests/test_renderer.py
  • boost_library_usage_dashboard/tests/test_report.py
  • boost_library_usage_dashboard/tests/test_utils.py
  • boost_library_usage_dashboard/utils.py
  • config/settings.py
  • conftest.py
  • workflow/management/commands/run_all_collectors.py

Comment thread boost_library_usage_dashboard/analyzer_libraries.py
Comment thread boost_library_usage_dashboard/analyzer_libraries.py Outdated
Comment thread boost_library_usage_dashboard/analyzer_metrics.py Outdated
Comment thread boost_library_usage_dashboard/dashboard_html_common.py Outdated
Comment thread boost_library_usage_dashboard/dashboard_html_library.py Outdated
Comment thread boost_library_usage_dashboard/dashboard_html_library.py Outdated
Comment thread boost_library_usage_dashboard/dashboard_html.py
Comment thread boost_library_usage_dashboard/tests/test_analyzer.py Outdated
Comment thread boost_library_usage_dashboard/utils.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
boost_library_usage_dashboard/analyzer.py (1)

50-52: ⚠️ Potential issue | 🟠 Major

Use semantic version ordering consistently across analyzer paths.

Line 50, Line 136, and Line 415 still rely on lexical ordering of version strings. That can select wrong “latest/created” versions and misorder version-sorted outputs.

🛠️ Patch sketch
-from django.db.models import Count, Min
+from django.db.models import Count
@@
-from .utils import format_percent
+from .utils import _version_tuple, format_percent
@@
-        self.version_info = list(
-            BoostVersion.objects.all().order_by("version")
-        )  # pylint: disable=no-member
+        self.version_info = sorted(
+            BoostVersion.objects.all(),  # pylint: disable=no-member
+            key=lambda v: _version_tuple(v.version),
+        )
@@
-        created_versions = {
-            row["library_id"]: row["version__version"]
-            for row in BoostLibraryVersion.objects.values(
-                "library_id"
-            ).annotate(  # pylint: disable=no-member
-                version__version=Min("version__version")
-            )
-        }
+        created_versions: dict[int, str] = {}
+        for row in BoostLibraryVersion.objects.values(  # pylint: disable=no-member
+            "library_id", "version__version"
+        ):
+            lib_id = row["library_id"]
+            version = row["version__version"] or ""
+            prev = created_versions.get(lib_id)
+            if prev is None or _version_tuple(version) < _version_tuple(prev):
+                created_versions[lib_id] = version
@@
-        for row in (
-            BoostLibraryVersion.objects.exclude(
-                description=""
-            )  # pylint: disable=no-member
-            .values("library_id", "description", "version__version")
-            .order_by("library_id", "-version__version")
-        ):
-            desc_map.setdefault(row["library_id"], row["description"])
+        latest_desc_version: dict[int, str] = {}
+        for row in (
+            BoostLibraryVersion.objects.exclude(description="")  # pylint: disable=no-member
+            .values("library_id", "description", "version__version")
+            .order_by("library_id")
+        ):
+            lib_id = row["library_id"]
+            version = row["version__version"] or ""
+            prev = latest_desc_version.get(lib_id)
+            if prev is None or _version_tuple(version) > _version_tuple(prev):
+                latest_desc_version[lib_id] = version
+                desc_map[lib_id] = row["description"]
@@
-        if sort_field:
-            ret_data.sort(
-                key=lambda x: x.get(sort_field) or 0,
-                reverse=sort_order.lower() == "desc",
-            )
+        if sort_field:
+            def _sort_key(item: dict[str, Any]) -> Any:
+                value = item.get(sort_field)
+                if "version" in sort_field:
+                    return _version_tuple(value or "")
+                return value or 0
+            ret_data.sort(
+                key=_sort_key,
+                reverse=sort_order.lower() == "desc",
+            )
#!/bin/bash
# Verify lexical-version sites still present in analyzer.py
rg -nP 'order_by\("version"\)|Min\("version__version"\)|-version__version|key=lambda x: x.get\(sort_field\) or 0' boost_library_usage_dashboard/analyzer.py

# Demonstrate why lexical ordering is unsafe for versions.
python - <<'PY'
versions = ["boost-1.9.0", "boost-1.10.0", "boost-1.84.0"]
print("lexical :", sorted(versions))

def vtuple(v):
    parts = v.replace("boost-", "").split(".")
    nums = [int("".join(ch for ch in p if ch.isdigit()) or 0) for p in parts[:3]]
    while len(nums) < 3:
        nums.append(0)
    return tuple(nums)

print("semantic:", sorted(versions, key=vtuple))
PY

Also applies to: 136-151, 395-417

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer.py` around lines 50 - 52, Several
places (the self.version_info assignment, any Min("version__version") usages,
and the custom key=lambda x: x.get(sort_field) sort) rely on lexical ordering of
version strings; replace these with semantic version ordering by using
packaging.version.parse (from packaging import version) to compare versions.
Concretely: import packaging.version and after fetching querysets (e.g.,
BoostVersion.objects.all()) convert to list and sort with list.sort(key=lambda
v: version.parse(v.version) or version.parse(v.version.version) as appropriate),
replace Min("version__version")/Max-style aggregations by computing min/max on
the list of parsed versions instead of DB lexical aggregates, and update any
key=lambda x: x.get(sort_field) to parse that field with packaging.version.parse
so all comparisons/select-latest logic (including in methods that populate
self.version_info and the code paths around the earlier Min/Max usages) use
semantic ordering consistently.
🧹 Nitpick comments (10)
boost_library_usage_dashboard/models.py (1)

1-57: Enforce read-only intent at the model layer.

These models are described as read-only mappings, but writes are still technically possible. Adding a small guard prevents accidental mutation of external-owned tables.

♻️ Proposed safeguard
 from django.db import models
 
 
-class BoostExternalRepository(models.Model):
+class _ReadOnlyExternalModel(models.Model):
+    class Meta:
+        abstract = True
+
+    def save(self, *args, **kwargs):
+        raise RuntimeError(f"{self.__class__.__name__} is read-only.")
+
+    def delete(self, *args, **kwargs):
+        raise RuntimeError(f"{self.__class__.__name__} is read-only.")
+
+
+class BoostExternalRepository(_ReadOnlyExternalModel):
@@
-class BoostUsage(models.Model):
+class BoostUsage(_ReadOnlyExternalModel):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/models.py` around lines 1 - 57, Add explicit
read-only guards to prevent accidental writes to these unmanaged mappings:
implement and override save() and delete() on both BoostExternalRepository and
BoostUsage (and optionally bulk methods like update/refresh_from_db if present)
to raise a clear exception (e.g., RuntimeError or PermissionDenied) indicating
the model is read-only; keep Meta.managed = False as-is and ensure the exception
message names the class (BoostExternalRepository or BoostUsage) so callers know
which model is protected.
boost_library_usage_dashboard/tests/test_command.py (1)

44-52: Assert renderer call arguments, not only call count.

Right now the test would still pass if render_dashboard_html is called with the wrong path. Verifying propagated output_dir will catch command wiring regressions.

🧪 Proposed test hardening
     write_report.assert_called_once_with(
         fake_analyzer.report_file,
         {"total_repositories": 0},
         stars_min_threshold=10,
     )
     render_html.assert_called_once()
+    args, kwargs = render_html.call_args
+    called_output_dir = kwargs.get("output_dir", args[1] if len(args) > 1 else None)
+    assert called_output_dir == Path(tmp_path)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/tests/test_command.py` around lines 44 - 52,
The test currently only checks that render_html was called once but not that it
received the correct output location; update the test to assert the exact call
arguments for render_html (e.g., that it was invoked with the expected
output_dir or HTML path produced by render_dashboard_html), so the command
wiring is validated. Locate the existing assertions around analyzer_cls,
fake_analyzer.run, write_report and render_html in the test and replace or
augment the render_html.assert_called_once() with a
render_html.assert_called_once_with(...) check that uses the test's
output_dir/expected HTML filename and any other expected params.
boost_library_usage_dashboard/analyzer_metrics.py (2)

136-136: Consider adding strict=True to zip() for defensive coding.

While x_values and ratios are derived from the same source and are guaranteed to have equal length, adding strict=True would catch any future refactoring that breaks this invariant.

Suggested change
-        sum_xy = sum(x * y for x, y in zip(x_values, ratios))
+        sum_xy = sum(x * y for x, y in zip(x_values, ratios, strict=True))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_metrics.py` at line 136, The zip call
used to compute sum_xy (sum_xy = sum(x * y for x, y in zip(x_values, ratios)))
should be made defensive by adding strict=True to the zip invocation to ensure
x_values and ratios are the same length at runtime; update the generator
expression to use zip(x_values, ratios, strict=True) in the sum_xy computation
(and run tests or ensure the project targets Python 3.10+ where strict is
supported).

22-35: Large iterator query without batching could cause memory pressure under extreme data volumes.

The query uses .iterator() which is good, but iterates over all matching BoostUsage rows in a single pass while accumulating results in memory (library_year_data, library_top_repo, library_headers). For very large datasets, consider whether chunking or streaming aggregation is needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_metrics.py` around lines 22 - 35, The
large BoostUsage query assigned to rows is iterated while accumulating
library_year_data, library_top_repo, and library_headers, which can cause memory
pressure for huge datasets; update the loop to process results in bounded
batches (e.g., use BoostUsage.objects.filter(...).iterator(chunk_size=...) or
implement manual chunking with .order_by(...).values_list('pk', flat=True) and
slicing) and ensure each batch updates library_year_data, library_top_repo, and
library_headers incrementally and releases references before loading the next
batch so memory usage stays bounded; reference the rows variable,
BoostUsage.select_related(...), analyzer.stars_min_threshold, and the
accumulator structures when making this change.
boost_library_usage_dashboard/utils.py (2)

18-26: Consider stripping before removing the "boost-" prefix.

If input has leading whitespace like " boost-1.82", the "boost-" prefix won't be removed because .strip() is applied after the replacement on line 21.

Suggested fix
 def normalize_version_str(version_str: str) -> str | None:
-
-    version = (version_str or "").replace("boost-", "")
-    version = version.strip().replace("-", ".").replace("_", ".")
+    version = (version_str or "").strip().replace("boost-", "")
+    version = version.replace("-", ".").replace("_", ".")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/utils.py` around lines 18 - 26,
normalize_version_str currently calls .replace("boost-", "") before .strip(), so
inputs with leading/trailing whitespace like " boost-1.82" won’t have the
"boost-" prefix removed; update normalize_version_str to strip whitespace first
(e.g., call .strip() on version_str or its fallback before removing the "boost-"
prefix), then perform the "boost-" removal and the subsequent replacements and
checks so that prefixed versions with surrounding spaces are normalized
correctly.

4-15: Minor: Redundant null check on line 8.

Line 6-7 already returns (0, 0, 0) when version is falsy, so the (version or "") on line 8 is unnecessary.

Suggested simplification
     if not version:
         return (0, 0, 0)
-    parts = (version or "").strip().split(".")
+    parts = version.strip().split(".")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/utils.py` around lines 4 - 15, In
_version_tuple, remove the redundant (version or "") guard: since the function
already returns on falsy version, directly call version.strip().split(".") to
parse parts; update the parts assignment in the _version_tuple function and
leave the rest of the logic (digit extraction, padding to three components, and
returning tuple) unchanged.
boost_library_usage_dashboard/analyzer_output.py (2)

80-96: Protected member access is acknowledged but consider refactoring.

The _filter_and_sort_libraries method is called with protected access (noqa: SLF001). If this method is needed by multiple modules, consider making it public or extracting it to a shared utility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_output.py` around lines 80 - 96, The
code calls the protected method analyzer._filter_and_sort_libraries to build
dashboard_data["metrics_by_library"]; make this call use a public API or shared
utility instead: either rename and expose _filter_and_sort_libraries as
filter_and_sort_libraries on the Analyzer class and update callers (including
this call), or extract the filtering/sorting logic into a new utility function
(e.g., lib_metrics.filter_and_sort_libraries) and import/use that here; update
any tests/imports accordingly and remove the noqa/disable comments once the
protected access is eliminated.

62-70: Validate row structure more defensively.

The code assumes row is a list/tuple with at least 4 elements, but if distribution_by_version contains differently structured data (e.g., dicts), the code would silently skip them. Consider logging skipped rows for debugging.

Suggested enhancement
     for row in version_counts:
         if not isinstance(row, (list, tuple)) or len(row) < 4:
+            # Log unexpected row format for debugging
             continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_output.py` around lines 62 - 70, The
loop over version_counts in analyzer_output.py is too permissive and silently
skips malformed rows; update the logic in the block that references
version_counts, _version_tuple, repos_by_version_rows and min_key to validate
and handle unexpected row types: if row is a dict, extract keys like "version",
"created_at", "confirmed", "guessed"; if it is a list/tuple check length>=4 as
today; for any skipped or malformed row emit a debug/warn log that includes the
raw row content and reason for skipping so issues can be traced; keep the
existing behavior of appending (version, created_at, confirmed+guessed) when
values are valid and comparable to min_key.
boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py (1)

94-96: Consider atomic publish: clone to temp directory first.

If clone_repo() fails after shutil.rmtree(publish_root), the publish directory is lost. Consider cloning to a temporary directory first, then replacing the target only on success.

Suggested approach
+        import tempfile
+        
         publish_root = (
             get_workspace_path("shared") / "boost_library_usage_dashboard_publish"
         )
-        if publish_root.exists():
-            shutil.rmtree(publish_root)
-        clone_repo(target_repo, publish_root)
+        # Clone to temp first to avoid losing existing state on failure
+        with tempfile.TemporaryDirectory() as tmp:
+            tmp_clone = Path(tmp) / "clone"
+            clone_repo(target_repo, tmp_clone)
+            if publish_root.exists():
+                shutil.rmtree(publish_root)
+            shutil.move(str(tmp_clone), str(publish_root))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`
around lines 94 - 96, The current flow removes publish_root then calls
clone_repo(target_repo, publish_root), which can leave the publish directory
deleted if cloning fails; change the logic to clone into a temporary directory
(e.g., create a temp path next to publish_root), call clone_repo(target_repo,
temp_dir), verify success, then atomically replace publish_root with temp_dir
(use shutil.move or os.replace for atomic rename on the same filesystem), and
ensure proper cleanup of temp_dir on failure; update references in this file to
use publish_root and clone_repo accordingly and wrap the sequence in
try/except/finally to avoid leaking temp directories.
boost_library_usage_dashboard/analyzer_libraries.py (1)

134-136: Consider indexing release dates for faster commit→version lookup.

This path does a full version_info scan per row. Pre-sorting release timestamps once and using binary search will reduce hot-path cost on large histories.

⚡ Optional optimization sketch
+from bisect import bisect_right
+
+def _build_release_index(version_info: list[Any]) -> list[tuple[datetime, str]]:
+    return sorted(
+        [
+            (v.version_created_at, v.version)
+            for v in version_info
+            if v.version_created_at is not None
+        ],
+        key=lambda x: x[0],
+    )
+
+def _first_release_after(
+    release_index: list[tuple[datetime, str]], commit_at: datetime
+) -> str | None:
+    pos = bisect_right(release_index, (commit_at, "\uffff"))
+    return release_index[pos][1] if pos < len(release_index) else None

Also applies to: 170-183

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_libraries.py` around lines 134 - 136,
The current per-row scan using analyzer.version_info in
get_first_version_released_after is causing repeated full scans; precompute and
cache a sorted list/array of release timestamps (and parallel version refs) for
each analyzer (e.g., build_release_timestamps or a cached attribute on Analyzer)
and replace the linear scan with a binary search (bisect) lookup using commit_dt
to find the first release after the commit; update calls (the lookup at
commit_dt and the similar block around the other occurrence) to use the new
precomputed index or change get_first_version_released_after to accept the
pre-sorted timestamps argument so the hot path uses O(log n) instead of O(n).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_library_usage_dashboard/dashboard_html_index.py`:
- Around line 204-219: The JS context is being populated with raw
json.dumps(...) which doesn't escape characters that can break out of <script>,
so replace all json.dumps(...) uses (the constants yearLabels, yearCounts,
yearCum, versionLabels, versionCounts, versionCum, boostLabels, boostOver10,
nonBoostOver10, boostRate, languageData, allLanguages, tableLibraryOverview,
tableTopStars, tableTopUsage, tableTopCreated) with json_for_script(...) and add
the corresponding import for json_for_script from the same helper module used by
dashboard_html_library.py so the values are properly escaped for inlined
scripts.
- Around line 282-286: The templates rowHtml and topRow are inserting
user-controlled fields (e.g., r.name, r.created_version, r.safe_name,
r.repo_name, r.stars, r.usage_count, r.created_at, r.activity_score,
r.average_stars) directly into HTML; update both functions to apply the same
esc() escaping used elsewhere: wrap every interpolated value with esc(...)
(including formatted numeric/string results like toNumber(...).toLocaleString(),
Number(...).toFixed(3), and (r.created_at || '').toString().slice(...)) and
escape values used in URLs (e.g., esc(r.safe_name) for libraries/${...}.html and
esc(r.repo_name) for the GitHub href) so no external data is inserted unescaped.

---

Duplicate comments:
In `@boost_library_usage_dashboard/analyzer.py`:
- Around line 50-52: Several places (the self.version_info assignment, any
Min("version__version") usages, and the custom key=lambda x: x.get(sort_field)
sort) rely on lexical ordering of version strings; replace these with semantic
version ordering by using packaging.version.parse (from packaging import
version) to compare versions. Concretely: import packaging.version and after
fetching querysets (e.g., BoostVersion.objects.all()) convert to list and sort
with list.sort(key=lambda v: version.parse(v.version) or
version.parse(v.version.version) as appropriate), replace
Min("version__version")/Max-style aggregations by computing min/max on the list
of parsed versions instead of DB lexical aggregates, and update any key=lambda
x: x.get(sort_field) to parse that field with packaging.version.parse so all
comparisons/select-latest logic (including in methods that populate
self.version_info and the code paths around the earlier Min/Max usages) use
semantic ordering consistently.

---

Nitpick comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 134-136: The current per-row scan using analyzer.version_info in
get_first_version_released_after is causing repeated full scans; precompute and
cache a sorted list/array of release timestamps (and parallel version refs) for
each analyzer (e.g., build_release_timestamps or a cached attribute on Analyzer)
and replace the linear scan with a binary search (bisect) lookup using commit_dt
to find the first release after the commit; update calls (the lookup at
commit_dt and the similar block around the other occurrence) to use the new
precomputed index or change get_first_version_released_after to accept the
pre-sorted timestamps argument so the hot path uses O(log n) instead of O(n).

In `@boost_library_usage_dashboard/analyzer_metrics.py`:
- Line 136: The zip call used to compute sum_xy (sum_xy = sum(x * y for x, y in
zip(x_values, ratios))) should be made defensive by adding strict=True to the
zip invocation to ensure x_values and ratios are the same length at runtime;
update the generator expression to use zip(x_values, ratios, strict=True) in the
sum_xy computation (and run tests or ensure the project targets Python 3.10+
where strict is supported).
- Around line 22-35: The large BoostUsage query assigned to rows is iterated
while accumulating library_year_data, library_top_repo, and library_headers,
which can cause memory pressure for huge datasets; update the loop to process
results in bounded batches (e.g., use
BoostUsage.objects.filter(...).iterator(chunk_size=...) or implement manual
chunking with .order_by(...).values_list('pk', flat=True) and slicing) and
ensure each batch updates library_year_data, library_top_repo, and
library_headers incrementally and releases references before loading the next
batch so memory usage stays bounded; reference the rows variable,
BoostUsage.select_related(...), analyzer.stars_min_threshold, and the
accumulator structures when making this change.

In `@boost_library_usage_dashboard/analyzer_output.py`:
- Around line 80-96: The code calls the protected method
analyzer._filter_and_sort_libraries to build
dashboard_data["metrics_by_library"]; make this call use a public API or shared
utility instead: either rename and expose _filter_and_sort_libraries as
filter_and_sort_libraries on the Analyzer class and update callers (including
this call), or extract the filtering/sorting logic into a new utility function
(e.g., lib_metrics.filter_and_sort_libraries) and import/use that here; update
any tests/imports accordingly and remove the noqa/disable comments once the
protected access is eliminated.
- Around line 62-70: The loop over version_counts in analyzer_output.py is too
permissive and silently skips malformed rows; update the logic in the block that
references version_counts, _version_tuple, repos_by_version_rows and min_key to
validate and handle unexpected row types: if row is a dict, extract keys like
"version", "created_at", "confirmed", "guessed"; if it is a list/tuple check
length>=4 as today; for any skipped or malformed row emit a debug/warn log that
includes the raw row content and reason for skipping so issues can be traced;
keep the existing behavior of appending (version, created_at, confirmed+guessed)
when values are valid and comparable to min_key.

In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`:
- Around line 94-96: The current flow removes publish_root then calls
clone_repo(target_repo, publish_root), which can leave the publish directory
deleted if cloning fails; change the logic to clone into a temporary directory
(e.g., create a temp path next to publish_root), call clone_repo(target_repo,
temp_dir), verify success, then atomically replace publish_root with temp_dir
(use shutil.move or os.replace for atomic rename on the same filesystem), and
ensure proper cleanup of temp_dir on failure; update references in this file to
use publish_root and clone_repo accordingly and wrap the sequence in
try/except/finally to avoid leaking temp directories.

In `@boost_library_usage_dashboard/models.py`:
- Around line 1-57: Add explicit read-only guards to prevent accidental writes
to these unmanaged mappings: implement and override save() and delete() on both
BoostExternalRepository and BoostUsage (and optionally bulk methods like
update/refresh_from_db if present) to raise a clear exception (e.g.,
RuntimeError or PermissionDenied) indicating the model is read-only; keep
Meta.managed = False as-is and ensure the exception message names the class
(BoostExternalRepository or BoostUsage) so callers know which model is
protected.

In `@boost_library_usage_dashboard/tests/test_command.py`:
- Around line 44-52: The test currently only checks that render_html was called
once but not that it received the correct output location; update the test to
assert the exact call arguments for render_html (e.g., that it was invoked with
the expected output_dir or HTML path produced by render_dashboard_html), so the
command wiring is validated. Locate the existing assertions around analyzer_cls,
fake_analyzer.run, write_report and render_html in the test and replace or
augment the render_html.assert_called_once() with a
render_html.assert_called_once_with(...) check that uses the test's
output_dir/expected HTML filename and any other expected params.

In `@boost_library_usage_dashboard/utils.py`:
- Around line 18-26: normalize_version_str currently calls .replace("boost-",
"") before .strip(), so inputs with leading/trailing whitespace like "
boost-1.82" won’t have the "boost-" prefix removed; update normalize_version_str
to strip whitespace first (e.g., call .strip() on version_str or its fallback
before removing the "boost-" prefix), then perform the "boost-" removal and the
subsequent replacements and checks so that prefixed versions with surrounding
spaces are normalized correctly.
- Around line 4-15: In _version_tuple, remove the redundant (version or "")
guard: since the function already returns on falsy version, directly call
version.strip().split(".") to parse parts; update the parts assignment in the
_version_tuple function and leave the rest of the logic (digit extraction,
padding to three components, and returning tuple) unchanged.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34257ca and 8a2d7a6.

📒 Files selected for processing (28)
  • boost_library_usage_dashboard/__init__.py
  • boost_library_usage_dashboard/analyzer.py
  • boost_library_usage_dashboard/analyzer_libraries.py
  • boost_library_usage_dashboard/analyzer_metrics.py
  • boost_library_usage_dashboard/analyzer_output.py
  • boost_library_usage_dashboard/apps.py
  • boost_library_usage_dashboard/dashboard_html.py
  • boost_library_usage_dashboard/dashboard_html_common.py
  • boost_library_usage_dashboard/dashboard_html_index.py
  • boost_library_usage_dashboard/dashboard_html_library.py
  • boost_library_usage_dashboard/management/__init__.py
  • boost_library_usage_dashboard/management/commands/__init__.py
  • boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py
  • boost_library_usage_dashboard/models.py
  • boost_library_usage_dashboard/renderer.py
  • boost_library_usage_dashboard/report.py
  • boost_library_usage_dashboard/tests/__init__.py
  • boost_library_usage_dashboard/tests/fixtures.py
  • boost_library_usage_dashboard/tests/test_analyzer.py
  • boost_library_usage_dashboard/tests/test_command.py
  • boost_library_usage_dashboard/tests/test_models.py
  • boost_library_usage_dashboard/tests/test_renderer.py
  • boost_library_usage_dashboard/tests/test_report.py
  • boost_library_usage_dashboard/tests/test_utils.py
  • boost_library_usage_dashboard/utils.py
  • config/settings.py
  • conftest.py
  • workflow/management/commands/run_all_collectors.py

Comment thread boost_library_usage_dashboard/dashboard_html_index.py Outdated
Comment thread boost_library_usage_dashboard/dashboard_html_index.py Outdated
@henry0816191
Copy link
Copy Markdown
Collaborator Author

@coderabbitai continue to review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 26, 2026

@henry0816191, I'll continue reviewing the changes in this PR.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
boost_library_usage_dashboard/dashboard_html.py (1)

27-28: Optional: Remove redundant fallback.

The or {} is redundant since data.get("libraries_page_data", {}) already returns {} when the key is missing.

♻️ Suggested simplification
-    for library_name in data.get("libraries_page_data", {}) or {}:
+    for library_name in data.get("libraries_page_data", {}):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/dashboard_html.py` around lines 27 - 28, The
for-loop uses a redundant fallback expression; replace the iteration target
`data.get("libraries_page_data", {}) or {}` with just
`data.get("libraries_page_data", {})` in the loop that calls
`build_library_page(data, library_name, lib_dir)` so the default empty dict from
get already handles the missing key and removes the unnecessary `or {}`.
boost_library_usage_dashboard/tests/test_analyzer.py (1)

10-19: Consider using pytest's tmp_path fixture instead.

While tempfile.gettempdir() resolves the security warning, using pytest's tmp_path fixture would provide automatic cleanup and isolation between test runs.

♻️ Alternative approach using tmp_path
-def _make_analyzer() -> BoostUsageDashboardAnalyzer:
+def _make_analyzer(tmp_path: Path | None = None) -> BoostUsageDashboardAnalyzer:
     analyzer = BoostUsageDashboardAnalyzer.__new__(BoostUsageDashboardAnalyzer)
-    analyzer.base_dir = Path(tempfile.gettempdir()) / "boost-dashboard-test-base"
-    analyzer.output_dir = Path(tempfile.gettempdir()) / "boost-dashboard-test-output"
+    base = tmp_path if tmp_path else Path(tempfile.gettempdir())
+    analyzer.base_dir = base / "boost-dashboard-test-base"
+    analyzer.output_dir = base / "boost-dashboard-test-output"
     analyzer.version_name_list = ["1.50.0", "1.51.0", "1.52.0", "1.53.0", "1.54.0"]

Then update tests that need actual directories to pass tmp_path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/tests/test_analyzer.py` around lines 10 - 19,
Replace the hardcoded tempfile.gettempdir() usage in the _make_analyzer helper
with pytest's tmp_path fixture: change _make_analyzer to accept a tmp_path
parameter (or create a wrapper that calls it) and set analyzer.base_dir and
analyzer.output_dir to subpaths of tmp_path (e.g., tmp_path /
"boost-dashboard-test-base" and tmp_path / "boost-dashboard-test-output");
update any tests that call _make_analyzer to pass the tmp_path fixture so each
test gets isolated, auto-cleaned directories for BoostUsageDashboardAnalyzer
instances, and avoid global tempdir usage.
boost_library_usage_dashboard/tests/test_command.py (1)

52-56: Minor: Simplify path handling.

The Path(str(tmp_path)).resolve() conversion is redundant since tmp_path is already a pathlib.Path. Similarly at line 112, Path(tmp_path) is unnecessary.

♻️ Suggested simplification
-    expected_output_dir = Path(str(tmp_path)).resolve()
+    expected_output_dir = tmp_path.resolve()

And at line 112:

-        output_dir=Path(tmp_path),
+        output_dir=tmp_path,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/tests/test_command.py` around lines 52 - 56,
The test unnecessarily wraps tmp_path in Path(...) twice; replace
Path(str(tmp_path)).resolve() with tmp_path.resolve() (or tmp_path if resolve
not needed) when setting expected_output_dir, and replace Path(tmp_path) at the
other occurrence (around line 112) with tmp_path directly; update assertions
that call render_html(...) to pass the resulting tmp_path/tmp_path.resolve() and
keep settings.BASE_DIR as-is so
render_html.assert_called_once_with(base_dir=settings.BASE_DIR,
output_dir=expected_output_dir) still matches.
boost_library_usage_dashboard/analyzer.py (1)

133-135: Prefer lazy % formatting for logger calls.

Using f-strings in logger calls evaluates the string even when the log level is disabled. Use % formatting for lazy evaluation.

Proposed fix
         logger.debug(
-            f"Loaded {len(libs)} libraries. Latest version: {latest_version_id}"
+            "Loaded %d libraries. Latest version: %s", len(libs), latest_version_id
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer.py` around lines 133 - 135, Replace
the f-string in the logger.debug call with lazy %-style formatting so the
message isn't eagerly evaluated; locate the logger.debug call that references
libs and latest_version_id (the line calling logger.debug(f"Loaded {len(libs)}
libraries. Latest version: {latest_version_id}")) and change it to use a format
string with % placeholders and pass len(libs) and latest_version_id as arguments
to logger.debug.
boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py (1)

127-132: Consider adding error handling for git push failures.

If the push fails (e.g., due to network issues or permission problems), the error message from subprocess.run(..., check=True) may not be user-friendly. The push() helper from git_ops is used which should handle this, but the commit step before it doesn't capture output for diagnostics.

Proposed improvement for better error messages
         commit_message = "Update Boost library usage dashboard artifacts"
-        subprocess.run(
-            ["git", "-C", str(publish_root), "commit", "-m", commit_message],
-            check=True,
-        )
+        try:
+            subprocess.run(
+                ["git", "-C", str(publish_root), "commit", "-m", commit_message],
+                check=True,
+                capture_output=True,
+                text=True,
+            )
+        except subprocess.CalledProcessError as e:
+            raise CommandError(f"Git commit failed: {e.stderr}") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`
around lines 127 - 132, The commit call using subprocess.run(["git", "-C",
str(publish_root), "commit", "-m", commit_message], check=True) lacks error
handling and doesn't capture stdout/stderr; wrap this in a try/except for
subprocess.CalledProcessError (or run with capture_output=True, text=True and
check=False) to collect stdout/stderr, then log or include those details in the
error message and surface a clear message before calling push(publish_root,
branch=branch) (use processLogger or the existing logging mechanism). Ensure the
handler references commit_message, publish_root, and branch and either
exits/raises after logging the detailed error so push() isn’t called on a failed
commit.
boost_library_usage_dashboard/utils.py (1)

4-15: Type annotation mismatch on return value.

The function is annotated to return tuple[int, int, int], but tuple(out[:3]) returns a generic tuple[int, ...]. This causes a type checker warning.

Proposed fix
-    return tuple(out[:3])
+    return (out[0], out[1], out[2])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/utils.py` around lines 4 - 15, The type checker
warns because tuple(out[:3]) yields a generic tuple type; update _version_tuple
to return an explicit 3-tuple to match the annotation: replace the final return
tuple(out[:3]) with an explicit construction like (out[0], out[1], out[2]) (or
use typing.cast(tuple[int,int,int], tuple(out[:3])) if you prefer a cast); this
ensures the return value from _version_tuple is statically typed as
tuple[int,int,int].
boost_library_usage_dashboard/models.py (1)

51-51: The field name excepted_at is intentional, not a typo. It correctly refers to marking records as "excepted" (excluded/marked as exception). This is confirmed by dedicated test coverage (test_boost_usage_excepted_at_accepts_valid_date) and consistent usage across the analyzer modules filtering for excepted_at__isnull=True.

However, "excepted" is an uncommon word choice that may confuse developers unfamiliar with the codebase. Consider adding a docstring to the model field or renaming to excluded_at or exception_marked_at for improved clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/models.py` at line 51, The field name
excepted_at is intentionally used to mark records as "excepted" but is unclear
to readers; add clarifying documentation rather than changing behavior: update
the models.py definition of the excepted_at DateField to include a
help_text/docstring (e.g., help_text="Timestamp when a record was marked as
excepted (excluded)"), and if you choose to rename the field to excluded_at or
exception_marked_at instead, update all references (tests like
test_boost_usage_excepted_at_accepts_valid_date and analyzer filters using
excepted_at__isnull=True) and run migrations and tests to keep behavior
consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 141-149: The current identity_name computation assumes
change.commit.account is non-null and will raise AttributeError if account is
None; update the logic in analyzer_libraries.py around the identity_name
assignment to first check whether change.commit.account is truthy (e.g.,
change.commit.account is not None) before accessing identity_id or any account
attributes, and if account is None return a sensible fallback such as "unknown";
ensure you reference change.commit.account.identity.display_name,
change.commit.account.display_name, change.commit.account.username and
change.commit.account.identity_id in the guarded branches so attribute access
only occurs when account exists.

In `@boost_library_usage_dashboard/report.py`:
- Around line 40-45: The loop directly indexes lib['name'], lib['repo_count'],
and lib['total_usage'], which can raise KeyError for malformed entries; change
those direct accesses to use .get() with safe defaults (e.g., name =
lib.get("name","N/A"), repo_count = lib.get("repo_count",0), total_usage =
lib.get("total_usage",0)) and then use these variables in the
report_lines.append call (keeping earliest/latest handling as-is) so the code is
defensive against missing keys.

---

Nitpick comments:
In `@boost_library_usage_dashboard/analyzer.py`:
- Around line 133-135: Replace the f-string in the logger.debug call with lazy
%-style formatting so the message isn't eagerly evaluated; locate the
logger.debug call that references libs and latest_version_id (the line calling
logger.debug(f"Loaded {len(libs)} libraries. Latest version:
{latest_version_id}")) and change it to use a format string with % placeholders
and pass len(libs) and latest_version_id as arguments to logger.debug.

In `@boost_library_usage_dashboard/dashboard_html.py`:
- Around line 27-28: The for-loop uses a redundant fallback expression; replace
the iteration target `data.get("libraries_page_data", {}) or {}` with just
`data.get("libraries_page_data", {})` in the loop that calls
`build_library_page(data, library_name, lib_dir)` so the default empty dict from
get already handles the missing key and removes the unnecessary `or {}`.

In
`@boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py`:
- Around line 127-132: The commit call using subprocess.run(["git", "-C",
str(publish_root), "commit", "-m", commit_message], check=True) lacks error
handling and doesn't capture stdout/stderr; wrap this in a try/except for
subprocess.CalledProcessError (or run with capture_output=True, text=True and
check=False) to collect stdout/stderr, then log or include those details in the
error message and surface a clear message before calling push(publish_root,
branch=branch) (use processLogger or the existing logging mechanism). Ensure the
handler references commit_message, publish_root, and branch and either
exits/raises after logging the detailed error so push() isn’t called on a failed
commit.

In `@boost_library_usage_dashboard/models.py`:
- Line 51: The field name excepted_at is intentionally used to mark records as
"excepted" but is unclear to readers; add clarifying documentation rather than
changing behavior: update the models.py definition of the excepted_at DateField
to include a help_text/docstring (e.g., help_text="Timestamp when a record was
marked as excepted (excluded)"), and if you choose to rename the field to
excluded_at or exception_marked_at instead, update all references (tests like
test_boost_usage_excepted_at_accepts_valid_date and analyzer filters using
excepted_at__isnull=True) and run migrations and tests to keep behavior
consistent.

In `@boost_library_usage_dashboard/tests/test_analyzer.py`:
- Around line 10-19: Replace the hardcoded tempfile.gettempdir() usage in the
_make_analyzer helper with pytest's tmp_path fixture: change _make_analyzer to
accept a tmp_path parameter (or create a wrapper that calls it) and set
analyzer.base_dir and analyzer.output_dir to subpaths of tmp_path (e.g.,
tmp_path / "boost-dashboard-test-base" and tmp_path /
"boost-dashboard-test-output"); update any tests that call _make_analyzer to
pass the tmp_path fixture so each test gets isolated, auto-cleaned directories
for BoostUsageDashboardAnalyzer instances, and avoid global tempdir usage.

In `@boost_library_usage_dashboard/tests/test_command.py`:
- Around line 52-56: The test unnecessarily wraps tmp_path in Path(...) twice;
replace Path(str(tmp_path)).resolve() with tmp_path.resolve() (or tmp_path if
resolve not needed) when setting expected_output_dir, and replace Path(tmp_path)
at the other occurrence (around line 112) with tmp_path directly; update
assertions that call render_html(...) to pass the resulting
tmp_path/tmp_path.resolve() and keep settings.BASE_DIR as-is so
render_html.assert_called_once_with(base_dir=settings.BASE_DIR,
output_dir=expected_output_dir) still matches.

In `@boost_library_usage_dashboard/utils.py`:
- Around line 4-15: The type checker warns because tuple(out[:3]) yields a
generic tuple type; update _version_tuple to return an explicit 3-tuple to match
the annotation: replace the final return tuple(out[:3]) with an explicit
construction like (out[0], out[1], out[2]) (or use
typing.cast(tuple[int,int,int], tuple(out[:3])) if you prefer a cast); this
ensures the return value from _version_tuple is statically typed as
tuple[int,int,int].

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34257ca and 744b9a1.

📒 Files selected for processing (28)
  • boost_library_usage_dashboard/__init__.py
  • boost_library_usage_dashboard/analyzer.py
  • boost_library_usage_dashboard/analyzer_libraries.py
  • boost_library_usage_dashboard/analyzer_metrics.py
  • boost_library_usage_dashboard/analyzer_output.py
  • boost_library_usage_dashboard/apps.py
  • boost_library_usage_dashboard/dashboard_html.py
  • boost_library_usage_dashboard/dashboard_html_common.py
  • boost_library_usage_dashboard/dashboard_html_index.py
  • boost_library_usage_dashboard/dashboard_html_library.py
  • boost_library_usage_dashboard/management/__init__.py
  • boost_library_usage_dashboard/management/commands/__init__.py
  • boost_library_usage_dashboard/management/commands/run_boost_library_usage_dashboard.py
  • boost_library_usage_dashboard/models.py
  • boost_library_usage_dashboard/renderer.py
  • boost_library_usage_dashboard/report.py
  • boost_library_usage_dashboard/tests/__init__.py
  • boost_library_usage_dashboard/tests/fixtures.py
  • boost_library_usage_dashboard/tests/test_analyzer.py
  • boost_library_usage_dashboard/tests/test_command.py
  • boost_library_usage_dashboard/tests/test_models.py
  • boost_library_usage_dashboard/tests/test_renderer.py
  • boost_library_usage_dashboard/tests/test_report.py
  • boost_library_usage_dashboard/tests/test_utils.py
  • boost_library_usage_dashboard/utils.py
  • config/settings.py
  • conftest.py
  • workflow/management/commands/run_all_collectors.py

Comment thread boost_library_usage_dashboard/analyzer_libraries.py
Comment thread boost_library_usage_dashboard/report.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
boost_library_usage_dashboard/report.py (2)

78-83: Tuple unpacking assumes exactly 4 elements.

The loop unpacks each item as (version, created_at, confirmed_count, no_confirmed_count). If distribution_by_version contains malformed entries with a different number of elements, this will raise ValueError.

This is likely acceptable if the data source is trusted, but consider adding a try/except or validation if robustness against malformed data is desired.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/report.py` around lines 78 - 83, The loop over
version_stats["distribution_by_version"] currently unpacks each item as
(version, created_at, confirmed_count, no_confirmed_count) which fails on
malformed tuples; update the loop in the same block (iterating
version_stats["distribution_by_version"] and appending to report_lines) to
validate each item before unpacking (e.g., check len(item) == 4) or wrap the
unpack in a try/except that logs the bad entry and continues so malformed
entries don’t raise ValueError and stop the report generation.

29-32: Consider defensive access for required stats keys.

Lines 29-32 directly access stats['total_repositories'], stats['affected_repositories'], etc., which will raise KeyError if missing. While these are core stats and failing fast may be intentional, this is inconsistent with the defensive .get() pattern used in lines 40-48.

If failing fast is intended, consider adding explicit validation at the function entry to provide a clearer error message.

Option A: Defensive access with defaults
-        f"- **Total Repositories (≥{stars_min_threshold} stars)**: {stats['total_repositories']:,}",
-        f"- **Repositories Using System Boost**: {stats['affected_repositories']:,}",
-        f"- **Total Boost Libraries**: {stats['total_libraries']:,}",
-        f"- **Total Usage Records**: {stats['total_usage_records']:,}",
+        f"- **Total Repositories (≥{stars_min_threshold} stars)**: {stats.get('total_repositories', 0):,}",
+        f"- **Repositories Using System Boost**: {stats.get('affected_repositories', 0):,}",
+        f"- **Total Boost Libraries**: {stats.get('total_libraries', 0):,}",
+        f"- **Total Usage Records**: {stats.get('total_usage_records', 0):,}",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/report.py` around lines 29 - 32, The code
directly indexes stats['total_repositories'], stats['affected_repositories'],
stats['total_libraries'], and stats['total_usage_records'] which can raise
KeyError and is inconsistent with later .get() usage; fix by either replacing
these direct accesses with defensive lookups (e.g., use
stats.get('total_repositories', 0) etc.) or add an explicit validation at the
start of the function that contains this block to assert/raise a clear
ValueError if any of those required keys are missing (mentioning the missing key
names). Ensure the change is applied where the four lines are constructed so the
rest of the function can rely on consistent, well-defined values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_library_usage_dashboard/report.py`:
- Around line 62-65: In the never-used libraries loop, change the direct access
lib['name'] to a defensive access like lib.get('name') (with a sensible default
such as 'N/A') so it matches the defensive pattern used in the top libraries
loop (see the block around lines 43-45); update the f-string in the
report_lines.append inside that loop to use lib.get('name') (or 'N/A') instead
of lib['name'] to avoid KeyError on missing keys.

---

Nitpick comments:
In `@boost_library_usage_dashboard/report.py`:
- Around line 78-83: The loop over version_stats["distribution_by_version"]
currently unpacks each item as (version, created_at, confirmed_count,
no_confirmed_count) which fails on malformed tuples; update the loop in the same
block (iterating version_stats["distribution_by_version"] and appending to
report_lines) to validate each item before unpacking (e.g., check len(item) ==
4) or wrap the unpack in a try/except that logs the bad entry and continues so
malformed entries don’t raise ValueError and stop the report generation.
- Around line 29-32: The code directly indexes stats['total_repositories'],
stats['affected_repositories'], stats['total_libraries'], and
stats['total_usage_records'] which can raise KeyError and is inconsistent with
later .get() usage; fix by either replacing these direct accesses with defensive
lookups (e.g., use stats.get('total_repositories', 0) etc.) or add an explicit
validation at the start of the function that contains this block to assert/raise
a clear ValueError if any of those required keys are missing (mentioning the
missing key names). Ensure the change is applied where the four lines are
constructed so the rest of the function can rely on consistent, well-defined
values.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 744b9a1 and 9244306.

📒 Files selected for processing (2)
  • boost_library_usage_dashboard/analyzer_libraries.py
  • boost_library_usage_dashboard/report.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • boost_library_usage_dashboard/analyzer_libraries.py

Comment thread boost_library_usage_dashboard/report.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
boost_library_usage_dashboard/analyzer_libraries.py (1)

134-136: Optimize version lookup in the hot loop.

get_first_version_released_after() linearly scans version_info for every file change. On large datasets this becomes a dominant cost. Precompute sorted release timestamps once and use binary search.

♻️ Suggested refactor
+from bisect import bisect_right
 ...
 def collect_commit_info_by_library(analyzer: Any) -> dict[str, Any]:
@@
+    release_points = sorted(
+        (v.version_created_at, v.version)
+        for v in analyzer.version_info
+        if v.version_created_at is not None
+    )
+    release_times = [ts for ts, _ in release_points]
@@
-        version = get_first_version_released_after(analyzer.version_info, commit_dt)
+        version = get_first_version_released_after(commit_dt, release_points, release_times)
@@
-def get_first_version_released_after(
-    version_info: list[Any], commit_at: datetime | None
-) -> str | None:
+def get_first_version_released_after(
+    commit_at: datetime | None,
+    release_points: list[tuple[datetime, str]],
+    release_times: list[datetime],
+) -> str | None:
@@
-    candidates = [
-        v
-        for v in version_info
-        if v.version_created_at is not None and v.version_created_at > commit_at
-    ]
-    if not candidates:
-        return None
-    return min(candidates, key=lambda v: v.version_created_at).version
+    idx = bisect_right(release_times, commit_at)
+    return release_points[idx][1] if idx < len(release_points) else None

Also applies to: 171-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@boost_library_usage_dashboard/analyzer_libraries.py` around lines 134 - 136,
The current hot loop calls
get_first_version_released_after(analyzer.version_info, commit_dt) for every
change which scans version_info linearly; precompute a sorted list/array of
release timestamps (and corresponding versions) once (e.g., in Analyzer
initialization or before iterating changes) and replace the per-change linear
scan with a binary search (bisect) lookup using commit_dt to find the first
release after the commit; update the loop sites around commit_dt (references in
analyzer.version_info, get_first_version_released_after usage at the shown block
and the similar block at 171-184) to use the precomputed timestamps and a fast
index-to-version mapping instead of calling the linear helper repeatedly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@boost_library_usage_dashboard/report.py`:
- Around line 29-32: The four f-strings directly index the stats dict and can
raise KeyError; update them to use the same defensive .get() pattern used
elsewhere by replacing stats['total_repositories'],
stats['affected_repositories'], stats['total_libraries'], and
stats['total_usage_records'] with stats.get('total_repositories', 0),
stats.get('affected_repositories', 0), stats.get('total_libraries', 0), and
stats.get('total_usage_records', 0) respectively so missing keys default to 0
while preserving the thousand-separator formatting in the f-strings.

---

Nitpick comments:
In `@boost_library_usage_dashboard/analyzer_libraries.py`:
- Around line 134-136: The current hot loop calls
get_first_version_released_after(analyzer.version_info, commit_dt) for every
change which scans version_info linearly; precompute a sorted list/array of
release timestamps (and corresponding versions) once (e.g., in Analyzer
initialization or before iterating changes) and replace the per-change linear
scan with a binary search (bisect) lookup using commit_dt to find the first
release after the commit; update the loop sites around commit_dt (references in
analyzer.version_info, get_first_version_released_after usage at the shown block
and the similar block at 171-184) to use the precomputed timestamps and a fast
index-to-version mapping instead of calling the linear helper repeatedly.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 744b9a1 and df5da69.

📒 Files selected for processing (5)
  • boost_library_usage_dashboard/analyzer_libraries.py
  • boost_library_usage_dashboard/report.py
  • config/settings.py
  • conftest.py
  • workflow/management/commands/run_all_collectors.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • conftest.py
  • config/settings.py

Comment thread boost_library_usage_dashboard/report.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants