feat: configurable overview script#58
Conversation
AlexanderLanin
left a comment
There was a problem hiding this comment.
AI-assisted review (Claude):
Critical: 0 | Important: 1 | Suggestions: 2
Good PR overall — the generalization from hardcoded docs-as-code / daily-workflow constants to config-driven deps and signals is clean, well-tested, and the TOML schema is straightforward. The JS sort verified generic (column-index based), so dynamic dep/signal columns sort correctly.
One correctness issue to fix before merge, two design observations worth discussing.
| current_signal_labels = tuple( | ||
| s.label for s in config.workflow_signals | ||
| ) | ||
| if current_signal_labels == existing_snapshot.workflow_signal_labels: |
There was a problem hiding this comment.
AI-assisted review (Claude):
[Important] Cache invalidation compares labels only — reference-string changes are invisible
The comparison uses only the labels stored in the snapshot:
current_signal_labels = tuple(s.label for s in config.workflow_signals)
if current_signal_labels == existing_snapshot.workflow_signal_labels:workflow_signal_labels in the snapshot contains only human-readable labels, not reference strings. If an operator changes a reference (e.g. pins daily.yml@v1 → daily.yml@v2) while keeping the same label, the cache is not invalidated, and every repo keeps its stale matched_workflow_signals from the previous run.
Fix: store the reference strings in the snapshot alongside the labels and include them in the comparison.
There was a problem hiding this comment.
AI fix-commit (Claude): ceb882e — RepoSnapshot.workflow_signal_labels replaced by workflow_signals: tuple[WorkflowSignal, ...]; the snapshot now stores both label and reference, and cache invalidation compares the full tuple.
There was a problem hiding this comment.
Pull request overview
Replaces hard-coded organization assumptions (eclipse-score, docs-as-code, cicd-workflows, bazel_registry, reference_integration) with a TOML-driven org_config.toml. Adds OrgConfig, generic TrackedDep and WorkflowSignal models, refactors signal detection and renderers to be data-driven from the config, and bumps the snapshot schema to 17.
Changes:
- New
org_config.pymodule +org_config.tomldefining org name, repo filters, tracked Bazel deps, workflow signals, registry/ref-int repos;collectCLI now requires--org-config. - Generalized
docs_as_code_version/uses_cicd_daily_workflowintobazel_depslookups andmatched_workflow_signals; metrics/HTML renderers iterate over configured deps and signal labels. - Cache invalidation when workflow signal labels change; reference-integration org name and registry repo lookup made config-driven.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Unrelated downgrades of pygments/pytest/urllib3 |
| tests/test_repo_overview.py | Updates tests for new models, adds cache-invalidation and multi-dep tests |
| tests/test_org_config.py | New test module covering OrgConfig loading/validation |
| tests/test_generator.py | Threads org_name through render_readme calls |
| tests/test_cli_render.py | Adds tracked-dep detail-page rendering test |
| src/generate_repo_overview/org_config.py | New OrgConfig dataclass + TOML loader/validator |
| src/generate_repo_overview/models.py | Adds TrackedDep, WorkflowSignal, schema v17, removes legacy fields |
| src/generate_repo_overview/metrics_report.py | Renders Versions/Automation tables from config-driven deps/signals |
| src/generate_repo_overview/metrics_html.py | Passes tracked-dep latest versions through to HTML pages |
| src/generate_repo_overview/_html_index.py | Index page rebuilt around tracked deps / signal labels |
| src/generate_repo_overview/_html_detail.py | Detail page renders per-dep version badges and signal rows |
| src/generate_repo_overview/_html_common.py | version_badge arg renamed latest_dac→latest_dep_version |
| src/generate_repo_overview/collector/init.py | Threads OrgConfig through collection; conditional registry/ref-int fetches; cache invalidation on signal label change |
| src/generate_repo_overview/collector/signal_detection.py | Replaces daily-workflow detection with multi-signal matcher |
| src/generate_repo_overview/collector/reference_integration.py | Takes org_name parameter instead of hard-coded value |
| src/generate_repo_overview/collector/repo_entry.py | Threads workflow_signals into deep inspection |
| src/generate_repo_overview/collector/traceability.py | Generalizes docs-as-code repo detection to tracked deps; drops pr-484 fallback |
| src/generate_repo_overview/cli.py | Adds --org-config arg, loads OrgConfig |
| src/generate_repo_overview/constants.py | Removes DEFAULT_ORG |
| src/generate_repo_overview/profile_readme.py | Makes org_name a required kw-only arg |
| README.md / AGENTS.md / CLAUDE.md / docs/repo-overview-tool-design.md / src/generate_repo_overview/README.md | Doc updates for --org-config |
| .github/workflows/refresh-profile.yml | Passes --org-config org_config.toml |
| org_config.toml | New default org configuration file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _parse_tracked_deps(value: object) -> tuple[TrackedDep, ...]: | ||
| if not isinstance(value, list): | ||
| return () | ||
| result: list[TrackedDep] = [] | ||
| for item in cast("list[Any]", value): | ||
| if not isinstance(item, dict): | ||
| continue | ||
| repo = item.get("repo") | ||
| module_name = item.get("module_name") | ||
| if ( | ||
| isinstance(repo, str) | ||
| and repo.strip() | ||
| and isinstance(module_name, str) | ||
| and module_name.strip() | ||
| ): | ||
| result.append(TrackedDep(repo=repo.strip(), module_name=module_name.strip())) | ||
| return tuple(result) |
There was a problem hiding this comment.
AI fix-commit (Claude): ceb882e — _parse_tracked_deps in org_config.py now rejects entries where repo doesn't contain a /, consistent with the validation for reference_integration_repo and registry_repo.
| tracked_deps=tuple( | ||
| dep | ||
| for item in (data.get("tracked_deps") or ()) | ||
| if isinstance(item, dict) | ||
| and (dep := TrackedDep.from_dict(cast("Mapping[str, Any]", item))) | ||
| and dep.repo | ||
| and dep.module_name | ||
| ), |
There was a problem hiding this comment.
AI fix-commit (Claude): ceb882e — replaced the walrus-as-filter pattern in RepoSnapshot.from_dict with explicit isinstance checks; the walrus was always truthy because frozen dataclass instances are never falsy.
| cached_by_name: dict[str, RepoEntry] = {} | ||
| if existing_snapshot is not None: | ||
| current_signal_labels = tuple( | ||
| s.label for s in config.workflow_signals | ||
| ) | ||
| if current_signal_labels == existing_snapshot.workflow_signal_labels: | ||
| cached_by_name = {repo.name: repo for repo in existing_snapshot.repos} | ||
| else: | ||
| print_status( | ||
| "Workflow signal definitions changed — ignoring content cache", | ||
| prefix=status_prefix, | ||
| ) |
There was a problem hiding this comment.
AI fix-commit (Claude): ceb882e — same fix: snapshot now stores full WorkflowSignal objects; changing a reference string correctly invalidates the content cache.
|
|
||
| def is_docs_as_code_repo(entry: RepoEntry) -> bool: | ||
| return bool(entry.content.docs_as_code_version) or entry.name == _DAC_REPO_NAME | ||
| def is_tracked_dep_repo( |
There was a problem hiding this comment.
AI-assisted review (Claude):
[Suggestion] _is_tracked_dep_repo is duplicated — same logic in two files
collector/traceability.py:is_tracked_dep_repo and _html_index.py:_is_tracked_dep_repo contain identical logic. If the definition of "tracked dep repo" ever changes (e.g. matching on org prefix too), it has to be updated in both places and tests only cover one of them.
Move the function to models.py (or metrics_report.py) and import it in both call sites.
There was a problem hiding this comment.
AI fix-commit (Claude): ceb882e — is_tracked_dep_repo moved into models.py alongside the data it operates on; both _html_index.py and collector/traceability.py now import from there.
| ) -> dict[str, tuple[TraceabilityTypeMetrics, ...]]: | ||
| dac_repos = [r for r in repos if is_docs_as_code_repo(r)] | ||
| if not dac_repos: | ||
| eligible_repos = [r for r in repos if is_tracked_dep_repo(r, tracked_deps)] |
There was a problem hiding this comment.
AI-assisted review (Claude):
[Suggestion] Traceability fetching is still docs-as-code-specific, but now runs for all tracked deps
The URL being probed — https://{org_name}.github.io/{repo_name}/main/metrics.json — only exists for repos that publish docs-as-code traceability data. With the current config (one tracked dep = docs-as-code) this works fine.
But as soon as a second tracked dep is added (e.g. a Bazel toolchain), repos that use that dep will appear in the Traceability section with no data, shown as "not reporting." The section title and summary stats will be confusing to readers who do not know which dep triggers it.
Consider either keeping the traceability filter explicitly tied to docs-as-code (separate from tracked deps), or adding a config flag like traceability_dep to org_config.toml so only the intended dep drives this section.
There was a problem hiding this comment.
mtrics.json is provided by docs-as-code tooling.
There was a problem hiding this comment.
AI fix-commit (Claude): No code change. The user confirmed that metrics.json is published by all repos running docs-as-code tooling, so the filter on tracked-dep repos is correct as-is.
- Store full WorkflowSignal (label + reference) in RepoSnapshot instead of labels-only; cache invalidation now compares both fields so changing a reference string correctly busts the content cache - Consolidate is_tracked_dep_repo into models.py; remove the duplicate copy from _html_index.py and the one from collector/traceability.py - Validate that tracked_deps[].repo is in org/repo format in org_config, matching the existing validation for reference_integration_repo - Fix always-truthy walrus expression in RepoSnapshot.from_dict; filter tracked_deps with explicit isinstance checks instead - Bump SNAPSHOT_SCHEMA_VERSION 17 → 18
AlexanderLanin
left a comment
There was a problem hiding this comment.
AI-assisted review (Claude):
All findings from this review resolved in ceb882e. Summary of what changed:
Cache invalidation (my comment + external): RepoSnapshot.workflow_signal_labels: tuple[str, ...] replaced by workflow_signals: tuple[WorkflowSignal, ...]. The snapshot now stores both label and reference for each signal. The cache comparison uses the full tuple, so changing a reference string (e.g. pinning daily.yml@v1 → daily.yml@v2) correctly busts the content cache. Schema version bumped 17 → 18.
Walrus-as-filter (external): RepoSnapshot.from_dict was filtering tracked_deps with (dep := TrackedDep.from_dict(...)) and dep.repo and dep.module_name. Frozen dataclass instances are always truthy, so the walrus clause did nothing. Replaced with explicit isinstance guards on the raw dict fields before constructing the object.
tracked_deps[].repo validation (external): _parse_tracked_deps in org_config.py now rejects entries where repo doesn't contain a /, matching the existing validation applied to reference_integration_repo and registry_repo.
is_tracked_dep_repo duplication (my suggestion): Identical function existed in both collector/traceability.py and _html_index.py. Moved to models.py next to the data it operates on; both callers import from there.
Traceability scope (my suggestion, no fix): Closed as non-issue — metrics.json is published by all repos running docs-as-code tooling, so filtering the traceability section by tracked-dep repos is correct.
Introduce a configuration file for the overview script, allowing users to specify organization settings and tracked dependencies. Update the collection and rendering commands to utilize this configuration, enhancing flexibility and usability.