Improve ecosystem stats refresh resilience#344
Conversation
Code Review — PR #344: Improve ecosystem stats refresh resilienceOverviewThis PR adds two improvements to the ecosystem-stats pipeline task:
The structure is clear, the new methods are well-scoped, and XML parsing is hardened against XXE. Test coverage is a genuine positive. A few concerns are worth addressing before merge. Issues1.
|
79420c8 to
0ad0ff1
Compare
Code Review — PR #344: Improve ecosystem stats refresh resilienceOverviewThis PR adds two distinct features to the
The test coverage covers both paths. The XML parsing hardening (DTD prohibition) and OData escaping are solid. A few concerns below. Issues and Suggestions🟡 Behavioral default change —
|
0ad0ff1 to
6a998d9
Compare
Code Review — PR #344: Improve ecosystem stats refresh resilienceOverviewThis PR adds two resilience features to the
Both features are opt-in ( Issues & Suggestions
|
| Category | Status |
|---|---|
| Correctness | ✅ No functional bugs found |
| Security | ✅ XXE protected; OData injection prevented |
| Performance | HttpClient lifecycle needs attention |
| Test coverage | ✅ Three targeted tests; minor duplication |
| Conventions | ✅ Follows existing multi-alias config pattern |
| Line budget | linecount.js before merging |
The core logic is sound. The HttpClient lifetime is the one change I'd recommend making before merge; the rest are low-priority improvements.
6a998d9 to
a6d9e17
Compare
|
Code Review - PR 344: Improve ecosystem stats refresh resilience This PR adds two capabilities to the ecosystem-stats pipeline task: (1) PSGallery per-module fallback refresh when the owner-wide Gallery query times out, re-fetching modules individually via the OData v2 API; and (2) Project catalog telemetry sync that merges stats back into a project catalog JSON after each run. The implementation is well-structured. A few areas worth discussing: CORRECTNESS: The outer catch blocks for HttpRequestException and TaskCanceledException in TryRefreshPowerShellGalleryModulesById are effectively dead code - per-module exceptions are already caught per-task and converted to Warning strings, so Task.WhenAll rarely throws. Also, if the CancellationToken fires mid-batch, partial refresh results (some modules refreshed, others not) can be returned silently; a comment clarifying the best-effort intent would help. DESIGN: The static HttpClient with Timeout.InfiniteTimeSpan is the correct pattern. However the default for powerShellGalleryByIdRefreshTimeoutSeconds falls back to timeoutSeconds, coupling the two timeouts unless explicitly overridden - worth documenting. The sync-over-async GetAwaiter().GetResult() bridge is fine given the single-threaded pipeline runner with no SynchronizationContext. MergeProjectTelemetry and ProjectCatalogDocument are not visible in this diff - a cross-reference to their location would help reviewers. CODE QUALITY: (1) In SyncProjectCatalogTelemetryFromStats all three catch blocks (JsonException, IOException, UnauthorizedAccessException) share an identical body - collapse into a single type-pattern handler. (2) The null-forgiveness operator on result.Warning! is redundant after the IsNullOrWhiteSpace check. (3) summary.json now always emits a projectCatalogTelemetry object even when the feature is disabled - soft breaking change for strict-schema consumers. TESTS: (1) Tests for SyncProjectCatalogTelemetryFromStats, TryPreserveEcosystemSources, and BuildPowerShellGalleryPackageByIdUrl use NonPublic reflection with positional argument arrays. This is brittle - TryPreserveEcosystemSources already required a positional update (6 to 8 args) in this PR. Consider InternalsVisibleTo instead. (2) RunPipeline_EcosystemStats_SyncsProjectCatalogTelemetry uses timeoutSeconds=1 and a random non-existent GitHub org to force fallback. Potentially flaky in CI if PSGallery returns an empty success fast enough. SECURITY: DTD processing correctly disabled (DtdProcessing.Prohibit, XmlResolver=null) - no XXE risk. OData single-quote escaping in BuildPowerShellGalleryPackageByIdUrl is correct and verified by the dedicated unit test. No unsanitised user-controlled paths beyond what ResolvePath covers. SUMMARY: Solid, defensively-implemented feature. Actionable: (1) Collapse the three identical catch blocks. (2) Use InternalsVisibleTo over NonPublic reflection for private-method tests. (3) Document that powerShellGalleryByIdRefreshTimeoutSeconds can be set independently of timeoutSeconds. (4) Cross-reference MergeProjectTelemetry and ProjectCatalogDocument. Good work on the OData escaping test and secure XML reader settings. |
a6d9e17 to
783788f
Compare
Code Review — PR #344: Improve ecosystem stats refresh resilienceOverviewThis PR adds two resilience features to the
Both features are opt-in ( Code Quality & StyleStrengths:
Specific Issues1. Dead catch clauses in
|
783788f to
71ba1b3
Compare
Code Review — PR #344: Improve ecosystem stats refresh resilienceOverviewThis PR adds two independent features to the
Both features are gated behind new Code Quality & Best PracticesStrengths
Concerns 1. Missing error isolation for optional catalog sync (medium)// ExecuteEcosystemStats — no try/catch around this block
if (syncProjectCatalogTelemetry)
{
projectCatalogTelemetryMerged = SyncProjectCatalogTelemetryFromStats(
outputPath, projectCatalogPath, projectCatalogPublishPath,
out projectCatalogTelemetryWarning);
}
2. Undocumented timeout clamping interaction (minor)var boundedTimeoutSeconds = Math.Clamp(timeoutSeconds, 5, 300);When 3. Reflection-based private-method test is fragile (minor)var syncMethod = typeof(WebPipelineRunner)
.GetMethod("SyncProjectCatalogTelemetryFromStats", BindingFlags.NonPublic | BindingFlags.Static);
var args = new object?[] { statsPath, catalogPath, publishedCatalogPath, null };
var merged = Assert.IsType<int>(syncMethod!.Invoke(null, args));
Assert.Null(args[3]); // checks the out-warningReflection tests are fragile across refactors and provide no compile-time safety. Since 4. Duplicated JSON fixtures across tests (minor)The Missing Test CoverageThe following paths have no automated coverage:
The per-ID fallback refresh is the headline feature of this PR; not having a test that exercises it (even with a pre-canned XML response) is a gap worth closing before shipping. SummaryThe implementation is thoughtful — security is well handled (XXE, OData injection), the async-to-sync bridge is correctly contained, and the feature flags keep the default behaviour stable. The two items worth addressing before merge:
Everything else is minor polish. |
Summary
Validation
WebPipelineRunnerEcosystemStatsTests|FullyQualifiedNameWebEcosystemStatsGeneratorTests"