Skip to content

Mobile failure scanner workflow misclassifies failures and misses context — pilot arcade-skills plugin as verifier #127866

@steveisok

Description

@steveisok

Context

The "Mobile Platform Failure Scanner" agentic workflow (.github/workflows/mobile-scan.md, run 25430707131) auto-files tracking issues for CI failures. Issue #127859 (filed by this scanner against runtime-diagnostics def 309) is a useful case study because it exhibits three concrete defects that are likely to recur:

  1. Sample window too narrow. The issue cited 5 builds and "past week"; the failure has actually been 100% red on main since at least build 1390492 (Apr 21) — over 2 weeks. The scanner's caveat ("computed within the scanned window and may not be the true origin") is correct but the upstream ~20 builds look-back undersamples persistent failures.
  2. Recommended fix doesn't reflect existing code. The issue's "preferred" fix was "split the Helix payload into per-platform jobs" — but cdac-dump-xplat-test-helix.proj already does exactly that and the file's header comment says so explicitly. The scanner did not read the cited file before recommending a fix.
  3. Mis-routed area label. The failure is in Microsoft.DotNet.Helix.Sdk / arcade payload upload (MemoryStream 2 GiB ceiling in DirectoryPayload.DoUploadAsync), not a cDAC product issue, but the issue is labeled area-Diagnostics-coreclr. The scanner has no notion of "this stack trace points at arcade infrastructure, not the test under test".

Bonus oddity: this issue was filed by mobile-scan even though runtime-diagnostics (def 309) is not a mobile pipeline.

Proposal

Pilot the dotnet-dnceng skills plugin as a second-opinion verifier before the workflow files an issue:

  • ci-analysis would catch the area mis-label by running stack-trace → owner mapping.
  • pipeline-investigation is the correct route for non-Helix-test errors (build-time MSBuild task failures like this one) and would have surfaced the "Send cDAC X-Plat Dump Tests to Helix (Unix)" timeline record with its succeededWithIssues result and proper recordId/logId.
  • known-issue-history would compute a real failure-rate baseline by mining the build-analysis bot's hit-count edits, instead of approximating from a 20-build slice.
  • The CiInvestigator agent at plugins/dotnet-dnceng/agents/CiInvestigator.agent.md already encodes the routing this scanner is missing.

Obstacles to integration

The skills plugin isn't a drop-in: it depends on MCP servers (hlx, maestro, mcp-binlog-tool, mihubot) and the gh/az CLIs, while the gh-aw runtime today has a strict bash allowlist (no gh, no pwsh, no python, no $(...)) and doesn't currently load MCP servers. Two paths:

  • (A) Wire MCP servers into the gh-aw engine config.
  • (B) Port the skill scripts under plugins/dotnet-dnceng/skills/*/scripts/ to the workflow's allowlist (most are bash + curl + jq today).

Recommended next step

Run ci-analysis + pipeline-investigation against the proposed issue body before the workflow files it. Even if the workflow keeps producing the body itself, this gate would have rejected #127859 for defects (2) and (3) above. Lower effort than full migration, gives concrete signal on whether deeper integration is worth doing.

cc @steveisok @dotnet/runtime-infrastructure

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions