Mobile failure scanner workflow misclassifies failures and misses context — pilot arcade-skills plugin as verifier

### Context

The "Mobile Platform Failure Scanner" agentic workflow (`.github/workflows/mobile-scan.md`, run [25430707131](https://github.com/dotnet/runtime/actions/runs/25430707131)) auto-files tracking issues for CI failures. Issue #127859 (filed by this scanner against `runtime-diagnostics` def 309) is a useful case study because it exhibits three concrete defects that are likely to recur:

1. **Sample window too narrow.** The issue cited 5 builds and "past week"; the failure has actually been 100% red on `main` since at least build [1390492](https://dev.azure.com/dnceng-public/public/_build/results?buildId=1390492) (Apr 21) — over 2 weeks. The scanner's caveat ("computed within the scanned window and may not be the true origin") is correct but the upstream `~20 builds` look-back undersamples persistent failures.
2. **Recommended fix doesn't reflect existing code.** The issue's "preferred" fix was *"split the Helix payload into per-platform jobs"* — but [`cdac-dump-xplat-test-helix.proj`](https://github.com/dotnet/runtime/blob/main/src/native/managed/cdac/tests/DumpTests/cdac-dump-xplat-test-helix.proj) already does exactly that and the file's header comment says so explicitly. The scanner did not read the cited file before recommending a fix.
3. **Mis-routed area label.** The failure is in `Microsoft.DotNet.Helix.Sdk` / arcade payload upload (`MemoryStream` 2 GiB ceiling in `DirectoryPayload.DoUploadAsync`), not a cDAC product issue, but the issue is labeled `area-Diagnostics-coreclr`. The scanner has no notion of "this stack trace points at arcade infrastructure, not the test under test".

Bonus oddity: this issue was filed by `mobile-scan` even though `runtime-diagnostics` (def 309) is not a mobile pipeline.

### Proposal

Pilot the [`dotnet-dnceng` skills plugin](https://github.com/dotnet/arcade-skills/tree/main/plugins/dotnet-dnceng) as a *second-opinion verifier* before the workflow files an issue:

- **`ci-analysis`** would catch the area mis-label by running stack-trace → owner mapping.
- **`pipeline-investigation`** is the correct route for non-Helix-test errors (build-time MSBuild task failures like this one) and would have surfaced the "Send cDAC X-Plat Dump Tests to Helix (Unix)" timeline record with its `succeededWithIssues` result and proper recordId/logId.
- **`known-issue-history`** would compute a real failure-rate baseline by mining the build-analysis bot's hit-count edits, instead of approximating from a 20-build slice.
- The `CiInvestigator` agent at [`plugins/dotnet-dnceng/agents/CiInvestigator.agent.md`](https://github.com/steveisok/arcade-skills/blob/main/plugins/dotnet-dnceng/agents/CiInvestigator.agent.md) already encodes the routing this scanner is missing.

### Obstacles to integration

The skills plugin isn't a drop-in: it depends on MCP servers (`hlx`, `maestro`, `mcp-binlog-tool`, `mihubot`) and the `gh`/`az` CLIs, while the gh-aw runtime today has a strict bash allowlist (no `gh`, no `pwsh`, no `python`, no `$(...)`) and doesn't currently load MCP servers. Two paths:

- (A) Wire MCP servers into the gh-aw engine config.
- (B) Port the skill scripts under `plugins/dotnet-dnceng/skills/*/scripts/` to the workflow's allowlist (most are bash + curl + jq today).

### Recommended next step

Run `ci-analysis` + `pipeline-investigation` against the *proposed issue body* before the workflow files it. Even if the workflow keeps producing the body itself, this gate would have rejected #127859 for defects (2) and (3) above. Lower effort than full migration, gives concrete signal on whether deeper integration is worth doing.

cc @steveisok @dotnet/runtime-infrastructure


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mobile failure scanner workflow misclassifies failures and misses context — pilot arcade-skills plugin as verifier #127866

Context

Proposal

Obstacles to integration

Recommended next step

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Mobile failure scanner workflow misclassifies failures and misses context — pilot arcade-skills plugin as verifier #127866

Description

Context

Proposal

Obstacles to integration

Recommended next step

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions