Add repo health check agentic workflows by JanKrivanek · Pull Request #7583 · dotnet/machinelearning

JanKrivanek · 2026-03-06T19:18:39Z

Adds the 3-tier repo health monitoring system:

Orchestrator (
epo-health-check) — daily at 6:00 UTC, collects issues/PRs/CI data, maintains dashboard issue
Investigator (
epo-health-investigate) — dispatched for critical findings, deep-dive analysis
Groomer (
epo-health-groom) — daily at 9:00 UTC, links results, hides stale comments

Also includes .github/health-baseline.md\ with 24 known P0/P1 issues and 6 long-running PRs.

… and known baseline

Copilot

Pull request overview

This PR adds a 3-tier agentic repo health monitoring system to the dotnet/machinelearning repository using GitHub's gh-aw (agentic workflows) framework. The system automatically monitors issues, PRs, and CI pipelines, and takes automated actions on the dashboard.

Changes:

Adds an Orchestrator workflow (repo-health-check) that runs daily at 6:00 UTC, collects health data, diffs against previous runs, updates a pinned dashboard issue, and dispatches investigators for critical findings.
Adds an Investigator workflow (repo-health-investigate) dispatched for critical/high findings to perform deep-dive analysis and post results back to the dashboard.
Adds a Groomer workflow (repo-health-groom) that runs daily at 9:00 UTC, links investigation results, hides stale comments, and enforces dashboard structure.
Adds .github/health-baseline.md cataloguing 24 known P0/P1 issues and 6 long-running PRs to suppress false positives.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`.github/workflows/repo-health-check.md`	Orchestrator prompt/playbook for the daily health check agentic workflow
`.github/workflows/repo-health-check.lock.yml`	Auto-generated compiled workflow for the orchestrator
`.github/workflows/repo-health-investigate.md`	Investigator prompt/playbook for deep-dive analysis
`.github/workflows/repo-health-investigate.lock.yml`	Auto-generated compiled workflow for the investigator
`.github/workflows/repo-health-groom.md`	Groomer prompt/playbook for dashboard maintenance
`.github/workflows/repo-health-groom.lock.yml`	Auto-generated compiled workflow for the groomer
`.github/health-baseline.md`	Known baseline of accepted issues/PRs to suppress from new finding alerts
`.github/aw/actions-lock.json`	Action version pins for gh-aw toolchain

Comments suppressed due to low confidence (2)

.github/workflows/repo-health-groom.md:155

The repo-health-groom.md playbook instructs the agent to hide comments using gh api graphql (line 145), but the compiled lock file's agent prompt explicitly states "The gh CLI is NOT authenticated. Do NOT use gh commands for GitHub operations." The agent cannot use gh api graphql for the minimize mutation; instead it should use the hide_comment safe-output tool (which is defined in the lock file's safe outputs config). The hide-comment operation in the playbook should reference the safe-output tool call rather than a gh command, otherwise the hide operation will silently fail or error on every run.

### Hide Operation

```bash
# Minimize comment (hide with reason)
gh api graphql -f query='
  mutation {
    minimizeComment(input: {
      subjectId: "COMMENT_NODE_ID",
      classifier: OUTDATED
    }) {
      minimizedComment { isMinimized }
    }
  }
'

**.github/workflows/repo-health-check.md:413**
* In `repo-health-check.md`, Step 5 instructs the agent to dispatch the investigator using `gh workflow run repo-health-investigate.lock.yml ...`. However, the gh CLI is not authenticated, so this command will fail. The agent should instead use the `repo_health_investigate` safe-output tool (which is defined in the lock file as a `dispatch_workflow` safe output) to trigger the investigation workflow.

# Budget: max 5 dispatches
DISPATCHED=0

for finding in critical_and_high_findings; do
  if [ $DISPATCHED -ge 5 ]; then
    break
  fi

  gh workflow run repo-health-investigate.lock.yml \
    --repo dotnet/machinelearning \
    -f finding_id="$FINDING_ID" \
    -f category="$CATEGORY" \
    -f severity="$SEVERITY" \
    -f summary="$SUMMARY" \
    -f health_issue_number="$ISSUE"

  DISPATCHED=$((DISPATCHED + 1))
done

</details>



---

💡 <a href="/dotnet/machinelearning/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.<br><br>You can also share your feedback on Copilot code review. [Take the survey](https://www.surveymonkey.com/r/XP6L3XJ).

Copilot · 2026-03-06T19:23:28Z

.github/workflows/repo-health-groom.md

+# Only if validation passes
+gh issue edit "$ISSUE" --repo dotnet/machinelearning --body "$UPDATED_BODY"


In repo-health-groom.md, Step 6 instructs the agent to apply the issue body update via gh issue edit ... (line 204). Since the gh CLI is not authenticated, this will always fail. The agent should instead use the update_issue safe-output tool to apply the validated update, consistent with how the safe-outputs config is set up in the lock file.

This issue also appears on line 141 of the same file.

Suggested change

# Only if validation passes

gh issue edit "$ISSUE" --repo dotnet/machinelearning --body "$UPDATED_BODY"

# Only if validation passes; use the safe-output tool to apply the update

update_issue "$ISSUE" "$UPDATED_BODY"

Copilot · 2026-03-06T19:23:29Z

.github/workflows/repo-health-check.md

+```bash
+# Check if AZDO_PAT is available; skip AzDO checks if not
+if [ -z "$AZDO_PAT" ]; then
+  echo "AZDO_PAT not set — skipping Azure DevOps pipeline checks"
+else
+for pipeline in vsts-ci codecoverage-ci night-build outer-loop-build; do
+  curl -s -u ":$AZDO_PAT" \
+    "https://dev.azure.com/dnceng/public/_apis/build/builds?definitions=$pipeline&\$top=1&api-version=7.0" \
+    | jq '.value[0] | {id, buildNumber, status, result, queueTime, finishTime}'
+done
+```
+
+**A2. Pipeline failure rate (last 7 days)**
+
+```bash
+SINCE=$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -v-7d +%Y-%m-%dT%H:%M:%SZ)
+for pipeline in vsts-ci codecoverage-ci night-build outer-loop-build; do
+  curl -s -u ":$AZDO_PAT" \
+    "https://dev.azure.com/dnceng/public/_apis/build/builds?definitions=$pipeline&minTime=$SINCE&api-version=7.0" \
+    | jq '[.value[] | .result] | group_by(.) | map({result: .[0], count: length})'
+done
+```
+
+**A3. Queue times**
+
+```bash
+for pipeline in vsts-ci codecoverage-ci night-build outer-loop-build; do
+  curl -s -u ":$AZDO_PAT" \
+    "https://dev.azure.com/dnceng/public/_apis/build/builds?definitions=$pipeline&\$top=10&api-version=7.0" \
+    | jq '[.value[] | {queueTime, startTime} | {wait: ((.startTime | fromdateiso8601) - (.queueTime | fromdateiso8601))}] | {avg_wait_seconds: (map(.wait) | add / length)}'
+done
+fi


The AzDO bash code in the orchestrator instructions is split across three separate fenced code blocks (A1, A2, A3), but the if/else control flow spans all three:

A1 block opens if [ -z "$AZDO_PAT" ]; then ... else but has no fi and no closing of the block.

A2 block contains just the else body (two for loops) with no if/fi.

A3 block contains the final for loop followed by fi to close the A1 if.

Since these are separate fenced Markdown code blocks, the agent will treat them as distinct snippets. The fi at the end of A3 appears to close a non-existent if, and A2 has no guard at all. If the agent executes A2 or A3 as standalone blocks, the A2 and A3 curl commands will run unconditionally even when AZDO_PAT is not set, causing authentication errors on every run. The entire if/else/fi guard should be contained within a single code block, or each block should have its own if [ -n "$AZDO_PAT" ] guard.

Copilot · 2026-03-06T19:23:29Z

.github/workflows/repo-health-check.md

+
+```bash
+# Find existing dashboard issue
+ISSUE=$(gh issue list --repo dotnet/machinelearning \
+  --label "repo-health" --state open \
+  --json number --jq '.[0].number')
+
+if [ -z "$ISSUE" ]; then
+  # Create new dashboard issue
+  ISSUE=$(gh issue create --repo dotnet/machinelearning \
+    --title "🏥 Repo Health Dashboard" \
+    --label "repo-health" \
+    --body "$DASHBOARD_BODY")
+  # Pin the issue
+  gh issue pin "$ISSUE" --repo dotnet/machinelearning
+fi
+```
+
+### Update Issue Body
+
+Replace the entire issue body with the current state using the dashboard format. Include:
+
+1. **Header** — Last updated timestamp, overall status emoji and counts
+2. **Summary** — Executive summary (1-2 sentences)
+3. **Findings tables** — Critical, Warning, Recently Resolved, Baselined
+4. **Trends (7-day)** — Key metrics with directional arrows
+5. **Footer** — Link to workflow run and baseline file
+
+### Post Daily Comment
+
+```bash
+gh issue comment "$ISSUE" --repo dotnet/machinelearning \
+  --body "$DELTA_SUMMARY"
+```


The repo-health-check.md orchestrator instructs the agent to use gh issue create and gh issue pin to create/pin the dashboard issue (lines 356–362), and gh issue comment (line 379), but the compiled lock file explicitly states "The gh CLI is NOT authenticated." The agent should instead use the create_issue, update_issue, and add_comment safe-output tool calls for these operations. As written, these gh commands will fail silently on every run, meaning the dashboard issue will never be created or commented on.

This issue also appears on line 394 of the same file.

Copilot · 2026-03-06T19:23:29Z

.github/workflows/repo-health-investigate.md

+Post a single comment on the dashboard issue (#${{ inputs.health_issue_number }}).
+
+```bash
+gh issue comment ${{ inputs.health_issue_number }} --repo dotnet/machinelearning --body "$REPORT"


In repo-health-investigate.md, the Step 5 report-back uses a gh issue comment command (line 199) to post to the dashboard issue. However, the compiled lock file explicitly notes "The gh CLI is NOT authenticated." The agent should instead use the add_comment safe-output tool to post the investigation report. As written, this command will fail and the investigation result will never be posted back to the dashboard.

Suggested change

gh issue comment ${{ inputs.health_issue_number }} --repo dotnet/machinelearning --body "$REPORT"

add_comment "$REPORT"

Add repo health check workflows (orchestrator, investigator, groomer)…

c982403

… and known baseline

Copilot AI review requested due to automatic review settings March 6, 2026 19:18

JanKrivanek merged commit df690c2 into main Mar 6, 2026

Copilot started reviewing on behalf of JanKrivanek March 6, 2026 19:19 View session

dotnet-policy-service bot assigned JanKrivanek Mar 6, 2026

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add repo health check agentic workflows#7583

Add repo health check agentic workflows#7583
JanKrivanek merged 1 commit intomainfrom
repo-health-check

JanKrivanek commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Copilot AI Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Only if validation passes
		gh issue edit "$ISSUE" --repo dotnet/machinelearning --body "$UPDATED_BODY"

	gh issue comment ${{ inputs.health_issue_number }} --repo dotnet/machinelearning --body "$REPORT"
	add_comment "$REPORT"

Conversation

JanKrivanek commented Mar 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants