chore(ci): add ARC baseline collector for OS-49 runner migration by jtoelke2 · Pull Request #927 · NVIDIA/OpenShell

jtoelke2 · 2026-04-23T00:10:28Z

Summary

Add a stdlib-only Python script that pulls 30-day GitHub Actions baseline metrics (runs, success rate, wall p50/p95, queue p50/p95) for the ten workflows in scope for the ARC → nv-gha-runners migration. This gives us a before-snapshot to compare against as we cut workflows over in Phases 2-7.

Related Issue

Part of the OS-49 runner migration. See Linear OS-49 (parent) and OS-125 (Phase 1 baseline). The baseline numbers produced by this script are captured in the OS-125 Linear document.

Changes

scripts/baseline_workflow_metrics.py: queries /repos/{owner}/{repo}/actions/workflows/{id}/runs for top-level workflows and /repos/{owner}/{repo}/actions/runs filtered by referenced_workflows[].path for reusable workflows (docker-build.yml, e2e-test.yml). Excludes skipped/cancelled/startup_failure runs from wall/queue percentiles. Outputs JSON and Markdown.

Testing

mise run pre-commit passes
Unit tests added/updated — none added; script is one-shot diagnostic, output verified manually against Actions UI
E2E tests added/updated (if applicable) — N/A

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Architecture docs updated (if applicable) — N/A; plan lives in Linear OS-49

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>

github-actions · 2026-04-23T00:10:37Z

All contributors have signed the DCO ✍️ ✅
_{Posted by the DCO Assistant Lite bot.}

jtoelke2 · 2026-04-23T00:32:03Z

I have read the DCO document and I hereby sign the DCO.

jtoelke2 · 2026-04-23T00:34:33Z

recheck

pimlock · 2026-04-23T18:06:22Z

What are you planning to use to keep track of data? I.e. where is it going to be stored?

I wonder if having some kind of observability platform hooked up would make sense here, so we can track things not just as a one off, but long term as well?

There is a hosted Grafana that can use GitHub as a datasource, maybe that would work here? Or we do something custom?

cc @TaylorMutch -> I know our previous project build custom metrics/storage/dashboard around CI metrics, not sure why it was custom, but you may have some answers here.

jtoelke2 · 2026-04-23T21:00:51Z

Intent here is one-shot diagnostic — the script is the evidence trail for OS-49's Phase 1 exit criterion (baseline captured), not a durable metrics pipeline. Numbers land in Linear OS-125 as a point-in-time snapshot, and we re-run manually when we need to diff against a cut-over (Phases 5–7).

Long-term CI observability is out of scope for this migration but genuinely worthwhile. Happy to file a follow-up issue if that sounds right — a hosted-Grafana-with-GitHub-datasource approach would avoid the custom-metrics-platform pattern @TaylorMutch is probably thinking of. For now, PR 927 is just a stdlib Python one-shot: zero deps, zero infra, drop-in disposable.

pimlock · 2026-04-23T21:10:56Z

Sounds good @jtoelke2. I was thinking if getting something longer term would be as easy as enabling integration in Grafana it would be a win-win, but it doesn't look that way, so a one-off solution for the migration sounds good for now.

jtoelke2 · 2026-04-24T03:20:55Z

@pimlock @TaylorMutch — follow-up issue filed: #954 — Long-term CI observability via OTLP → Observability Service (Mimir) + Grafana.

Based on a dig through NVIDIA's Observability Onboarding Guide (http://nv/observability), the canonical path turns out to be OTLP → LGTM (Mimir for metrics) → grafana.nvidia.com — not Grafana's GitHub datasource plugin (not installed on our instance) and not NVDataFlow (wrong backend for observability). GHE Actions Insights is 404 for OpenShell, ruling out the zero-effort native path.

One open row in the alternatives table is explicitly marked TBD pending your input, @TaylorMutch — curious whether your prior custom pipeline has label/schema conventions we should reuse, or if LGTM didn't exist yet when you built it.

chore(ci): add ARC baseline collector for OS-49 runner migration

46227ae

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>

jtoelke2 requested a review from a team as a code owner April 23, 2026 00:10

jtoelke2 self-assigned this Apr 23, 2026

pimlock approved these changes Apr 23, 2026

View reviewed changes

jtoelke2 merged commit 75b880b into main Apr 23, 2026
13 of 14 checks passed

jtoelke2 deleted the jtoelke/os-125-arc-baseline-collector branch April 23, 2026 21:52

jtoelke2 mentioned this pull request Apr 24, 2026

Long-term CI observability via OTLP → Observability Service (Mimir) + Grafana #954

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(ci): add ARC baseline collector for OS-49 runner migration#927

chore(ci): add ARC baseline collector for OS-49 runner migration#927
jtoelke2 merged 1 commit intomainfrom
jtoelke/os-125-arc-baseline-collector

jtoelke2 commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

jtoelke2 commented Apr 23, 2026

Uh oh!

jtoelke2 commented Apr 23, 2026

Uh oh!

pimlock commented Apr 23, 2026

Uh oh!

jtoelke2 commented Apr 23, 2026

Uh oh!

pimlock commented Apr 23, 2026

Uh oh!

Uh oh!

jtoelke2 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jtoelke2 commented Apr 23, 2026

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

github-actions Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtoelke2 commented Apr 23, 2026

Uh oh!

jtoelke2 commented Apr 23, 2026

Uh oh!

pimlock commented Apr 23, 2026

Uh oh!

jtoelke2 commented Apr 23, 2026

Uh oh!

pimlock commented Apr 23, 2026

Uh oh!

Uh oh!

jtoelke2 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Apr 23, 2026 •

edited

Loading