fix: PROJECT_NUMBER typo (P1) + ci-status run_id comparison (P2)#2
Conversation
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #2 (.github)
Summary
Two real bugs, two correct fixes. The P1 fix is a clean one-character rename that restores broken automation. The P2 fix is a meaningful improvement over same_wf[0]-by-position, though it leaves one minor edge case unresolved.
Verdict: Approve — ship as-is. The remaining nit can be a follow-up.
P1 — PROJECT_NUUMBER typo (pr-merged-done.yml)
✅ Correct. The only consumer is:
const projectResp = await github.graphql(projectQuery, {
org: ORG,
number: PROJECT_NUMBER,
});at .github/workflows/pr-merged-done.yml:96. Renaming the declaration matches the consumer. No other references to PROJECT_NUUMBER exist in the file or repo.
Impact confirmed: before this fix, PROJECT_NUMBER is not defined would have thrown at the GraphQL call site, so issues never moved to Done — the workflow has been silently broken since it landed.
P2 — run_id-based current run selection (octo-ci-status.yml)
✅ Correct core fix.
run_idwas already declared asrequired: truein theworkflow_callinputs onmain, so wiring it throughenv.RUN_IDis not a breaking change for callers — they already pass it.next((r for r in same_wf if r['id'] == run_id), same_wf[0])correctly identifies the triggering run. Type comparison is sound: GitHub's REST API returnsidas an integer, andint(os.environ.get('RUN_ID', 0))produces an int.- The fallback to
same_wf[0]whenrun_iddoesn't match is a reasonable degradation (matches the prior behavior).
⚠️ Nit — others[0] is "most recent by created_at", not "most recent before current"
others = [r for r in same_wf if r['id'] != current['id']]
previous = others[0] if others else Nonesame_wf is the API ordering (created_at desc). After excluding current, others[0] is the most recently created run — which can still be a run that was created and completed after current, e.g.:
- Run A starts at T0, completes at T+3 (slow).
- Run B starts at T+1, completes at T+2.
- Octo's notify job for A starts late (after B has also finished). API returns
[B, A, …]. current = A(matched byrun_id) ✅others = [B, …]→previous = B, but B completed after A.
So when the notify job for A picks "previous", it's actually picking a later run, which can flip the alert/recovery semantics. The PR description identifies the original race exactly as same_wf[0] being the wrong run; this fix solves "which is current" but leaves "which is previous" anchored on created_at ordering.
A tighter implementation would scope others to runs older than current:
older = [r for r in same_wf
if r['id'] != current['id']
and r['created_at'] < current['created_at']]
previous = older[0] if older else None(Or use updated_at if you want completion-time ordering specifically; for CI runs they're effectively equivalent.)
This is a corner case — concurrent runs finishing out-of-order with delayed notify jobs — and the fix is strictly better than same_wf[0]/same_wf[1]. Not a blocker. File a follow-up.
Minor observation — silent fallback
If run_id is somehow not in same_wf (e.g., the triggering run dropped out of the most-recent-10 window before the notify job ran), the code silently falls back to same_wf[0] — i.e., the exact behavior the PR is fixing. A one-line print('WARN: run_id %s not in recent runs, falling back' % run_id) would help when triaging future false alerts. Optional.
Other findings (out of scope, unchanged code)
per_page=10combined with thename-based filter insame_wf = [r for r in runs if r['name'] == wf_name]could yield zero or one matching run when many other workflows exist on the repo. Not introduced by this PR; flagging because the P2 fix relies on the current run being in this window.same_wfis filtered by workflowname, which is the human-readable name. If the.github/workflows/<x>.ymlfile is renamed butname:stays the same, this still works; if thename:field is changed, history is severed. Acceptable trade-off, just worth knowing.
Both pre-existed and are unrelated to this PR's scope.
Verification checklist
- ✅
PROJECT_NUMBERdeclaration matches its single consumer at line 96. - ✅
run_idworkflow input was already declared onmain— no caller changes required. - ✅ Python
intcomparison against APIr['id']is type-correct. - ✅
previous = Nonepath is handled by the existingprev_conclusion = previous['conclusion'] if previous else Noneguard. - ✅ State-change logic (
if curr_conclusion == prev_conclusion) preserves first-completion-success silence and first-completion-failure alerting. ⚠️ others[0]may be a chronologically-later run under concurrent completion + delayed notify; minor.
LGTM — approving.
yujiawei
left a comment
There was a problem hiding this comment.
Code Review — PR #2 (.github)
Summary
Two small, well-scoped fixes to org-level reusable workflows. Both bugs are real, the fixes are correct, and the risk surface is limited to these two workflows.
Verdict: APPROVED — minor non-blocking observations below.
1. Verification
P1 — PROJECT_NUUMBER typo (.github/workflows/pr-merged-done.yml)
✅ Confirmed bug, correctly fixed.
- Before fix:
const PROJECT_NUUMBER = 2;declared but never referenced again. The later GraphQL call at.github/workflows/pr-merged-done.yml:85and thecore.setFailedmessage at line 89 both reference the undeclaredPROJECT_NUMBER, which would have thrownReferenceError: PROJECT_NUMBER is not definedat runtime in theactions/github-scriptstep. - After fix: declaration at line 28 matches both use sites (
pr-merged-done.yml:85,pr-merged-done.yml:89).grep -n PROJECT_Nshows three consistent occurrences, no stragglers. - Impact statement in PR body matches reality: this workflow has indeed never moved issues to Done; this is the first time the script can actually reach the GraphQL query.
P2 — run_id based current/previous selection (.github/workflows/octo-ci-status.yml)
✅ Confirmed bug, correct primary fix; one residual edge case worth flagging.
- Original
current = same_wf[0]; previous = same_wf[1]relied on the GitHub Actions API window happening to put the just-completed run at index 0. When two runs of the same workflow finish close in time, the?status=completed&per_page=10listing (created_at desc) can place a sibling run ahead of the triggering run, so the wrong pair is compared and the alert/recovery semantics flip. - The new logic anchors
currentto therun_idinput (which is${{ github.run_id }}at the caller, hence guaranteed to identify this execution), then derivespreviousfrom runs strictly older bycreated_at, with the current run excluded by id. ISO‑8601 strings sort lexicographically, and the API ordering meansolder[0]is the most-recent-strictly-older run — that is exactly what the alert/recovery comparison wants. run_idwas already declared as a requiredworkflow_callinput onmain, so this PR does not introduce a breaking change for callers — they're already passing it; the input just wasn't being used.
2. Findings
P2 — fallback path preserves the original broken behavior (low)
When RUN_ID is not found in the 10-run window (e.g. the workflow had a delayed dispatch and >10 same-named runs landed first, or the env var is misconfigured), the code falls back to current = same_wf[0] and then computes older from that. That is exactly the pre-fix behavior, just behind a WARN: log line. In practice the recent-window-miss should be rare, but two suggestions:
- Consider exiting silently (or with a non-zero status that's caught) when the warn fires, instead of silently producing a possibly wrong alert. A spurious page is worse than a missed one for a meta-monitoring workflow.
- Or widen the window (
per_page=30) so the fallback rarely triggers in practice.
Either is fine; current behavior is acceptable and strictly better than before.
P2 — created_at ties (very low)
r['created_at'] < current['created_at'] excludes runs created in the same second. That's an unrealistic but possible edge if two pushes land within sub-second window. Optionally tighten with (r['created_at'], r['id']) < (current['created_at'], current['id']) to break ties deterministically. Not worth blocking on.
P2 — cancelled runs in previous (pre-existing, low)
?status=completed includes cancelled. The current branch already short-circuits when the current conclusion is cancelled, but a cancelled previous produces prev_conclusion='cancelled' which falls through to the Unhandled transition arm and exits cleanly — so no spurious alert today. Worth noting only because if the message-decision logic ever grows new cases, cancelled-as-previous needs to be filtered explicitly. Not introduced by this PR; not a blocker.
Style nit — single-quote f-string mix
'WARN: run_id %s not found ...' % run_id is fine, but the surrounding script uses f-strings throughout. Optional cosmetic alignment. No functional impact.
3. Recommendations
- Ship as-is. Both bugs are real, both fixes are correct, and the diff is appropriately minimal.
- Optional follow-up (separate PR): decide whether the
run_id-not-found fallback should be soft (current) or hard (skip the notification). My preference is hard-skip, because the whole point of the fix is to stop sending miscalibrated alerts. - Optional follow-up: add a note in
octo-ci-status.ymlcallers' README thatrun_id: ${{ github.run_id }}is required, since the field has been required-but-unused onmainand a missing call site would only be caught at workflow-call time.
4. Additional Observations (out of scope, surfaced from reading)
pr-merged-done.ymlusessame_wffiltering byr['name'] == wf_name. If a repo ever renames a workflow file in-place, recent runs from the old name disappear from the same-name window — this is desired here, just noting it as an implicit assumption. Not a problem for this PR.- The
keywordReregex inpr-merged-done.ymlalready handles plural forms and same-repo URL form. Cross-repo closing references are explicitly out-of-scope per the file header. Reasonable.
* ci: restore auto-add-to-project as reusable workflow Re-add the org-level reusable workflow for automatically adding new issues and PRs to Octo Board (Project #2). Supports: - workflow_call: callable from any org repo via caller workflow - Direct trigger: issues/pull_request_target for .github repo itself Replaces the previous standalone version with the reusable pattern consistent with other org-level workflows (issue-welcome, pr-feed, etc.). * ci: address review feedback on auto-add-to-project - Rename to 'Auto add to project (reusable)' for consistency - Add rationale for Actions-based approach vs native auto-add rule - Document PROJECT_TOKEN requirement for direct triggers - Add security warning against actions/checkout under pull_request_target - Add timeout-minutes: 5 to job --------- Co-authored-by: Octo Bot <octo-bot@mininglamp.com>
Workflow Bug Fixes (Codex Review)
Two bugs found in org-level reusable workflows via automated review.
P1 🔴 Critical:
PROJECT_NUMBERtypo inpr-merged-done.ymlconst PROJECT_NUUMBER = 2(extra U) butPROJECT_NUMBERis used laterReferenceError— issues have NEVER been moved to Done automaticallyPROJECT_NUUMBER→PROJECT_NUMBERP2 🟡 Logic:
octo-ci-status.ymlwrong run comparisonsame_wf[0]may not be the triggering run when two CI runs complete simultaneouslyrun_idinput to precisely identify the current run before selecting the previous one