Skip to content

fix: PROJECT_NUMBER typo (P1) + ci-status run_id comparison (P2)#2

Merged
lml2468 merged 3 commits into
mainfrom
fix/workflow-bugs
May 13, 2026
Merged

fix: PROJECT_NUMBER typo (P1) + ci-status run_id comparison (P2)#2
lml2468 merged 3 commits into
mainfrom
fix/workflow-bugs

Conversation

@lml2468
Copy link
Copy Markdown
Contributor

@lml2468 lml2468 commented May 13, 2026

Workflow Bug Fixes (Codex Review)

Two bugs found in org-level reusable workflows via automated review.

P1 🔴 Critical: PROJECT_NUMBER typo in pr-merged-done.yml

  • Bug: const PROJECT_NUUMBER = 2 (extra U) but PROJECT_NUMBER is used later
  • Impact: Board automation throws ReferenceError — issues have NEVER been moved to Done automatically
  • Fix: Rename PROJECT_NUUMBERPROJECT_NUMBER

P2 🟡 Logic: octo-ci-status.yml wrong run comparison

  • Bug: same_wf[0] may not be the triggering run when two CI runs complete simultaneously
  • Impact: Wrong state-change pair compared; could send false alerts or miss real failures
  • Fix: Use run_id input to precisely identify the current run before selecting the previous one

yujiawei
yujiawei previously approved these changes May 13, 2026
Copy link
Copy Markdown

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #2 (.github)

Summary

Two real bugs, two correct fixes. The P1 fix is a clean one-character rename that restores broken automation. The P2 fix is a meaningful improvement over same_wf[0]-by-position, though it leaves one minor edge case unresolved.

Verdict: Approve — ship as-is. The remaining nit can be a follow-up.


P1 — PROJECT_NUUMBER typo (pr-merged-done.yml)

Correct. The only consumer is:

const projectResp = await github.graphql(projectQuery, {
  org: ORG,
  number: PROJECT_NUMBER,
});

at .github/workflows/pr-merged-done.yml:96. Renaming the declaration matches the consumer. No other references to PROJECT_NUUMBER exist in the file or repo.

Impact confirmed: before this fix, PROJECT_NUMBER is not defined would have thrown at the GraphQL call site, so issues never moved to Done — the workflow has been silently broken since it landed.


P2 — run_id-based current run selection (octo-ci-status.yml)

Correct core fix.

  • run_id was already declared as required: true in the workflow_call inputs on main, so wiring it through env.RUN_ID is not a breaking change for callers — they already pass it.
  • next((r for r in same_wf if r['id'] == run_id), same_wf[0]) correctly identifies the triggering run. Type comparison is sound: GitHub's REST API returns id as an integer, and int(os.environ.get('RUN_ID', 0)) produces an int.
  • The fallback to same_wf[0] when run_id doesn't match is a reasonable degradation (matches the prior behavior).

⚠️ Nit — others[0] is "most recent by created_at", not "most recent before current"

others  = [r for r in same_wf if r['id'] != current['id']]
previous = others[0] if others else None

same_wf is the API ordering (created_at desc). After excluding current, others[0] is the most recently created run — which can still be a run that was created and completed after current, e.g.:

  • Run A starts at T0, completes at T+3 (slow).
  • Run B starts at T+1, completes at T+2.
  • Octo's notify job for A starts late (after B has also finished). API returns [B, A, …].
  • current = A (matched by run_id) ✅
  • others = [B, …]previous = B, but B completed after A.

So when the notify job for A picks "previous", it's actually picking a later run, which can flip the alert/recovery semantics. The PR description identifies the original race exactly as same_wf[0] being the wrong run; this fix solves "which is current" but leaves "which is previous" anchored on created_at ordering.

A tighter implementation would scope others to runs older than current:

older = [r for r in same_wf
         if r['id'] != current['id']
         and r['created_at'] < current['created_at']]
previous = older[0] if older else None

(Or use updated_at if you want completion-time ordering specifically; for CI runs they're effectively equivalent.)

This is a corner case — concurrent runs finishing out-of-order with delayed notify jobs — and the fix is strictly better than same_wf[0]/same_wf[1]. Not a blocker. File a follow-up.

Minor observation — silent fallback

If run_id is somehow not in same_wf (e.g., the triggering run dropped out of the most-recent-10 window before the notify job ran), the code silently falls back to same_wf[0] — i.e., the exact behavior the PR is fixing. A one-line print('WARN: run_id %s not in recent runs, falling back' % run_id) would help when triaging future false alerts. Optional.


Other findings (out of scope, unchanged code)

  1. per_page=10 combined with the name-based filter in same_wf = [r for r in runs if r['name'] == wf_name] could yield zero or one matching run when many other workflows exist on the repo. Not introduced by this PR; flagging because the P2 fix relies on the current run being in this window.
  2. same_wf is filtered by workflow name, which is the human-readable name. If the .github/workflows/<x>.yml file is renamed but name: stays the same, this still works; if the name: field is changed, history is severed. Acceptable trade-off, just worth knowing.

Both pre-existed and are unrelated to this PR's scope.


Verification checklist

  • PROJECT_NUMBER declaration matches its single consumer at line 96.
  • run_id workflow input was already declared on main — no caller changes required.
  • ✅ Python int comparison against API r['id'] is type-correct.
  • previous = None path is handled by the existing prev_conclusion = previous['conclusion'] if previous else None guard.
  • ✅ State-change logic (if curr_conclusion == prev_conclusion) preserves first-completion-success silence and first-completion-failure alerting.
  • ⚠️ others[0] may be a chronologically-later run under concurrent completion + delayed notify; minor.

LGTM — approving.

Copy link
Copy Markdown

@yujiawei yujiawei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #2 (.github)

Summary

Two small, well-scoped fixes to org-level reusable workflows. Both bugs are real, the fixes are correct, and the risk surface is limited to these two workflows.

Verdict: APPROVED — minor non-blocking observations below.


1. Verification

P1 — PROJECT_NUUMBER typo (.github/workflows/pr-merged-done.yml)

Confirmed bug, correctly fixed.

  • Before fix: const PROJECT_NUUMBER = 2; declared but never referenced again. The later GraphQL call at .github/workflows/pr-merged-done.yml:85 and the core.setFailed message at line 89 both reference the undeclared PROJECT_NUMBER, which would have thrown ReferenceError: PROJECT_NUMBER is not defined at runtime in the actions/github-script step.
  • After fix: declaration at line 28 matches both use sites (pr-merged-done.yml:85, pr-merged-done.yml:89). grep -n PROJECT_N shows three consistent occurrences, no stragglers.
  • Impact statement in PR body matches reality: this workflow has indeed never moved issues to Done; this is the first time the script can actually reach the GraphQL query.

P2 — run_id based current/previous selection (.github/workflows/octo-ci-status.yml)

Confirmed bug, correct primary fix; one residual edge case worth flagging.

  • Original current = same_wf[0]; previous = same_wf[1] relied on the GitHub Actions API window happening to put the just-completed run at index 0. When two runs of the same workflow finish close in time, the ?status=completed&per_page=10 listing (created_at desc) can place a sibling run ahead of the triggering run, so the wrong pair is compared and the alert/recovery semantics flip.
  • The new logic anchors current to the run_id input (which is ${{ github.run_id }} at the caller, hence guaranteed to identify this execution), then derives previous from runs strictly older by created_at, with the current run excluded by id. ISO‑8601 strings sort lexicographically, and the API ordering means older[0] is the most-recent-strictly-older run — that is exactly what the alert/recovery comparison wants.
  • run_id was already declared as a required workflow_call input on main, so this PR does not introduce a breaking change for callers — they're already passing it; the input just wasn't being used.

2. Findings

P2 — fallback path preserves the original broken behavior (low)

When RUN_ID is not found in the 10-run window (e.g. the workflow had a delayed dispatch and >10 same-named runs landed first, or the env var is misconfigured), the code falls back to current = same_wf[0] and then computes older from that. That is exactly the pre-fix behavior, just behind a WARN: log line. In practice the recent-window-miss should be rare, but two suggestions:

  • Consider exiting silently (or with a non-zero status that's caught) when the warn fires, instead of silently producing a possibly wrong alert. A spurious page is worse than a missed one for a meta-monitoring workflow.
  • Or widen the window (per_page=30) so the fallback rarely triggers in practice.

Either is fine; current behavior is acceptable and strictly better than before.

P2 — created_at ties (very low)

r['created_at'] < current['created_at'] excludes runs created in the same second. That's an unrealistic but possible edge if two pushes land within sub-second window. Optionally tighten with (r['created_at'], r['id']) < (current['created_at'], current['id']) to break ties deterministically. Not worth blocking on.

P2 — cancelled runs in previous (pre-existing, low)

?status=completed includes cancelled. The current branch already short-circuits when the current conclusion is cancelled, but a cancelled previous produces prev_conclusion='cancelled' which falls through to the Unhandled transition arm and exits cleanly — so no spurious alert today. Worth noting only because if the message-decision logic ever grows new cases, cancelled-as-previous needs to be filtered explicitly. Not introduced by this PR; not a blocker.

Style nit — single-quote f-string mix

'WARN: run_id %s not found ...' % run_id is fine, but the surrounding script uses f-strings throughout. Optional cosmetic alignment. No functional impact.


3. Recommendations

  • Ship as-is. Both bugs are real, both fixes are correct, and the diff is appropriately minimal.
  • Optional follow-up (separate PR): decide whether the run_id-not-found fallback should be soft (current) or hard (skip the notification). My preference is hard-skip, because the whole point of the fix is to stop sending miscalibrated alerts.
  • Optional follow-up: add a note in octo-ci-status.yml callers' README that run_id: ${{ github.run_id }} is required, since the field has been required-but-unused on main and a missing call site would only be caught at workflow-call time.

4. Additional Observations (out of scope, surfaced from reading)

  • pr-merged-done.yml uses same_wf filtering by r['name'] == wf_name. If a repo ever renames a workflow file in-place, recent runs from the old name disappear from the same-name window — this is desired here, just noting it as an implicit assumption. Not a problem for this PR.
  • The keywordRe regex in pr-merged-done.yml already handles plural forms and same-repo URL form. Cross-repo closing references are explicitly out-of-scope per the file header. Reasonable.

@lml2468 lml2468 merged commit d115330 into main May 13, 2026
1 check passed
lml2468 added a commit that referenced this pull request May 15, 2026
* ci: restore auto-add-to-project as reusable workflow

Re-add the org-level reusable workflow for automatically adding
new issues and PRs to Octo Board (Project #2).

Supports:
- workflow_call: callable from any org repo via caller workflow
- Direct trigger: issues/pull_request_target for .github repo itself

Replaces the previous standalone version with the reusable pattern
consistent with other org-level workflows (issue-welcome, pr-feed, etc.).

* ci: address review feedback on auto-add-to-project

- Rename to 'Auto add to project (reusable)' for consistency
- Add rationale for Actions-based approach vs native auto-add rule
- Document PROJECT_TOKEN requirement for direct triggers
- Add security warning against actions/checkout under pull_request_target
- Add timeout-minutes: 5 to job

---------

Co-authored-by: Octo Bot <octo-bot@mininglamp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants