Skip to content

Fix CI Slack notifications never firing recovery alerts#65119

Merged
potiuk merged 1 commit intoapache:mainfrom
potiuk:fix-slack-notification-artifact-lookup
Apr 13, 2026
Merged

Fix CI Slack notifications never firing recovery alerts#65119
potiuk merged 1 commit intoapache:mainfrom
potiuk:fix-slack-notification-artifact-lookup

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Apr 13, 2026

Summary

The CI "Notify Slack" job (in ci-amd-arm.yml, ci-notification.yml, etc.) never sent the "all tests passing" recovery message after a failure was fixed. Looking at run 24310256628 / job 70982878551, the determination step printed:

Root cause: scripts/ci/slack_notification_state.py calls gh api repos/.../actions/artifacts -f name=.... The gh CLI defaults to POST when any -f/-F parameter is passed, and the artifacts list endpoint returns 404 on POST. As a result, download_previous_state() always returned None, every run looked like the first run with no prior state, and determine_action([], None) always returned \"skip\" — so notify_recovery could never trigger.

Fix

  • Pass --method GET explicitly so the -f parameters are encoded as query string instead of a POST body.
  • Add scripts/tests/ci/test_slack_notification_state.py covering:
    • The regression: download_previous_state must call gh api with --method GET.
    • The determine_action state machine (skip / notify_new / notify_recovery / change-of-failures).

Verified the broken call locally returns 404, and the fixed call returns the expected artifact metadata.

Test plan

  • uv run --project scripts pytest scripts/tests/ci/test_slack_notification_state.py -xvs
  • Next scheduled ci-amd-arm.yml run on main should now correctly send a recovery notification when a previously-failing run flips to all green.

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.6)

Generated-by: Claude Code (Opus 4.6) following the guidelines

The artifact lookup for the previous notification state was using
'gh api' with -f parameters, which makes the CLI default to POST
instead of GET. The artifacts endpoint returns 404 on POST, so the
script always treated each run as the first one with no prior state.
That meant the 'all tests passing' recovery notification could never
fire, since it requires the previous state to contain failures.

Force --method GET on the artifact lookup and add unit tests covering
both the GET requirement and the determine_action state machine.
@boring-cyborg boring-cyborg bot added area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels Apr 13, 2026
Copy link
Copy Markdown
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@potiuk potiuk merged commit 35e002a into apache:main Apr 13, 2026
65 checks passed
@potiuk potiuk deleted the fix-slack-notification-artifact-lookup branch April 13, 2026 16:40
github-actions bot pushed a commit that referenced this pull request Apr 13, 2026
…65119)

The artifact lookup for the previous notification state was using
'gh api' with -f parameters, which makes the CLI default to POST
instead of GET. The artifacts endpoint returns 404 on POST, so the
script always treated each run as the first one with no prior state.
That meant the 'all tests passing' recovery notification could never
fire, since it requires the previous state to contain failures.

Force --method GET on the artifact lookup and add unit tests covering
both the GET requirement and the determine_action state machine.
(cherry picked from commit 35e002a)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@github-actions
Copy link
Copy Markdown

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

dandanseo123 pushed a commit to dandanseo123/airflow that referenced this pull request Apr 13, 2026
The artifact lookup for the previous notification state was using
'gh api' with -f parameters, which makes the CLI default to POST
instead of GET. The artifacts endpoint returns 404 on POST, so the
script always treated each run as the first one with no prior state.
That meant the 'all tests passing' recovery notification could never
fire, since it requires the previous state to contain failures.

Force --method GET on the artifact lookup and add unit tests covering
both the GET requirement and the determine_action state machine.
eladkal pushed a commit that referenced this pull request Apr 14, 2026
…65119) (#65164)

The artifact lookup for the previous notification state was using
'gh api' with -f parameters, which makes the CLI default to POST
instead of GET. The artifacts endpoint returns 404 on POST, so the
script always treated each run as the first one with no prior state.
That meant the 'all tests passing' recovery notification could never
fire, since it requires the previous state to contain failures.

Force --method GET on the artifact lookup and add unit tests covering
both the GET requirement and the determine_action state machine.
(cherry picked from commit 35e002a)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants