Skip to content

Reigstry: Fix wave-release auto-trigger meta-token expansion#66100

Merged
kaxil merged 1 commit intoapache:mainfrom
astronomer:fix-1305-wave-release-meta-token-expansion
Apr 30, 2026
Merged

Reigstry: Fix wave-release auto-trigger meta-token expansion#66100
kaxil merged 1 commit intoapache:mainfrom
astronomer:fix-1305-wave-release-meta-token-expansion

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented Apr 29, 2026

publish-docs-to-s3.yml's build-info job dropped the all-providers and apache-airflow-providers meta-tokens when computing registry-providers. Wave releases dispatch publish-docs with exactly those tokens (per README_RELEASE_PROVIDERS.md and workflow_commands.py:188-195 which auto-appends apache-airflow-providers for any providers dispatch), so registry-providers was always empty, the gated update-registry job was skipped, and the registry was silently never rebuilt.

Fix

Extracts the registry-trigger logic into a small unit-tested Python script (dev/registry/derive_wave_providers.py).

  1. Wave-aware derivation. When INCLUDE_DOCS contains a meta-token AND the dispatch ref matches the wave-tag pattern (providers/YYYY-MM-DD), the script derives the wave's actual provider list from per-provider tags reachable from the wave ref but not from the previous wave tag. Only those providers go through the incremental registry build -- non-wave providers' PyPI download counts, latest pointers, and modules stay untouched.

  2. Safe fallback. For non-wave-tag dispatches (manual rebuild against main, etc.), or when the derivation produces an empty list (first-ever wave, or a wave with no new per-provider tags), the script sets a new registry-full-build output and the gate honors it. registry-build.yml already interprets provider: "" as "full build", so no change is needed there. Suspect empty-list cases emit ::warning:: annotations for monitoring.

Why derive instead of full-rebuild

A full rebuild on every wave touches all 86 providers: re-extracts metadata, refreshes PyPI counts, and races the 30-min job timeout. Wave-incremental cuts that to the wave's ~22 providers and leaves everyone else alone.

Behavior matrix

Scenario INCLUDE_DOCS REF Outcome
Standard wave all-providers ... providers/2026-04-21 Incremental: derived ~22 providers
Wave RC ref all-providers ... providers/2026-04-21-rc1 Full rebuild (regex doesn't match)
First wave ever all-providers ... providers/2026-01-15 Full rebuild + ::warning::
Manual against main all-providers ... main Full rebuild
Explicit packages amazon google any Incremental: amazon google
Per-provider doc apache-airflow-providers-amazon providers-amazon/9.27.0 Incremental: amazon

Verification

  • 15 pytest cases pass locally and on CI.
  • Local sanity check against providers/2026-04-21 derives the expected 22 providers (amazon, apache-kafka, ..., vespa).
  • Post-merge: dispatch publish-docs-to-s3 against providers/2026-04-21 to staging and confirm update-registry runs incrementally on the derived 22 (visible in job logs as --provider "amazon ...").

Out of scope (follow-ups)

  • registry-build.yml lacks the pre-sync content guards that registry-backfill.yml got in Make registry-backfill workflow actually publish backfilled pages #66027 -- will have PR once this is merged separately.
  • The build-and-publish-registry 30-minute timeout is now less critical for wave dispatches (incremental is ~5-10m), but a manual full rebuild against main could still race -- will have PR once this is merged separately.
  • Soon I will probably move all of this to a single breeze command but that's isn't on critical path yet -- and want to test few things so not creating issues for it yet.

@boring-cyborg boring-cyborg Bot added area:dev-tools area:registry backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels Apr 29, 2026
@kaxil kaxil added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 29, 2026
@kaxil kaxil force-pushed the fix-1305-wave-release-meta-token-expansion branch from b52654b to ebc0f56 Compare April 29, 2026 18:34
@kaxil kaxil force-pushed the fix-1305-wave-release-meta-token-expansion branch from ebc0f56 to 88bfe04 Compare April 29, 2026 23:16
`publish-docs-to-s3.yml`'s build-info job dropped `all-providers` and
`apache-airflow-providers` meta-tokens when computing `registry-providers`,
which left the `update-registry` gate empty and silently skipped the
registry update on every wave release.

This fix extracts the registry-trigger logic into a small, unit-tested
Python script (`dev/registry/derive_wave_providers.py`). When `INCLUDE_DOCS`
contains a meta-token (`all`, `all-providers`, `apache-airflow-providers`)
AND the dispatch ref matches the wave-tag pattern (`providers/YYYY-MM-DD`),
the script derives the wave's actual provider list from per-provider tags
reachable from the wave ref but not from the previous wave tag. Only those
providers go through the incremental registry build; non-wave providers'
download counts and latest pointers stay untouched.

For non-wave-tag dispatches (e.g., manual rebuild against `main`) or when
the derivation produces an empty list, the script falls back to a full
registry rebuild via a new `registry-full-build` output, which the gate
honors alongside the existing per-provider list. The downstream
`registry-build.yml` already interprets `provider: ""` as "full build", so
no change is needed there.
@kaxil kaxil force-pushed the fix-1305-wave-release-meta-token-expansion branch from 88bfe04 to 5df8945 Compare April 29, 2026 23:18
@kaxil kaxil merged commit b5e9bae into apache:main Apr 30, 2026
137 checks passed
@kaxil kaxil deleted the fix-1305-wave-release-meta-token-expansion branch April 30, 2026 00:17
@github-project-automation github-project-automation Bot moved this from Backlog to Done in Airflow Registry Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Backport failed to create: v3-2-test. View the failure log Run details

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test Commit Link

You can attempt to backport this manually by running:

cherry_picker b5e9bae v3-2-test

This should apply the commit to the v3-2-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

seruman pushed a commit to seruman/airflow that referenced this pull request Apr 30, 2026
`publish-docs-to-s3.yml`'s build-info job dropped `all-providers` and
`apache-airflow-providers` meta-tokens when computing `registry-providers`,
which left the `update-registry` gate empty and silently skipped the
registry update on every wave release.

This fix extracts the registry-trigger logic into a small, unit-tested
Python script (`dev/registry/derive_wave_providers.py`). When `INCLUDE_DOCS`
contains a meta-token (`all`, `all-providers`, `apache-airflow-providers`)
AND the dispatch ref matches the wave-tag pattern (`providers/YYYY-MM-DD`),
the script derives the wave's actual provider list from per-provider tags
reachable from the wave ref but not from the previous wave tag. Only those
providers go through the incremental registry build; non-wave providers'
download counts and latest pointers stay untouched.

For non-wave-tag dispatches (e.g., manual rebuild against `main`) or when
the derivation produces an empty list, the script falls back to a full
registry rebuild via a new `registry-full-build` output, which the gate
honors alongside the existing per-provider list. The downstream
`registry-build.yml` already interprets `provider: ""` as "full build", so
no change is needed there.
kaxil added a commit that referenced this pull request Apr 30, 2026
Recent successful full builds run 26-30 minutes (most recent: 29m49s,
26m13s, 20m43s). The 30-minute timeout left near-zero headroom -- a
modest pypistats slowdown or transient registry/network blip during
the per-provider PyPI fetches in `extract_metadata.py` would race the
timeout and silently fail the registry update.

After #66100 (#1305 fix), `registry-build.yml` fires automatically on
every wave-release dispatch. Bumping the timeout to 45 minutes gives
meaningful headroom without going so high that a genuinely stuck job
sits around forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:registry backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants