Reigstry: Fix wave-release auto-trigger meta-token expansion#66100
Merged
kaxil merged 1 commit intoapache:mainfrom Apr 30, 2026
Merged
Conversation
b52654b to
ebc0f56
Compare
potiuk
approved these changes
Apr 29, 2026
ebc0f56 to
88bfe04
Compare
`publish-docs-to-s3.yml`'s build-info job dropped `all-providers` and `apache-airflow-providers` meta-tokens when computing `registry-providers`, which left the `update-registry` gate empty and silently skipped the registry update on every wave release. This fix extracts the registry-trigger logic into a small, unit-tested Python script (`dev/registry/derive_wave_providers.py`). When `INCLUDE_DOCS` contains a meta-token (`all`, `all-providers`, `apache-airflow-providers`) AND the dispatch ref matches the wave-tag pattern (`providers/YYYY-MM-DD`), the script derives the wave's actual provider list from per-provider tags reachable from the wave ref but not from the previous wave tag. Only those providers go through the incremental registry build; non-wave providers' download counts and latest pointers stay untouched. For non-wave-tag dispatches (e.g., manual rebuild against `main`) or when the derivation produces an empty list, the script falls back to a full registry rebuild via a new `registry-full-build` output, which the gate honors alongside the existing per-provider list. The downstream `registry-build.yml` already interprets `provider: ""` as "full build", so no change is needed there.
88bfe04 to
5df8945
Compare
Contributor
Backport failed to create: v3-2-test. View the failure log Run detailsNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
You can attempt to backport this manually by running: cherry_picker b5e9bae v3-2-testThis should apply the commit to the v3-2-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continueIf you don't have cherry-picker installed, see the installation guide. |
1 task
seruman
pushed a commit
to seruman/airflow
that referenced
this pull request
Apr 30, 2026
`publish-docs-to-s3.yml`'s build-info job dropped `all-providers` and `apache-airflow-providers` meta-tokens when computing `registry-providers`, which left the `update-registry` gate empty and silently skipped the registry update on every wave release. This fix extracts the registry-trigger logic into a small, unit-tested Python script (`dev/registry/derive_wave_providers.py`). When `INCLUDE_DOCS` contains a meta-token (`all`, `all-providers`, `apache-airflow-providers`) AND the dispatch ref matches the wave-tag pattern (`providers/YYYY-MM-DD`), the script derives the wave's actual provider list from per-provider tags reachable from the wave ref but not from the previous wave tag. Only those providers go through the incremental registry build; non-wave providers' download counts and latest pointers stay untouched. For non-wave-tag dispatches (e.g., manual rebuild against `main`) or when the derivation produces an empty list, the script falls back to a full registry rebuild via a new `registry-full-build` output, which the gate honors alongside the existing per-provider list. The downstream `registry-build.yml` already interprets `provider: ""` as "full build", so no change is needed there.
kaxil
added a commit
that referenced
this pull request
Apr 30, 2026
Recent successful full builds run 26-30 minutes (most recent: 29m49s, 26m13s, 20m43s). The 30-minute timeout left near-zero headroom -- a modest pypistats slowdown or transient registry/network blip during the per-provider PyPI fetches in `extract_metadata.py` would race the timeout and silently fail the registry update. After #66100 (#1305 fix), `registry-build.yml` fires automatically on every wave-release dispatch. Bumping the timeout to 45 minutes gives meaningful headroom without going so high that a genuinely stuck job sits around forever.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
publish-docs-to-s3.yml'sbuild-infojob dropped theall-providersandapache-airflow-providersmeta-tokens when computingregistry-providers. Wave releases dispatch publish-docs with exactly those tokens (per README_RELEASE_PROVIDERS.md and workflow_commands.py:188-195 which auto-appendsapache-airflow-providersfor any providers dispatch), soregistry-providerswas always empty, the gatedupdate-registryjob was skipped, and the registry was silently never rebuilt.Fix
Extracts the registry-trigger logic into a small unit-tested Python script (
dev/registry/derive_wave_providers.py).Wave-aware derivation. When
INCLUDE_DOCScontains a meta-token AND the dispatch ref matches the wave-tag pattern (providers/YYYY-MM-DD), the script derives the wave's actual provider list from per-provider tags reachable from the wave ref but not from the previous wave tag. Only those providers go through the incremental registry build -- non-wave providers' PyPI download counts, latest pointers, and modules stay untouched.Safe fallback. For non-wave-tag dispatches (manual rebuild against
main, etc.), or when the derivation produces an empty list (first-ever wave, or a wave with no new per-provider tags), the script sets a newregistry-full-buildoutput and the gate honors it.registry-build.ymlalready interpretsprovider: ""as "full build", so no change is needed there. Suspect empty-list cases emit::warning::annotations for monitoring.Why derive instead of full-rebuild
A full rebuild on every wave touches all 86 providers: re-extracts metadata, refreshes PyPI counts, and races the 30-min job timeout. Wave-incremental cuts that to the wave's ~22 providers and leaves everyone else alone.
Behavior matrix
all-providers ...providers/2026-04-21all-providers ...providers/2026-04-21-rc1all-providers ...providers/2026-01-15::warning::all-providers ...mainamazon googleamazon googleapache-airflow-providers-amazonproviders-amazon/9.27.0amazonVerification
providers/2026-04-21derives the expected 22 providers (amazon, apache-kafka, ..., vespa).providers/2026-04-21to staging and confirmupdate-registryruns incrementally on the derived 22 (visible in job logs as--provider "amazon ...").Out of scope (follow-ups)
registry-build.ymllacks the pre-sync content guards thatregistry-backfill.ymlgot in Make registry-backfill workflow actually publish backfilled pages #66027 -- will have PR once this is merged separately.build-and-publish-registry30-minute timeout is now less critical for wave dispatches (incremental is ~5-10m), but a manual full rebuild againstmaincould still race -- will have PR once this is merged separately.