Skip to content

Make registry-backfill workflow actually publish backfilled pages#66027

Merged
kaxil merged 1 commit intoapache:mainfrom
astronomer:registry-backfill-patch-and-guard
Apr 28, 2026
Merged

Make registry-backfill workflow actually publish backfilled pages#66027
kaxil merged 1 commit intoapache:mainfrom
astronomer:registry-backfill-patch-and-guard

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented Apr 28, 2026

Summary

The Registry Backfill workflow has been reporting success while uploading nothing. Three related bugs combine:

  1. Cached providers.json overwrite drops the backfill version. registry-backfill.yml's Download data files from S3 for build step writes the cached S3 providers.json to registry/src/_data/providers.json AFTER extraction has run. The cached copy predates the backfilled version, so provider.versions[] doesn't list it. registry/src/_data/providerVersions.js:48-50 then intersects provider.versions ∩ availableSet and drops the on-disk metadata.json for the new version. Eleventy emits no per-version page.

  2. Silent aws s3 sync on missing source. With (1) preventing the page from being built, the next step's aws s3 sync exits 0 on the missing source and the job goes green having uploaded nothing.

  3. A redundant Extract version metadata from git tags step that was also broken. It duplicated breeze registry backfill's internal call to extract_versions.py AND was broken for multi-version matrix entries because extract_versions.py:446 declares --version as single-valued (no action="append"). When the workflow looped --version A --version B argparse only kept the last one. The trailing || true masked the failure.

Fix

  • New dev/registry/patch_providers_json.py runs after the S3 download and appends the backfilled version(s) into provider.versions[]. Defensively keeps provider.version (latest) in the array so patching a previously-empty list doesn't leave the latest field excluded once the array becomes authoritative for providerVersions.js.

  • Pre-sync guards assert (a) the per-version index.html was emitted, (b) the API directory exists, (c) parameters.json is non-empty, before each aws s3 sync. Silent no-ops become hard failures with ::error:: annotations.

  • Delete the redundant standalone Extract version metadata step. breeze registry backfill already runs _run_extract_versions internally and propagates exit codes correctly.

Why patch providers.json instead of changing the intersection in providerVersions.js

The intersection is shared by registry-build.yml (full builds) and registry-backfill.yml. Changing it affects both flows. Patching is scoped to backfill, mirrors the mental model that providers.json IS the source of truth for "which versions to expose", and matches what breeze registry publish-versions is supposed to do at the end of the run anyway.

Gotchas

  • packaging is now declared as a runtime dep of dev/registry/. It was already available transitively via pydantic, but the patch script uses packaging.version for newest-first sorting and shouldn't rely on a transitive dep.
  • The pre-sync guard's content checks deliberately don't assert classes is non-empty: providers like common-compat legitimately have zero modules ("Configuration-only provider"), and the page renders a placeholder for them.

Known follow-up (separate PR)

The publish-versions job runs in a fresh checkout where providers.json is gitignored, so publish_registry_versions.py:119-127 raises FileNotFoundError after the per-version sync. Tracked separately and shipping right after this PR; not in scope here.

Verification

This PR alone makes the per-version pages land on S3 correctly. Once the follow-up publish-versions fix lands, the smoke test (amazon/9.24.0 google/21.0.0 common-compat/1.14.2) should go fully green:

gh workflow run registry-backfill.yml -R apache/airflow \
  -f destination=staging \
  -f provider-versions="amazon/9.24.0 google/21.0.0 common-compat/1.14.2"

Tests in dev/registry/tests/test_patch_providers_json.py cover 11 cases including the empty-list / latest-field edges.

Linked prior fixes that built the chain to here: #65972, #65975, #65987, #65984, #65989.

Three related bugs in `.github/workflows/registry-backfill.yml` that
combine to make backfill jobs report success while uploading nothing:

1. **`Download data files from S3 for build` clobbers the in-flight
   `providers.json`.** The workflow downloads the cached S3 copy of
   `providers.json` to `registry/src/_data/providers.json` AFTER
   extraction has run. The cached copy predates the backfilled version,
   so `provider.versions[]` doesn't list it.
   `registry/src/_data/providerVersions.js:48-50` then intersects
   `provider.versions ∩ availableSet` and drops the on-disk
   `metadata.json` for the new version. Eleventy emits no per-version
   page for the backfilled version.

2. **`aws s3 sync` exits 0 on missing source.** Combined with (1), the
   workflow's "Sync backfilled version pages to S3" step silently
   no-ops when the build didn't emit a page, and the job goes green.

3. **A redundant `Extract version metadata from git tags` step
   duplicated `breeze registry backfill`'s internal call AND was broken
   for multi-version matrix entries.** `extract_versions.py:446`
   declared `--version` as single-valued (no `action="append"`), so
   when the workflow looped `--version A --version B` argparse only
   kept the last one. The trailing `|| true` masked the failure.

Fixes:

- New script `dev/registry/patch_providers_json.py` runs after the S3
  download and appends backfilled version(s) into the targeted
  `provider.versions` array. It also defensively keeps `provider.version`
  (the latest) in the array -- otherwise patching into a previously-empty
  list could leave the latest field excluded once the array becomes
  the authoritative filter for `providerVersions.js`.

- Sync step now asserts (a) the per-version `index.html` was emitted,
  (b) the API directory exists, (c) `parameters.json` is non-empty,
  before running each `aws s3 sync`. Silent no-ops are now hard
  failures with `::error::` annotations.

- The redundant standalone `Extract version metadata` step is deleted.
  `breeze registry backfill` already runs `_run_extract_versions`
  internally and propagates exit codes correctly.

`packaging` is now declared as a runtime dep of `dev/registry/`
(previously only available transitively via pydantic) since
`patch_providers_json.py` uses `packaging.version` for newest-first
sorting.

Tests: 11 unit tests in `dev/registry/tests/test_patch_providers_json.py`
covering append, dedup, unknown-provider error, multi-version,
invalid-version sort fallback, empty-list edge (the latest-included
guarantee), latest-field-untouched, other-providers-untouched, and
three CLI invocation paths.
@kaxil kaxil merged commit 7e49884 into apache:main Apr 28, 2026
144 checks passed
@kaxil kaxil deleted the registry-backfill-patch-and-guard branch April 28, 2026 20:55
@github-actions
Copy link
Copy Markdown
Contributor

Backport failed to create: v3-2-test. View the failure log Run details

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 7e49884 v3-2-test

This should apply the commit to the v3-2-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

kaxil added a commit that referenced this pull request Apr 30, 2026
`aws s3 sync` exits 0 silently when its source directory is missing or
empty, so a partial `breeze registry extract-data` failure (one provider
errors mid-run, Eleventy still produces a tree, sync uploads stale or
empty JSON) would currently report green while leaving the live registry
in an inconsistent state.

This adds the same pre-sync content guards `registry-backfill.yml` got
in PR #66027: for incremental builds, assert each target provider's HTML
directory + API directory + `versions.json` exist and are non-empty;
for full builds, assert the top-level `index.html`, `api/providers.json`
listing, and at least one provider subtree under `api/providers/` are
present. Any missing artifact aborts the sync with `::error::` before
S3 is touched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:registry backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants