Skip to content

perf(ci): use postgres service container for python job#4633

Merged
Yicong-Huang merged 2 commits into
apache:mainfrom
Yicong-Huang:perf/python-postgres-service
May 2, 2026
Merged

perf(ci): use postgres service container for python job#4633
Yicong-Huang merged 2 commits into
apache:mainfrom
Yicong-Huang:perf/python-postgres-service

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 2, 2026

What changes were proposed in this PR?

Switch the python job in build.yml from apt-get install postgresql + systemctl start to a services: postgres container, mirroring what the scala job already does:

  • Add services.postgres (image postgres, POSTGRES_PASSWORD=postgres, port 5432, pg_isready healthcheck).
  • Drop Install PostgreSQL, Start PostgreSQL Service, and the sudo -u postgres psql -f seed step.
  • Single Create iceberg catalog database step that runs psql -h localhost -U postgres -f sql/iceberg_postgres_catalog.sql (same pattern as the scala job).

Any related issues, documentation, discussions?

Closes #4634.

Driven by repeated python-job failures on apt-get update against azure.archive.ubuntu.com, which has been unreliable; runs sit ignoring the InRelease responses for tens of seconds and either fail or surface stale package metadata. The docker registry path used by services is independent of that mirror.

Side benefit: postgres container starts in seconds, vs. ~30 s of apt-get update even on a healthy day. Removes the only place in build.yml that still needed the apt mirror.

How was this PR tested?

Will be exercised by this PR's own python matrix once the CI runs. The seed SQL is the same one the scala job already runs successfully against the same container image.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Replace 'apt-get install postgresql' + systemctl start in the python
matrix with the same docker service container the scala job already
uses (postgres image with POSTGRES_PASSWORD=postgres on localhost
:5432, plus a pg_isready healthcheck). Drop the apt-get update step
to remove the dependency on the Azure Ubuntu mirror, which has been
unreliable; container pulls go through a separate Docker registry
CDN.

Mechanics: connect via 'psql -h localhost -U postgres' with
PGPASSWORD=postgres, the same way the scala job does. The single
'Create iceberg catalog database' step replaces the previous
install/start/seed sequence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the ci changes related to CI label May 2, 2026
@Yicong-Huang Yicong-Huang added the release/v1.1.0-incubating back porting to release/v1.1.0-incubating label May 2, 2026
@Yicong-Huang Yicong-Huang requested a review from aglinxinyuan May 2, 2026 00:40
Copy link
Copy Markdown
Contributor

@aglinxinyuan aglinxinyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Yicong-Huang Yicong-Huang enabled auto-merge (squash) May 2, 2026 00:55
@Yicong-Huang Yicong-Huang merged commit f32c974 into apache:main May 2, 2026
22 checks passed
github-actions Bot pushed a commit that referenced this pull request May 2, 2026
### What changes were proposed in this PR?

Switch the python job in `build.yml` from `apt-get install postgresql` +
`systemctl start` to a `services: postgres` container, mirroring what
the scala job already does:

- Add `services.postgres` (image `postgres`,
`POSTGRES_PASSWORD=postgres`, port 5432, `pg_isready` healthcheck).
- Drop `Install PostgreSQL`, `Start PostgreSQL Service`, and the `sudo
-u postgres psql -f` seed step.
- Single `Create iceberg catalog database` step that runs `psql -h
localhost -U postgres -f sql/iceberg_postgres_catalog.sql` (same pattern
as the scala job).

### Any related issues, documentation, discussions?

Closes #4634.

Driven by repeated python-job failures on `apt-get update` against
`azure.archive.ubuntu.com`, which has been unreliable; runs sit ignoring
the InRelease responses for tens of seconds and either fail or surface
stale package metadata. The docker registry path used by `services` is
independent of that mirror.

Side benefit: postgres container starts in seconds, vs. ~30 s of
`apt-get update` even on a healthy day. Removes the only place in
`build.yml` that still needed the apt mirror.

### How was this PR tested?

Will be exercised by this PR's own python matrix once the CI runs. The
seed SQL is the same one the scala job already runs successfully against
the same container image.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

(backported from commit f32c974)
SarahAsad23 pushed a commit to SarahAsad23/texera that referenced this pull request May 4, 2026
### What changes were proposed in this PR?

Switch the python job in `build.yml` from `apt-get install postgresql` +
`systemctl start` to a `services: postgres` container, mirroring what
the scala job already does:

- Add `services.postgres` (image `postgres`,
`POSTGRES_PASSWORD=postgres`, port 5432, `pg_isready` healthcheck).
- Drop `Install PostgreSQL`, `Start PostgreSQL Service`, and the `sudo
-u postgres psql -f` seed step.
- Single `Create iceberg catalog database` step that runs `psql -h
localhost -U postgres -f sql/iceberg_postgres_catalog.sql` (same pattern
as the scala job).

### Any related issues, documentation, discussions?

Closes apache#4634.

Driven by repeated python-job failures on `apt-get update` against
`azure.archive.ubuntu.com`, which has been unreliable; runs sit ignoring
the InRelease responses for tens of seconds and either fail or surface
stale package metadata. The docker registry path used by `services` is
independent of that mirror.

Side benefit: postgres container starts in seconds, vs. ~30 s of
`apt-get update` even on a healthy day. Removes the only place in
`build.yml` that still needed the apt mirror.

### How was this PR tested?

Will be exercised by this PR's own python matrix once the CI runs. The
seed SQL is the same one the scala job already runs successfully against
the same container image.

### Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI release/v1.1.0-incubating back porting to release/v1.1.0-incubating

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use postgres service container in python CI job

2 participants