Overview
This issue tracks the implementation of a Layer 3 upgrade-drift CI test. The goal is to catch cases where ALTER EXTENSION cat_tools UPDATE (from 0.2.2 to 0.3.0) produces a different schema than a fresh CREATE EXTENSION cat_tools. Any such difference is a bug in the upgrade script.
Implementation Spec
Procedure
make install PGUSER=postgres (installs all SQL files)
- Create two databases:
- fresh:
CREATE EXTENSION cat_tools (gets 0.3.0 directly)
- upgraded:
CREATE EXTENSION cat_tools VERSION '0.2.2' then ALTER EXTENSION cat_tools UPDATE (arrives at 0.3.0 via upgrade)
- For each database: run
unmark-extension.sql to remove all objects from extension membership (so pg_dump includes them as regular objects)
pg_dump --schema-only --no-owner --no-privileges on each database
- Normalize both dumps (strip noise, sort object blocks)
- Diff — any difference is an upgrade drift bug
- CI job passes if diff is empty; fails with the diff shown
Unmarking Extension Objects
/*
* Generate ALTER EXTENSION cat_tools DROP ... statements for every
* object owned by the extension, so pg_dump includes them as regular objects.
*/
SELECT format(
'ALTER EXTENSION cat_tools DROP %s %s;',
(pg_identify_object(classid, objid, 0)).type,
(pg_identify_object(classid, objid, 0)).identity
)
FROM pg_depend
WHERE refobjid = (SELECT oid FROM pg_extension WHERE extname = 'cat_tools')
AND deptype = 'e'
AND classid != 'pg_extension'::regclass;
pg_identify_object returns (type, schema, name, identity); the identity field is suitable for use in ALTER EXTENSION DROP. Available PG 9.3+. Some object types may need special handling — test on PG 11, PG 12, and PG 18.
Dump Normalization Script
The normalization script should:
- Strip pg_dump header boilerplate (lines before first
SET or -- section)
- Strip
SET statements (search_path etc.)
- Strip
-- Name: ...; Type: ...; Schema: ... section comment lines
- Split remaining content into blocks on blank-line boundaries
- Within each block, normalize whitespace (collapse runs, trim line ends)
- Sort blocks lexicographically
- Rejoin and diff the two outputs
File Layout
test/upgrade-drift/
PLAN.md <- Design doc (write first)
unmark-extension.sql <- SQL to generate DROP statements
run-drift-test.sh <- Orchestrator: creates DBs, runs full test
normalize-dump.pl <- Normalization script (or .sh)
CI Job
upgrade-drift-test:
strategy:
matrix:
pg: [11, 12, 18]
name: Upgrade drift test on PostgreSQL ${{ matrix.pg }}
runs-on: ubuntu-latest
container: pgxn/pgxn-tools
steps:
- name: Start PostgreSQL ${{ matrix.pg }}
run: pg-start ${{ matrix.pg }}
- name: Check out the repo
uses: actions/checkout@v4
- name: Install rsync
run: apt-get install -y rsync
- name: Install cat_tools
run: make install PGUSER=postgres
- name: Run upgrade drift test
run: test/upgrade-drift/run-drift-test.sh
Known Edge Cases to Address
- Objects intentionally different between fresh/upgraded (allowlist approach — start with empty
test/upgrade-drift/expected-diffs.txt)
- PG version differences in pg_dump output format
- The
_cat_tools private schema (included or excluded?)
- Table data (schema-only dump ignores it — note this limitation)
prosrc field in pg_dump output (function bodies) will be byte-for-byte identical if the upgrade script copies them correctly — whitespace differences are signal, not noise
Prior Analysis Findings
pg_identify_object(classid, objid, 0).identity gives the right string for ALTER EXTENSION DROP without needing to handle each object type separately — but test this assumption for edge cases.
- Paragraph mode (splitting on blank lines) aligns well with pg_dump's output format.
- An allowlist of known acceptable diffs is the right pattern for intentional differences.
Language Choice for Normalization Script (Open Question)
The language for normalize-dump.pl (or equivalent) is still open. The leading candidate is Perl, but the tradeoffs are:
Perl (leading candidate)
- Pro: Ships with every Debian system (it's a dependency of dpkg itself); available in the
pgxn/pgxn-tools CI image with zero extra install steps; paragraph mode (local $/ = "") is perfect for block-splitting; strong regex with /xsm modifiers
- Con: Less readable to developers who aren't Perl users; idiomatic Perl can be cryptic
Python
- Pro: More readable to cold readers;
re.DOTALL + re.VERBOSE cover the same ground as Perl's regex modifiers
- Con: Not guaranteed in the CI image (needs
apt-get install -y python3); potential version and venv headaches on dev machines
Shell/awk
- Pro: Already used in the Makefile; zero new dependencies;
awk RS="" paragraph mode exists
- Con: Set arithmetic and multi-file logic are awkward; limited for the full requirements of both scripts
Go
- Pro: Fast and statically typed
- Con: Requires an install and build step; RE2 engine (no lookahead/lookbehind); overkill for these scripts
Recommendation: Use Perl unless there is a strong team preference otherwise. If Perl is chosen, keep idioms straightforward — avoid write-only constructions, prefer named captures over positional, and include comments explaining any non-obvious regex.
Success Criteria
test/upgrade-drift/run-drift-test.sh runs locally (with PG available) and PASSes when fresh install and upgrade produce identical schema, FAILs with a readable diff when they differ
- CI job defined in
ci.yml and syntactically valid
PLAN.md is complete enough that a new developer understands the full design
Overview
This issue tracks the implementation of a Layer 3 upgrade-drift CI test. The goal is to catch cases where
ALTER EXTENSION cat_tools UPDATE(from 0.2.2 to 0.3.0) produces a different schema than a freshCREATE EXTENSION cat_tools. Any such difference is a bug in the upgrade script.Implementation Spec
Procedure
make install PGUSER=postgres(installs all SQL files)CREATE EXTENSION cat_tools(gets 0.3.0 directly)CREATE EXTENSION cat_tools VERSION '0.2.2'thenALTER EXTENSION cat_tools UPDATE(arrives at 0.3.0 via upgrade)unmark-extension.sqlto remove all objects from extension membership (so pg_dump includes them as regular objects)pg_dump --schema-only --no-owner --no-privilegeson each databaseUnmarking Extension Objects
pg_identify_objectreturns(type, schema, name, identity); theidentityfield is suitable for use inALTER EXTENSION DROP. Available PG 9.3+. Some object types may need special handling — test on PG 11, PG 12, and PG 18.Dump Normalization Script
The normalization script should:
SETor--section)SETstatements (search_path etc.)-- Name: ...; Type: ...; Schema: ...section comment linesFile Layout
CI Job
Known Edge Cases to Address
test/upgrade-drift/expected-diffs.txt)_cat_toolsprivate schema (included or excluded?)prosrcfield in pg_dump output (function bodies) will be byte-for-byte identical if the upgrade script copies them correctly — whitespace differences are signal, not noisePrior Analysis Findings
pg_identify_object(classid, objid, 0).identitygives the right string forALTER EXTENSION DROPwithout needing to handle each object type separately — but test this assumption for edge cases.Language Choice for Normalization Script (Open Question)
The language for
normalize-dump.pl(or equivalent) is still open. The leading candidate is Perl, but the tradeoffs are:Perl (leading candidate)
pgxn/pgxn-toolsCI image with zero extra install steps; paragraph mode (local $/ = "") is perfect for block-splitting; strong regex with/xsmmodifiersPython
re.DOTALL+re.VERBOSEcover the same ground as Perl's regex modifiersapt-get install -y python3); potential version and venv headaches on dev machinesShell/awk
awk RS=""paragraph mode existsGo
Recommendation: Use Perl unless there is a strong team preference otherwise. If Perl is chosen, keep idioms straightforward — avoid write-only constructions, prefer named captures over positional, and include comments explaining any non-obvious regex.
Success Criteria
test/upgrade-drift/run-drift-test.shruns locally (with PG available) and PASSes when fresh install and upgrade produce identical schema, FAILs with a readable diff when they differci.ymland syntactically validPLAN.mdis complete enough that a new developer understands the full design