A PostgreSQL Table Access Method (TAM) extension for storing versioned data with automatic delta compression. Built on the xpatch delta encoding library.
Store versioned rows (document revisions, config history, audit logs, etc.) with massive space savings. Compression is automatic and transparent - you just INSERT and SELECT normally.
-- Create a table using the xpatch access method
CREATE TABLE documents (
doc_id INT,
version INT,
content TEXT
) USING xpatch;
-- Configure grouping and ordering (optional - auto-detection works for most cases)
SELECT xpatch.configure('documents',
group_by => 'doc_id',
order_by => 'version'
);
-- Insert versions normally
INSERT INTO documents VALUES (1, 1, 'Hello World');
INSERT INTO documents VALUES (1, 2, 'Hello PostgreSQL World!');
INSERT INTO documents VALUES (1, 3, 'Hello PostgreSQL World! Updated.');
-- Query normally - reconstruction is automatic
SELECT * FROM documents WHERE doc_id = 1 ORDER BY version;
-- Check compression stats
SELECT * FROM xpatch_stats('documents');- Automatic delta compression - Only stores differences between versions
- Transparent reconstruction - SELECT works normally, deltas are decoded on-the-fly
- Auto-detection - Automatically detects group_by, order_by, and delta columns
- Parallel scans - Full support for parallel query execution
- Index support - B-tree indexes work on all columns (including delta-compressed ones)
- MVCC - Basic multi-version concurrency control
- DELETE - Cascade delete removes a version and all subsequent versions in the chain
- VACUUM - Dead tuple cleanup works
- WAL logging - Crash recovery supported
- Shared memory cache - LRU cache for reconstructed content across all backends
- UPDATE - Not supported. Insert a new version instead. This is intentional for append-only versioned data.
- Out-of-order inserts - Versions must be inserted in order within each group
- Hidden columns - The internal
_xp_seqcolumn is visible inSELECT *(PostgreSQL limitation)
-- Describe a table: shows config, schema, and storage stats
SELECT * FROM xpatch.describe('documents');
-- Warm the cache for faster subsequent queries
SELECT * FROM xpatch.warm_cache('documents');
SELECT * FROM xpatch.warm_cache('documents', max_groups => 100);
-- Get compression statistics
SELECT * FROM xpatch_stats('documents');
-- Inspect internal storage for a specific group (debugging/analysis)
SELECT * FROM xpatch_inspect('documents', 1); -- group_value = 1
-- Get cache statistics (requires shared_preload_libraries)
SELECT * FROM xpatch_cache_stats();
-- Get xpatch library version
SELECT xpatch_version();
-- Dump all table configs as SQL (for backup/migration)
SELECT * FROM xpatch.dump_configs();
-- Fix config OIDs after pg_restore
SELECT xpatch.fix_restored_configs();Space savings depend heavily on your data patterns. Here are real benchmarks with 5000 rows (100 documents x 50 versions each):
| Data Pattern | xpatch Size | heap Size | Space Saved |
|---|---|---|---|
| Incremental changes (base content + small additions) | 416 KB | 9.2 MB | 95% smaller |
| Identical content across versions | 488 KB | 648 KB | 25% smaller |
| Completely random data | 816 KB | 728 KB | 12% larger (overhead) |
Key insight: xpatch shines when versions share content. For typical document versioning (where each version is similar to the previous), expect 10-20x space savings. For random/unrelated data, xpatch adds overhead and provides no benefit.
Important: These benchmarks are rough indicators, not precise measurements. Your mileage will vary based on hardware, data patterns, cache state, and query complexity.
Benchmark setup: 10,100 rows (101 documents x 100 versions), incremental text data. xpatch: 776 KB, heap: 17 MB.
| Operation | xpatch | heap | Slowdown |
|---|---|---|---|
| Full table COUNT | |||
| - Cold cache | 44ms | 2.6ms | 17x slower |
| - Warm cache | 20ms | 1.4ms | 14x slower |
| Point lookup (single doc, 100 rows) | 0.7ms | 0.05ms | 14x slower |
| Point lookup (single row) | 0.13ms | 0.02ms | 6x slower |
| GROUP BY aggregate | 27ms | 3ms | 9x slower |
| Latest version per doc | 28ms | 5.5ms | 5x slower |
| Text search (LIKE) | 3.4ms | 1.5ms | 2x slower |
| INSERT (100 rows) | 33ms | 0.3ms | 100x slower |
| Parallel scan (2 workers) | 31ms | 3.5ms | 9x slower |
Key observations:
- Reads are 5-17x slower due to delta reconstruction overhead
- Writes are ~100x slower due to delta encoding (this is the main trade-off)
- Cache helps but doesn't eliminate the reconstruction cost
- Indexed lookups are faster than full scans but still have overhead
A note on write performance:
"100x slower" sounds alarming, but look at the absolute numbers: 33ms for 100 rows is ~0.33ms per row. For versioned document storage, audit logs, or real-time collaboration (saving state every few seconds), sub-millisecond writes are more than fast enough.
Also remember that writes parallelize across groups. If you have 50 users editing 50 different documents, all 50 writes happen concurrently. Sequential writes only apply within a single group's version chain—which is inherent to delta compression, not a limitation.
When to use xpatch:
- Storage cost is a primary concern (95% space savings)
- Data is written once and read occasionally
- Append-only versioned data (audit logs, document history, config snapshots)
- You can tolerate higher write latency
When NOT to use xpatch:
- High-frequency reads on the same data
- Write-heavy workloads where latency matters
- Data with no similarity between versions (random data)
The shared memory cache dramatically improves read performance for repeated access:
-- First query (cold): ~35ms for 5000 rows
SELECT COUNT(*) FROM documents;
-- Second query (warm): ~1ms for 5000 rows
SELECT COUNT(*) FROM documents;To enable the shared memory cache, add to postgresql.conf:
shared_preload_libraries = 'pg_xpatch'
To test pg-xpatch against a real versioning system, we stored the complete file history of the tokio repository (35,827 file versions across 2,641 files, 296 MB raw content) and compared storage size and random access speed against git's packfile format.
Setup:
group_by=path, compress_depth=1000, keyframe_every=100, enable_zstd=true- Each row stores: file path, content, commit message, commit hash, parent hash, timestamp
- No optimization was done — the schema is a single flat denormalized table with commit metadata repeated per row, no content deduplication, no normalization. Just naive INSERTs and let xpatch handle compression.
Storage comparison (total on-disk size):
| Storage Method | Total Size | What's Included |
|---|---|---|
| Raw file content | 296 MB | Just the file blobs |
| Git (normal bare clone) | 18 MB | Packfile + commit/tree objects + refs |
Git (gc --aggressive) |
11 MB | Same, with aggressive packing |
| pg-xpatch | 12 MB | Table + TOAST + indexes + per-row metadata |
Git with aggressive packing is slightly smaller (11 MB vs 12 MB), but pg-xpatch provides full SQL queryability over the data. Both systems store more than just file content — git stores commit/tree objects, pg-xpatch stores commit hashes, timestamps, messages, and indexes.
Internal delta compression: xpatch's delta engine achieves 42x compression on the content columns alone (296 MB → 7 MB), but the total relation size (12 MB) includes non-delta columns and index overhead.
Random access performance (100 random point lookups, warmed cache):
| Method | Median | p95 | Max |
|---|---|---|---|
| pg-xpatch | 0.20 ms | < 1 ms | 22 ms |
git (git show) |
1.32 ms | 1.47 ms | 1.81 ms |
pg-xpatch is ~6.5x faster on median, with 95% of queries under 1 ms. Git has more consistent latency (tight 1-2 ms range), while pg-xpatch has occasional tail latency spikes from cold chain reconstructions.
Insert and engine performance:
The delta engine performs up to 22,000 base comparisons per second (~45μs per comparison). Each insert compares against up to compress_depth previous versions to find the best delta base, so practical insert speed depends on how many comparisons are needed. With group_by, chains are short (avg 13.57 versions here), meaning fewer comparisons per insert and higher row throughput — up to 336 rows/s for this dataset.
Parallel writes: Since groups are independent, multiple groups can be written concurrently from separate connections with no performance degradation.
# Run PostgreSQL with pg-xpatch pre-installed
docker run -d --name pg-xpatch \
-p 5432:5432 \
-e POSTGRES_PASSWORD=secret \
ghcr.io/imgajeed76/pg-xpatch:latest
# Connect and enable the extension
psql -h localhost -U postgres -c "CREATE EXTENSION pg_xpatch;"Download from GitHub Releases:
# Download and extract (replace VERSION with actual version, e.g., v0.1.1)
tar -xzf pg_xpatch-VERSION-pg16-linux-amd64.tar.gz
cd pg_xpatch-VERSION-pg16-linux-amd64
# Install
sudo cp pg_xpatch.so $(pg_config --pkglibdir)/
sudo cp pg_xpatch.control *.sql $(pg_config --sharedir)/extension/
# (Optional) Enable shared memory cache - add to postgresql.conf:
# shared_preload_libraries = 'pg_xpatch'
# Restart PostgreSQL, then:
psql -c "CREATE EXTENSION pg_xpatch;"Requirements:
- PostgreSQL 16+ (with dev headers)
- Rust 1.70+
- cbindgen (
cargo install cbindgen)
git clone https://github.com/ImGajeed76/pg-xpatch
cd pg-xpatch
make clean && make && make install
psql -c "CREATE EXTENSION pg_xpatch;"A pre-configured Docker environment is available in .devcontainer/:
# Build and run
docker build -t pg-xpatch-dev .devcontainer/
docker run -d --name pg-xpatch-dev -v $(pwd):/workspace pg-xpatch-dev
# Build and test inside container
docker exec pg-xpatch-dev bash -c "cd /workspace && make && make install"
docker exec -u postgres pg-xpatch-dev psql -c "CREATE EXTENSION pg_xpatch;"For most tables, xpatch auto-detects the configuration:
- group_by: Not set (whole table is one version chain) or explicitly configured
- order_by: Last INT/SMALLINT/BIGINT/TIMESTAMP/TIMESTAMPTZ column before
_xp_seq - delta_columns: All TEXT, VARCHAR, BYTEA, JSON, JSONB columns
SELECT xpatch.configure('my_table',
group_by => 'doc_id', -- Column that groups versions (optional)
order_by => 'version', -- Column that orders versions
delta_columns => ARRAY['content', 'metadata']::text[], -- Columns to compress
keyframe_every => 100, -- Full snapshot every N versions (default: 100)
compress_depth => 1, -- How many previous versions to consider (default: 1)
enable_zstd => true -- Enable zstd compression (default: true)
);-- Full table description
SELECT * FROM xpatch.describe('my_table');
-- Just the config
SELECT * FROM xpatch.get_config('my_table');- Grouping: Rows are grouped by an optional
group_bycolumn (e.g., document ID) - Ordering: Within each group, rows are ordered by an
order_bycolumn (e.g., version number) - Keyframes: Every Nth row (default: 100) stores full content
- Deltas: Other rows store only the differences from the previous version
When you SELECT a delta-compressed row:
- Find the nearest keyframe
- Apply deltas sequentially to reconstruct the content
- Cache the result for future queries
xpatch automatically adds an _xp_seq column to track sequence numbers. This column:
- Is added automatically via event trigger on
CREATE TABLE ... USING xpatch - Is used internally for efficient delta chain traversal
- Is visible in
SELECT *(PostgreSQL doesn't support truly hidden columns) - Should be excluded in your queries if you don't want to see it:
SELECT doc_id, version, content FROM ...
xpatch automatically creates indexes for efficient lookups:
- Basic
_xp_seqindex on table creation - Composite
(group_by, _xp_seq)index whengroup_byis configured
# Run all tests (20 test files)
# First create the test database and extension
createdb xpatch_test
psql -d xpatch_test -c "CREATE EXTENSION pg_xpatch;"
# Then run all test files
for f in test/sql/*.sql; do
psql -d xpatch_test -f "$f"
done
# Or use the test runner
./test/run_tests.sh runThe basic test suite covers:
- Basic INSERT/SELECT operations
- Delta compression and reconstruction
- Keyframe behavior
- Index support
- Parallel scans
- DELETE with cascade
- VACUUM
- Error handling
- Edge cases (empty tables, NULL values, unusual types, large data)
In addition to the basic test suite, pg-xpatch has been validated with 240 comprehensive tests covering production scenarios. These tests are not yet included in the repository but document the testing that was performed.
| Category | Tests | Description |
|---|---|---|
| Data Types | 41 | TEXT, BYTEA, JSON, JSONB, Unicode, binary data, empty strings, very large content (100MB+) |
| Transactions | 12 | Commit, rollback, savepoints, nested transactions, isolation levels (READ COMMITTED, REPEATABLE READ, SERIALIZABLE) |
| SQL Compatibility | 25 | JOINs, CTEs, window functions, aggregates, subqueries, UNION/INTERSECT, EXPLAIN, prepared statements |
| Concurrency | 10 | Parallel inserts (different/same groups), concurrent read/write, connection storms, long vs short transactions |
| Crash Recovery | 4 | Kill -9 survival, data integrity after crash, repeated crash cycles |
| Adversarial Inputs | 54 | SQL injection (8 patterns), integer overflow/underflow, invalid UTF-8, malformed data, resource exhaustion |
| Edge Cases | 57 | Views, triggers, foreign keys, cursors, indexes, RETURNING clause, COPY operations, locking |
| Backup/Restore | 18 | pg_dump/pg_restore cycle, data integrity verification, extension upgrade simulation |
- Concurrency: ~68 inserts/sec with parallel workers, race conditions handled correctly
- Crash Recovery: Zero data loss after 3 consecutive kill -9 crashes, all delta chains intact
- SQL Injection: All 8 injection patterns safely stored as literal text (no execution)
- Large Transactions: 10,000 inserts in a single transaction commits successfully
- Stability: 60-second mixed workload (insert/read/delete) with zero errors
- Backup/Restore: Full pg_dump/pg_restore cycle preserves data integrity, OIDs auto-fixed on first access
- No UPDATE: Use INSERT with a new version number instead
- Ordered inserts only: Versions must be inserted in ascending order within each group
- Append-only design: Optimized for immutable version history
_xp_seqvisible: PostgreSQL doesn't support hidden columns- Cold read performance: First query on uncached data is slow
- Write overhead: Delta encoding adds INSERT latency
These issues exist in the current implementation and may be addressed in future versions:
- MVCC for reconstructed tuples: The
xpatch_tuple_satisfies_snapshot()function uses proper MVCC checks for buffer-backed tuples, but trusts that virtual tuples (created during delta reconstruction) were built from visible source tuples. This is correct behavior but means visibility is checked at reconstruction time, not query time.
These issues are documented for transparency. For typical workloads (versioned document storage, audit logs), they don't cause problems.
Thoroughly tested on PostgreSQL 16 with 240+ test cases. Other versions may work but are not officially supported.
pg-xpatch is dual-licensed: AGPL-3.0-or-later for open source, with a commercial option for proprietary use.
I'm a huge fan of open source. I also don't want massive corporations extracting value from community work without giving anything back. AGPL solves this - if you modify pg-xpatch and distribute it (including running it as a service), those improvements stay open.
That said, I'm not trying to build a licensing business here. This is about fairness, not revenue.
Probably not if you're:
- Building open source software (AGPL is perfect)
- A small team or indie developer
- Experimenting or doing research
- A startup figuring things out
Maybe if you're:
- A large company with AGPL restrictions
- Integrating this into proprietary infrastructure at scale
- Need legal certainty for closed-source use
Email me at xpatch-commercial@alias.oseifert.ch and let's talk.
Small businesses? Probably free - I just want to know who's using it and how.
Larger companies? Yeah, I'll ask for something, but it'll be reasonable. You have the resources to support open source work, so let's make it fair.
Would rather contribute code than pay? Even better. Help make pg-xpatch better and we'll figure out the licensing stuff.
I'm not interested in complex contracts or pricing games. Just don't be a massive corp that takes community work and gives nothing back. That's literally the only thing I'm trying to prevent.
If you contribute code, you're granting us rights to use it under both AGPL and commercial terms. This sounds scarier than it is - it just means we can handle licensing requests without tracking down every contributor for permission.
The AGPL version stays open forever. This just gives us flexibility to be reasonable with companies that need commercial licenses.
See LICENSE-AGPL.txt for the full text, or LICENSE-COMMERCIAL.txt for commercial terms.
Contributions are welcome! Please open an issue or pull request on GitHub.
Before submitting:
- Run the test suite (
test/sql/*.sql) - Add tests for new functionality
- Keep commits focused and well-documented
- xpatch library: https://github.com/ImGajeed76/xpatch
- Issue tracker: https://github.com/ImGajeed76/pg-xpatch/issues
- Tests:
test/sql/*.sql