pg-xpatch

A PostgreSQL Table Access Method (TAM) extension for storing versioned data with automatic delta compression. Built on the xpatch delta encoding library.

What It Does

Store versioned rows (document revisions, config history, audit logs, etc.) with massive space savings. Compression is automatic and transparent - you just INSERT and SELECT normally.

-- Create a table using the xpatch access method
CREATE TABLE documents (
    doc_id   INT,
    version  INT,
    content  TEXT
) USING xpatch;

-- Configure grouping and ordering (optional - auto-detection works for most cases)
SELECT xpatch.configure('documents',
    group_by => 'doc_id',
    order_by => 'version'
);

-- Insert versions normally
INSERT INTO documents VALUES (1, 1, 'Hello World');
INSERT INTO documents VALUES (1, 2, 'Hello PostgreSQL World!');
INSERT INTO documents VALUES (1, 3, 'Hello PostgreSQL World! Updated.');

-- Query normally - reconstruction is automatic
SELECT * FROM documents WHERE doc_id = 1 ORDER BY version;

-- Check compression stats
SELECT * FROM xpatch_stats('documents');

Features

What Works

Automatic delta compression - Only stores differences between versions
Transparent reconstruction - SELECT works normally, deltas are decoded on-the-fly
Auto-detection - Automatically detects group_by, order_by, and delta columns
Parallel scans - Full support for parallel query execution
Index support - B-tree indexes work on all columns (including delta-compressed ones)
MVCC - Basic multi-version concurrency control
DELETE - Cascade delete removes a version and all subsequent versions in the chain
VACUUM - Dead tuple cleanup works
WAL logging - Crash recovery supported
Shared memory cache - LRU cache for reconstructed content across all backends

What Doesn't Work (By Design)

UPDATE - Not supported. Insert a new version instead. This is intentional for append-only versioned data.
Out-of-order inserts - Versions must be inserted in order within each group
Hidden columns - The internal _xp_seq column is visible in SELECT * (PostgreSQL limitation)

Utility Functions

-- Describe a table: shows config, schema, and storage stats
SELECT * FROM xpatch.describe('documents');

-- Warm the cache for faster subsequent queries
SELECT * FROM xpatch.warm_cache('documents');
SELECT * FROM xpatch.warm_cache('documents', max_groups => 100);

-- Get compression statistics
SELECT * FROM xpatch_stats('documents');

-- Inspect internal storage for a specific group (debugging/analysis)
SELECT * FROM xpatch_inspect('documents', 1);  -- group_value = 1

-- Get cache statistics (requires shared_preload_libraries)
SELECT * FROM xpatch_cache_stats();

-- Get xpatch library version
SELECT xpatch_version();

-- Dump all table configs as SQL (for backup/migration)
SELECT * FROM xpatch.dump_configs();

-- Fix config OIDs after pg_restore
SELECT xpatch.fix_restored_configs();

Performance

Storage Compression

Space savings depend heavily on your data patterns. Here are real benchmarks with 5000 rows (100 documents x 50 versions each):

Data Pattern	xpatch Size	heap Size	Space Saved
Incremental changes (base content + small additions)	416 KB	9.2 MB	95% smaller
Identical content across versions	488 KB	648 KB	25% smaller
Completely random data	816 KB	728 KB	12% larger (overhead)

Key insight: xpatch shines when versions share content. For typical document versioning (where each version is similar to the previous), expect 10-20x space savings. For random/unrelated data, xpatch adds overhead and provides no benefit.

Query Performance

Important: These benchmarks are rough indicators, not precise measurements. Your mileage will vary based on hardware, data patterns, cache state, and query complexity.

Benchmark setup: 10,100 rows (101 documents x 100 versions), incremental text data. xpatch: 776 KB, heap: 17 MB.

Operation	xpatch	heap	Slowdown
Full table COUNT
- Cold cache	44ms	2.6ms	17x slower
- Warm cache	20ms	1.4ms	14x slower
Point lookup (single doc, 100 rows)	0.7ms	0.05ms	14x slower
Point lookup (single row)	0.13ms	0.02ms	6x slower
GROUP BY aggregate	27ms	3ms	9x slower
Latest version per doc	28ms	5.5ms	5x slower
Text search (LIKE)	3.4ms	1.5ms	2x slower
INSERT (100 rows)	33ms	0.3ms	100x slower
Parallel scan (2 workers)	31ms	3.5ms	9x slower

Key observations:

Reads are 5-17x slower due to delta reconstruction overhead
Writes are ~100x slower due to delta encoding (this is the main trade-off)
Cache helps but doesn't eliminate the reconstruction cost
Indexed lookups are faster than full scans but still have overhead

A note on write performance:

"100x slower" sounds alarming, but look at the absolute numbers: 33ms for 100 rows is ~0.33ms per row. For versioned document storage, audit logs, or real-time collaboration (saving state every few seconds), sub-millisecond writes are more than fast enough.

Also remember that writes parallelize across groups. If you have 50 users editing 50 different documents, all 50 writes happen concurrently. Sequential writes only apply within a single group's version chain—which is inherent to delta compression, not a limitation.

When to use xpatch:

Storage cost is a primary concern (95% space savings)
Data is written once and read occasionally
Append-only versioned data (audit logs, document history, config snapshots)
You can tolerate higher write latency

When NOT to use xpatch:

High-frequency reads on the same data
Write-heavy workloads where latency matters
Data with no similarity between versions (random data)

Cache Behavior

The shared memory cache dramatically improves read performance for repeated access:

-- First query (cold): ~35ms for 5000 rows
SELECT COUNT(*) FROM documents;

-- Second query (warm): ~1ms for 5000 rows  
SELECT COUNT(*) FROM documents;

To enable the shared memory cache, add to postgresql.conf:

shared_preload_libraries = 'pg_xpatch'

Real-World Benchmark: pg-xpatch vs Git

To test pg-xpatch against a real versioning system, we stored the complete file history of the tokio repository (35,827 file versions across 2,641 files, 296 MB raw content) and compared storage size and random access speed against git's packfile format.

Setup:

group_by=path, compress_depth=1000, keyframe_every=100, enable_zstd=true
Each row stores: file path, content, commit message, commit hash, parent hash, timestamp
No optimization was done — the schema is a single flat denormalized table with commit metadata repeated per row, no content deduplication, no normalization. Just naive INSERTs and let xpatch handle compression.

Storage comparison (total on-disk size):

Storage Method	Total Size	What's Included
Raw file content	296 MB	Just the file blobs
Git (normal bare clone)	18 MB	Packfile + commit/tree objects + refs
Git (`gc --aggressive`)	11 MB	Same, with aggressive packing
pg-xpatch	12 MB	Table + TOAST + indexes + per-row metadata

Git with aggressive packing is slightly smaller (11 MB vs 12 MB), but pg-xpatch provides full SQL queryability over the data. Both systems store more than just file content — git stores commit/tree objects, pg-xpatch stores commit hashes, timestamps, messages, and indexes.

Internal delta compression: xpatch's delta engine achieves 42x compression on the content columns alone (296 MB → 7 MB), but the total relation size (12 MB) includes non-delta columns and index overhead.

Random access performance (100 random point lookups, warmed cache):

Method	Median	p95	Max
pg-xpatch	0.20 ms	< 1 ms	22 ms
git (`git show`)	1.32 ms	1.47 ms	1.81 ms

pg-xpatch is ~6.5x faster on median, with 95% of queries under 1 ms. Git has more consistent latency (tight 1-2 ms range), while pg-xpatch has occasional tail latency spikes from cold chain reconstructions.

Insert and engine performance:

The delta engine performs up to 22,000 base comparisons per second (~45μs per comparison). Each insert compares against up to compress_depth previous versions to find the best delta base, so practical insert speed depends on how many comparisons are needed. With group_by, chains are short (avg 13.57 versions here), meaning fewer comparisons per insert and higher row throughput — up to 336 rows/s for this dataset.

Parallel writes: Since groups are independent, multiple groups can be written concurrently from separate connections with no performance degradation.

Installation

Docker (Easiest)

# Run PostgreSQL with pg-xpatch pre-installed
docker run -d --name pg-xpatch \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=secret \
  ghcr.io/imgajeed76/pg-xpatch:latest

# Connect and enable the extension
psql -h localhost -U postgres -c "CREATE EXTENSION pg_xpatch;"

Pre-built Binaries

Download from GitHub Releases:

# Download and extract (replace VERSION with actual version, e.g., v0.1.1)
tar -xzf pg_xpatch-VERSION-pg16-linux-amd64.tar.gz
cd pg_xpatch-VERSION-pg16-linux-amd64

# Install
sudo cp pg_xpatch.so $(pg_config --pkglibdir)/
sudo cp pg_xpatch.control *.sql $(pg_config --sharedir)/extension/

# (Optional) Enable shared memory cache - add to postgresql.conf:
# shared_preload_libraries = 'pg_xpatch'

# Restart PostgreSQL, then:
psql -c "CREATE EXTENSION pg_xpatch;"

Building from Source

Requirements:

PostgreSQL 16+ (with dev headers)
Rust 1.70+
cbindgen (cargo install cbindgen)

git clone https://github.com/ImGajeed76/pg-xpatch
cd pg-xpatch
make clean && make && make install
psql -c "CREATE EXTENSION pg_xpatch;"

Docker Development Environment

A pre-configured Docker environment is available in .devcontainer/:

# Build and run
docker build -t pg-xpatch-dev .devcontainer/
docker run -d --name pg-xpatch-dev -v $(pwd):/workspace pg-xpatch-dev

# Build and test inside container
docker exec pg-xpatch-dev bash -c "cd /workspace && make && make install"
docker exec -u postgres pg-xpatch-dev psql -c "CREATE EXTENSION pg_xpatch;"

Configuration

Auto-Detection

For most tables, xpatch auto-detects the configuration:

group_by: Not set (whole table is one version chain) or explicitly configured
order_by: Last INT/SMALLINT/BIGINT/TIMESTAMP/TIMESTAMPTZ column before _xp_seq
delta_columns: All TEXT, VARCHAR, BYTEA, JSON, JSONB columns

Explicit Configuration

SELECT xpatch.configure('my_table',
    group_by => 'doc_id',           -- Column that groups versions (optional)
    order_by => 'version',          -- Column that orders versions
    delta_columns => ARRAY['content', 'metadata']::text[],  -- Columns to compress
    keyframe_every => 100,          -- Full snapshot every N versions (default: 100)
    compress_depth => 1,            -- How many previous versions to consider (default: 1)
    enable_zstd => true             -- Enable zstd compression (default: true)
);

Inspecting Configuration

-- Full table description
SELECT * FROM xpatch.describe('my_table');

-- Just the config
SELECT * FROM xpatch.get_config('my_table');

How It Works

Storage Model

Grouping: Rows are grouped by an optional group_by column (e.g., document ID)
Ordering: Within each group, rows are ordered by an order_by column (e.g., version number)
Keyframes: Every Nth row (default: 100) stores full content
Deltas: Other rows store only the differences from the previous version

Reconstruction

When you SELECT a delta-compressed row:

Find the nearest keyframe
Apply deltas sequentially to reconstruct the content
Cache the result for future queries

Internal Column

xpatch automatically adds an _xp_seq column to track sequence numbers. This column:

Is added automatically via event trigger on CREATE TABLE ... USING xpatch
Is used internally for efficient delta chain traversal
Is visible in SELECT * (PostgreSQL doesn't support truly hidden columns)
Should be excluded in your queries if you don't want to see it: SELECT doc_id, version, content FROM ...

Automatic Indexes

xpatch automatically creates indexes for efficient lookups:

Basic _xp_seq index on table creation
Composite (group_by, _xp_seq) index when group_by is configured

Testing

Basic Test Suite

# Run all tests (20 test files)
# First create the test database and extension
createdb xpatch_test
psql -d xpatch_test -c "CREATE EXTENSION pg_xpatch;"

# Then run all test files
for f in test/sql/*.sql; do
    psql -d xpatch_test -f "$f"
done

# Or use the test runner
./test/run_tests.sh run

The basic test suite covers:

Basic INSERT/SELECT operations
Delta compression and reconstruction
Keyframe behavior
Index support
Parallel scans
DELETE with cascade
VACUUM
Error handling
Edge cases (empty tables, NULL values, unusual types, large data)

Comprehensive Stress Testing

In addition to the basic test suite, pg-xpatch has been validated with 240 comprehensive tests covering production scenarios. These tests are not yet included in the repository but document the testing that was performed.

Test Categories

Category	Tests	Description
Data Types	41	TEXT, BYTEA, JSON, JSONB, Unicode, binary data, empty strings, very large content (100MB+)
Transactions	12	Commit, rollback, savepoints, nested transactions, isolation levels (READ COMMITTED, REPEATABLE READ, SERIALIZABLE)
SQL Compatibility	25	JOINs, CTEs, window functions, aggregates, subqueries, UNION/INTERSECT, EXPLAIN, prepared statements
Concurrency	10	Parallel inserts (different/same groups), concurrent read/write, connection storms, long vs short transactions
Crash Recovery	4	Kill -9 survival, data integrity after crash, repeated crash cycles
Adversarial Inputs	54	SQL injection (8 patterns), integer overflow/underflow, invalid UTF-8, malformed data, resource exhaustion
Edge Cases	57	Views, triggers, foreign keys, cursors, indexes, RETURNING clause, COPY operations, locking
Backup/Restore	18	pg_dump/pg_restore cycle, data integrity verification, extension upgrade simulation

Key Results

Concurrency: ~68 inserts/sec with parallel workers, race conditions handled correctly
Crash Recovery: Zero data loss after 3 consecutive kill -9 crashes, all delta chains intact
SQL Injection: All 8 injection patterns safely stored as literal text (no execution)
Large Transactions: 10,000 inserts in a single transaction commits successfully
Stability: 60-second mixed workload (insert/read/delete) with zero errors
Backup/Restore: Full pg_dump/pg_restore cycle preserves data integrity, OIDs auto-fixed on first access

Limitations and Known Issues

Intentional Limitations

No UPDATE: Use INSERT with a new version number instead
Ordered inserts only: Versions must be inserted in ascending order within each group
Append-only design: Optimized for immutable version history

Current Limitations (May Be Addressed Later)

_xp_seq visible: PostgreSQL doesn't support hidden columns
Cold read performance: First query on uncached data is slow
Write overhead: Delta encoding adds INSERT latency

Technical Debt (Known Implementation Issues)

These issues exist in the current implementation and may be addressed in future versions:

MVCC for reconstructed tuples: The xpatch_tuple_satisfies_snapshot() function uses proper MVCC checks for buffer-backed tuples, but trusts that virtual tuples (created during delta reconstruction) were built from visible source tuples. This is correct behavior but means visibility is checked at reconstruction time, not query time.

These issues are documented for transparency. For typical workloads (versioned document storage, audit logs), they don't cause problems.

PostgreSQL Version

Thoroughly tested on PostgreSQL 16 with 240+ test cases. Other versions may work but are not officially supported.

License

pg-xpatch is dual-licensed: AGPL-3.0-or-later for open source, with a commercial option for proprietary use.

The Philosophy

I'm a huge fan of open source. I also don't want massive corporations extracting value from community work without giving anything back. AGPL solves this - if you modify pg-xpatch and distribute it (including running it as a service), those improvements stay open.

That said, I'm not trying to build a licensing business here. This is about fairness, not revenue.

Do You Need a Commercial License?

Probably not if you're:

Building open source software (AGPL is perfect)
A small team or indie developer
Experimenting or doing research
A startup figuring things out

Maybe if you're:

A large company with AGPL restrictions
Integrating this into proprietary infrastructure at scale
Need legal certainty for closed-source use

How Commercial Licensing Works

Email me at xpatch-commercial@alias.oseifert.ch and let's talk.

Small businesses? Probably free - I just want to know who's using it and how.

Larger companies? Yeah, I'll ask for something, but it'll be reasonable. You have the resources to support open source work, so let's make it fair.

Would rather contribute code than pay? Even better. Help make pg-xpatch better and we'll figure out the licensing stuff.

I'm not interested in complex contracts or pricing games. Just don't be a massive corp that takes community work and gives nothing back. That's literally the only thing I'm trying to prevent.

Contributor License Agreement

If you contribute code, you're granting us rights to use it under both AGPL and commercial terms. This sounds scarier than it is - it just means we can handle licensing requests without tracking down every contributor for permission.

The AGPL version stays open forever. This just gives us flexibility to be reasonable with companies that need commercial licenses.

See LICENSE-AGPL.txt for the full text, or LICENSE-COMMERCIAL.txt for commercial terms.

Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

Before submitting:

Run the test suite (test/sql/*.sql)
Add tests for new functionality
Keep commits focused and well-documented

Links

xpatch library: https://github.com/ImGajeed76/xpatch
Issue tracker: https://github.com/ImGajeed76/pg-xpatch/issues
Tests: test/sql/*.sql

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
benchmark		benchmark
examples/notes-app		examples/notes-app
lib/blake3		lib/blake3
sql		sql
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE-AGPL.txt		LICENSE-AGPL.txt
LICENSE-COMMERCIAL.txt		LICENSE-COMMERCIAL.txt
Makefile		Makefile
README.md		README.md
pg_xpatch.control		pg_xpatch.control

License

Licenses found

ImGajeed76/pg-xpatch

Folders and files

Latest commit

History

Repository files navigation

pg-xpatch

What It Does

Features

What Works

What Doesn't Work (By Design)

Utility Functions

Performance

Storage Compression

Query Performance

Cache Behavior

Real-World Benchmark: pg-xpatch vs Git

Installation

Docker (Easiest)

Pre-built Binaries

Building from Source

Docker Development Environment

Configuration

Auto-Detection

Explicit Configuration

Inspecting Configuration

How It Works

Storage Model

Reconstruction

Internal Column

Automatic Indexes

Testing

Basic Test Suite

Comprehensive Stress Testing

Test Categories

Key Results

Limitations and Known Issues

Intentional Limitations

Current Limitations (May Be Addressed Later)

Technical Debt (Known Implementation Issues)

PostgreSQL Version

License

The Philosophy

Do You Need a Commercial License?

How Commercial Licensing Works

Contributor License Agreement

Contributing

Links

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages