v1.17.1 -- COPY-based match log (#424)
goldenmatch 1.17.1 -- 2026-05-22
Patch release for the throughput fix landed in PR #426.
Fix
User reported post-#423 ceiling at ~125 rows/sec writing to
gm_match_log. Root cause: psycopg3 executemany is NOT pipelined
without an explicit with conn.pipeline(): context, so each batch
incurred N round-trips at ~7ms RTT to managed Postgres.
Three changes:
cursor.copy()on psycopg3 inlog_matches_batch. ~10-100x
faster on bulk loads. Falls back toexecutemanyfor non-psycopg3
cursors (SQLite / DuckDB test paths).- Buffer pairs across blocks before flushing. Default
GOLDENMATCH_MATCH_LOG_FLUSH_PAIRS=10000. Set to1for per-block
flushing (preserves historical incremental-progress behavior). GOLDENMATCH_SKIP_MATCH_LOG=1opt-out for nightly-cron
pipelines that only consumegm_clusters/gm_golden_records.
Expected impact
User's ~3M-pair workload: previously ~5 hours of writes -> projected
< 5 min.
Full CHANGELOG: https://github.com/benseverndev-oss/goldenmatch/blob/v1.17.1/packages/python/goldenmatch/CHANGELOG.md