Skip to content

v1.17.1 -- COPY-based match log (#424)

Choose a tag to compare

@benzsevern benzsevern released this 22 May 01:53
· 535 commits to main since this release
c79675c

goldenmatch 1.17.1 -- 2026-05-22

Patch release for the throughput fix landed in PR #426.

Fix

User reported post-#423 ceiling at ~125 rows/sec writing to
gm_match_log. Root cause: psycopg3 executemany is NOT pipelined
without an explicit with conn.pipeline(): context, so each batch
incurred N round-trips at ~7ms RTT to managed Postgres.

Three changes:

  • cursor.copy() on psycopg3 in log_matches_batch. ~10-100x
    faster on bulk loads. Falls back to executemany for non-psycopg3
    cursors (SQLite / DuckDB test paths).
  • Buffer pairs across blocks before flushing. Default
    GOLDENMATCH_MATCH_LOG_FLUSH_PAIRS=10000. Set to 1 for per-block
    flushing (preserves historical incremental-progress behavior).
  • GOLDENMATCH_SKIP_MATCH_LOG=1 opt-out for nightly-cron
    pipelines that only consume gm_clusters / gm_golden_records.

Expected impact

User's ~3M-pair workload: previously ~5 hours of writes -> projected
< 5 min.

Full CHANGELOG: https://github.com/benseverndev-oss/goldenmatch/blob/v1.17.1/packages/python/goldenmatch/CHANGELOG.md