Skip to content

Replace clickhouse-cpp with clickhouse-c#254

Merged
serprex merged 1 commit into
mainfrom
binary-c
May 28, 2026
Merged

Replace clickhouse-cpp with clickhouse-c#254
serprex merged 1 commit into
mainfrom
binary-c

Conversation

@serprex
Copy link
Copy Markdown
Member

@serprex serprex commented May 21, 2026

Resolves #241 and #254.

@serprex serprex requested a review from theory May 21, 2026 22:24
@serprex serprex force-pushed the binary-c branch 11 times, most recently from 17d92e9 to 500c05b Compare May 22, 2026 14:35
Comment thread README.md
Comment thread README.md Outdated
Comment thread src/binary/insert.c
Comment thread src/binary.c Outdated
Comment thread src/binary.c Outdated
@serprex serprex force-pushed the binary-c branch 4 times, most recently from b4bc592 to 9ee9486 Compare May 27, 2026 05:53
@theory theory changed the title replace clickhouse-cpp with clickhouse-c Replace clickhouse-cpp with clickhouse-c May 27, 2026
Comment thread src/binary/binary.c Outdated
Comment thread src/binary/binary.c Outdated
Comment thread src/binary/connection.c Outdated
Comment thread src/binary/connection.c
@serprex serprex force-pushed the binary-c branch 2 times, most recently from b9e53d5 to 8c62cc3 Compare May 27, 2026 13:05
Copy link
Copy Markdown
Collaborator

@theory theory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO close! Happy to help resolve some of these. Would also like to squash it into a single commit with a nice description.

Comment thread src/binary/encode.c Outdated
Comment thread src/binary/decode.c
Comment thread src/binary/decode.c Outdated
Comment thread src/binary/decode.c Outdated
Comment thread src/binary/decode.c Outdated
Comment thread src/include/binary.h Outdated
Comment thread src/include/binary.h
Comment thread src/binary/convert.c Outdated
Comment thread src/binary/convert.c Outdated
Comment thread src/pglink.c Outdated
@serprex serprex force-pushed the binary-c branch 7 times, most recently from bad039f to 8548ff7 Compare May 28, 2026 00:23
Comment thread src/binary/encode.c Outdated
Copy link
Copy Markdown
Collaborator

@theory theory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.

Comment thread test/expected/import_schema.out
Comment thread test/expected/binary_queries_8.out Outdated
@serprex serprex force-pushed the binary-c branch 2 times, most recently from 9214615 to 300dfc5 Compare May 28, 2026 17:19
The clickhouse-cpp library's use of [RAII] and C++ exceptions conflicted
with PostgreSQL's memory management and `PG_TRY`/`setjmp`/`longjmp`
patterns. Analysis of these conflicts lead to the conclusion that
continuing to use the C++ library was unsafe.

Instead, replace the vendored clickhouse-cpp library with the new
headers-only [clickhouse-c] client, also vendored. This package provides
an interface for memory management functions, the `chc_alloc` struct,
mapped here to the Postgres `palloc` family of functions, to ensure
consistent memory handling. It also supports fetching results by block,
rather than all at once, for reduced memory consumption.

This change results in greatly reduced build time without the need to
compile the vendored C++ library, and reduces the size of the resulting
shared library ~75%.

The new library supports all the same features as clickhouse-cpp, but at
a lower level, such that we must handle the conversion of values from
the ClickHouse binary wire format. Most of the conversions are fairly
straightforward, with a few exceptions:

*   Encoding and decoding Decimal values
*   Encoding parameters requires special quoting and escaping
*   Connecting via TCP or TLS requires lower-level socket configuration

However, it also means we no longer rely on clickhouse-cpp's vendored
dependencies (absl, cityhash, lz4, and zstd). Instead, we simply import
the appropriate headers to support compression and encryption and
require the necessary libraries (`liblz4` and `libzstd`). In other
words, no more vendoring aside from the headers.

Take advantage of the consistent use of PostgreSQL's memory management by
creating memory contexts specific to each operation for easy cleanup.

Replace the `src/binary.cpp` and `src/convert.c` files with the
`src/binary` directory with various responsibilities split by `.c` file
for clearer organization:

*   `binary.c` - core glue for the driver
*   `binary_internal.h` - Private state for the driver
*   `connection.c` - TCP with and without TLS connections
*   `convert.c` - Binary value conversion, mostly unchanged from
    `src/convert.c`
*   `decode.c` - Convert ClickHouse wire column values to Postgres
    Datums; used for `SELECT`
*   `encode.c` - Convert Postgres Datums to ClickHouse binary values and
    append to columns; used for `INSERT`
*   `insert.c` - Handles `INSERT` process: prepare, insert (flush), and
    finalization, along with appending values to ClickHouse columns
*   `select.c` - Handles `SELECT` process: simple query, fetching rows
    (pump) and cleanup; returns results by block, rather than buffering
    all the results at once

Preserve all previous behaviors with three exceptions:

*   UInt16 values were previously incorrectly cast to `int16` instead of
    `int32`. Fixed here, along with the tests, bringing the behavior in
    line with the http driver.
*   Bool values cast to `bool`.
*   Improved some messages and added additional query context to error
    messages.

Update the `Dockerfile` to require `liblz4` and `libzstd`.

Vastly simplify `Makefile`, now that we no longer need to manage a
vendored C++ library that uses `cmake`. Instead we simply add all the C
files to `OBJS` and pull the vendored submodule if it's not already
present (it ships with release packages). We also add the no-longer
vendored `lz4` and `zstd` dependencies to `PG_LDFLAGS`. We also do away
with `clang-format`, which we'd used only for the C++ files, and teach
`pg_bsd_indent` and `clang-tidy` to find the new `*.c` files in
`src/binary`.

Update the GitHub workflow to eliminate caching, since we no longer
build `clickhouse-cpp`, and to install the `lz4` and `zstd`
dependencies. We also remove the separate static and dynamic tests,
since there is just the static build now.

  [RAII]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization
  [clickhouse-c]: https://github.com/ClickHouse/clickhouse-c

Co-authored-by: David E. Wheeler <david.wheeler@clickhouse.com>
@serprex serprex merged commit cbe39e5 into main May 28, 2026
21 checks passed
@serprex serprex deleted the binary-c branch May 28, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit and fix binary driver RAII vs PG_TRY setjmp/longjmp issues

2 participants