feat(connectors): Clickhouse Sink Connector#2886
feat(connectors): Clickhouse Sink Connector#2886kriti-sc wants to merge 3 commits intoapache:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2886 +/- ##
============================================
+ Coverage 68.36% 68.54% +0.18%
Complexity 739 739
============================================
Files 1053 1059 +6
Lines 84763 86337 +1574
Branches 61297 62879 +1582
============================================
+ Hits 57948 59183 +1235
- Misses 24448 24649 +201
- Partials 2367 2505 +138
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| //! RowBinary / RowBinaryWithDefaults byte serialization. | ||
| //! | ||
| //! Follows the ClickHouse binary format specification: | ||
| //! <https://clickhouse.com/docs/en/interfaces/formats#rowbinary> | ||
| //! | ||
| //! Key layout rules: | ||
| //! - All integers are **little-endian**. | ||
| //! - Strings are prefixed with an **unsigned LEB128 varint** length. | ||
| //! - `Nullable(T)`: 1-byte null marker (`0x01` = null, `0x00` = not null) | ||
| //! followed by T bytes when not null. | ||
| //! - `RowBinaryWithDefaults`: each top-level column is preceded by a 1-byte | ||
| //! flag (`0x01` = use server DEFAULT, `0x00` = value follows). |
There was a problem hiding this comment.
Would you mind explaining the reasoning to choose a bespoke implementation here over using the official Rust client? It uses HTTP and RowBinary serialization by default, so it's not clear what's being gained here.
There was a problem hiding this comment.
The official client is not suitable for use with Iggy because it requires the target table schema to be defined at compile time using statically typed Rust structs. In contrast, Iggy connectors expect the schema to be provided dynamically via configuration.
Even if the ClickHouse client were used, a dynamic encoder would still need to be implemented to convert runtime data into the required binary format. In that case, the client would only simplify some HTTP request handling while leaving the core complexity unresolved.
Supporting the binary ingestion format is important because it provides the best ingestion performance in ClickHouse.
Let me know if this addresses your question, or if there are other considerations I may have overlooked.
Which issue does this PR close?
Closes #2539
Rationale
Clickhouse is a real-time data analytics engine, and very popular in modern analytics architectures.
What changed?
This PR introduces a Clickhouse Sink Connector that enables writing data from Iggy to Clickhouse.
The Clickhouse writing logic is heavily inspired by the official Clickhouse Kafka Connector.
Local Execution
Images 1&2: Produced 30456 + 29060 rows into Iggy in two batches
Image 3: Verified schema and number of rows in Clickhouse
AI Usage