Expand writing type support for String/FixedString types and add ReadStringsAsByteArrays setting#174
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces binary data handling capabilities for ClickHouse String and FixedString types, along with a new connection setting to read these types as raw byte arrays instead of UTF-8 strings.
Changes:
- Adds
ReadStringsAsByteArraysconnection setting to control whether String/FixedString columns are read asbyte[]orstring - Extends String/FixedString write support to accept
ReadOnlyMemory<byte>(NET6+) andStream(seekable and non-seekable) via BulkCopy - Updates type system infrastructure to propagate the new setting through
TypeSettings,ClickHouseClientSettings, and connection string builder
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| RELEASENOTES.md | Documents the new binary write support and ReadStringsAsByteArrays setting |
| ClickHouse.Driver/Types/TypeConverter.cs | Propagates ReadStringsAsByteArrays setting to StringType instances during type parsing |
| ClickHouse.Driver/Types/StringType.cs | Adds ReadAsByteArray property and implements binary read/write for byte[], ReadOnlyMemory, and Stream |
| ClickHouse.Driver/Types/FixedStringType.cs | Adds ReadAsByteArray property, implements binary read with UTF-8 decode fallback, and Stream/ReadOnlyMemory write support |
| ClickHouse.Driver/Types/BinaryTypeDecoder.cs | Passes ReadStringsAsByteArrays setting when decoding String/FixedString from binary protocol |
| ClickHouse.Driver/TypeSettings.cs | Adds readStringsAsByteArrays parameter to TypeSettings record |
| ClickHouse.Driver/ADO/ClickHouseDefaults.cs | Defines default value (false) for ReadStringsAsByteArrays setting |
| ClickHouse.Driver/ADO/ClickHouseConnectionStringBuilder.cs | Adds ReadStringsAsByteArrays property with connection string serialization |
| ClickHouse.Driver/ADO/ClickHouseConnection.cs | Passes ReadStringsAsByteArrays to TypeSettings constructor |
| ClickHouse.Driver/ADO/ClickHouseClientSettings.cs | Adds ReadStringsAsByteArrays property with copy constructor, equality, and ToString support |
| ClickHouse.Driver.Tests/Types/DynamicTests.cs | Tests ReadStringsAsByteArrays with Dynamic type wrapping String and FixedString |
| ClickHouse.Driver.Tests/BulkCopy/BulkCopyTests.cs | Comprehensive tests for ReadOnlyMemory, Stream (seekable/non-seekable), invalid UTF-8, and default behavior |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
a1b4d3e to
b399ad7
Compare
| { | ||
| return bytes; | ||
| } | ||
| return Encoding.UTF8.GetString(bytes); |
There was a problem hiding this comment.
Maybe it's better to make this a property of ClientSettings?
ClientSettings.StringEncoding, ClientSettings.FixedStringEncoding?
There was a problem hiding this comment.
Perhaps I'm misunderstanding your comment, but the value comes from ClientSettings (passed to TypeSettings -> ReadAsByteArray property).
Do you think separate settings for String and FixedString would be valuable?
There was a problem hiding this comment.
Encoding.UTF8 - encoding hard-coded in FixedStringType. Probably better to make it as a ClientSettings part.
…sByteArrays setting
b399ad7 to
b4af15d
Compare
| public async Task ShouldInsertFixedStringFromReadOnlyMemory() | ||
| { | ||
| var targetTable = "test.bulk_fixedstring_memory"; | ||
| var testData = new byte[] { 121, 122, 123, 124 }; |
There was a problem hiding this comment.
Looks like we have only the happy path. Can we also test if we have content with a size of 4+
var testData = new byte[] { 121, 122, 123, 124, 111, 111}; -> not happy path to test FixedString
There was a problem hiding this comment.
We have that below in eg WriteToServerAsync_FixedStringWithReadOnlyMemoryTooLong_ThrowsArgumentException
Note
High Risk
Changes core type mapping/serialization for
String/FixedString(including defaultFixedStringread type shifting tostring), plus new connection-level behavior that can affect all query results. This is broadly user-facing and could be a breaking behavior change for consumers expectingFixedStringasbyte[].Overview
Adds a new connection string/client setting
ReadStringsAsByteArraysthat switches howStringandFixedStringvalues are materialized on read (byte[]vs UTF-8string), and threads this throughClickHouseConnection/TypeSettings/binary type decoding.Expands
BulkCopy/type writers forStringandFixedStringto acceptStreamandReadOnlyMemory<byte>inputs (including buffering non-seekable streams and validatingFixedString(N)lengths with clearer errors). Updates tests, JSON handling forFixedStringvalues, public API metadata, release notes, and adds a newexamples/DataTypes_004_StringHandling.cswalkthrough.Written by Cursor Bugbot for commit 0b96ab2. This will update automatically on new commits. Configure here.