Fix sparse nullable serialization inconsistency in Tuple subcolumns by amosbird · Pull Request #91932 · ClickHouse/ClickHouse

amosbird · 2025-12-11T07:31:55Z

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a serialization inconsistency between sparse and nullable substreams in Tuple columns that could lead to corrupted parts or crashes during reading. This addresses #91851 . @Algunenano Could you please help check if this can fix the stress test in private repo? @CurtizJ Could you also help take a look please? Thanks!

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

clickhouse-gh · 2025-12-11T07:32:35Z

Workflow [PR], commit [642bad1]

Summary: ❌

job_name	test_name	status	info
Integration tests (amd_tsan, 1/6)		failure
	test_storage_nats/test_nats_jet_stream.py::test_nats_overloaded_insert	FAIL	cidb, issue
BuzzHouse (amd_debug)		failure
	Logical error: 'Inconsistent AST formatting in Function_arrayElement: the query:	FAIL	cidb, issue

Algunenano · 2025-12-11T09:30:10Z

src/DataTypes/Serializations/SerializationInfoSettings.cpp

+namespace DB
+{
+
+bool SerializationInfoSettings::supportsSparseSerialization(const IDataType & type) const


Let's find a different name, either for this one or for IDataType(). Otherwise it will be really easy to mess up again.

Not sure 100%, but maybe we could call this SerializationInfoSettings::shouldUseSparseSerialization. Another option is to keep only IDataType and make SerializationInfoSettings a mandatory parameter. WDTY?

Let's find a different name, either for this one or for IDataType(). Otherwise it will be really easy to mess up again.

Sure.

Another option is to keep only IDataType and make SerializationInfoSettings a mandatory parameter.

I did try that approach, but the required changes were large: every derivative type would need to be updated. Also, the semantics feel a bit off: whether a data type supports sparse encoding shouldn't really be governed by SerializationInfoSettings. That setting should describe limitations of a particular serialization pipeline (e.g. MergeTree serialization, or Native reader/writer), not the capabilities of the type system itself. DataTypeNullable should fundamentally support sparse encoding. It's the specific serialization path that may decide sparse is not allowed under certain SerializationInfoSettings.

I'll rename to SerializationInfoSettings::canUseSparseSerialization

Algunenano · 2025-12-11T12:01:06Z

The crash seems related:

clickhouse-server.err.log:2025.12.11 12:36:08.297481 [ 160989 ] {843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9} <Fatal> : Logical error: 'Bad cast from type DB::ColumnSparse to DB::ColumnArray'.
clickhouse-server.err.log:2025.12.11 12:36:08.297914 [ 160989 ] {843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9} <Fatal> : Format string: 'Bad cast from type {} to {}'.
clickhouse-server.err.log:2025.12.11 12:36:08.374617 [ 160989 ] {843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9} <Fatal> : Stack trace (when copying this message, always include the lines below):
clickhouse-server.err.log:2025.12.11 12:36:08.375168 [ 1143 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
clickhouse-server.err.log:2025.12.11 12:36:08.375324 [ 1143 ] {} <Fatal> BaseDaemon: (version 25.12.1.444, build id: EEBE8ABC8B4D8A05FE229F9E2FED973A2CF9E424, git hash: 7ddb4b0f3c60f709778458cb1358664b9d78e4c5, architecture: x86_64) (from thread 160989) Received signal 6
clickhouse-server.err.log:2025.12.11 12:36:08.375399 [ 1143 ] {} <Fatal> BaseDaemon: Signal description: Aborted
clickhouse-server.err.log:2025.12.11 12:36:08.375457 [ 1143 ] {} <Fatal> BaseDaemon: 
clickhouse-server.err.log:2025.12.11 12:36:08.375539 [ 1143 ] {} <Fatal> BaseDaemon: Stack trace: 0x00007ff76cb8b9fd 0x00007ff76cb37476 0x00007ff76cb1d7f3 0x0000562cd8e4ab85 0x0000562cd8e4d2ec 0x0000562cd8e4d9c2 0x0000562cc2e1f0f7 0x0000562cc2e1e227 0x0000562cc2e1ce3e 0x0000562cc5c1bcf2 0x0000562ce6c10c36 0x0000562cf37d0f1a 0x0000562cf37d35ab 0x0000562cf36299d2 0x0000562cf3611158 0x0000562cf35f310f 0x0000562cf36365d6 0x0000562d0206b02f 0x0000562d0206bd97 0x0000562d01f6096b 0x0000562d01f5a248 0x0000562cc2dc5527 0x00007ff76cb89ac3 0x00007ff76cc1b8c0
clickhouse-server.err.log:2025.12.11 12:36:08.375623 [ 1143 ] {} <Fatal> BaseDaemon: ########################################
clickhouse-server.err.log:2025.12.11 12:36:08.376002 [ 1143 ] {} <Fatal> BaseDaemon: (version 25.12.1.444, build id: EEBE8ABC8B4D8A05FE229F9E2FED973A2CF9E424, git hash: 7ddb4b0f3c60f709778458cb1358664b9d78e4c5) (from thread 160989) (query_id: 843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9) (query: select json.b as path, toTypeName(path) from test;) Received signal Aborted (6)
clickhouse-server.err.log:2025.12.11 12:36:08.376237 [ 1143 ] {} <Fatal> BaseDaemon: 
clickhouse-server.err.log:2025.12.11 12:36:08.376462 [ 1143 ] {} <Fatal> BaseDaemon: Stack trace: 0x00007ff76cb8b9fd 0x00007ff76cb37476 0x00007ff76cb1d7f3 0x0000562cd8e4ab85 0x0000562cd8e4d2ec 0x0000562cd8e4d9c2 0x0000562cc2e1f0f7 0x0000562cc2e1e227 0x0000562cc2e1ce3e 0x0000562cc5c1bcf2 0x0000562ce6c10c36 0x0000562cf37d0f1a 0x0000562cf37d35ab 0x0000562cf36299d2 0x0000562cf3611158 0x0000562cf35f310f 0x0000562cf36365d6 0x0000562d0206b02f 0x0000562d0206bd97 0x0000562d01f6096b 0x0000562d01f5a248 0x0000562cc2dc5527 0x00007ff76cb89ac3 0x00007ff76cc1b8c0
clickhouse-server.err.log:2025.12.11 12:36:08.376852 [ 1143 ] {} <Fatal> BaseDaemon: 3. pthread_kill @ 0x00000000000969fd
clickhouse-server.err.log:2025.12.11 12:36:08.377108 [ 1143 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x0000000000042476
clickhouse-server.err.log:2025.12.11 12:36:08.377215 [ 1143 ] {} <Fatal> BaseDaemon: 5. __lgamma_r_finite @ 0x00000000000287f3
clickhouse-server.err.log:2025.12.11 12:36:08.475190 [ 1143 ] {} <Fatal> BaseDaemon: 6. ./ci/tmp/build/./src/Common/Exception.cpp:54: DB::abortOnFailedAssertion(String const&, std::basic_string_view<char, std::char_traits<char>>, void* const*, unsigned long, unsigned long) @ 0x00000000246c2b85
clickhouse-server.err.log:2025.12.11 12:36:08.541622 [ 1143 ] {} <Fatal> BaseDaemon: 7. ./ci/tmp/build/./src/Common/Exception.cpp:87: DB::handle_error_code(String const&, std::basic_string_view<char, std::char_traits<char>>, int, bool, std::vector<void*, std::allocator<void*>> const&) @ 0x00000000246c52ec
clickhouse-server.err.log:2025.12.11 12:36:08.614064 [ 1143 ] {} <Fatal> BaseDaemon: 8. ./ci/tmp/build/./src/Common/Exception.cpp:138: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000246c59c2
clickhouse-server.err.log:2025.12.11 12:36:08.769073 [ 1143 ] {} <Fatal> BaseDaemon: 9. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000e6970f7
clickhouse-server.err.log:2025.12.11 12:36:08.830214 [ 1143 ] {} <Fatal> BaseDaemon: 10. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000e696227
clickhouse-server.err.log:2025.12.11 12:36:08.888819 [ 1143 ] {} <Fatal> BaseDaemon: 11. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&) @ 0x000000000e694e3e
clickhouse-server.err.log:2025.12.11 12:36:08.953797 [ 1143 ] {} <Fatal> BaseDaemon: 12. DB::ColumnArray const& assert_cast<DB::ColumnArray const&, DB::IColumn const&>(DB::IColumn const&) @ 0x0000000011493cf2
clickhouse-server.err.log:2025.12.11 12:36:09.029661 [ 1143 ] {} <Fatal> BaseDaemon: 13. ./ci/tmp/build/./src/DataTypes/Serializations/SerializationArray.cpp:302: DB::SerializationArray::serializeBinaryBulkStatePrefix(DB::IColumn const&, DB::ISerialization::SerializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::SerializeBinaryBulkState>&) const @ 0x0000000032488c36
clickhouse-server.err.log:2025.12.11 12:36:09.067419 [ 1143 ] {} <Fatal> BaseDaemon: 14. ./ci/tmp/build/./src/Formats/NativeWriter.cpp:100: DB::NativeWriter::writeData(DB::ISerialization const&, COW<DB::IColumn>::immutable_ptr<DB::IColumn> const&, DB::WriteBuffer&, std::optional<DB::FormatSettings> const&, unsigned long, unsigned long, unsigned long) @ 0x000000003f048f1a
clickhouse-server.err.log:2025.12.11 12:36:09.113060 [ 1143 ] {} <Fatal> BaseDaemon: 15. ./ci/tmp/build/./src/Formats/NativeWriter.cpp:224: DB::NativeWriter::write(DB::Block const&) @ 0x000000003f04b5ab
clickhouse-server.err.log:2025.12.11 12:36:09.476226 [ 1143 ] {} <Fatal> BaseDaemon: 16. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2720: DB::TCPHandler::sendData(DB::QueryState&, DB::Block const&) @ 0x000000003eea19d2
clickhouse-server.err.log:2025.12.11 12:36:09.721331 [ 1143 ] {} <Fatal> BaseDaemon: 17. ./ci/tmp/build/./src/Server/TCPHandler.cpp:1410: DB::TCPHandler::processOrdinaryQuery(DB::QueryState&) @ 0x000000003ee89158
clickhouse-server.err.log:2025.12.11 12:36:09.899546 [ 1143 ] {} <Fatal> BaseDaemon: 18. ./ci/tmp/build/./src/Server/TCPHandler.cpp:791: DB::TCPHandler::runImpl() @ 0x000000003ee6b10f
clickhouse-server.err.log:2025.12.11 12:36:10.211288 [ 1143 ] {} <Fatal> BaseDaemon: 19. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2873: DB::TCPHandler::run() @ 0x000000003eeae5d6
clickhouse-server.err.log:2025.12.11 12:36:10.219588 [ 1143 ] {} <Fatal> BaseDaemon: 20. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x000000004d8e302f
clickhouse-server.err.log:2025.12.11 12:36:10.233490 [ 1143 ] {} <Fatal> BaseDaemon: 21. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x000000004d8e3d97
clickhouse-server.err.log:2025.12.11 12:36:10.260049 [ 1143 ] {} <Fatal> BaseDaemon: 22. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x000000004d7d896b
clickhouse-server.err.log:2025.12.11 12:36:10.286417 [ 1143 ] {} <Fatal> BaseDaemon: 23. ./base/poco/Foundation/src/Thread_POSIX.cpp:341: Poco::ThreadImpl::runnableEntry(void*) @ 0x000000004d7d2248
clickhouse-server.err.log:2025.12.11 12:36:10.355473 [ 1143 ] {} <Fatal> BaseDaemon: 24. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:239: asan_thread_start(void*) @ 0x000000000e63d527
clickhouse-server.err.log:2025.12.11 12:36:10.355576 [ 1143 ] {} <Fatal> BaseDaemon: 25. ? @ 0x0000000000094ac3
clickhouse-server.err.log:2025.12.11 12:36:10.355648 [ 1143 ] {} <Fatal> BaseDaemon: 26. ? @ 0x00000000001268c0
clickhouse-server.err.log:2025.12.11 12:36:10.355747 [ 1143 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.
clickhouse-server.err.log:2025.12.11 12:36:15.260283 [ 1143 ] {} <Fatal> BaseDaemon: Changed settings: min_compress_block_size = 2225194, max_com

amosbird · 2025-12-11T12:18:11Z

The crash seems related:

Yes, it looks related. I'm digging into it now.

Algunenano · 2025-12-11T14:44:29Z

This also closes #91380. We can add a test later on

Algunenano

Some minor questions

Algunenano · 2025-12-11T15:23:56Z

src/DataTypes/IDataType.cpp

-        if (supportsSparseSerialization() && kind == ISerialization::Kind::SPARSE)
+        if (settings.canUseSparseSerialization(*this) && kind == ISerialization::Kind::SPARSE)
            serialization = std::make_shared<SerializationSparse>(serialization);
-        else if (kind == ISerialization::Kind::DETACHED)


Any reason for removing the else except that if you take the previous if you won't take the second one?

This is definitely an unintended change. It was probably introduced during some local debugging and not fully cleaned up. Good catch!

Algunenano · 2025-12-11T15:26:46Z

src/Formats/NativeReader.cpp

+            /// NativeReader must enable nullable sparse support here. Since it operates on in-memory state, it should
+            /// be able to handle all possible serialization variants.
+            SerializationInfoSettings settings;
+            settings.allowNullableSparse();


Is this necessary only if server >= DBMS_MIN_REVISION_WITH_NULLABLE_SPARSE_SERIALIZATION or it doesn't matter?

It doesn't matter. The newest reader should be able to handle all serialization variants.

azat · 2025-12-11T16:16:31Z

src/DataTypes/Serializations/SerializationInfoSettings.h

+
+    bool canUseSparseSerialization(const IDataType & type) const;
+
+    void allowNullableSparse() { nullable_serialization_version = MergeTreeNullableSerializationVersion::ALLOW_SPARSE; }


Minor, but creating object and call a method on it later looks adhoc, maybe adding separate method to create an object with nullable_serialization_version = MergeTreeNullableSerializationVersion::ALLOW_SPARSE will be better? i.e. createSerializationInfoSettingsWithNullableSparse (or a static function)? And the comment can be put there with an explanation

Yes. I’ll add a method that builds this setting with the widest serialization capability, so it can be extended in the future, along with proper documentation.

Algunenano

Thanks for the quick fix

Enable sparse nullable consistently

3dc690a

amosbird force-pushed the fix-91851 branch from b8c6d7b to 3dc690a Compare December 11, 2025 07:32

clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 11, 2025

Algunenano self-assigned this Dec 11, 2025

Algunenano mentioned this pull request Dec 11, 2025

Revert "Merge pull request #88999 from amosbird/cond-44539" #91851

Closed

Fix test

3e305bc

amosbird force-pushed the fix-91851 branch from 2c374b4 to 3e305bc Compare December 11, 2025 09:14

Algunenano reviewed Dec 11, 2025

View reviewed changes

Address review

d8e7717

Fix test

4718868

Algunenano reviewed Dec 11, 2025

View reviewed changes

Address review

8d5f9dd

azat reviewed Dec 11, 2025

View reviewed changes

amosbird added 3 commits December 12, 2025 00:30

Address another review

93933c2

Merge remote-tracking branch 'upstream/master' into fix-91851

966dc8e

Fix test

642bad1

Algunenano enabled auto-merge December 12, 2025 08:54

Algunenano approved these changes Dec 12, 2025

View reviewed changes

Algunenano added this pull request to the merge queue Dec 12, 2025

Merged via the queue into ClickHouse:master with commit e277cb3 Dec 12, 2025
127 of 130 checks passed

Algunenano mentioned this pull request Dec 12, 2025

Add test for #91380 #92042

Closed

1 task

robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 12, 2025

pamarcos mentioned this pull request Dec 12, 2025

Fix segfault with sparse values in NULL #91738

Closed

1 task


		bool canUseSparseSerialization(const IDataType & type) const;

		void allowNullableSparse() { nullable_serialization_version = MergeTreeNullableSerializationVersion::ALLOW_SPARSE; }

Conversation

amosbird commented Dec 11, 2025

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

clickhouse-gh bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Algunenano commented Dec 11, 2025

Uh oh!

amosbird commented Dec 11, 2025

Uh oh!

Algunenano commented Dec 11, 2025

Uh oh!

Algunenano left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Algunenano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clickhouse-gh bot commented Dec 11, 2025 •

edited

Loading