Skip to content

Fix sparse nullable serialization inconsistency in Tuple subcolumns#91932

Merged
Algunenano merged 8 commits intoClickHouse:masterfrom
amosbird:fix-91851
Dec 12, 2025
Merged

Fix sparse nullable serialization inconsistency in Tuple subcolumns#91932
Algunenano merged 8 commits intoClickHouse:masterfrom
amosbird:fix-91851

Conversation

@amosbird
Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a serialization inconsistency between sparse and nullable substreams in Tuple columns that could lead to corrupted parts or crashes during reading. This addresses #91851 . @Algunenano Could you please help check if this can fix the stress test in private repo? @CurtizJ Could you also help take a look please? Thanks!

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Dec 11, 2025

Workflow [PR], commit [642bad1]

Summary:

job_name test_name status info comment
Integration tests (amd_tsan, 1/6) failure
test_storage_nats/test_nats_jet_stream.py::test_nats_overloaded_insert FAIL cidb, issue
BuzzHouse (amd_debug) failure
Logical error: 'Inconsistent AST formatting in Function_arrayElement: the query: FAIL cidb, issue

@clickhouse-gh clickhouse-gh bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 11, 2025
@Algunenano Algunenano self-assigned this Dec 11, 2025
namespace DB
{

bool SerializationInfoSettings::supportsSparseSerialization(const IDataType & type) const
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find a different name, either for this one or for IDataType(). Otherwise it will be really easy to mess up again.

Not sure 100%, but maybe we could call this SerializationInfoSettings::shouldUseSparseSerialization. Another option is to keep only IDataType and make SerializationInfoSettings a mandatory parameter. WDTY?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's find a different name, either for this one or for IDataType(). Otherwise it will be really easy to mess up again.

Sure.

Another option is to keep only IDataType and make SerializationInfoSettings a mandatory parameter.

I did try that approach, but the required changes were large: every derivative type would need to be updated. Also, the semantics feel a bit off: whether a data type supports sparse encoding shouldn't really be governed by SerializationInfoSettings. That setting should describe limitations of a particular serialization pipeline (e.g. MergeTree serialization, or Native reader/writer), not the capabilities of the type system itself. DataTypeNullable should fundamentally support sparse encoding. It's the specific serialization path that may decide sparse is not allowed under certain SerializationInfoSettings.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename to SerializationInfoSettings::canUseSparseSerialization

@Algunenano
Copy link
Copy Markdown
Member

The crash seems related:

clickhouse-server.err.log:2025.12.11 12:36:08.297481 [ 160989 ] {843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9} <Fatal> : Logical error: 'Bad cast from type DB::ColumnSparse to DB::ColumnArray'.
clickhouse-server.err.log:2025.12.11 12:36:08.297914 [ 160989 ] {843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9} <Fatal> : Format string: 'Bad cast from type {} to {}'.
clickhouse-server.err.log:2025.12.11 12:36:08.374617 [ 160989 ] {843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9} <Fatal> : Stack trace (when copying this message, always include the lines below):
clickhouse-server.err.log:2025.12.11 12:36:08.375168 [ 1143 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
clickhouse-server.err.log:2025.12.11 12:36:08.375324 [ 1143 ] {} <Fatal> BaseDaemon: (version 25.12.1.444, build id: EEBE8ABC8B4D8A05FE229F9E2FED973A2CF9E424, git hash: 7ddb4b0f3c60f709778458cb1358664b9d78e4c5, architecture: x86_64) (from thread 160989) Received signal 6
clickhouse-server.err.log:2025.12.11 12:36:08.375399 [ 1143 ] {} <Fatal> BaseDaemon: Signal description: Aborted
clickhouse-server.err.log:2025.12.11 12:36:08.375457 [ 1143 ] {} <Fatal> BaseDaemon: 
clickhouse-server.err.log:2025.12.11 12:36:08.375539 [ 1143 ] {} <Fatal> BaseDaemon: Stack trace: 0x00007ff76cb8b9fd 0x00007ff76cb37476 0x00007ff76cb1d7f3 0x0000562cd8e4ab85 0x0000562cd8e4d2ec 0x0000562cd8e4d9c2 0x0000562cc2e1f0f7 0x0000562cc2e1e227 0x0000562cc2e1ce3e 0x0000562cc5c1bcf2 0x0000562ce6c10c36 0x0000562cf37d0f1a 0x0000562cf37d35ab 0x0000562cf36299d2 0x0000562cf3611158 0x0000562cf35f310f 0x0000562cf36365d6 0x0000562d0206b02f 0x0000562d0206bd97 0x0000562d01f6096b 0x0000562d01f5a248 0x0000562cc2dc5527 0x00007ff76cb89ac3 0x00007ff76cc1b8c0
clickhouse-server.err.log:2025.12.11 12:36:08.375623 [ 1143 ] {} <Fatal> BaseDaemon: ########################################
clickhouse-server.err.log:2025.12.11 12:36:08.376002 [ 1143 ] {} <Fatal> BaseDaemon: (version 25.12.1.444, build id: EEBE8ABC8B4D8A05FE229F9E2FED973A2CF9E424, git hash: 7ddb4b0f3c60f709778458cb1358664b9d78e4c5) (from thread 160989) (query_id: 843c1d99-aa5c-4990-bbfa-cb0cfe6a4ff9) (query: select json.b as path, toTypeName(path) from test;) Received signal Aborted (6)
clickhouse-server.err.log:2025.12.11 12:36:08.376237 [ 1143 ] {} <Fatal> BaseDaemon: 
clickhouse-server.err.log:2025.12.11 12:36:08.376462 [ 1143 ] {} <Fatal> BaseDaemon: Stack trace: 0x00007ff76cb8b9fd 0x00007ff76cb37476 0x00007ff76cb1d7f3 0x0000562cd8e4ab85 0x0000562cd8e4d2ec 0x0000562cd8e4d9c2 0x0000562cc2e1f0f7 0x0000562cc2e1e227 0x0000562cc2e1ce3e 0x0000562cc5c1bcf2 0x0000562ce6c10c36 0x0000562cf37d0f1a 0x0000562cf37d35ab 0x0000562cf36299d2 0x0000562cf3611158 0x0000562cf35f310f 0x0000562cf36365d6 0x0000562d0206b02f 0x0000562d0206bd97 0x0000562d01f6096b 0x0000562d01f5a248 0x0000562cc2dc5527 0x00007ff76cb89ac3 0x00007ff76cc1b8c0
clickhouse-server.err.log:2025.12.11 12:36:08.376852 [ 1143 ] {} <Fatal> BaseDaemon: 3. pthread_kill @ 0x00000000000969fd
clickhouse-server.err.log:2025.12.11 12:36:08.377108 [ 1143 ] {} <Fatal> BaseDaemon: 4. gsignal @ 0x0000000000042476
clickhouse-server.err.log:2025.12.11 12:36:08.377215 [ 1143 ] {} <Fatal> BaseDaemon: 5. __lgamma_r_finite @ 0x00000000000287f3
clickhouse-server.err.log:2025.12.11 12:36:08.475190 [ 1143 ] {} <Fatal> BaseDaemon: 6. ./ci/tmp/build/./src/Common/Exception.cpp:54: DB::abortOnFailedAssertion(String const&, std::basic_string_view<char, std::char_traits<char>>, void* const*, unsigned long, unsigned long) @ 0x00000000246c2b85
clickhouse-server.err.log:2025.12.11 12:36:08.541622 [ 1143 ] {} <Fatal> BaseDaemon: 7. ./ci/tmp/build/./src/Common/Exception.cpp:87: DB::handle_error_code(String const&, std::basic_string_view<char, std::char_traits<char>>, int, bool, std::vector<void*, std::allocator<void*>> const&) @ 0x00000000246c52ec
clickhouse-server.err.log:2025.12.11 12:36:08.614064 [ 1143 ] {} <Fatal> BaseDaemon: 8. ./ci/tmp/build/./src/Common/Exception.cpp:138: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000246c59c2
clickhouse-server.err.log:2025.12.11 12:36:08.769073 [ 1143 ] {} <Fatal> BaseDaemon: 9. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000e6970f7
clickhouse-server.err.log:2025.12.11 12:36:08.830214 [ 1143 ] {} <Fatal> BaseDaemon: 10. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000e696227
clickhouse-server.err.log:2025.12.11 12:36:08.888819 [ 1143 ] {} <Fatal> BaseDaemon: 11. DB::Exception::Exception<String, String>(int, FormatStringHelperImpl<std::type_identity<String>::type, std::type_identity<String>::type>, String&&, String&&) @ 0x000000000e694e3e
clickhouse-server.err.log:2025.12.11 12:36:08.953797 [ 1143 ] {} <Fatal> BaseDaemon: 12. DB::ColumnArray const& assert_cast<DB::ColumnArray const&, DB::IColumn const&>(DB::IColumn const&) @ 0x0000000011493cf2
clickhouse-server.err.log:2025.12.11 12:36:09.029661 [ 1143 ] {} <Fatal> BaseDaemon: 13. ./ci/tmp/build/./src/DataTypes/Serializations/SerializationArray.cpp:302: DB::SerializationArray::serializeBinaryBulkStatePrefix(DB::IColumn const&, DB::ISerialization::SerializeBinaryBulkSettings&, std::shared_ptr<DB::ISerialization::SerializeBinaryBulkState>&) const @ 0x0000000032488c36
clickhouse-server.err.log:2025.12.11 12:36:09.067419 [ 1143 ] {} <Fatal> BaseDaemon: 14. ./ci/tmp/build/./src/Formats/NativeWriter.cpp:100: DB::NativeWriter::writeData(DB::ISerialization const&, COW<DB::IColumn>::immutable_ptr<DB::IColumn> const&, DB::WriteBuffer&, std::optional<DB::FormatSettings> const&, unsigned long, unsigned long, unsigned long) @ 0x000000003f048f1a
clickhouse-server.err.log:2025.12.11 12:36:09.113060 [ 1143 ] {} <Fatal> BaseDaemon: 15. ./ci/tmp/build/./src/Formats/NativeWriter.cpp:224: DB::NativeWriter::write(DB::Block const&) @ 0x000000003f04b5ab
clickhouse-server.err.log:2025.12.11 12:36:09.476226 [ 1143 ] {} <Fatal> BaseDaemon: 16. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2720: DB::TCPHandler::sendData(DB::QueryState&, DB::Block const&) @ 0x000000003eea19d2
clickhouse-server.err.log:2025.12.11 12:36:09.721331 [ 1143 ] {} <Fatal> BaseDaemon: 17. ./ci/tmp/build/./src/Server/TCPHandler.cpp:1410: DB::TCPHandler::processOrdinaryQuery(DB::QueryState&) @ 0x000000003ee89158
clickhouse-server.err.log:2025.12.11 12:36:09.899546 [ 1143 ] {} <Fatal> BaseDaemon: 18. ./ci/tmp/build/./src/Server/TCPHandler.cpp:791: DB::TCPHandler::runImpl() @ 0x000000003ee6b10f
clickhouse-server.err.log:2025.12.11 12:36:10.211288 [ 1143 ] {} <Fatal> BaseDaemon: 19. ./ci/tmp/build/./src/Server/TCPHandler.cpp:2873: DB::TCPHandler::run() @ 0x000000003eeae5d6
clickhouse-server.err.log:2025.12.11 12:36:10.219588 [ 1143 ] {} <Fatal> BaseDaemon: 20. ./ci/tmp/build/./base/poco/Net/src/TCPServerConnection.cpp:40: Poco::Net::TCPServerConnection::start() @ 0x000000004d8e302f
clickhouse-server.err.log:2025.12.11 12:36:10.233490 [ 1143 ] {} <Fatal> BaseDaemon: 21. ./ci/tmp/build/./base/poco/Net/src/TCPServerDispatcher.cpp:115: Poco::Net::TCPServerDispatcher::run() @ 0x000000004d8e3d97
clickhouse-server.err.log:2025.12.11 12:36:10.260049 [ 1143 ] {} <Fatal> BaseDaemon: 22. ./ci/tmp/build/./base/poco/Foundation/src/ThreadPool.cpp:205: Poco::PooledThread::run() @ 0x000000004d7d896b
clickhouse-server.err.log:2025.12.11 12:36:10.286417 [ 1143 ] {} <Fatal> BaseDaemon: 23. ./base/poco/Foundation/src/Thread_POSIX.cpp:341: Poco::ThreadImpl::runnableEntry(void*) @ 0x000000004d7d2248
clickhouse-server.err.log:2025.12.11 12:36:10.355473 [ 1143 ] {} <Fatal> BaseDaemon: 24. /home/ubuntu/actions-runner/_work/ClickHouse/ClickHouse/contrib/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:239: asan_thread_start(void*) @ 0x000000000e63d527
clickhouse-server.err.log:2025.12.11 12:36:10.355576 [ 1143 ] {} <Fatal> BaseDaemon: 25. ? @ 0x0000000000094ac3
clickhouse-server.err.log:2025.12.11 12:36:10.355648 [ 1143 ] {} <Fatal> BaseDaemon: 26. ? @ 0x00000000001268c0
clickhouse-server.err.log:2025.12.11 12:36:10.355747 [ 1143 ] {} <Fatal> BaseDaemon: Integrity check of the executable skipped because the reference checksum could not be read.
clickhouse-server.err.log:2025.12.11 12:36:15.260283 [ 1143 ] {} <Fatal> BaseDaemon: Changed settings: min_compress_block_size = 2225194, max_com

@amosbird
Copy link
Copy Markdown
Collaborator Author

The crash seems related:

Yes, it looks related. I'm digging into it now.

@Algunenano
Copy link
Copy Markdown
Member

This also closes #91380. We can add a test later on

Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor questions

if (supportsSparseSerialization() && kind == ISerialization::Kind::SPARSE)
if (settings.canUseSparseSerialization(*this) && kind == ISerialization::Kind::SPARSE)
serialization = std::make_shared<SerializationSparse>(serialization);
else if (kind == ISerialization::Kind::DETACHED)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for removing the else except that if you take the previous if you won't take the second one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely an unintended change. It was probably introduced during some local debugging and not fully cleaned up. Good catch!

/// NativeReader must enable nullable sparse support here. Since it operates on in-memory state, it should
/// be able to handle all possible serialization variants.
SerializationInfoSettings settings;
settings.allowNullableSparse();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary only if server >= DBMS_MIN_REVISION_WITH_NULLABLE_SPARSE_SERIALIZATION or it doesn't matter?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter. The newest reader should be able to handle all serialization variants.


bool canUseSparseSerialization(const IDataType & type) const;

void allowNullableSparse() { nullable_serialization_version = MergeTreeNullableSerializationVersion::ALLOW_SPARSE; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but creating object and call a method on it later looks adhoc, maybe adding separate method to create an object with nullable_serialization_version = MergeTreeNullableSerializationVersion::ALLOW_SPARSE will be better? i.e. createSerializationInfoSettingsWithNullableSparse (or a static function)? And the comment can be put there with an explanation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I’ll add a method that builds this setting with the widest serialization capability, so it can be extended in the future, along with proper documentation.

@Algunenano Algunenano enabled auto-merge December 12, 2025 08:54
Copy link
Copy Markdown
Member

@Algunenano Algunenano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix

@Algunenano Algunenano added this pull request to the merge queue Dec 12, 2025
Merged via the queue into ClickHouse:master with commit e277cb3 Dec 12, 2025
127 of 130 checks passed
@Algunenano Algunenano mentioned this pull request Dec 12, 2025
1 task
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants