Skip to content

Add AVX-512VL to dynamic dispatch and optimise QBit [un]transposition#88445

Merged
rienath merged 8 commits intomasterfrom
qbit-transposition-optimisation
Oct 18, 2025
Merged

Add AVX-512VL to dynamic dispatch and optimise QBit [un]transposition#88445
rienath merged 8 commits intomasterfrom
qbit-transposition-optimisation

Conversation

@rienath
Copy link
Copy Markdown
Member

@rienath rienath commented Oct 13, 2025

Not for changelog because QBit hasn't been released yet. New things in this PR:

  1. Optimised logic around bit transposition. Previously, it would accept a vector of floats and produce a vector of bytes. This meant we had to first read the floats into the vector, then run transposition, write results to a separate vector so that we can later write them to FixedStrings. Now, we can read a single float, transpose it, and write directly into the affected FixedStrings, reducing intermediate memory allocations and increasing ingestion speed.
  2. Added VL instruction set as an option for AVX-512 dispatch. Previously, it was only available with VBMI instructions, but there are machines that have VL without VBMI. This change was needed for ↓.
  3. Added vectorised algorithms for QBit untransposition, speeding up distance calculations on it. To make this possible, simplified serialisation and removed MSB -> LSB -> MSB trickery we used before.
  4. More detailed documentation for QBit serialisation and gtest_qbit_serialization.cpp test fix.
  5. Binary [de]serialisation of QBit [de]serialised trailing padding zeroes that QBit had. This was not necessary, as we read/write vector. Thus, zeroes were removed.

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Oct 13, 2025

Workflow [PR], commit [47c15c5]

Summary:

job_name test_name status info comment
Integration tests (amd_binary, 5/5) failure
test_storage_nats/test_nats_jet_stream.py::test_nats_overloaded_insert FAIL cidb
Stress test (arm_asan, s3) failure
Server died FAIL cidb
Hung check failed, possible deadlock found (see hung_check.log) FAIL cidb
Killed by signal (in clickhouse-server.log) FAIL cidb
Fatal message in clickhouse-server.log (see fatal_messages.txt) FAIL cidb
Killed by signal (output files) FAIL cidb

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 13, 2025
@rienath rienath changed the title Add AVX512VL to dynamic dispatch and optimise QBit [un]transposition Add AVX-512VL to dynamic dispatch and optimise QBit [un]transposition Oct 13, 2025
@rienath rienath force-pushed the qbit-transposition-optimisation branch from c1cfd2c to 49f6a01 Compare October 13, 2025 11:07
Comment on lines -2 to +7
11100000
11100000
11100000
11100000
11100000
11100000
00000111
00000111
00000111
00000111
00000111
00000111
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal representation of values was reversed to simplify serialization, that is why these changed

Comment on lines -47 to +50
SELECT vec.7 FROM qbit ORDER BY id;
SELECT vec.15 FROM qbit ORDER BY id;
SELECT vec.23 FROM qbit ORDER BY id;
SELECT vec.31 FROM qbit ORDER BY id;
SELECT bin(vec.7) FROM qbit ORDER BY id;
SELECT bin(vec.15) FROM qbit ORDER BY id;
SELECT bin(vec.23) FROM qbit ORDER BY id;
SELECT bin(vec.31) FROM qbit ORDER BY id;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to look at underlying binary values, because displaying bytes as characters doesn't tell us what went wrong

@Avogar Avogar self-assigned this Oct 14, 2025
Comment on lines +120 to +121
if (size > DEFAULT_MAX_STRING_SIZE)
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Too large QBit dimension (maximum: {})", DEFAULT_MAX_STRING_SIZE);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a setting max_binary_array_size in FormatSettings for binary formats. Let's use it instead of DEFAULT_MAX_STRING_SIZE.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +123 to +128
/// If the dimension % 8 != 0, the buffer will contain padding floats. Thus, `size` can be larger, equal, but never smaller than dimension
if (size < dimension)
throw Exception(
ErrorCodes::SERIALIZATION_ERROR, "Size of the read QBit {} doesn't match expected size {}", size, (dimension / 8) * 8);

return size;
Copy link
Copy Markdown
Member

@Avogar Avogar Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, does it mean that in RowBinary format we output padding floats? If yes, we need to fix this, we should output array with dimension floats. Otherwise user will get unexpected 0-s in their vectors during deserialization

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't client aware of the dimension? It is one of the members of SerializationQBit. If not, it might also be a good idea to remove

if (size != dimension)
    throw Exception(
        ErrorCodes::SERIALIZATION_ERROR, "Dimension of the read QBit {} doesn't match expected dimension {}", size, dimension);

in validateAndReadQBitSize too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed trailing zeroes

}

const char * value_bytes = reinterpret_cast<const char *>(value_floats.data());
/// We do not need to worry about skipping padding floats at the tail here like we do in deserializeFloatsToQBitTuple(...) .
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just should not have padding floats at all in any format

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Member Author

@rienath rienath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Avogar thanks for the review, I have addressed the highlighted problems. PTAL when you have time

@rienath rienath force-pushed the qbit-transposition-optimisation branch from 5877065 to eda0769 Compare October 16, 2025 11:37
Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 small comments, everything else looks good

Comment thread src/DataTypes/Serializations/SerializationQBit.cpp Outdated
/// Transpose data
std::vector<char> transposed_bytes(bytes_per_fixedstring * element_size);
transposeBits<Word>(reinterpret_cast<const Word *>(value_bytes), reinterpret_cast<Word *>(transposed_bytes.data()), padded_n);
while (i < dimension)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it can be just simple for loop.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done

rienath and others added 4 commits October 17, 2025 16:20
@rienath
Copy link
Copy Markdown
Member Author

rienath commented Oct 18, 2025

@rienath rienath added this pull request to the merge queue Oct 18, 2025
Merged via the queue into master with commit f3fc9af Oct 18, 2025
120 of 123 checks passed
@rienath rienath deleted the qbit-transposition-optimisation branch October 18, 2025 09:27
@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants