You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would like to use an explicit encoding value for Nullable column types, keeping the clean semantics of NULL and compatibility with Null-friendly SQL code, but without taking the hit of the separate bitmap to record NULL records. The documentation is clear that there is almost always a performance and space impact of making a field Nullable. In cases where the full value space of the type is not being used, NULL semantics can be maintained without this impacting storage or performance.
Describe the solution you'd like
I would like to be able to specify the storage value for NULL within the type space of the Nullable type.
When specified, NULL will be stored inline with the column and transparently handled as a NULL value, but without the separate bitmaps storage or lookup.
For example:
Nullable(UInt32 AS 999999999) or Nullable(UInt32, NULL AS 999999999) or similar syntax.
The value specification would be completely optional, but when specified NULL would be stored as the specified value in the primary table. No separate NULL bitmap would be stored.
So in the example above, inserting 1, NULL, 2 into this table would result in:
1, 999999999, 2 actually being stored in the column and no separate NULL bitmap.
All of the SQL NULL handling (e.g., isNull, etc.) would be transparently mapped under the hood to comparisons to the specified value.
So isNull(v) of above would still return false, true, false. There would be no way to 'read' the 999999999. It is a declared storage-only value for the NULL representation.
When specified as the NULL value, it would no longer be possible to store that value in the column.
It would either ERROR on insert or transparently write, but read back as a NULL value (I don't care on this).
Describe alternatives you've considered
This can be handled explicitly in the logic above Clickhouse, but it's messy and inconsistent. It also makes it more difficult to translate and adapt alternate SQL code that cleanly and appropriately uses SQL NULL semantics.
As Clickhouse philosophically allows a great deal of storage control in the interest of performance and efficiency, it seems consistent to allow the encoding of NULL to be inline with the Nullable data type, in instances where the full value space of the underlying type is not in fact needed.
Additional context
Notably #16330 suggests that the lowCardinality type may also operate this way (assigning a value in the dictionary to represent the NULL, avoiding the separate mask). This effectively extends this approach to any data type where a value can be set aside for use in NULL encoding, restoring all the efficiency of the non-Nullable implementation.
Also, since this is completely optional, there would be no impact on storage or implementation for Nullable data types declared without an encoding value specified.
The text was updated successfully, but these errors were encountered:
Use case
Would like to use an explicit encoding value for Nullable column types, keeping the clean semantics of NULL and compatibility with Null-friendly SQL code, but without taking the hit of the separate bitmap to record NULL records. The documentation is clear that there is almost always a performance and space impact of making a field Nullable. In cases where the full value space of the type is not being used, NULL semantics can be maintained without this impacting storage or performance.
Describe the solution you'd like
I would like to be able to specify the storage value for NULL within the type space of the Nullable type.
When specified, NULL will be stored inline with the column and transparently handled as a NULL value, but without the separate bitmaps storage or lookup.
For example:
Nullable(UInt32 AS 999999999) or Nullable(UInt32, NULL AS 999999999) or similar syntax.
The value specification would be completely optional, but when specified NULL would be stored as the specified value in the primary table. No separate NULL bitmap would be stored.
So in the example above, inserting 1, NULL, 2 into this table would result in:
1, 999999999, 2 actually being stored in the column and no separate NULL bitmap.
All of the SQL NULL handling (e.g., isNull, etc.) would be transparently mapped under the hood to comparisons to the specified value.
So isNull(v) of above would still return false, true, false. There would be no way to 'read' the 999999999. It is a declared storage-only value for the NULL representation.
When specified as the NULL value, it would no longer be possible to store that value in the column.
It would either ERROR on insert or transparently write, but read back as a NULL value (I don't care on this).
Describe alternatives you've considered
This can be handled explicitly in the logic above Clickhouse, but it's messy and inconsistent. It also makes it more difficult to translate and adapt alternate SQL code that cleanly and appropriately uses SQL NULL semantics.
As Clickhouse philosophically allows a great deal of storage control in the interest of performance and efficiency, it seems consistent to allow the encoding of NULL to be inline with the Nullable data type, in instances where the full value space of the underlying type is not in fact needed.
Additional context
Notably #16330 suggests that the lowCardinality type may also operate this way (assigning a value in the dictionary to represent the NULL, avoiding the separate mask). This effectively extends this approach to any data type where a value can be set aside for use in NULL encoding, restoring all the efficiency of the non-Nullable implementation.
Also, since this is completely optional, there would be no impact on storage or implementation for Nullable data types declared without an encoding value specified.
The text was updated successfully, but these errors were encountered: