Skip to content

Variant column reading crashes on NULL values (0xFF discriminator not handled) #2789

@alex-clickhouse

Description

@alex-clickhouse

When reading a Variant column in RowBinary format, ClickHouse uses a single discriminator byte before each value to indicate which type from the Variant's type list it belongs to. For a Variant(Int32, String), discriminator 0x00 means Int32 and 0x01 means String. When the value is NULL, ClickHouse sends 0xFF as the discriminator with no following value bytes.

Bug

The driver does not handle 0xFF as a NULL indicator, causing crashes when reading NULL Variant values.

Client V2 (BinaryStreamReader.java:859-862):

public Object readVariant(ClickHouseColumn column) throws IOException {
    int ordNum = readByte();
    return readValue(column.getNestedColumns().get(ordNum));
}

readByte() returns a signed byte. When ClickHouse sends 0xFF (255 unsigned / -1 signed), the code passes it directly to getNestedColumns().get(ordNum) without checking for the NULL sentinel. Calling .get(-1) on a List throws IndexOutOfBoundsException.

Legacy implementation (ClickHouseRowBinaryProcessor.java:268-281):

int ordTypeNum = BinaryStreamUtils.readInt8(input);
for (int i = 0; i < len; i++) {
    if (ordTypeNum == i) {
        tupleValues[i] = deserializers[i].deserialize(values[i], input).asObject();
    } else {
        tupleValues[i] = null;
    }
}

Here readInt8() returns -1 for 0xFF. The loop never matches ordTypeNum == i (since i is never -1), so all tuple values become null — silently returning incorrect data instead of a proper NULL variant.

Expected Behavior

When the discriminator byte is 0xFF, the driver should return null without attempting to read any following value bytes.

Suggested Fix (Client V2)

public Object readVariant(ClickHouseColumn column) throws IOException {
    int ordNum = readByte() & 0xFF;  // unsigned
    if (ordNum == 0xFF) {
        return null;
    }
    return readValue(column.getNestedColumns().get(ordNum));
}

Reproduction

The simplest reproduction is a single query with a NULL cast to Variant:

SELECT NULL::Variant(Int32, String) FORMAT RowBinary
SETTINGS allow_experimental_variant_type = 1;

Inspecting the raw bytes returned by ClickHouse confirms the wire format:

# NULL Variant — single 0xFF byte, no value payload
$ curl -s 'http://localhost:8123/' \
    --data-binary "SELECT NULL::Variant(Int32, String) FORMAT RowBinary SETTINGS allow_experimental_variant_type=1" | xxd
00000000: ff

# Int32 Variant — discriminator 0x00, then 4-byte little-endian 42
$ curl -s 'http://localhost:8123/' \
    --data-binary "SELECT 42::Variant(Int32, String) FORMAT RowBinary SETTINGS allow_experimental_variant_type=1" | xxd
00000000: 002a 0000 00

# String Variant — discriminator 0x01, then length-prefixed 'hello'
$ curl -s 'http://localhost:8123/' \
    --data-binary "SELECT 'hello'::Variant(Int32, String) FORMAT RowBinary SETTINGS allow_experimental_variant_type=1" | xxd
00000000: 0105 6865 6c6c 6f

Reading the NULL row will crash with IndexOutOfBoundsException in Client V2, or silently return incorrect data in the legacy processor.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions