Skip to content

Conversation

rschu1ze
Copy link
Member

@rschu1ze rschu1ze commented May 26, 2025

When Starrocks was added, I kind of missed that the "materialized" table definition

CREATE TABLE bluesky (
    `id` BIGINT AUTO_INCREMENT,
    -- Main JSON column (comes after key columns)
    `data` JSON NULL COMMENT "Main JSON object",
    -- Key columns (must come first in the schema and in the same order as DUPLICATE KEY)
    `kind` VARCHAR(255) AS get_json_string(data, '$.kind'),
    `operation` VARCHAR(255) AS get_json_string(data, '$.commit.operation'),
    `collection` VARCHAR(255) AS get_json_string(data, '$.commit.collection'),
    `did` VARCHAR(255) AS get_json_string(data, '$.did'),
    `time_us` BIGINT AS get_json_int(data, '$.time_us')
)
ORDER BY(`kind`, `operation`, `collection`, `did`, `time_us`);

is technically against the benchmark rules

It is [not allowed](https://clickhouse.com/blog/json-bench-clickhouse-vs-mongodb-elasticsearch-duckdb-postgresql#no-query-results-cache) to use query results caching or flatten JSON into multiple non-JSON colums at insertion time.

Some databases do have a JSON data type but they flatten nested JSON documents at insertion time to a single level (typically using . as separator between levels). We consider this a grey zone. On the one hand, this removes the possibility to restore the original documents, on the other hand, flattening may in many practical situations be acceptable. The dashboard allows to filter out databases which do not retain the document structure (i.e. which flatten).

Basically, "implicit" flattening by the database itself is allowed (grey zone) but explicit flattening like above defeats the purpose of the benchmark. Therefore removing the corresponding "materialized" results for Starrocks.

@rschu1ze rschu1ze merged commit f9f5b62 into main May 26, 2025
@rschu1ze rschu1ze mentioned this pull request May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant