Skip to content

Violation of Benchmark Rules Regarding JSON Flattening #79

@KVeschgini

Description

@KVeschgini

The benchmark rules explicitly state: “It is not allowed to use query results caching or flatten JSON into multiple non-JSON columns at insertion time.” However, the pipeline.yaml in the Greptime benchmark appears to define transformation rules that result in a table schema where critical fields from the commit JSON object are flattened into separate non-JSON columns:

CREATE TABLE IF NOT EXISTS "bluesky" (
  "did" STRING NULL,
  "kind" STRING NULL INVERTED INDEX,
  "commit_collection" STRING NULL INVERTED INDEX,
  "commit_operation" STRING NULL INVERTED INDEX,
  "commit" JSON NULL,
  "time_us" TIMESTAMP(6) NOT NULL,
  TIME INDEX ("time_us"),
  PRIMARY KEY ("kind", "commit_collection", "commit_operation")
)
ENGINE=mito
WITH(
  append_mode = 'true'
)

This schema directly violates the stated rule, as it includes several fields as separate columns instead of keeping them within the commit JSON structure.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions