Background
Measure defines two fields that affect data routing:
entity (Entity.tag_names) — determines the series and shard for a data point
sharding_key (ShardingKey.tag_names) — intended to enhance TopN streaming performance by overriding shard routing
message Measure {
Entity entity = 4;
ShardingKey sharding_key = 8;
}
Problem
Although ShardingKey was designed to augment TopN performance while preserving entity locality, there is no enforcement of this contract at the schema level. A user could configure:
entity.tag_names = ["service_id"]
sharding_key.tag_names = ["instance_id"]
With this configuration, data points sharing the same service_id (same entity) but different instance_id values would be routed to different shards/nodes. Each node would then hold only a partial view of that entity, producing incorrect TopN aggregation results and breaking query correctness.
The rule that the same entity must always map to the same node is currently implicit and relies entirely on the caller (e.g., OAP server) following the convention correctly.
Proposed Fix
Add a server-side validation step when a Measure schema is created or updated.
Rule: ShardingKey.tag_names must be a superset of Entity.tag_names — it may add extra tags for finer-grained routing, but it must include all entity tags to preserve locality.
Validation pseudocode:
if sharding_key is set:
for each tag in entity.tag_names:
assert tag ∈ sharding_key.tag_names,
"ShardingKey must contain all Entity tags to guarantee entity locality"
Acceptance Criteria
References
- Proto definition:
api/proto/banyandb/database/v1/schema.proto
Background
Measuredefines two fields that affect data routing:entity(Entity.tag_names) — determines the series and shard for a data pointsharding_key(ShardingKey.tag_names) — intended to enhance TopN streaming performance by overriding shard routingProblem
Although
ShardingKeywas designed to augment TopN performance while preserving entity locality, there is no enforcement of this contract at the schema level. A user could configure:With this configuration, data points sharing the same
service_id(same entity) but differentinstance_idvalues would be routed to different shards/nodes. Each node would then hold only a partial view of that entity, producing incorrect TopN aggregation results and breaking query correctness.The rule that the same entity must always map to the same node is currently implicit and relies entirely on the caller (e.g., OAP server) following the convention correctly.
Proposed Fix
Add a server-side validation step when a
Measureschema is created or updated.Rule:
ShardingKey.tag_namesmust be a superset ofEntity.tag_names— it may add extra tags for finer-grained routing, but it must include all entity tags to preserve locality.Validation pseudocode:
Acceptance Criteria
Measurereturns a validation error whensharding_key.tag_namesdoes not contain all tags inentity.tag_namesReferences
api/proto/banyandb/database/v1/schema.proto