Validate ShardingKey against Entity tags#1069
Validate ShardingKey against Entity tags#1069aliyasirnac wants to merge 19 commits intoapache:mainfrom
Conversation
|
LGTM |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1069 +/- ##
==========================================
+ Coverage 45.97% 51.04% +5.07%
==========================================
Files 328 417 +89
Lines 55505 68041 +12536
==========================================
+ Hits 25520 34734 +9214
- Misses 27909 30342 +2433
- Partials 2076 2965 +889
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@aliyasirnac Please fix the ci failures caused by the new restriction. |
|
Hello, following your feedback, I reviewed and tested my changes which only add a schema-level validation ensuring Any suggestions on the right approach here? @ButterBright |
It looks good to me |
Test data under /pkg/test/measure (e.g., service_instance_cpm_minute.json) should be updated to pass the validation checks. |
| } | ||
| for i, tag := range measure.ShardingKey.TagNames { | ||
| if measure.Entity.TagNames[i] != tag { | ||
| return errors.New("ShardingKey must be a prefix of Entity tags to guarantee entity locality") |
There was a problem hiding this comment.
The e2e failed. Could you print the measure.ShardingKey.TagNames and measure.Entity.TagNames. Then I can see the detail of the error.
There was a problem hiding this comment.
I added debug logging as suggested. Here are the results the validation passes correctly, no errors are returned:
{"level":"info","time":"2026-04-16T18:53:23+03:00","message":"Full ShardingKey.TagNames: [service_id entity_id]"}
{"level":"info","time":"2026-04-16T18:53:23+03:00","message":"Full Entity.TagNames: [service_id entity_id]"}
There was a problem hiding this comment.
Based on that, the validation should pass when ShardingKey is identical to the Entity, not just a subset.
|
I investigated the CI failures. These are not caused by the validation changes in this PR, the same tests also fail on the main branch. The failures are infrastructure/timing issues in the distributed integration
All measure schemas pass the new ShardingKey validation without issues. |
The e2e failed due to the validation. I have provided a suggestion. Please follow it to improve the validation logic. |
|
@aliyasirnac The OAP get the error: In order to analyze the failed reason, you should append the |
|
I ran into a Docker-related issue while trying to run the E2E tests locally. I've been quite busy with work over the past few days, but I plan to dedicate some time this weekend to resolve it. |
|
Here is a guide about debugging e2e tests locally that might help. Feel free to share the errors if you encounter any issues. |
Thanks.. |
|
Got the e2e working locally and tracked down the root cause. Legacy schemas like Instead of hard-failing, I changed the validation to log a warning instead of returning an error. Schemas register fine, OAP starts up, and operators still see the bad locality in the logs. Let me know what you think. |
Could you show me the ShardingKey and Entity for |
|
Tested this PR against the standard OAP compose ( The affected measures are the entire endpoint-scoped, browser-page, service-relation, instance-relation, and kubernetes-endpoint metric families. The schema OAP installs: entity: { tagNames: [entity_id] }
shardingKey: { tagNames: [service_id] }This is OAP's intentional composite-id pattern: Two other issues with the rule as written:
Proposal// (a) Skip when len(Entity.TagNames) == 1 — composite-id pattern (OAP's entity_id).
// (b) Otherwise require ShardingKey ⊆ Entity AND shared tags in the same relative order.(a) eliminates the 87 false positives; (b) catches reorder bugs and the superset case (ShardingKey tag absent from Entity), both of which the current rule either misses or over-rejects. |
|
Thanks for the feedback. I'll rewrite the code.
27 Nis 2026 Pzt 16:10 tarihinde Gao Hongtao ***@***.***> şunu
yazdı:
… *hanahmily* left a comment (apache/skywalking-banyandb#1069)
<#1069 (comment)>
Tested this PR against the standard OAP compose (
test/e2e-v2/cases/storage/banyandb/, OAP @ b4a8811d). The advisory check
fires *174 times across 87 unique measures* at OAP startup — every one of
them with the same mismatch:
ShardingKey [service_id] must be a prefix of Entity tags [entity_id]
The affected measures are the entire endpoint-scoped, browser-page,
service-relation, instance-relation, and kubernetes-endpoint metric
families. The schema OAP installs:
entity: { tagNames: [entity_id] }shardingKey: { tagNames: [service_id] }
This is OAP's intentional composite-id pattern: entity_id is a base64
that already contains service_id, and service_id is exposed separately so
service-level TopN aggregation is correct. The current "prefix of Entity
tags" rule can't express this — it compares tag names literally.
Two other issues with the rule as written:
1. The PR description says "ShardingKey must be a *superset* of
Entity," but the code checks for a *prefix*. The real invariant for
entity locality is ShardingKey ⊆ Entity (set-wise).
2. The test case invalid sharding key (subset with different tag)
(Entity=[s,i], ShardingKey=[i]) is rejected, but locality is fine here.
Proposal
// (a) Skip when len(Entity.TagNames) == 1 — composite-id pattern (OAP's entity_id).// (b) Otherwise require ShardingKey ⊆ Entity AND shared tags in the same relative order.
(a) eliminates the 87 false positives; (b) catches reorder bugs *and* the
superset case (ShardingKey tag absent from Entity), both of which the
current rule either misses or over-rejects.
—
Reply to this email directly, view it on GitHub
<#1069 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ASJQ434SZX5MP63OBL7IAIL4X5L5ZAVCNFSM6AAAAACXY7CWXGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DGMRXGIZDQMZQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Fix missing validation between Measure.ShardingKey and Measure.Entity
ShardingKeymust be a superset ofEntityto preserve entity locality;otherwise TopN results become incorrect. This PR enforces the rule at
schema validation time.
Add a unit test to verify that the fix works.
Explain briefly why the bug exists and how to fix it.
If this pull request closes/resolves/fixes an existing issue, replace the issue number. Fixes feat(banyandb/measure): add validation to enforce ShardingKey compatibility with Entity skywalking#13814.
Update the
CHANGESlog.