New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TimeSeries samples now Truncated, not Accumulated #5905
TimeSeries samples now Truncated, not Accumulated #5905
Conversation
// the merge system is for time series data, which is safe against replay; | ||
// however, this property is not general for all potential mergeable types. | ||
// If a future need arises to merge another type of data, replay protection | ||
// will likely need to be a consideration. | ||
if (left->has_merge_timestamp() && right.has_merge_timestamp()) { | ||
if (left->merge_timestamp().wall_time() == right.merge_timestamp().wall_time() && | ||
left->merge_timestamp().logical() == right.merge_timestamp().logical()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you want to disable the replay protection here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, never mind. You're saying that merge replays are now safe due to the way merges are processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you were right the first time, I wanted to remove this. I had a weird merge at the very end, didn't delete enough.
Reviewed 8 of 8 files at r1. storage/engine/mvcc_test.go, line 3246 [r1] (raw file): storage/engine/mvcc_test.go, line 3248 [r1] (raw file): ts/doc.go, line 19 [r1] (raw file): ts/doc.go, line 26 [r1] (raw file): ts/doc.go, line 77 [r1] (raw file): ts/doc.go, line 89 [r1] (raw file): Comments from Reviewable |
can you update internal.proto to say
This will allow us to remove these values in the future Review status: all files reviewed at latest revision, 8 unresolved discussions. storage/engine/mvcc_test.go, line 3216 [r1] (raw file): storage/engine/rocksdb/db.cc, line 661 [r1] (raw file): Comments from Reviewable |
7e4c3f4
to
4e2e7fe
Compare
Comment added to internal.proto. Review status: 5 of 9 files reviewed at latest revision, 8 unresolved discussions. storage/engine/mvcc_test.go, line 3216 [r1] (raw file): storage/engine/mvcc_test.go, line 3246 [r1] (raw file): storage/engine/mvcc_test.go, line 3248 [r1] (raw file): storage/engine/rocksdb/db.cc, line 661 [r1] (raw file): ts/doc.go, line 19 [r1] (raw file): ts/doc.go, line 26 [r1] (raw file): ts/doc.go, line 77 [r1] (raw file): ts/doc.go, line 89 [r1] (raw file): Comments from Reviewable |
Reviewed 4 of 4 files at r2. roachpb/internal.proto, line 87 [r2] (raw file): Comments from Reviewable |
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. storage/engine/rocksdb.go, line 307 [r2] (raw file): Comments from Reviewable |
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. roachpb/internal.proto, line 87 [r2] (raw file): Comments from Reviewable |
This commit modifies the way our time series system handles multiple values in the same "sample period". Previously, the engine would accumulate the values from the individual samples, maintaining a "sum", "count", "max" and "min". However, there were a variety of consistency issues when this interacted with the concept of "partial merges": + Our ability to provide protection from replayed raft commands was different, depending on how partial merges were applied. + The accumulation of floating point values is not associative, and the order could be different depending on how partial merges are applied. Most troublingly, this could result on inconsistencies between raft replicas - partial merges may occur on one replica independently of the others, resulting in differences in the final merged result on disk. This is unacceptable. Unfortunately, support for "partial merges" is relatively important for performance reasons, and we are hesitant to remove it. To fix all problems associated with this, we have decided to remove sample accumulation. As an alternative, when multiple samples exist for the same sample offset, they are discarded except for the most recently merged. This is considered an acceptable compromise for all time series maintained by cockroach. Additional Changes: + Updated ts/doc.go to better reflect the current state of the /ts package. + Added a test designed by @bdarnell to test replay protection in the scope of compactions. + Removed generalized "replay protection" in merge (which did not work) and added an advisory for any future programmers looking to add a mergeable type. Fixes cockroachdb#5658
4e2e7fe
to
b1373fb
Compare
Review status: 9 of 12 files reviewed at latest revision, 3 unresolved discussions. storage/engine/rocksdb.go, line 307 [r2] (raw file): Comments from Reviewable |
As part of cockroachdb#5905, general replay protection was removed from the engine's merge operator. This is safe in general because the merge operator is only used for Time Series data, which was made replay-safe in that issue. However, TestMerge was still merging string data, and as a result became flaky due to replays. This commit changes TestMerge to use time series data, which is replay-safe. Fixes cockroachdb#5976
As part of cockroachdb#5905, general replay protection was removed from the engine's merge operator. This is safe in general because the merge operator is only used for Time Series data, which was made replay-safe in that issue. However, TestMerge was still merging string data, and as a result became flaky due to replays. This commit changes TestMerge to use time series data, which is replay-safe. Fixes cockroachdb#5976
…amples It looks like cockroachdb#5905 missed some comments which still maintained the notion that samples with identical offsets were accumulated instead of truncated. This change fixes these comments.
This commit modifies the way our time series system handles multiple values
in the same "sample period". Previously, the engine would accumulate the values
from the individual samples, maintaining a "sum", "count", "max" and "min".
However, there were a variety of consistency issues when this interacted with
the concept of "partial merges":
depending on how partial merges were applied.
could be different depending on how partial merges are applied.
Most troublingly, this could result on inconsistencies between raft replicas -
partial merges may occur on one replica independently of the others, resulting
in differences in the final merged result on disk. This is unacceptable.
Unfortunately, support for "partial merges" is relatively important for
performance reasons, and we are hesitant to remove it.
To fix all problems associated with this, we have decided to remove sample
accumulation. As an alternative, when multiple samples exist for the same sample
offset, they are discarded except for the most recently merged. This is
considered an acceptable compromise for all time series maintained by cockroach.
Additional Changes:
compactions.
added an advisory for any future programmers looking to add a mergeable type.
Fixes #5658
This change is