-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: reconcile manual Range splits and automatic Range merges #37487
Comments
You introduced the idea of using a range-local key in #24394 (comment) Each approach introduces some complexity. Putting it in the range descriptor is probably simpler to start with, but because range descriptors are mirrored into meta ranges and cached, it'll be a little harder to make the UNSPLIT command (it'll need to be a transaction across the local copy of the descriptor and the meta copy. Addressable range-local keys are rare but adding one for this might be simpler in the long run. One additional stage before the USNPLIT command: we need a way to inspect the current splits and see which ones are sticky. Maybe that's just an extra column for Do we want a MERGE AT command (which would both clear the sticky bit and merge if possible) or just UNSPLIT to clear the sticky bit and leave the rest to the queues? |
Good point. This will require an extra lookup per range if we don't add the sticky bit to the range descriptor, so we'll probably want to do something similar to
I was thinking we would start with just the UNSPLIT at command. That's significantly simpler. I also don't know of any reason why a user would want to manually merge a Range. |
@ajwerner and I discussed this earlier today and we decided that storing the sticky bit in the range descriptor will be a better long-term descriptor. This will simplify things in SQL because it means we won't need to issue a kv lookup per range to search for the sticky bit when populating the virtual table. It will also simplify the The complication here is that updating the sticky bit will need to touch both the meta descriptor and the range-local descriptor. To do this, we'll need to use the functions |
37506: storage: reconcile manual splitting with automatic merging r=jeffrey-xiao a=jeffrey-xiao Follows the steps outlined in #37487 to reconcile manual splitting with automatic merging. This PR includes the changes needed at the storage layer. The sticky bit indicating that a range is manually split is added to the range descriptor. 37558: docs/tla-plus: add timestamp refreshes to ParallelCommits spec r=nvanbenschoten a=nvanbenschoten This commit adds transaction timestamp refreshes to the PlusCal model for parallel commits. This allows the committing transaction to bump its timestamp without a required epoch bump. This completes the parallel commits formal specification. 37578: opt, sql: fix type inference of TypeCheck for subqueries r=rytaft a=rytaft Prior to this commit, the optimizer was not correctly inferring the types of columns in subqueries for expressions of the form `scalar IN (subquery)`. This was due to two problems which have now been fixed: 1. The subquery was built as a relational expression before the desired types were known. Now the subquery build is delayed until `TypeCheck` is called for the first time. 2. For subqueries on the right side of an `IN` expression, the desired type passed into `TypeCheck` was `AnyTuple`. This resulted in an error later on in `typeCheckSubqueryWithIn`, which checks to make sure the type of the subquery is `tuple{T}` where `T` is the type of the left expression. Now the desired type passed into `TypeCheck` is `tuple{T}`. Note that this commit only fixes type inference for the optimizer. It is still broken in the heuristic planner. Fixes #37263 Fixes #14554 Release note (bug fix): Fixed type inference of columns in subqueries for some expressions of the form `scalar IN (subquery)`. Co-authored-by: Jeffrey Xiao <jeffrey.xiao1998@gmail.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@jeffrey-xiao we can close this now, right? |
Background reading:
CockroachDB's distributed keyspace is split into a series of contiguous Ranges, each of which attempts to remain between 32 MB and 64 MB in size. These Ranges are automatically split by the
splitQueue
based on their size and foreground load. To complement this process, the recent 19.1 release introduced automatic Range merging, where previously split Ranges are merged back together by themergeQueue
if they fall below a size and load threshold.In addition to these automated processes, CockroachDB has long supported manual hooks into data distribution. The
ALTER TABLE ... SPLIT AT
allows an operator to manually override Range splitting decisions and inject their own split point into the keyspace. There are a number of reasons why an operator might want to do this, which are discussed here.The current design of automatic Range merges doesn't play well with these manual Range splits. The
mergeQueue
is not aware of which Ranges were automatically split and which were manually split and will happily merge away a manual split. For this reason, using manual splits while automatic merges are enabled is currently disallowed. An operator that attempts to perform a manual Range split without disabling automatic Range merging will be greeted with the following error:Manual Range splits and automatic Range merges should be fixed to work with one another.
The proposed solution to this was originally discussed in the Range Merges RFC. The RFC mentions:
This "sticky" bit is the first part of the solution. A manual split will need to denote somewhere that a Range was split manually. The best discussion about this is in the range merges RFC pull request. It's still not clear to me whether a range local key is a better solution than marking the bit directly on the Range descriptor. Perhaps @bdarnell has an opinion on this.
Once manual splits begin writing this sticky bit, we'll make the
mergeQueue
search for the sticky bit before deciding on whether to merge a Range into its left neighbor.Once this is complete, we can allow manual Range splits and automatic Range merges.
Finally, we'll want to introduce a new
ALTER TABLE ... UNSPLIT AT
command that simply removes a sticky bit if one exists. This again was discussed most thoroughly in the range merges RFC pull request.Stages:
ALTER TABLE ... UNSPLIT AT
syntax to manually remove sticky bitThe text was updated successfully, but these errors were encountered: