-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: native support for creating hash sharded indexes #42922
Conversation
This RFC proposes to provide better user experience when creating hash sharded indexes, in order to alleviate single range hotspots in sequential workloads. This PR accompanies cockroachdb#42922 which implements the ideas described in the RFC. Release note: None
e38c575
to
9b7a23f
Compare
This RFC proposes to provide better user experience when creating hash sharded indexes, in order to alleviate single range hotspots in sequential workloads. This PR accompanies cockroachdb#42922 which implements the ideas described in the RFC. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a cursory look at this after reading the RFC
Reviewed 6 of 19 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/create_index.go, line 142 at r1 (raw file):
} func createAndAddShardColToTable(
I'm not 100% sure about computed columns, but I'm pretty sure they will need a backfill if they are stored on disk -- you'll have to add the column as a mutation rather than adding it directly to the table when this isn't in a create table statement.
pkg/sql/create_table.go, line 1124 at r1 (raw file):
idxColumns := d.Columns if d.Sharded != nil { var colNames []string
This code feels like its been duplicated many times -- is there a way to unify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @solongordon, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 65 at r2 (raw file):
FAMILY "primary" (a, a_shard__internal, rowid) )
We should write tests around CREATE INDEX
. Also it'd be good to exercises cases of dropping and re-adding sharded indexes. Also probably worthwhile to add some rows to the table.
There are some logic tests which look at plans. They may be useful here.
pkg/sql/parser/sql.y, line 4814 at r2 (raw file):
opt_hash_sharded: USING HASH WITH BUCKET_COUNT '=' a_expr
I made this comment on the other PR. How do you feel about using the opt_with_clause here instead of makign BUCKET_COUNT
a token in the grammar?
pkg/sql/parser/sql.y, line 4814 at r2 (raw file): Previously, ajwerner wrote…
Yup sounds good, on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I'm not too familiar with this part of the codebase, but I left some nits and questions.
Reviewed 12 of 19 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @jordanlewis, @rohany, and @solongordon)
pkg/sql/create_index.go, line 105 at r2 (raw file):
// check constraint here, but doing it above this function would lead to a bunch // of code duplication. ckDef, err := makeShardCheckConstraintDef(tableDesc, int(buckets), shardCol)
The error here is unchecked.
Why you don't like creating a check constraint here?
pkg/sql/create_index.go, line 145 at r2 (raw file):
shardBuckets int, desc *sqlbase.MutableTableDescriptor, colNames []string, shouldAssignID bool, ) (*sqlbase.ColumnDescriptor, error) { shardCol, err := makeShardColumnDesc(colNames, shardBuckets, false)
[nit]: please add an inline comment what argument false
is here for.
pkg/sql/create_index.go, line 149 at r2 (raw file):
return nil, err } if !hasColumn(*desc, shardCol.Name) {
This check seems slightly suspicious to me - the function is named "create and add" shard column, so I'd expect for shard column to always be added. It probably deserves a callout in the comment for the method.
pkg/sql/create_index.go, line 169 at r2 (raw file):
// index is created in a 'CREATE TABLE` statement but I can't be 100% sure. Does this // seem like a valid thing to do? if shouldAssignID {
Why wouldn't we want to assign an ID on every creation? Is this what your comment is about as well?
pkg/sql/create_table.go, line 1124 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
This code feels like its been duplicated many times -- is there a way to unify this?
+1
pkg/sql/sem/tree/create.go, line 566 at r2 (raw file):
if node.Name != "" { ctx.FormatNode(&node.Name) ctx.WriteByte(' ')
We don't need this space?
pkg/sql/sqlbase/structured.go, line 499 at r2 (raw file):
if !ok { return 0, pgerror.New(pgcode.InvalidParameterValue, `BUCKET_COUNT must be a strictly positive integer value.`)
[nit]: the error message could be extracted into a variable.
pkg/sql/sqlbase/structured.go, line 587 at r2 (raw file):
// IsSharded returns whether the index is hash sharded or not. func (desc *IndexDescriptor) IsSharded() bool { return !reflect.DeepEqual(desc.Sharded, ShardedDescriptor{})
This reflection check seems suspicious - I think usually we do interface conversion in such a scenario, is it not possible in this case?
pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):
// Sharded, if it's not the zero value, describes how this index is sharded. optional ShardedDescriptor sharded = 18 [(gogoproto.nullable) = false];
I think this field could be nullable, and then IsSharded
check would compare this field against nil
, no?
9b7a23f
to
0d736cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_index.go, line 142 at r1 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I'm not 100% sure about computed columns, but I'm pretty sure they will need a backfill if they are stored on disk -- you'll have to add the column as a mutation rather than adding it directly to the table when this isn't in a create table statement.
Thanks for pointing this out, fixed.
pkg/sql/create_index.go, line 105 at r2 (raw file):
Previously, yuzefovich wrote…
The error here is unchecked.
Why you don't like creating a check constraint here?
I didn't like the idea of creating the shard column, its check constraint, etc inside of a method named MakeIndexDescriptor
since that name makes it seem like it should only be doing that one thing. Fixed now.
pkg/sql/create_index.go, line 145 at r2 (raw file):
Previously, yuzefovich wrote…
[nit]: please add an inline comment what argument
false
is here for.
Done.
pkg/sql/create_index.go, line 149 at r2 (raw file):
Previously, yuzefovich wrote…
This check seems slightly suspicious to me - the function is named "create and add" shard column, so I'd expect for shard column to always be added. It probably deserves a callout in the comment for the method.
Made it more explicit.
pkg/sql/create_index.go, line 169 at r2 (raw file):
Previously, yuzefovich wrote…
Why wouldn't we want to assign an ID on every creation? Is this what your comment is about as well?
I changed this to instead do a separate pass through all the indexes in the table (see AssignFamiliesToShardColumns
) and if they're hash sharded, assign the shard column to the column family of the first column in the set of index columns. imo, it seems cleaner this way. Let me know if any of this doesn't make sense.
pkg/sql/create_table.go, line 1124 at r1 (raw file):
Previously, yuzefovich wrote…
+1
I wanted to stay consistent to the pattern followed in the rest of that method, but in retrospect this seems misguided. Fixed now.
pkg/sql/sqlbase/structured.go, line 499 at r2 (raw file):
Previously, yuzefovich wrote…
[nit]: the error message could be extracted into a variable.
Done.
9b58934
to
7954d4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 65 at r2 (raw file):
Previously, ajwerner wrote…
We should write tests around
CREATE INDEX
. Also it'd be good to exercises cases of dropping and re-adding sharded indexes. Also probably worthwhile to add some rows to the table.There are some logic tests which look at plans. They may be useful here.
Done.
pkg/sql/sqlbase/structured.go, line 587 at r2 (raw file):
Previously, yuzefovich wrote…
This reflection check seems suspicious - I think usually we do interface conversion in such a scenario, is it not possible in this case?
Changed that field to a pointer, see comment below.
pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):
Previously, yuzefovich wrote…
I think this field could be nullable, and then
IsSharded
check would compare this field againstnil
, no?
Done. I'm curious, is there a good reason we force nullable = false
for all the other fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to gate sharded index creation behind 20.1, but otherwise this is RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_index.go, line 84 at r3 (raw file):
} if n.Sharded != nil {
Small nit, but I feel like this code would be cleaner if it was possible to move it into a separate function.
pkg/sql/create_table.go, line 992 at r3 (raw file):
} func hasColumn(desc sqlbase.MutableTableDescriptor, colName string) bool {
There is already a findColumnByName
function you could use
pkg/sql/drop_index.go, line 131 at r3 (raw file):
for i := range tableDesc.Columns { if tableDesc.Columns[i].ID == shardColDesc.ID { tableDesc.Columns = append(tableDesc.Columns[:i:i], tableDesc.Columns[i+1:]...)
this allocates basically everytime you delete one of the elements -- probably better to do something like
tmp := tableDesc.Columns[:0]
// do the loop
if keep element {
tmp = append(tmp, elem)
}
tableDesc.Columns = tmp
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):
SHOW CREATE TABLE sharded_primary ---- sharded_primary CREATE TABLE sharded_primary (
Do we want users to be able to "inspect" this sharded table and see whats actually happening, or no?
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):
a INT8 NULL, INDEX sharded_secondary_a_shard_10_internal_a_idx (a ASC) USING HASH WITH BUCKET_COUNT=10, INDEX sharded_secondary_a_shard_4_internal_a_idx (a ASC) USING HASH WITH BUCKET_COUNT=4,
I might have missed this, but are there any issues with name conflicts if you made another sharded index on a with the same bucket counts here?
pkg/sql/parser/sql.y, line 5128 at r3 (raw file):
// // Hash sharded clause: // USING HASH WITH BUCKET_COUNT = <shard_buckets>
just a thought, have we ruled out the syntax of
USING HASH WITH a_expr BUCKETS
pkg/sql/parser/sql.y, line 5132 at r3 (raw file):
// %SeeAlso: CREATE TABLE, SHOW INDEXES, SHOW CREATE, // WEBDOCS/create-index.html // TODO DURING REVIEW: how do I clarify above that the hash sharded clause only applies to non-inverted indexes?
You'd want that to just show up in the docs -- i think we just do pure syntax here.
pkg/sql/sem/tree/create.go, line 217 at r3 (raw file):
} PrimaryKey struct { PrimaryKey bool
rename this to IsPrimaryKey or something -- seeing this used in other places in the code was confusing
pkg/sql/sqlbase/structured.go, line 745 at r3 (raw file):
func (desc *MutableTableDescriptor) AssignFamiliesToShardColumns() error { getColumnFamilyForShard := func(shardCol string, idxColumns []string) (string, error) { for _, f := range desc.GetFamilies() {
just looping over the Families slice would be fine. also, we usually loop over the indices and index into the list for these larger structs to avoid copies
pkg/sql/sqlbase/structured.go, line 765 at r3 (raw file):
return err } desc.RemoveColumnFromFamily(shardColDesc.ID)
Why do you remove the column from a family and then add it back?
pkg/sql/sqlbase/structured.go, line 775 at r3 (raw file):
// PrintColumnNamesAndIDs prints list of column names and ids to stdout. func (desc *TableDescriptor) PrintColumnNamesAndIDs() {
should these be removed before merging?
pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Done. I'm curious, is there a good reason we force
nullable = false
for all the other fields?
we do nullable = false to avoid a pointer (an allocation). I think you can make it not nullable and use the generated IsSet method.
pkg/sql/sqlbase/structured.proto, line 227 at r3 (raw file):
// ColumnNames lists the names of the columns used to compute the shard column's // values. repeated string column_names = 3;
would it make sense to just hold onto column ID's here? I'm not sure why we need the names.
pkg/sql/sqlbase/table.go, line 225 at r3 (raw file):
// AddShardToIndexDesc populates the given index descriptor with details pertaining to how // it's sharded. func AddShardToIndexDesc(
This method basically doesn't do anything. I'd advocate just doing whatever you do here where you need it.
pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):
// GetShardColumnName generates a name for the hidden shard column to be used to create a // hash sharded index. func GetShardColumnName(colNames []string, buckets int32) string {
this will cause name conflicts if someone applies the duplicate index pattern on sharded indexes right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theres alot of descriptor manipulation going on here -- since show create isn't a good way to introspect what the descriptor actually looks like at the end of the process, i think there should be some tests that look at raw table descriptors after the creation process and ensure things are as you expect.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
Do we want users to be able to "inspect" this sharded table and see whats actually happening, or no?
No, we'd decided that we wanted it to be transparent to the user.
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
I might have missed this, but are there any issues with name conflicts if you made another sharded index on a with the same bucket counts here?
This isn't a problem since postgres (and CRDB) allows creation of multiple identical indices. I'll add this to the logictests to make it clear. With regards to the hidden computed shard column, we reuse the same shard column if there is already an index on the same set of columns with the same bucket_count.
pkg/sql/parser/sql.y, line 5128 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
just a thought, have we ruled out the syntax of
USING HASH WITH a_expr BUCKETS
We didn't consider it since we wanted to borrow from the SQL Server syntax. Additionally, the current approach allows us to add more options to this feature in the future.
pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):
Previously, rohany (Rohan Yadav) wrote…
this will cause name conflicts if someone applies the duplicate index pattern on sharded indexes right?
This is intentional since we want to avoid re-creating multiple shard columns that are always equivalent. See comment above and the startExec
method in drop_index (we only drop the hidden shard column once we're sure there aren't any other sharded indexes referring to it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
No, we'd decided that we wanted it to be transparent to the user.
this is opaque though right -- users don't know (or just can't see) what using hash actually does under the hood
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
This isn't a problem since postgres (and CRDB) allows creation of multiple identical indices. I'll add this to the logictests to make it clear. With regards to the hidden computed shard column, we reuse the same shard column if there is already an index on the same set of columns with the same bucket_count.
i meant that its allowed, but you might run into name conflicts when you auto generate these names if you aren't careful
pkg/sql/parser/sql.y, line 5128 at r3 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
We didn't consider it since we wanted to borrow from the SQL Server syntax. Additionally, the current approach allows us to add more options to this feature in the future.
makes sense
pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
This is intentional since we want to avoid re-creating multiple shard columns that are always equivalent. See comment above and the
startExec
method in drop_index (we only drop the hidden shard column once we're sure there aren't any other sharded indexes referring to it).
yup makes sense.
bd12739
to
b28482c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this merges, can you also add a check in alter_table.go that disallows primary key changes from tables that have a hash shared primary key?
Done.
This is RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 1 at r10 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
I added some logic tests around this, however they are currently failing for reasons I haven't looked into yet. I'll try to understand what's happening and get them deflaked soon.
The tests were flaky because the logictest package adds random column family definitions if a CREATE TABLE
statement doesn't contain any. See: #41013 for details. Fixed now.
b28482c
to
6d1cb93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: get rid of the indentation on the release note.
I think we can spruce up the release note in general.
This is super close. Thanks for all your hard work! I'd love to get this merged this week. Again let us know if you anticipate having any trouble working on this.
Reviewed 1 of 11 files at r5, 27 of 27 files at r11.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/settings/cluster/cockroach_versions.go, line 409 at r11 (raw file):
// ranges with said shard column. Key: VersionHashShardedIndexes, Version: roachpb.Version{Major: 19, Minor: 2, Unstable: 11},
This needs an update
pkg/settings/cluster/versionkey_string.go, line 0 at r11 (raw file):
This needs to be regenerated after the fix.
pkg/sql/create_index.go, line 217 at r10 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
If we stick with the current version of the
ComputeExpr
(where we're summing the hashes of individual columns), the commutative nature of the expression makes it such that theName
of a shard column uniquely identifies it. If we end up switching back to the oldComputeExpr
(where we dofnv(columnSet..)
) then we end up in a situation whereexistingShardCol
could differ from the newshardCol
. Is this your concern?existingShardCol
would also differ fromshardCol
if we end up changing the way we compute the shard columns down the line in a subsequent release.
I mostly just didn't like the idea that this code was going to make assumptions about the only reason FindColumnByName
might return an error.
In most cases I agree with you that it's fine if the column used a different expression than maybe we might now create. I just want to call out the possibility to a reader.
pkg/sql/create_table.go, line 1519 at r4 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
wow TIL.
I still worry there are cases where this doesn't work. What does this expression actually look like for these weird cases? Try a case with a capital letter in the middle of the word. Maybe we store the string with the "
s in the colNames, I don't know, but given I had the question can you find out and call out in an NB
at the point of formatting above?
pkg/sql/create_table.go, line 1143 at r5 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Thanks, I lifted the common stuff into a shared function and it's definitely much cleaner now. There is still some duplication (in case the sharded primary key is a
ColumnTableDef
) but the wayMakeTableDesc
is currently structured makes it a bit hard to improve. Let me know if you'd like something done differently.
lgtm, thanks.
pkg/sql/create_table.go, line 1286 at r11 (raw file):
} if newColumn { buckets, _ := tree.EvalShardBucketCount(d.Sharded.ShardBuckets)
Why are you ignoring this error? If it's due to some assumption that this has already passed, please add a comment.
pkg/sql/drop_index.go, line 336 at r11 (raw file):
} // The index being deleted could be used as the origin index for this foreign key. if idx.IsValidOriginIndex(fk.OriginColumnIDs) {
why not &&
and lose a layer of indentation?
pkg/sql/drop_index.go, line 362 at r11 (raw file):
// If we haven't found a replacement candidate for this foreign key, then // we need a cascade to delete this index. if !indexHasReplacementCandidate(canReplace) {
nit: please use &&
on the conditions rather than the nested if
.
pkg/sql/vars.go, line 871 at r11 (raw file):
}, // CockroachDB extension.
@rohany does it make sense to have this as a session variable or a cluster setting? I was thinking cluster setting.
pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 463 at r11 (raw file):
INDEX foo ("'quotes' in the column's name" ASC) USING HASH WITH BUCKET_COUNT = 4, FAMILY "primary" ("I am a column with spaces", "'quotes' in the column's name", "crdb_internal_I am a column with spaces_shard_12", "crdb_internal_'quotes' in the column's name_shard_4") )
nit: newline
pkg/sql/sqlbase/table.go, line 231 at r10 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Reverted.
Whoops. Sorry for the back and forth. I prefer it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/vars.go, line 871 at r11 (raw file):
Previously, ajwerner wrote…
@rohany does it make sense to have this as a session variable or a cluster setting? I was thinking cluster setting.
I think it's best to add it as both right now, with the session variable having default value equal to the cluster setting. Thats what we've been doing elsewhere for similar things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I'm going to take a stab at updating this to get it merged.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
6f98f6b
to
fe685e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohany is this missing anything?
Reviewed 1 of 20 files at r12.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
I think its all good |
fe685e8
to
afdace2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I reworked the expression creation to build expressions using the AST instead of building a string and then parsing it. This was done out of an abundance of fear of an eventual sql injection attack and also to assuage any fears of quoting mistakes.
Once it builds I think I'm going to bors it. I played with it manually, it works and it's cool!
Thanks again @aayushshah15 for all of your hard work!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
This PR accompanies the hash sharded indexes RFC (cockroachdb#42924). Currently, in CockroachDB, write workloads that are sequential on a particular key will cause a hotspot on a single range if there's any sort of index with the said key as a prefix. This PR alleviates this problem by allowing users to _shard_ their indexes into a specific number of buckets by creating a computed shard column under the hood and prefixing the index with this shard column in order to distribute sequential traffic uniformly over the index's ranges. Fixes cockroachdb#39340 Release note (sql change): Add native support of hash-sharded indexes with new `USING HASH WITH BUCKET_COUNT = <n>` syntax for indices (including the primary index of a table). This feature allows users to easily relieve write hot-spots caused by sequential insert patterns at the cost of scan time for queries over the hashed dimension.
afdace2
to
3a14a3b
Compare
bors r=rohany,aayushshah |
42922: sql: native support for creating hash sharded indexes r=rohany,aayushshah a=aayushshah15 This PR accompanies the hash sharded indexes RFC (#42924). Currently, in CockroachDB, write workloads that are sequential on a particular key will cause a hotspot on a single range if there's any sort of index with the said key as a prefix. This PR alleviates this problem by allowing users to _shard_ their indexes into a specific number of buckets by creating a computed shard column under the hood and prefixing the index with this shard column in order to distribute sequential traffic uniformly over the index's ranges. Fixes #39340 TODOs: 1) Gate the creation of hash sharded indexes to 20.1 2) Better testing Release note (sql change): Improve user experience for creating hash sharded indexes. Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_table.go, line 1205 at r14 (raw file):
} if d.PrimaryKey.Sharded { if !cluster.Version.IsActive(ctx, st, cluster.VersionHashShardedIndexes) {
st
can be nil here. cluster.Version
can also be uninitialized (see comment above the if st != nil
block above).
I'm not familiar enough with this area to figure what should be done here if st == nil
or if active cluster version == cluster.ClusterVersion{}
. Any ideas? @rohany perhaps? Ran into this when rebasing #45455, so blocked.
I think using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 20 files at r12, 14 of 14 files at r13.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_table.go, line 1286 at r11 (raw file):
Previously, ajwerner wrote…
Why are you ignoring this error? If it's due to some assumption that this has already passed, please add a comment.
Done.
pkg/sql/create_table.go, line 1205 at r14 (raw file):
Previously, irfansharif (irfan sharif) wrote…
st
can be nil here.cluster.Version
can also be uninitialized (see comment above theif st != nil
block above).I'm not familiar enough with this area to figure what should be done here if
st == nil
or ifactive cluster version == cluster.ClusterVersion{}
. Any ideas? @rohany perhaps? Ran into this when rebasing #45455, so blocked.
in what cases can st == nil
be true here? If it's testing, can we fix those cases by plumbing cluster settings in?
When is active cluster version ClusterVersion{}
? Shouldn't we default that value to the minimum cluster version for this release?
pkg/sql/drop_index.go, line 336 at r11 (raw file):
Previously, ajwerner wrote…
why not
&&
and lose a layer of indentation?
Done.
pkg/sql/drop_index.go, line 362 at r11 (raw file):
Previously, ajwerner wrote…
nit: please use
&&
on the conditions rather than the nestedif
.
Done.
I take it back -- I can't seem to find whatever was passing a nil settings into here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_table.go, line 1205 at r14 (raw file):
Previously, ajwerner wrote…
in what cases can
st == nil
be true here? If it's testing, can we fix those cases by plumbing cluster settings in?When is active cluster version
ClusterVersion{}
? Shouldn't we default that value to the minimum cluster version for this release?
cockroach/pkg/settings/cluster/settings.go
Line 286 in 6595c62
return ClusterVersion{} |
It's ClusterVersion{}
when the version is uninitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_table.go, line 1205 at r14 (raw file):
Previously, irfansharif (irfan sharif) wrote…
cockroach/pkg/settings/cluster/settings.go
Line 286 in 6595c62
return ClusterVersion{} It's
ClusterVersion{}
when the version is uninitialized.
I'm fine with disallowing in the case that the version is empty though given the node startup flow I think this should still only happen in testing and is worthy of a comment. Let's not introduce the nil check on the setting if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)
pkg/sql/create_table.go, line 1205 at r14 (raw file):
Previously, ajwerner wrote…
I'm fine with disallowing in the case that the version is empty though given the node startup flow I think this should still only happen in testing and is worthy of a comment. Let's not introduce the nil check on the setting if we can avoid it.
Done (#45455 (comment)). I'll leave the nil check cleanup to Rohan, I think it affects more places than just here anyway.
This PR accompanies the hash sharded indexes RFC (#42924).
Currently, in CockroachDB, write workloads that are sequential on a
particular key will cause a hotspot on a single range if there's any
sort of index with the said key as a prefix.
This PR alleviates this problem by allowing users to shard their
indexes into a specific number of buckets by creating a computed shard
column under the hood and prefixing the index with this shard column in
order to distribute sequential traffic uniformly over the index's ranges.
Fixes #39340
TODOs:
Release note (sql change): Improve user experience for creating hash
sharded indexes.