sql: native support for creating hash sharded indexes #42922

aayushshah15 · 2019-12-03T20:39:30Z

This PR accompanies the hash sharded indexes RFC (#42924).

Currently, in CockroachDB, write workloads that are sequential on a
particular key will cause a hotspot on a single range if there's any
sort of index with the said key as a prefix.

This PR alleviates this problem by allowing users to shard their
indexes into a specific number of buckets by creating a computed shard
column under the hood and prefixing the index with this shard column in
order to distribute sequential traffic uniformly over the index's ranges.

Fixes #39340

TODOs:

Gate the creation of hash sharded indexes to 20.1
Better testing

Release note (sql change): Improve user experience for creating hash
sharded indexes.

cockroach-teamcity · 2019-12-03T20:39:37Z

This change is

This RFC proposes to provide better user experience when creating hash sharded indexes, in order to alleviate single range hotspots in sequential workloads. This PR accompanies cockroachdb#42922 which implements the ideas described in the RFC. Release note: None

rohany

I took a cursory look at this after reading the RFC

Reviewed 6 of 19 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @solongordon, and @yuzefovich)

pkg/sql/create_index.go, line 142 at r1 (raw file):

}

func createAndAddShardColToTable(

I'm not 100% sure about computed columns, but I'm pretty sure they will need a backfill if they are stored on disk -- you'll have to add the column as a mutation rather than adding it directly to the table when this isn't in a create table statement.

pkg/sql/create_table.go, line 1124 at r1 (raw file):

			idxColumns := d.Columns
			if d.Sharded != nil {
				var colNames []string

This code feels like its been duplicated many times -- is there a way to unify this?

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @solongordon, and @yuzefovich)

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 65 at r2 (raw file):

                        FAMILY "primary" (a, a_shard__internal, rowid)
)

We should write tests around CREATE INDEX. Also it'd be good to exercises cases of dropping and re-adding sharded indexes. Also probably worthwhile to add some rows to the table.

There are some logic tests which look at plans. They may be useful here.

pkg/sql/parser/sql.y, line 4814 at r2 (raw file):

opt_hash_sharded:
  USING HASH WITH BUCKET_COUNT '=' a_expr

I made this comment on the other PR. How do you feel about using the opt_with_clause here instead of makign BUCKET_COUNT a token in the grammar?

aayushshah15 · 2019-12-04T18:54:52Z

pkg/sql/parser/sql.y, line 4814 at r2 (raw file):

Previously, ajwerner wrote…

I made this comment on the other PR. How do you feel about using the opt_with_clause here instead of makign BUCKET_COUNT a token in the grammar?

Yup sounds good, on it.

yuzefovich

Nice work!

I'm not too familiar with this part of the codebase, but I left some nits and questions.

Reviewed 12 of 19 files at r1, 1 of 1 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @jordanlewis, @rohany, and @solongordon)

pkg/sql/create_index.go, line 105 at r2 (raw file):

		// check constraint here, but doing it above this function would lead to a bunch
		// of code duplication.
		ckDef, err := makeShardCheckConstraintDef(tableDesc, int(buckets), shardCol)

The error here is unchecked.

Why you don't like creating a check constraint here?

pkg/sql/create_index.go, line 145 at r2 (raw file):

	shardBuckets int, desc *sqlbase.MutableTableDescriptor, colNames []string, shouldAssignID bool,
) (*sqlbase.ColumnDescriptor, error) {
	shardCol, err := makeShardColumnDesc(colNames, shardBuckets, false)

[nit]: please add an inline comment what argument false is here for.

pkg/sql/create_index.go, line 149 at r2 (raw file):

		return nil, err
	}
	if !hasColumn(*desc, shardCol.Name) {

This check seems slightly suspicious to me - the function is named "create and add" shard column, so I'd expect for shard column to always be added. It probably deserves a callout in the comment for the method.

pkg/sql/create_index.go, line 169 at r2 (raw file):

	// index is created in a 'CREATE TABLE` statement but I can't be 100% sure. Does this
	// seem like a valid thing to do?
	if shouldAssignID {

Why wouldn't we want to assign an ID on every creation? Is this what your comment is about as well?

pkg/sql/create_table.go, line 1124 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

This code feels like its been duplicated many times -- is there a way to unify this?

+1

pkg/sql/sem/tree/create.go, line 566 at r2 (raw file):

	if node.Name != "" {
		ctx.FormatNode(&node.Name)
		ctx.WriteByte(' ')

We don't need this space?

pkg/sql/sqlbase/structured.go, line 499 at r2 (raw file):

	if !ok {
		return 0, pgerror.New(pgcode.InvalidParameterValue,
			`BUCKET_COUNT must be a strictly positive integer value.`)

[nit]: the error message could be extracted into a variable.

pkg/sql/sqlbase/structured.go, line 587 at r2 (raw file):

// IsSharded returns whether the index is hash sharded or not.
func (desc *IndexDescriptor) IsSharded() bool {
	return !reflect.DeepEqual(desc.Sharded, ShardedDescriptor{})

This reflection check seems suspicious - I think usually we do interface conversion in such a scenario, is it not possible in this case?

pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):

  // Sharded, if it's not the zero value, describes how this index is sharded.
  optional ShardedDescriptor sharded = 18 [(gogoproto.nullable) = false];

I think this field could be nullable, and then IsSharded check would compare this field against nil, no?

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_index.go, line 142 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I'm not 100% sure about computed columns, but I'm pretty sure they will need a backfill if they are stored on disk -- you'll have to add the column as a mutation rather than adding it directly to the table when this isn't in a create table statement.

Thanks for pointing this out, fixed.

pkg/sql/create_index.go, line 105 at r2 (raw file):

Previously, yuzefovich wrote…

The error here is unchecked.

Why you don't like creating a check constraint here?

I didn't like the idea of creating the shard column, its check constraint, etc inside of a method named MakeIndexDescriptor since that name makes it seem like it should only be doing that one thing. Fixed now.

pkg/sql/create_index.go, line 145 at r2 (raw file):

Previously, yuzefovich wrote…

[nit]: please add an inline comment what argument false is here for.

Done.

pkg/sql/create_index.go, line 149 at r2 (raw file):

Previously, yuzefovich wrote…

This check seems slightly suspicious to me - the function is named "create and add" shard column, so I'd expect for shard column to always be added. It probably deserves a callout in the comment for the method.

Made it more explicit.

pkg/sql/create_index.go, line 169 at r2 (raw file):

Previously, yuzefovich wrote…

Why wouldn't we want to assign an ID on every creation? Is this what your comment is about as well?

I changed this to instead do a separate pass through all the indexes in the table (see AssignFamiliesToShardColumns) and if they're hash sharded, assign the shard column to the column family of the first column in the set of index columns. imo, it seems cleaner this way. Let me know if any of this doesn't make sense.

pkg/sql/create_table.go, line 1124 at r1 (raw file):

Previously, yuzefovich wrote…

+1

I wanted to stay consistent to the pattern followed in the rest of that method, but in retrospect this seems misguided. Fixed now.

pkg/sql/sqlbase/structured.go, line 499 at r2 (raw file):

Previously, yuzefovich wrote…

[nit]: the error message could be extracted into a variable.

Done.

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 65 at r2 (raw file):

Previously, ajwerner wrote…

We should write tests around CREATE INDEX. Also it'd be good to exercises cases of dropping and re-adding sharded indexes. Also probably worthwhile to add some rows to the table.

There are some logic tests which look at plans. They may be useful here.

Done.

pkg/sql/sqlbase/structured.go, line 587 at r2 (raw file):

Previously, yuzefovich wrote…

This reflection check seems suspicious - I think usually we do interface conversion in such a scenario, is it not possible in this case?

Changed that field to a pointer, see comment below.

pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):

Previously, yuzefovich wrote…

I think this field could be nullable, and then IsSharded check would compare this field against nil, no?

Done. I'm curious, is there a good reason we force nullable = false for all the other fields?

aayushshah15

I still have to gate sharded index creation behind 20.1, but otherwise this is RFAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

rohany

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_index.go, line 84 at r3 (raw file):

	}

	if n.Sharded != nil {

Small nit, but I feel like this code would be cleaner if it was possible to move it into a separate function.

pkg/sql/create_table.go, line 992 at r3 (raw file):

}

func hasColumn(desc sqlbase.MutableTableDescriptor, colName string) bool {

There is already a findColumnByName function you could use

pkg/sql/drop_index.go, line 131 at r3 (raw file):

				for i := range tableDesc.Columns {
					if tableDesc.Columns[i].ID == shardColDesc.ID {
						tableDesc.Columns = append(tableDesc.Columns[:i:i], tableDesc.Columns[i+1:]...)

this allocates basically everytime you delete one of the elements -- probably better to do something like

tmp := tableDesc.Columns[:0]
// do the loop
if keep element {
  tmp = append(tmp, elem)
}
tableDesc.Columns = tmp

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):

SHOW CREATE TABLE sharded_primary
----
sharded_primary  CREATE TABLE sharded_primary (

Do we want users to be able to "inspect" this sharded table and see whats actually happening, or no?

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):

                   a INT8 NULL,
                   INDEX sharded_secondary_a_shard_10_internal_a_idx (a ASC) USING HASH WITH BUCKET_COUNT=10,
                   INDEX sharded_secondary_a_shard_4_internal_a_idx (a ASC) USING HASH WITH BUCKET_COUNT=4,

I might have missed this, but are there any issues with name conflicts if you made another sharded index on a with the same bucket counts here?

pkg/sql/parser/sql.y, line 5128 at r3 (raw file):

// 
// Hash sharded clause: 
//    USING HASH WITH BUCKET_COUNT = <shard_buckets>

just a thought, have we ruled out the syntax of

USING HASH WITH a_expr BUCKETS

pkg/sql/parser/sql.y, line 5132 at r3 (raw file):

// %SeeAlso: CREATE TABLE, SHOW INDEXES, SHOW CREATE,
// WEBDOCS/create-index.html
// TODO DURING REVIEW: how do I clarify above that the hash sharded clause only applies to non-inverted indexes?

You'd want that to just show up in the docs -- i think we just do pure syntax here.

pkg/sql/sem/tree/create.go, line 217 at r3 (raw file):

	}
	PrimaryKey struct {
		PrimaryKey   bool

rename this to IsPrimaryKey or something -- seeing this used in other places in the code was confusing

pkg/sql/sqlbase/structured.go, line 745 at r3 (raw file):

func (desc *MutableTableDescriptor) AssignFamiliesToShardColumns() error {
	getColumnFamilyForShard := func(shardCol string, idxColumns []string) (string, error) {
		for _, f := range desc.GetFamilies() {

just looping over the Families slice would be fine. also, we usually loop over the indices and index into the list for these larger structs to avoid copies

pkg/sql/sqlbase/structured.go, line 765 at r3 (raw file):

				return err
			}
			desc.RemoveColumnFromFamily(shardColDesc.ID)

Why do you remove the column from a family and then add it back?

pkg/sql/sqlbase/structured.go, line 775 at r3 (raw file):

// PrintColumnNamesAndIDs prints list of column names and ids to stdout.
func (desc *TableDescriptor) PrintColumnNamesAndIDs() {

should these be removed before merging?

pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Done. I'm curious, is there a good reason we force nullable = false for all the other fields?

we do nullable = false to avoid a pointer (an allocation). I think you can make it not nullable and use the generated IsSet method.

pkg/sql/sqlbase/structured.proto, line 227 at r3 (raw file):

  // ColumnNames lists the names of the columns used to compute the shard column's
  // values.
  repeated string column_names = 3;

would it make sense to just hold onto column ID's here? I'm not sure why we need the names.

pkg/sql/sqlbase/table.go, line 225 at r3 (raw file):

// AddShardToIndexDesc populates the given index descriptor with details pertaining to how
// it's sharded.
func AddShardToIndexDesc(

This method basically doesn't do anything. I'd advocate just doing whatever you do here where you need it.

pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):

// GetShardColumnName generates a name for the hidden shard column to be used to create a
// hash sharded index.
func GetShardColumnName(colNames []string, buckets int32) string {

this will cause name conflicts if someone applies the duplicate index pattern on sharded indexes right?

rohany

Theres alot of descriptor manipulation going on here -- since show create isn't a good way to introspect what the descriptor actually looks like at the end of the process, i think there should be some tests that look at raw table descriptors after the creation process and ensure things are as you expect.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

aayushshah15

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Do we want users to be able to "inspect" this sharded table and see whats actually happening, or no?

No, we'd decided that we wanted it to be transparent to the user.

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I might have missed this, but are there any issues with name conflicts if you made another sharded index on a with the same bucket counts here?

This isn't a problem since postgres (and CRDB) allows creation of multiple identical indices. I'll add this to the logictests to make it clear. With regards to the hidden computed shard column, we reuse the same shard column if there is already an index on the same set of columns with the same bucket_count.

pkg/sql/parser/sql.y, line 5128 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

just a thought, have we ruled out the syntax of
USING HASH WITH a_expr BUCKETS

We didn't consider it since we wanted to borrow from the SQL Server syntax. Additionally, the current approach allows us to add more options to this feature in the future.

pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

this will cause name conflicts if someone applies the duplicate index pattern on sharded indexes right?

This is intentional since we want to avoid re-creating multiple shard columns that are always equivalent. See comment above and the startExec method in drop_index (we only drop the hidden shard column once we're sure there aren't any other sharded indexes referring to it).

rohany

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

No, we'd decided that we wanted it to be transparent to the user.

this is opaque though right -- users don't know (or just can't see) what using hash actually does under the hood

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

This isn't a problem since postgres (and CRDB) allows creation of multiple identical indices. I'll add this to the logictests to make it clear. With regards to the hidden computed shard column, we reuse the same shard column if there is already an index on the same set of columns with the same bucket_count.

i meant that its allowed, but you might run into name conflicts when you auto generate these names if you aren't careful

pkg/sql/parser/sql.y, line 5128 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

We didn't consider it since we wanted to borrow from the SQL Server syntax. Additionally, the current approach allows us to add more options to this feature in the future.

makes sense

pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

This is intentional since we want to avoid re-creating multiple shard columns that are always equivalent. See comment above and the startExec method in drop_index (we only drop the hidden shard column once we're sure there aren't any other sharded indexes referring to it).

yup makes sense.

aayushshah15

Before this merges, can you also add a check in alter_table.go that disallows primary key changes from tables that have a hash shared primary key?

Done.

This is RFAL.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 1 at r10 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I added some logic tests around this, however they are currently failing for reasons I haven't looked into yet. I'll try to understand what's happening and get them deflaked soon.

The tests were flaky because the logictest package adds random column family definitions if a CREATE TABLE statement doesn't contain any. See: #41013 for details. Fixed now.

ajwerner

nit: get rid of the indentation on the release note.

I think we can spruce up the release note in general.

This is super close. Thanks for all your hard work! I'd love to get this merged this week. Again let us know if you anticipate having any trouble working on this.

Reviewed 1 of 11 files at r5, 27 of 27 files at r11.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/settings/cluster/cockroach_versions.go, line 409 at r11 (raw file):

		// ranges with said shard column.
		Key:     VersionHashShardedIndexes,
		Version: roachpb.Version{Major: 19, Minor: 2, Unstable: 11},

This needs an update

pkg/settings/cluster/versionkey_string.go, line 0 at r11 (raw file):
This needs to be regenerated after the fix.

pkg/sql/create_index.go, line 217 at r10 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

If we stick with the current version of the ComputeExpr (where we're summing the hashes of individual columns), the commutative nature of the expression makes it such that the Name of a shard column uniquely identifies it. If we end up switching back to the old ComputeExpr (where we do fnv(columnSet..)) then we end up in a situation where existingShardCol could differ from the new shardCol. Is this your concern? existingShardCol would also differ from shardCol if we end up changing the way we compute the shard columns down the line in a subsequent release.

I mostly just didn't like the idea that this code was going to make assumptions about the only reason FindColumnByName might return an error.

In most cases I agree with you that it's fine if the column used a different expression than maybe we might now create. I just want to call out the possibility to a reader.

pkg/sql/create_table.go, line 1519 at r4 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

wow TIL.

I still worry there are cases where this doesn't work. What does this expression actually look like for these weird cases? Try a case with a capital letter in the middle of the word. Maybe we store the string with the "s in the colNames, I don't know, but given I had the question can you find out and call out in an NB at the point of formatting above?

pkg/sql/create_table.go, line 1143 at r5 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Thanks, I lifted the common stuff into a shared function and it's definitely much cleaner now. There is still some duplication (in case the sharded primary key is a ColumnTableDef) but the way MakeTableDesc is currently structured makes it a bit hard to improve. Let me know if you'd like something done differently.

lgtm, thanks.

pkg/sql/create_table.go, line 1286 at r11 (raw file):

		}
		if newColumn {
			buckets, _ := tree.EvalShardBucketCount(d.Sharded.ShardBuckets)

Why are you ignoring this error? If it's due to some assumption that this has already passed, please add a comment.

pkg/sql/drop_index.go, line 336 at r11 (raw file):

		}
		// The index being deleted could be used as the origin index for this foreign key.
		if idx.IsValidOriginIndex(fk.OriginColumnIDs) {

why not && and lose a layer of indentation?

pkg/sql/drop_index.go, line 362 at r11 (raw file):

			// If we haven't found a replacement candidate for this foreign key, then
			// we need a cascade to delete this index.
			if !indexHasReplacementCandidate(canReplace) {

nit: please use && on the conditions rather than the nested if.

pkg/sql/vars.go, line 871 at r11 (raw file):

	},

	// CockroachDB extension.

@rohany does it make sense to have this as a session variable or a cluster setting? I was thinking cluster setting.

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 463 at r11 (raw file):

                INDEX foo ("'quotes' in the column's name" ASC) USING HASH WITH BUCKET_COUNT = 4,
                FAMILY "primary" ("I am a column with spaces", "'quotes' in the column's name", "crdb_internal_I am a column with spaces_shard_12", "crdb_internal_'quotes' in the column's name_shard_4")
)

nit: newline

pkg/sql/sqlbase/table.go, line 231 at r10 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Reverted.

Whoops. Sorry for the back and forth. I prefer it this way.

rohany

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/vars.go, line 871 at r11 (raw file):

Previously, ajwerner wrote…

@rohany does it make sense to have this as a session variable or a cluster setting? I was thinking cluster setting.

I think it's best to add it as both right now, with the session variable having default value equal to the cluster setting. Thats what we've been doing elsewhere for similar things.

ajwerner

Alright I'm going to take a stab at updating this to get it merged.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

ajwerner

@rohany is this missing anything?

Reviewed 1 of 20 files at r12.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

rohany · 2020-02-10T23:01:22Z

I think its all good

ajwerner

Alright, I reworked the expression creation to build expressions using the AST instead of building a string and then parsing it. This was done out of an abundance of fear of an eventual sql injection attack and also to assuage any fears of quoting mistakes.

Once it builds I think I'm going to bors it. I played with it manually, it works and it's cool!

Thanks again @aayushshah15 for all of your hard work!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

aayushshah15 · 2020-02-11T04:19:55Z

thanks for all your help @rohany and @ajwerner, glad to see it go in.

This PR accompanies the hash sharded indexes RFC (cockroachdb#42924). Currently, in CockroachDB, write workloads that are sequential on a particular key will cause a hotspot on a single range if there's any sort of index with the said key as a prefix. This PR alleviates this problem by allowing users to _shard_ their indexes into a specific number of buckets by creating a computed shard column under the hood and prefixing the index with this shard column in order to distribute sequential traffic uniformly over the index's ranges. Fixes cockroachdb#39340 Release note (sql change): Add native support of hash-sharded indexes with new `USING HASH WITH BUCKET_COUNT = <n>` syntax for indices (including the primary index of a table). This feature allows users to easily relieve write hot-spots caused by sequential insert patterns at the cost of scan time for queries over the hashed dimension.

ajwerner · 2020-02-11T14:32:19Z

bors r=rohany,aayushshah

42922: sql: native support for creating hash sharded indexes r=rohany,aayushshah a=aayushshah15 This PR accompanies the hash sharded indexes RFC (#42924). Currently, in CockroachDB, write workloads that are sequential on a particular key will cause a hotspot on a single range if there's any sort of index with the said key as a prefix. This PR alleviates this problem by allowing users to _shard_ their indexes into a specific number of buckets by creating a computed shard column under the hood and prefixing the index with this shard column in order to distribute sequential traffic uniformly over the index's ranges. Fixes #39340 TODOs: 1) Gate the creation of hash sharded indexes to 20.1 2) Better testing Release note (sql change): Improve user experience for creating hash sharded indexes. Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>

craig · 2020-02-11T15:11:43Z

Build succeeded

GitHub CI (Cockroach)

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_table.go, line 1205 at r14 (raw file):

			}
			if d.PrimaryKey.Sharded {
				if !cluster.Version.IsActive(ctx, st, cluster.VersionHashShardedIndexes) {

st can be nil here. cluster.Version can also be uninitialized (see comment above the if st != nil block above).

I'm not familiar enough with this area to figure what should be done here if st == nil or if active cluster version == cluster.ClusterVersion{}. Any ideas? @rohany perhaps? Ran into this when rebasing #45455, so blocked.

rohany · 2020-02-27T21:46:13Z

I think using ActiveVersionOrEmpty and allowing the hash sharded index to be created if the resulting version != cluster.ClusterVersion{} makes sense.

ajwerner

Reviewed 10 of 20 files at r12, 14 of 14 files at r13.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_table.go, line 1286 at r11 (raw file):

Previously, ajwerner wrote…

Why are you ignoring this error? If it's due to some assumption that this has already passed, please add a comment.

Done.

pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, irfansharif (irfan sharif) wrote…

st can be nil here. cluster.Version can also be uninitialized (see comment above the if st != nil block above).

I'm not familiar enough with this area to figure what should be done here if st == nil or if active cluster version == cluster.ClusterVersion{}. Any ideas? @rohany perhaps? Ran into this when rebasing #45455, so blocked.

in what cases can st == nil be true here? If it's testing, can we fix those cases by plumbing cluster settings in?

When is active cluster version ClusterVersion{}? Shouldn't we default that value to the minimum cluster version for this release?

pkg/sql/drop_index.go, line 336 at r11 (raw file):

Previously, ajwerner wrote…

why not && and lose a layer of indentation?

Done.

pkg/sql/drop_index.go, line 362 at r11 (raw file):

Previously, ajwerner wrote…

nit: please use && on the conditions rather than the nested if.

Done.

rohany · 2020-02-27T22:27:41Z

I take it back -- I can't seem to find whatever was passing a nil settings into here.

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, ajwerner wrote…

in what cases can st == nil be true here? If it's testing, can we fix those cases by plumbing cluster settings in?

When is active cluster version ClusterVersion{}? Shouldn't we default that value to the minimum cluster version for this release?

cockroach/pkg/settings/cluster/settings.go

Line 286 in 6595c62

return ClusterVersion{}

It's ClusterVersion{} when the version is uninitialized.

ajwerner

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, irfansharif (irfan sharif) wrote…

cockroach/pkg/settings/cluster/settings.go

Line 286 in 6595c62

return ClusterVersion{}

It's ClusterVersion{} when the version is uninitialized.

I'm fine with disallowing in the case that the version is empty though given the node startup flow I think this should still only happen in testing and is worthy of a comment. Let's not introduce the nil check on the setting if we can avoid it.

irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, ajwerner wrote…

I'm fine with disallowing in the case that the version is empty though given the node startup flow I think this should still only happen in testing and is worthy of a comment. Let's not introduce the nil check on the setting if we can avoid it.

Done (#45455 (comment)). I'll leave the nil check cleanup to Rohan, I think it affects more places than just here anyway.

aayushshah15 requested review from ajwerner, yuzefovich, solongordon and jordanlewis December 3, 2019 20:40

aayushshah15 mentioned this pull request Dec 3, 2019

RFC: easier hash sharded indexes #42924

Merged

aayushshah15 force-pushed the hash_sharded_impl branch from e38c575 to 9b7a23f Compare December 3, 2019 21:07

rohany suggested changes Dec 3, 2019

View reviewed changes

ajwerner reviewed Dec 4, 2019

View reviewed changes

yuzefovich reviewed Dec 10, 2019

View reviewed changes

aayushshah15 force-pushed the hash_sharded_impl branch from 9b7a23f to 0d736cf Compare December 10, 2019 20:17

aayushshah15 commented Dec 10, 2019

View reviewed changes

aayushshah15 force-pushed the hash_sharded_impl branch 6 times, most recently from 9b58934 to 7954d4c Compare December 16, 2019 23:46

aayushshah15 commented Dec 16, 2019

View reviewed changes

aayushshah15 requested review from ajwerner, yuzefovich and rohany December 16, 2019 23:49

rohany suggested changes Dec 17, 2019

View reviewed changes

rohany reviewed Dec 17, 2019

View reviewed changes

aayushshah15 commented Dec 17, 2019

View reviewed changes

rohany reviewed Dec 17, 2019

View reviewed changes

aayushshah15 force-pushed the hash_sharded_impl branch from bd12739 to b28482c Compare February 3, 2020 13:11

aayushshah15 commented Feb 3, 2020

View reviewed changes

aayushshah15 force-pushed the hash_sharded_impl branch from b28482c to 6d1cb93 Compare February 3, 2020 14:40

ajwerner reviewed Feb 3, 2020

View reviewed changes

rohany reviewed Feb 3, 2020

View reviewed changes

ajwerner reviewed Feb 10, 2020

View reviewed changes

ajwerner force-pushed the hash_sharded_impl branch 2 times, most recently from 6f98f6b to fe685e8 Compare February 10, 2020 22:37

ajwerner reviewed Feb 10, 2020

View reviewed changes

ajwerner force-pushed the hash_sharded_impl branch from fe685e8 to afdace2 Compare February 11, 2020 01:47

ajwerner reviewed Feb 11, 2020

View reviewed changes

ajwerner force-pushed the hash_sharded_impl branch from afdace2 to 3a14a3b Compare February 11, 2020 13:50

craig bot merged commit 3a14a3b into cockroachdb:master Feb 11, 2020

awoods187 mentioned this pull request Feb 11, 2020

sql: hash sharded indexes cockroachdb/docs#6518

Closed

irfansharif reviewed Feb 27, 2020

View reviewed changes

ajwerner reviewed Feb 27, 2020

View reviewed changes

irfansharif reviewed Feb 27, 2020

View reviewed changes

ajwerner reviewed Feb 27, 2020

View reviewed changes

irfansharif mentioned this pull request Feb 27, 2020

clusterversion: significantly rework cluster version handling #45455

Merged

irfansharif reviewed Feb 27, 2020

View reviewed changes

rohany mentioned this pull request Feb 27, 2020

bug: investigate use of ActiveVersionOrEmpty in create_table.go #45519

Closed

jseldess mentioned this pull request Mar 2, 2020

sql: native support for creating hash sharded indexes cockroachdb/docs#6730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: native support for creating hash sharded indexes #42922

sql: native support for creating hash sharded indexes #42922

aayushshah15 commented Dec 3, 2019 •

edited

Loading

cockroach-teamcity commented Dec 3, 2019

rohany left a comment

ajwerner left a comment

aayushshah15 commented Dec 4, 2019

yuzefovich left a comment

aayushshah15 left a comment

aayushshah15 left a comment

aayushshah15 left a comment

rohany left a comment

rohany left a comment

aayushshah15 left a comment

rohany left a comment

aayushshah15 left a comment

ajwerner left a comment

rohany left a comment

ajwerner left a comment

ajwerner left a comment

rohany commented Feb 10, 2020

ajwerner left a comment

aayushshah15 commented Feb 11, 2020

ajwerner commented Feb 11, 2020

craig bot commented Feb 11, 2020

irfansharif left a comment

rohany commented Feb 27, 2020

ajwerner left a comment

rohany commented Feb 27, 2020

irfansharif left a comment

ajwerner left a comment

irfansharif left a comment

sql: native support for creating hash sharded indexes #42922

sql: native support for creating hash sharded indexes #42922

Conversation

aayushshah15 commented Dec 3, 2019 • edited Loading

cockroach-teamcity commented Dec 3, 2019

rohany left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

aayushshah15 commented Dec 4, 2019

yuzefovich left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

rohany left a comment

Choose a reason for hiding this comment

rohany left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

rohany left a comment

Choose a reason for hiding this comment

aayushshah15 left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

rohany left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

rohany commented Feb 10, 2020

ajwerner left a comment

Choose a reason for hiding this comment

aayushshah15 commented Feb 11, 2020

ajwerner commented Feb 11, 2020

craig bot commented Feb 11, 2020

Build succeeded

irfansharif left a comment

Choose a reason for hiding this comment

rohany commented Feb 27, 2020

ajwerner left a comment

Choose a reason for hiding this comment

rohany commented Feb 27, 2020

irfansharif left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

aayushshah15 commented Dec 3, 2019 •

edited

Loading