Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: native support for creating hash sharded indexes #42922

Merged
merged 1 commit into from
Feb 11, 2020

Conversation

aayushshah15
Copy link
Contributor

@aayushshah15 aayushshah15 commented Dec 3, 2019

This PR accompanies the hash sharded indexes RFC (#42924).

Currently, in CockroachDB, write workloads that are sequential on a
particular key will cause a hotspot on a single range if there's any
sort of index with the said key as a prefix.

This PR alleviates this problem by allowing users to shard their
indexes into a specific number of buckets by creating a computed shard
column under the hood and prefixing the index with this shard column in
order to distribute sequential traffic uniformly over the index's ranges.

Fixes #39340

TODOs:

  1. Gate the creation of hash sharded indexes to 20.1
  2. Better testing

Release note (sql change): Improve user experience for creating hash
sharded indexes.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Dec 3, 2019
This RFC proposes to provide better user experience when creating hash
sharded indexes, in order to alleviate single range hotspots in
sequential workloads.

This PR accompanies cockroachdb#42922 which implements the ideas described in the
RFC.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Dec 3, 2019
This RFC proposes to provide better user experience when creating hash
sharded indexes, in order to alleviate single range hotspots in
sequential workloads.

This PR accompanies cockroachdb#42922 which implements the ideas described in the
RFC.

Release note: None
Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a cursory look at this after reading the RFC

Reviewed 6 of 19 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @solongordon, and @yuzefovich)


pkg/sql/create_index.go, line 142 at r1 (raw file):

}

func createAndAddShardColToTable(

I'm not 100% sure about computed columns, but I'm pretty sure they will need a backfill if they are stored on disk -- you'll have to add the column as a mutation rather than adding it directly to the table when this isn't in a create table statement.


pkg/sql/create_table.go, line 1124 at r1 (raw file):

			idxColumns := d.Columns
			if d.Sharded != nil {
				var colNames []string

This code feels like its been duplicated many times -- is there a way to unify this?

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @solongordon, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 65 at r2 (raw file):

                        FAMILY "primary" (a, a_shard__internal, rowid)
)

We should write tests around CREATE INDEX. Also it'd be good to exercises cases of dropping and re-adding sharded indexes. Also probably worthwhile to add some rows to the table.

There are some logic tests which look at plans. They may be useful here.


pkg/sql/parser/sql.y, line 4814 at r2 (raw file):

opt_hash_sharded:
  USING HASH WITH BUCKET_COUNT '=' a_expr

I made this comment on the other PR. How do you feel about using the opt_with_clause here instead of makign BUCKET_COUNT a token in the grammar?

@aayushshah15
Copy link
Contributor Author


pkg/sql/parser/sql.y, line 4814 at r2 (raw file):

Previously, ajwerner wrote…

I made this comment on the other PR. How do you feel about using the opt_with_clause here instead of makign BUCKET_COUNT a token in the grammar?

Yup sounds good, on it.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I'm not too familiar with this part of the codebase, but I left some nits and questions.

Reviewed 12 of 19 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @jordanlewis, @rohany, and @solongordon)


pkg/sql/create_index.go, line 105 at r2 (raw file):

		// check constraint here, but doing it above this function would lead to a bunch
		// of code duplication.
		ckDef, err := makeShardCheckConstraintDef(tableDesc, int(buckets), shardCol)

The error here is unchecked.

Why you don't like creating a check constraint here?


pkg/sql/create_index.go, line 145 at r2 (raw file):

	shardBuckets int, desc *sqlbase.MutableTableDescriptor, colNames []string, shouldAssignID bool,
) (*sqlbase.ColumnDescriptor, error) {
	shardCol, err := makeShardColumnDesc(colNames, shardBuckets, false)

[nit]: please add an inline comment what argument false is here for.


pkg/sql/create_index.go, line 149 at r2 (raw file):

		return nil, err
	}
	if !hasColumn(*desc, shardCol.Name) {

This check seems slightly suspicious to me - the function is named "create and add" shard column, so I'd expect for shard column to always be added. It probably deserves a callout in the comment for the method.


pkg/sql/create_index.go, line 169 at r2 (raw file):

	// index is created in a 'CREATE TABLE` statement but I can't be 100% sure. Does this
	// seem like a valid thing to do?
	if shouldAssignID {

Why wouldn't we want to assign an ID on every creation? Is this what your comment is about as well?


pkg/sql/create_table.go, line 1124 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

This code feels like its been duplicated many times -- is there a way to unify this?

+1


pkg/sql/sem/tree/create.go, line 566 at r2 (raw file):

	if node.Name != "" {
		ctx.FormatNode(&node.Name)
		ctx.WriteByte(' ')

We don't need this space?


pkg/sql/sqlbase/structured.go, line 499 at r2 (raw file):

	if !ok {
		return 0, pgerror.New(pgcode.InvalidParameterValue,
			`BUCKET_COUNT must be a strictly positive integer value.`)

[nit]: the error message could be extracted into a variable.


pkg/sql/sqlbase/structured.go, line 587 at r2 (raw file):

// IsSharded returns whether the index is hash sharded or not.
func (desc *IndexDescriptor) IsSharded() bool {
	return !reflect.DeepEqual(desc.Sharded, ShardedDescriptor{})

This reflection check seems suspicious - I think usually we do interface conversion in such a scenario, is it not possible in this case?


pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):

  // Sharded, if it's not the zero value, describes how this index is sharded.
  optional ShardedDescriptor sharded = 18 [(gogoproto.nullable) = false]; 

I think this field could be nullable, and then IsSharded check would compare this field against nil, no?

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_index.go, line 142 at r1 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I'm not 100% sure about computed columns, but I'm pretty sure they will need a backfill if they are stored on disk -- you'll have to add the column as a mutation rather than adding it directly to the table when this isn't in a create table statement.

Thanks for pointing this out, fixed.


pkg/sql/create_index.go, line 105 at r2 (raw file):

Previously, yuzefovich wrote…

The error here is unchecked.

Why you don't like creating a check constraint here?

I didn't like the idea of creating the shard column, its check constraint, etc inside of a method named MakeIndexDescriptor since that name makes it seem like it should only be doing that one thing. Fixed now.


pkg/sql/create_index.go, line 145 at r2 (raw file):

Previously, yuzefovich wrote…

[nit]: please add an inline comment what argument false is here for.

Done.


pkg/sql/create_index.go, line 149 at r2 (raw file):

Previously, yuzefovich wrote…

This check seems slightly suspicious to me - the function is named "create and add" shard column, so I'd expect for shard column to always be added. It probably deserves a callout in the comment for the method.

Made it more explicit.


pkg/sql/create_index.go, line 169 at r2 (raw file):

Previously, yuzefovich wrote…

Why wouldn't we want to assign an ID on every creation? Is this what your comment is about as well?

I changed this to instead do a separate pass through all the indexes in the table (see AssignFamiliesToShardColumns) and if they're hash sharded, assign the shard column to the column family of the first column in the set of index columns. imo, it seems cleaner this way. Let me know if any of this doesn't make sense.


pkg/sql/create_table.go, line 1124 at r1 (raw file):

Previously, yuzefovich wrote…

+1

I wanted to stay consistent to the pattern followed in the rest of that method, but in retrospect this seems misguided. Fixed now.


pkg/sql/sqlbase/structured.go, line 499 at r2 (raw file):

Previously, yuzefovich wrote…

[nit]: the error message could be extracted into a variable.

Done.

@aayushshah15 aayushshah15 force-pushed the hash_sharded_impl branch 6 times, most recently from 9b58934 to 7954d4c Compare December 16, 2019 23:46
Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 65 at r2 (raw file):

Previously, ajwerner wrote…

We should write tests around CREATE INDEX. Also it'd be good to exercises cases of dropping and re-adding sharded indexes. Also probably worthwhile to add some rows to the table.

There are some logic tests which look at plans. They may be useful here.

Done.


pkg/sql/sqlbase/structured.go, line 587 at r2 (raw file):

Previously, yuzefovich wrote…

This reflection check seems suspicious - I think usually we do interface conversion in such a scenario, is it not possible in this case?

Changed that field to a pointer, see comment below.


pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):

Previously, yuzefovich wrote…

I think this field could be nullable, and then IsSharded check would compare this field against nil, no?

Done. I'm curious, is there a good reason we force nullable = false for all the other fields?

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to gate sharded index creation behind 20.1, but otherwise this is RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_index.go, line 84 at r3 (raw file):

	}

	if n.Sharded != nil {

Small nit, but I feel like this code would be cleaner if it was possible to move it into a separate function.


pkg/sql/create_table.go, line 992 at r3 (raw file):

}

func hasColumn(desc sqlbase.MutableTableDescriptor, colName string) bool {

There is already a findColumnByName function you could use


pkg/sql/drop_index.go, line 131 at r3 (raw file):

				for i := range tableDesc.Columns {
					if tableDesc.Columns[i].ID == shardColDesc.ID {
						tableDesc.Columns = append(tableDesc.Columns[:i:i], tableDesc.Columns[i+1:]...)

this allocates basically everytime you delete one of the elements -- probably better to do something like

tmp := tableDesc.Columns[:0]
// do the loop
if keep element {
  tmp = append(tmp, elem)
}
tableDesc.Columns = tmp

pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):

SHOW CREATE TABLE sharded_primary
----
sharded_primary  CREATE TABLE sharded_primary (

Do we want users to be able to "inspect" this sharded table and see whats actually happening, or no?


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):

                   a INT8 NULL,
                   INDEX sharded_secondary_a_shard_10_internal_a_idx (a ASC) USING HASH WITH BUCKET_COUNT=10,
                   INDEX sharded_secondary_a_shard_4_internal_a_idx (a ASC) USING HASH WITH BUCKET_COUNT=4,

I might have missed this, but are there any issues with name conflicts if you made another sharded index on a with the same bucket counts here?


pkg/sql/parser/sql.y, line 5128 at r3 (raw file):

// 
// Hash sharded clause: 
//    USING HASH WITH BUCKET_COUNT = <shard_buckets>

just a thought, have we ruled out the syntax of

USING HASH WITH a_expr BUCKETS

pkg/sql/parser/sql.y, line 5132 at r3 (raw file):

// %SeeAlso: CREATE TABLE, SHOW INDEXES, SHOW CREATE,
// WEBDOCS/create-index.html
// TODO DURING REVIEW: how do I clarify above that the hash sharded clause only applies to non-inverted indexes?

You'd want that to just show up in the docs -- i think we just do pure syntax here.


pkg/sql/sem/tree/create.go, line 217 at r3 (raw file):

	}
	PrimaryKey struct {
		PrimaryKey   bool

rename this to IsPrimaryKey or something -- seeing this used in other places in the code was confusing


pkg/sql/sqlbase/structured.go, line 745 at r3 (raw file):

func (desc *MutableTableDescriptor) AssignFamiliesToShardColumns() error {
	getColumnFamilyForShard := func(shardCol string, idxColumns []string) (string, error) {
		for _, f := range desc.GetFamilies() {

just looping over the Families slice would be fine. also, we usually loop over the indices and index into the list for these larger structs to avoid copies


pkg/sql/sqlbase/structured.go, line 765 at r3 (raw file):

				return err
			}
			desc.RemoveColumnFromFamily(shardColDesc.ID)

Why do you remove the column from a family and then add it back?


pkg/sql/sqlbase/structured.go, line 775 at r3 (raw file):

// PrintColumnNamesAndIDs prints list of column names and ids to stdout.
func (desc *TableDescriptor) PrintColumnNamesAndIDs() {

should these be removed before merging?


pkg/sql/sqlbase/structured.proto, line 418 at r2 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Done. I'm curious, is there a good reason we force nullable = false for all the other fields?

we do nullable = false to avoid a pointer (an allocation). I think you can make it not nullable and use the generated IsSet method.


pkg/sql/sqlbase/structured.proto, line 227 at r3 (raw file):

  // ColumnNames lists the names of the columns used to compute the shard column's
  // values.
  repeated string column_names = 3;

would it make sense to just hold onto column ID's here? I'm not sure why we need the names.


pkg/sql/sqlbase/table.go, line 225 at r3 (raw file):

// AddShardToIndexDesc populates the given index descriptor with details pertaining to how
// it's sharded.
func AddShardToIndexDesc(

This method basically doesn't do anything. I'd advocate just doing whatever you do here where you need it.


pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):

// GetShardColumnName generates a name for the hidden shard column to be used to create a
// hash sharded index.
func GetShardColumnName(colNames []string, buckets int32) string {

this will cause name conflicts if someone applies the duplicate index pattern on sharded indexes right?

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres alot of descriptor manipulation going on here -- since show create isn't a good way to introspect what the descriptor actually looks like at the end of the process, i think there should be some tests that look at raw table descriptors after the creation process and ensure things are as you expect.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

Do we want users to be able to "inspect" this sharded table and see whats actually happening, or no?

No, we'd decided that we wanted it to be transparent to the user.


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

I might have missed this, but are there any issues with name conflicts if you made another sharded index on a with the same bucket counts here?

This isn't a problem since postgres (and CRDB) allows creation of multiple identical indices. I'll add this to the logictests to make it clear. With regards to the hidden computed shard column, we reuse the same shard column if there is already an index on the same set of columns with the same bucket_count.


pkg/sql/parser/sql.y, line 5128 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

just a thought, have we ruled out the syntax of

USING HASH WITH a_expr BUCKETS

We didn't consider it since we wanted to borrow from the SQL Server syntax. Additionally, the current approach allows us to add more options to this feature in the future.


pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):

Previously, rohany (Rohan Yadav) wrote…

this will cause name conflicts if someone applies the duplicate index pattern on sharded indexes right?

This is intentional since we want to avoid re-creating multiple shard columns that are always equivalent. See comment above and the startExec method in drop_index (we only drop the hidden shard column once we're sure there aren't any other sharded indexes referring to it).

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 8 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

No, we'd decided that we wanted it to be transparent to the user.

this is opaque though right -- users don't know (or just can't see) what using hash actually does under the hood


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 117 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

This isn't a problem since postgres (and CRDB) allows creation of multiple identical indices. I'll add this to the logictests to make it clear. With regards to the hidden computed shard column, we reuse the same shard column if there is already an index on the same set of columns with the same bucket_count.

i meant that its allowed, but you might run into name conflicts when you auto generate these names if you aren't careful


pkg/sql/parser/sql.y, line 5128 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

We didn't consider it since we wanted to borrow from the SQL Server syntax. Additionally, the current approach allows us to add more options to this feature in the future.

makes sense


pkg/sql/sqlbase/table.go, line 237 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

This is intentional since we want to avoid re-creating multiple shard columns that are always equivalent. See comment above and the startExec method in drop_index (we only drop the hidden shard column once we're sure there aren't any other sharded indexes referring to it).

yup makes sense.

Copy link
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this merges, can you also add a check in alter_table.go that disallows primary key changes from tables that have a hash shared primary key?

Done.

This is RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 1 at r10 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

I added some logic tests around this, however they are currently failing for reasons I haven't looked into yet. I'll try to understand what's happening and get them deflaked soon.

The tests were flaky because the logictest package adds random column family definitions if a CREATE TABLE statement doesn't contain any. See: #41013 for details. Fixed now.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: get rid of the indentation on the release note.

I think we can spruce up the release note in general.

This is super close. Thanks for all your hard work! I'd love to get this merged this week. Again let us know if you anticipate having any trouble working on this.

Reviewed 1 of 11 files at r5, 27 of 27 files at r11.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/settings/cluster/cockroach_versions.go, line 409 at r11 (raw file):

		// ranges with said shard column.
		Key:     VersionHashShardedIndexes,
		Version: roachpb.Version{Major: 19, Minor: 2, Unstable: 11},

This needs an update


pkg/settings/cluster/versionkey_string.go, line 0 at r11 (raw file):
This needs to be regenerated after the fix.


pkg/sql/create_index.go, line 217 at r10 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

If we stick with the current version of the ComputeExpr (where we're summing the hashes of individual columns), the commutative nature of the expression makes it such that the Name of a shard column uniquely identifies it. If we end up switching back to the old ComputeExpr (where we do fnv(columnSet..)) then we end up in a situation where existingShardCol could differ from the new shardCol. Is this your concern? existingShardCol would also differ from shardCol if we end up changing the way we compute the shard columns down the line in a subsequent release.

I mostly just didn't like the idea that this code was going to make assumptions about the only reason FindColumnByName might return an error.

In most cases I agree with you that it's fine if the column used a different expression than maybe we might now create. I just want to call out the possibility to a reader.


pkg/sql/create_table.go, line 1519 at r4 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

wow TIL.

I still worry there are cases where this doesn't work. What does this expression actually look like for these weird cases? Try a case with a capital letter in the middle of the word. Maybe we store the string with the "s in the colNames, I don't know, but given I had the question can you find out and call out in an NB at the point of formatting above?


pkg/sql/create_table.go, line 1143 at r5 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Thanks, I lifted the common stuff into a shared function and it's definitely much cleaner now. There is still some duplication (in case the sharded primary key is a ColumnTableDef) but the way MakeTableDesc is currently structured makes it a bit hard to improve. Let me know if you'd like something done differently.

lgtm, thanks.


pkg/sql/create_table.go, line 1286 at r11 (raw file):

		}
		if newColumn {
			buckets, _ := tree.EvalShardBucketCount(d.Sharded.ShardBuckets)

Why are you ignoring this error? If it's due to some assumption that this has already passed, please add a comment.


pkg/sql/drop_index.go, line 336 at r11 (raw file):

		}
		// The index being deleted could be used as the origin index for this foreign key.
		if idx.IsValidOriginIndex(fk.OriginColumnIDs) {

why not && and lose a layer of indentation?


pkg/sql/drop_index.go, line 362 at r11 (raw file):

			// If we haven't found a replacement candidate for this foreign key, then
			// we need a cascade to delete this index.
			if !indexHasReplacementCandidate(canReplace) {

nit: please use && on the conditions rather than the nested if.


pkg/sql/vars.go, line 871 at r11 (raw file):

	},

	// CockroachDB extension.

@rohany does it make sense to have this as a session variable or a cluster setting? I was thinking cluster setting.


pkg/sql/logictest/testdata/logic_test/hash_sharded_index, line 463 at r11 (raw file):

                INDEX foo ("'quotes' in the column's name" ASC) USING HASH WITH BUCKET_COUNT = 4,
                FAMILY "primary" ("I am a column with spaces", "'quotes' in the column's name", "crdb_internal_I am a column with spaces_shard_12", "crdb_internal_'quotes' in the column's name_shard_4")
)

nit: newline


pkg/sql/sqlbase/table.go, line 231 at r10 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Reverted.

Whoops. Sorry for the back and forth. I prefer it this way.

Copy link
Contributor

@rohany rohany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/vars.go, line 871 at r11 (raw file):

Previously, ajwerner wrote…

@rohany does it make sense to have this as a session variable or a cluster setting? I was thinking cluster setting.

I think it's best to add it as both right now, with the session variable having default value equal to the cluster setting. Thats what we've been doing elsewhere for similar things.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I'm going to take a stab at updating this to get it merged.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

@ajwerner ajwerner force-pushed the hash_sharded_impl branch 2 times, most recently from 6f98f6b to fe685e8 Compare February 10, 2020 22:37
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohany is this missing anything?

Reviewed 1 of 20 files at r12.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

@rohany
Copy link
Contributor

rohany commented Feb 10, 2020

I think its all good

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I reworked the expression creation to build expressions using the AST instead of building a string and then parsing it. This was done out of an abundance of fear of an eventual sql injection attack and also to assuage any fears of quoting mistakes.

Once it builds I think I'm going to bors it. I played with it manually, it works and it's cool!

Thanks again @aayushshah15 for all of your hard work!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)

@aayushshah15
Copy link
Contributor Author

thanks for all your help @rohany and @ajwerner, glad to see it go in.

This PR accompanies the hash sharded indexes RFC (cockroachdb#42924).

Currently, in CockroachDB, write workloads that are sequential on a
particular key will cause a hotspot on a single range if there's any
sort of index with the said key as a prefix.

This PR alleviates this problem by allowing users to _shard_ their
indexes into a specific number of buckets by creating a computed shard
column under the hood and prefixing the index with this shard column in
order to distribute sequential traffic uniformly over the index's ranges.

Fixes cockroachdb#39340

Release note (sql change): Add native support of hash-sharded indexes with new
`USING HASH WITH BUCKET_COUNT = <n>` syntax for indices (including the primary
index of a table). This feature allows users to easily relieve write hot-spots
caused by sequential insert patterns at the cost of scan time for queries over
the hashed dimension.
@ajwerner
Copy link
Contributor

bors r=rohany,aayushshah

craig bot pushed a commit that referenced this pull request Feb 11, 2020
42922: sql: native support for creating hash sharded indexes r=rohany,aayushshah a=aayushshah15

This PR accompanies the hash sharded indexes RFC (#42924).

Currently, in CockroachDB, write workloads that are sequential on a
particular key will cause a hotspot on a single range if there's any
sort of index with the said key as a prefix.

This PR alleviates this problem by allowing users to _shard_ their
indexes into a specific number of buckets by creating a computed shard
column under the hood and prefixing the index with this shard column in
order to distribute sequential traffic uniformly over the index's ranges.

Fixes #39340

TODOs:
1) Gate the creation of hash sharded indexes to 20.1
2) Better testing

Release note (sql change): Improve user experience for creating hash
                           sharded indexes.

Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
@craig
Copy link
Contributor

craig bot commented Feb 11, 2020

Build succeeded

@craig craig bot merged commit 3a14a3b into cockroachdb:master Feb 11, 2020
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_table.go, line 1205 at r14 (raw file):

			}
			if d.PrimaryKey.Sharded {
				if !cluster.Version.IsActive(ctx, st, cluster.VersionHashShardedIndexes) {

st can be nil here. cluster.Version can also be uninitialized (see comment above the if st != nil block above).

I'm not familiar enough with this area to figure what should be done here if st == nil or if active cluster version == cluster.ClusterVersion{}. Any ideas? @rohany perhaps? Ran into this when rebasing #45455, so blocked.

@rohany
Copy link
Contributor

rohany commented Feb 27, 2020

I think using ActiveVersionOrEmpty and allowing the hash sharded index to be created if the resulting version != cluster.ClusterVersion{} makes sense.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 20 files at r12, 14 of 14 files at r13.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_table.go, line 1286 at r11 (raw file):

Previously, ajwerner wrote…

Why are you ignoring this error? If it's due to some assumption that this has already passed, please add a comment.

Done.


pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, irfansharif (irfan sharif) wrote…

st can be nil here. cluster.Version can also be uninitialized (see comment above the if st != nil block above).

I'm not familiar enough with this area to figure what should be done here if st == nil or if active cluster version == cluster.ClusterVersion{}. Any ideas? @rohany perhaps? Ran into this when rebasing #45455, so blocked.

in what cases can st == nil be true here? If it's testing, can we fix those cases by plumbing cluster settings in?

When is active cluster version ClusterVersion{}? Shouldn't we default that value to the minimum cluster version for this release?


pkg/sql/drop_index.go, line 336 at r11 (raw file):

Previously, ajwerner wrote…

why not && and lose a layer of indentation?

Done.


pkg/sql/drop_index.go, line 362 at r11 (raw file):

Previously, ajwerner wrote…

nit: please use && on the conditions rather than the nested if.

Done.

@rohany
Copy link
Contributor

rohany commented Feb 27, 2020

I take it back -- I can't seem to find whatever was passing a nil settings into here.

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, ajwerner wrote…

in what cases can st == nil be true here? If it's testing, can we fix those cases by plumbing cluster settings in?

When is active cluster version ClusterVersion{}? Shouldn't we default that value to the minimum cluster version for this release?

return ClusterVersion{}

It's ClusterVersion{} when the version is uninitialized.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, irfansharif (irfan sharif) wrote…

return ClusterVersion{}

It's ClusterVersion{} when the version is uninitialized.

I'm fine with disallowing in the case that the version is empty though given the node startup flow I think this should still only happen in testing and is worthy of a comment. Let's not introduce the nil check on the setting if we can avoid it.

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @ajwerner, @jordanlewis, @rohany, @solongordon, and @yuzefovich)


pkg/sql/create_table.go, line 1205 at r14 (raw file):

Previously, ajwerner wrote…

I'm fine with disallowing in the case that the version is empty though given the node startup flow I think this should still only happen in testing and is worthy of a comment. Let's not introduce the nil check on the setting if we can avoid it.

Done (#45455 (comment)). I'll leave the nil check cleanup to Rohan, I think it affects more places than just here anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: improve user experience for hash-sharded indexes
7 participants