Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Add crdb_internal.locality_value builtin function #37369

Merged
merged 1 commit into from May 15, 2019

Conversation

Projects
None yet
6 participants
@andy-kimball
Copy link
Contributor

commented May 7, 2019

There is currently no easy way to programmatically inspect the locality of the
current node. It appears in crdb_internal.gossip_nodes, but it's hard to work
with because it's JSON, not constant-foldable, and keyed by node id.

This commit adds a new crdb_internal.locality_value builtin function that returns
the value of the locality key given as its argument.

When building geo-distributed applications, this enables a very nice way to
automatically assign the partition key, as illustrated below:

CREATE TABLE charges (
region STRING NOT NULL DEFAULT crdb_internal.locality_value('region'),
id UUID NOT NULL DEFAULT gen_random_uuid(),
...

The DEFAULT expression for the region column automatically inserts the region
value from the current node's locality.

It also enables queries like this, that only touch rows in the current region
(which is necessary to avoid cross-region hops):

SELECT * FROM charges WHERE region = crdb_internal.locality_value('region') AND id = $1

The optimizer folds the function to a constant value, which can then be used
to select an optimal index.

Resolves #37310

Release note (sql change): Adds a new locality_value builtin function that
returns the value of the locality key given as its argument.

@andy-kimball andy-kimball requested review from cockroachdb/sql-execution-prs as code owners May 7, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented May 7, 2019

This change is Reviewable

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

@awoods187, I added you to this since I'm proposing to add a new builtin function, which has product/customer impact. See #37310 for more details on the proposal. I'm looking not just for Eng feedback on the implementation, but also Product feedback on how the builtin works.

@knz
Copy link
Member

left a comment

See my comments #37310 (comment)

The general thrust is all right, but the idea to use labeled tuples is all wrong. Arrays and unnest are the way to go, both from a future extensibility perspective (tuple types must be valid regardless of the node where you run) and from a usability perspective (unnest gives you access to relational operators to operate on the locality data, tuples wouldn't give you that).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @awoods187, @jordanlewis, @knz, and @RaduBerinde)


pkg/sql/sem/builtins/builtins.go, line 2763 at r1 (raw file):

		tree.Overload{
			Types: tree.ArgTypes{},
			ReturnType: func(ctx *tree.SemaContext, _ []tree.TypedExpr) *types.T {

I'd say drop this and have it return fixed type string[][] (array[array[string]])


pkg/sql/sem/builtins/builtins.go, line 2773 at r1 (raw file):

				datums := make(tree.Datums, len(ctx.Locality.Tiers))
				for i := range ctx.Locality.Tiers {
					datums[i] = tree.NewDString(ctx.Locality.Tiers[i].Value)

Make a 2-element array of the key, value pair here


pkg/sql/sem/builtins/builtins.go, line 2776 at r1 (raw file):

				}
				typ := localityReturnType(&ctx.Locality)
				return tree.NewDTuple(typ, datums...), nil

and keep the array here

@RaduBerinde

This comment has been minimized.

Copy link
Member

commented May 8, 2019

I also think tuples with dynamic labels can be a sharp edge.. Could we have a function variant that returns the value of a given locality key? locality-value('region')

@jordanlewis

This comment has been minimized.

Copy link
Member

commented May 8, 2019

How about a JSON return value?

@knz

This comment has been minimized.

Copy link
Member

commented May 8, 2019

I like the function too, perhaps also propose the function in addition.

(I still think we need something array-like to reveal the set of keys available. There is no way for a client to know the available keys otherwise)

@knz

This comment has been minimized.

Copy link
Member

commented May 8, 2019

JSON is not a half bad idea! Thanks jordan
(Can you provide examples of how to turn the json into a table? I think it's important to retain the ability to go from blurb to table.)

@andy-kimball
Copy link
Contributor Author

left a comment

I think we should have these discussions on the original issue, here: #37310, since that will be easier to find later on if questions come up.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @awoods187, @jordanlewis, and @RaduBerinde)

@andy-kimball andy-kimball force-pushed the andy-kimball:locality branch from fe7ded3 to ff71d46 May 9, 2019

@andy-kimball andy-kimball changed the title sql: Add locality() builtin function sql: Add crdb_internal.locality_value builtin function May 9, 2019

sql: Add crdb_internal.locality_value builtin function
There is currently no easy way to programmatically inspect the locality of the
current node. It appears in crdb_internal.gossip_nodes, but it's hard to work
with because it's JSON, not constant-foldable, and keyed by node id.

This commit adds a new crdb_internal.locality_value builtin function that returns
the value of the locality key given as its argument.

When building geo-distributed applications, this enables a very nice way to
automatically assign the partition key, as illustrated below:

  CREATE TABLE charges (
    region STRING NOT NULL DEFAULT crdb_internal.locality_value('region'),
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    ...

The DEFAULT expression for the region column automatically inserts the region
value from the current node's locality.

It also enables queries like this, that only touch rows in the current region
(which is necessary to avoid cross-region hops):

  SELECT * FROM charges WHERE region = crdb_internal.locality_value('region') AND id = $1

The optimizer folds the function to a constant value, which can then be used
to select an optimal index.

Resolves #37310

Release note (sql change): Adds a new locality_value builtin function that
returns the value of the locality key given as its argument.

@andy-kimball andy-kimball force-pushed the andy-kimball:locality branch from ff71d46 to 29a636b May 9, 2019

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2019

I changed this PR to add a crdb_internal.locality_value function, as per discussion in #37310.

@RaduBerinde
Copy link
Member

left a comment

:lgtm: Thanks!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @awoods187, @jordanlewis, and @RaduBerinde)

@rytaft
Copy link
Contributor

left a comment

Reviewed 18 of 18 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @awoods187, and @jordanlewis)


pkg/sql/opt/norm/norm_test.go, line 117 at r2 (raw file):

		if props.Category == categorySystemInfo || props.Category == categoryDateAndTime {
			switch name {
			case "crdb_internal.locality_value":

Why is this ok to fold? Isn't it possible that we get the same exact query from a different locality?

@andy-kimball
Copy link
Contributor Author

left a comment

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, @knz, and @rytaft)


pkg/sql/opt/norm/norm_test.go, line 117 at r2 (raw file):

Previously, rytaft wrote…

Why is this ok to fold? Isn't it possible that we get the same exact query from a different locality?

Should always return the same value for a given key, on a given machine. It doesn't matter how many times it's run, or when it's run. It can only return a different value if the node is restarted with a different locality.


pkg/sql/sem/builtins/builtins.go, line 2763 at r1 (raw file):

Previously, knz (kena) wrote…

I'd say drop this and have it return fixed type string[][] (array[array[string]])

Switched up the design, so resolving this.


pkg/sql/sem/builtins/builtins.go, line 2773 at r1 (raw file):

Previously, knz (kena) wrote…

Make a 2-element array of the key, value pair here

Switched up the design, so resolving this.


pkg/sql/sem/builtins/builtins.go, line 2776 at r1 (raw file):

Previously, knz (kena) wrote…

and keep the array here

Switched up the design, so resolving this.

@rytaft

This comment has been minimized.

Copy link
Contributor

commented May 14, 2019


pkg/sql/opt/norm/norm_test.go, line 117 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Should always return the same value for a given key, on a given machine. It doesn't matter how many times it's run, or when it's run. It can only return a different value if the node is restarted with a different locality.

Thanks for the explanation!

@rytaft

rytaft approved these changes May 14, 2019

Copy link
Contributor

left a comment

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @awoods187, @jordanlewis, and @knz)

@andy-kimball

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

bors r+

craig bot pushed a commit that referenced this pull request May 15, 2019

Merge #37369 #37513
37369: sql: Add crdb_internal.locality_value builtin function r=andy-kimball a=andy-kimball

There is currently no easy way to programmatically inspect the locality of the
current node. It appears in crdb_internal.gossip_nodes, but it's hard to work
with because it's JSON, not constant-foldable, and keyed by node id.

This commit adds a new crdb_internal.locality_value builtin function that returns
the value of the locality key given as its argument.

When building geo-distributed applications, this enables a very nice way to
automatically assign the partition key, as illustrated below:

  CREATE TABLE charges (
    region STRING NOT NULL DEFAULT crdb_internal.locality_value('region'),
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    ...

The DEFAULT expression for the region column automatically inserts the region
value from the current node's locality.

It also enables queries like this, that only touch rows in the current region
(which is necessary to avoid cross-region hops):

  SELECT * FROM charges WHERE region = crdb_internal.locality_value('region') AND id = $1

The optimizer folds the function to a constant value, which can then be used
to select an optimal index.

Resolves #37310

Release note (sql change): Adds a new locality_value builtin function that
returns the value of the locality key given as its argument.

37513: opt: maintain a set of grouping cols instead of a list r=ridwanmsharif a=ridwanmsharif

Fixes #37317 and possibly also fixes #37444.
This commit changes the semantics of the grouping columns stored
in scope to only now save the columns if they're unique.

Release note: None

Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Co-authored-by: Ridwan Sharif <ridwan@cockroachlabs.com>
@craig

This comment has been minimized.

Copy link

commented May 15, 2019

Build succeeded

@craig craig bot merged commit 29a636b into cockroachdb:master May 15, 2019

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@andy-kimball andy-kimball deleted the andy-kimball:locality branch May 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.