sql: Add builtin function to return `locality` of current node #37310

andy-kimball · 2019-05-04T15:51:28Z

Suggested feature

There is currently no easy way to programmatically inspect the locality of the current node (it appears in crdb_internal.gossip_nodes, but it's hard to work with). I think we should add some kind of builtin locality function that returns the tiers of the locality in a way that can easily be manipulated in SQL. Here's a strawman implementation that returns an ARRAY of (key, value) TUPLEs:

	"locality": makeBuiltin(
		tree.FunctionProperties{Category: categorySystemInfo},
		tree.Overload{
			Types:      tree.ArgTypes{},
			ReturnType: tree.FixedReturnType(types.MakeArray(localityTupleType)),
			Fn: func(ctx *tree.EvalContext, args tree.Datums) (tree.Datum, error) {
				arr := tree.NewDArray(localityTupleType)
				arr.Array = make(tree.Datums, len(ctx.Locality.Tiers))
				for i := range ctx.Locality.Tiers {
					tier := &ctx.Locality.Tiers[i]
					arr.Array[i] = tree.NewDTuple(
						localityTupleType, tree.NewDString(tier.Key), tree.NewDString(tier.Value))
				}
				return arr, nil
			},
			Info: "Returns the locality.",
		},
	),

An example use case

When building geo-distributed applications, this would enable a very nice way to automatically assign the partition key, as illustrated below. This is a credit card charges table that is partitioned on region. The DEFAULT expression for the region column automatically inserts the region value from the current node's locality (e.g. cloud=gce,region=us-east1,zone=us-east1-b).

CREATE TABLE charges (
    region STRING NOT NULL DEFAULT (locality()[2]).value,
    id UUID NOT NULL DEFAULT gen_random_uuid(),
    amount DECIMAL NOT NULL,
    currency STRING NOT NULL,
    last4 STRING NOT NULL,
    account_id INT8 NOT NULL,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL,
    CONSTRAINT "primary" PRIMARY KEY (region ASC, id ASC),
    FAMILY "primary" (region, id, amount, currency, last4, account_id, created_at, updated_at)
) PARTITION BY LIST (region) (
    PARTITION us_east1 VALUES IN (('us-east1')),
    PARTITION us_west1 VALUES IN (('us-west1')),
    PARTITION europe_west2 VALUES IN (('europe-west2'))
)

There are undoubtedly other use cases for this as well.

Notes

We need to take care to always evaluate this on the Gateway node so that it's not shipped via DistSQL to other nodes (where each node could evaluate it to a different value).

The text was updated successfully, but these errors were encountered:

andy-kimball · 2019-05-04T15:52:19Z

CC @awoods187

andy-kimball · 2019-05-05T00:20:01Z

Another possible idea is determining the type at SQL analysis time. For example, say the locality is:

cloud=gce,region=us-east1,zone=us-east1-b

Then we'd create a tuple datum like this with type TUPLE[STRING AS <key1>, STRING AS <key2>, ...]:

('gce' AS cloud, 'us-east1' AS region, 'us-east1' AS zone)

This would allow more natural SQL expressions like this:

SELECT (locality()).cloud, (locality()).region, (locality()).zone;

andy-kimball · 2019-05-06T16:21:24Z

@knz, do you have an opinion on how this function should work? I think that the second idea is more useable. However, it might set a new precedent, since we'd be deriving a more specific return type during type checking, based on the sema context. That would need to now contain a new Locality field, which means that the results of SQL analysis could differ based on locale.

I think I'll create a PR with this approach, and let people react there. I'd like to use this function in a demo I'm building for a talk.

There is currently no easy way to programmatically inspect the locality of the current node. It appears in crdb_internal.gossip_nodes, but it's hard to work with because it's JSON, not constant-foldable, and keyed by node id. This commit adds a new locality builtin function that: Returns the hierarchical location of the current node as a tuple of labeled values, ordered from most inclusive to least inclusive. For example: `region=east,datacenter=us-east-1`. When building geo-distributed applications, this enables a very nice way to automatically assign the partition key, as illustrated below: CREATE TABLE charges ( region STRING NOT NULL DEFAULT (locality()).region, id UUID NOT NULL DEFAULT gen_random_uuid(), ... The DEFAULT expression for the region column automatically inserts the region value from the current node's locality. It also enables queries like this, that only touch rows in the current region (which is necessary to avoid cross-region hops): SELECT * FROM charges WHERE region = (locality()).region AND id = $1 The locality is constant, so the optimizer is able to fold column access to a constant value, which can then be used to select an optimal index. Resolves cockroachdb#37310 Release note (sql change): Adds a new locality builtin function that returns the hierarchical location of the current node as a tuple of labeled values, ordered from most inclusive to least inclusive.

knz · 2019-05-08T12:07:01Z

Woah there is a problem here, you are defining a dynamic type: even with full knowledge of the SQL schema (the declarative context) there is no way to predict the type signature of the result when looking at the SQL text on the client.

In fact the type signature and the validity of the SQL query as a whole may be different depending on the node where you happen to be connected. Think about this: if the client connects through a load balancer, they'll get 100% query failure on some nodes. This is not great design!

pg labeled tuples were not meant to be used as associative arrays (which is what you are doing here).

My recommendation is either to introduce an actual associative array type, or instead define the result of the function as a 2 dimensional array. This way the client can use unnest() on it and operate on its results using relational operators.

andy-kimball · 2019-05-08T13:43:38Z

I initially had similar thoughts to you, which is why I first did it exactly as you suggest (I have commits with both implementations). However, the more I thought about it, the more I saw the value in this other approach. Below are my arguments; you can decide if they're convincing.

This type is known at analysis time, so it is not a dynamic type by the traditional definition (type not known until runtime). I'm saying that the locality would be considered part of the declarative context, just as the types of the placeholders are. That is the way the user thinks about locality as well, since they use it in ALTER TABLE, ALTER PARTITION, etc.

As for the argument that depending on which nodes the clients connect to, they could get different results:

It's almost certainly an application bug to have different locality fields on different nodes. So you're talking about an edge case where a user has deliberately (or mistakenly) used different fields on different nodes.
If different nodes do have different fields, then they can get different results, regardless.

-- If locality is "region=east" on one node and "region=east,datacenter=east-1"
-- on another node, then the following statement will return different results
-- depending on which node happens to run it:
SELECT locality()[2] IS NULL

Why is giving different results preferable to sometimes raising an error and sometimes not?

Ultimately, we took a shortcut with localities. To be consistent with how SQL schema works, we really should have had a statement like (don't get hung up on syntax, just want to communicate the idea):

CREATE LOCALITY (cloud, region, datacenter)

Then we wouldn't even be having this discussion. We're treating locality when starting up nodes as if it's the dynamic thing, but it's really not. We really should be error'ing if you try to start up a node with locality fields that don't match other nodes. But now we've got it in our heads that there's some kind of dynamism here, when there really isn't.

andy-kimball · 2019-05-08T13:46:54Z

@RaduBerinde, @jordanlewis, see here for more discussion.

RaduBerinde · 2019-05-08T14:25:36Z

Continuing the discussion from #37369.

(Can you provide examples of how to turn the json into a table? I think it's important to retain the ability to go from blurb to table.)

root@:26257/tpcc> select * from json_each_text('{"k1": "v1", "k2": "v2"}');                                                                  
  key | value  
+-----+-------+
  k1  | v1     
  k2  | v2

knz · 2019-05-08T14:35:23Z

So it seems to me that JSON is the superior solution so far. The syntax locality()->'cloud' reads to me nicer than (locality()).cloud

knz · 2019-05-08T15:06:24Z

I think github had a glitch, several comments (including my last) just disappeared. Also I see Andy's comments is reportedly posted "6 hours from now".

knz · 2019-05-08T15:07:17Z

I like the JSON solution a lot. Also I think that the syntax `locality()->'field'` is more readable than `(locality()).field`

…

-- Raphael 'kena' Poss

awoods187 · 2019-05-08T15:20:24Z

I like @andy-kimball's point about the create localitystatement. As an aside, I do wonder if we should go back and do that work separately. I think we should be responding with errors when localities do not agree.

I also agree with @knz that JSON is much easier to read.

andy-kimball · 2019-05-08T15:42:28Z

The JSON solution looks OK, but I find that anytime we use JSON I've got to go to the docs to remember how to access fields. The JSON and relational data models make for a rough fit.

The more I work with constraints and localities, the more I think we've missed the boat on making them intuitive to use. Each person who's tried using the new locality-sensitive optimizer feature (Andy W, Rich, Roko, Nate) has been confused by some aspect of how to set it up. One of the reasons is due to the strange syntax:

CONFIGURE ZONE USING lease_preferences='[[+cloud=aws,-region=us_east], [+cloud=gce,+region=us_west]]'

instead of something that feels more natural with SQL, such as:

CONFIGURE ZONE USING LEASE PREFERENCES ON cloud='aws' AND region<>'us_east', FALLBACK TO cloud='gce' AND region='us_west'

I'd expect to use strongly-typed localities, so that I get an error if cloud is not a valid locality field. I'd expect that ZONEs and LOCALITYs are first-class schema objects that are versioned just like any other schema objects (with changes rolled out across cluster in same way).

I know I'm pulling in a bunch of extra stuff into what seems like a simple issue. But it's context to help others understand why I think we should at least consider more strongly-typed handling of LOCALITY. If we really want geo-distributed use-cases to be a centerpiece of our strategy, I think we need to recognize that we've made some usability mistakes so far, and consider rectifying them (in a future release) with better syntax and stronger typing.

knz · 2019-05-08T18:07:36Z

I know I'm pulling in a bunch of extra stuff into what seems like a simple issue.

I appreciate the issue has multiple aspects, and that using and creating zone configs are intimately related. However, please consider:

the issue that sparked my rejection of labeled tuples has nothing to do with the type of the values, and more with the fact that the set of store attributes is variable per store and per node
- on that specific topic I disagree with your assertion that mixing and matching multiple sets of attributes is necessarily an application bug.
- even if we agreed, an application bug should not have any influence on whether the type of a SQL query is the same from the perspective of every node. I still stand by that point. Having the set of local attributes on the node change the types derived in a query is just wrong.
for context, there is no commitment to the current zone configuration facilities and we can certainly work on improvements:
- the original mechanism was CONFIGURE ZONE '....yaml....' with just a single yaml string.
- I split the single yaml strings into multiple parameters a year ago, so as to add the ability to copy a parameter from the parent zone, or reset an already set parameter; at that time that was the only required improvement;
- then Ridwan added yet a 3rd inheritance mechanism, to ensure the zones can cascade;
- At no point YET we have really engaged the discussion of typing (nor taken any decision on that topic) and I personally would welcome any additional/new effort in that direction.
- for additional context, the reason why that conversation was delayed/postponed is that we simply did not have enough experience early on to design this properly. Arguably, with the additional/repeated experience of customers and sales engineers, now is a better time to do this design work than, say, one or two years ago.
I'd recommend that you stay mindful of the amount of effort required to have a proper design discussion (it doesn't need to be long and involved, but it shouldn't be too casual either). If we need to make roadmap adjustements, I would recommend we talk about that before committing the time to re-design the zone config mechanisms.
regardless of the outcomes on the point above, I appreciate that you're looking for a solution that gives you(us) separate types for the various attributes.
- in that context, tuples are still not good enough because of the problems I have identified before
- however, given your additional concerns, I would concede that JSON is better than arrays because different fields in JSON can have different types.

I find that anytime we use JSON I've got to go to the docs to remember how to access fields.

I think that's lack of familiarity. It's actually pretty simple and regular.
Furthermore, if we find usage patterns on these values that would benefit from a syntax shorthand, we can (later) define built-in functions or operators to do that.

Just don't pull locality labels in the type system. That's a bad idea all around.

andy-kimball · 2019-05-08T18:34:01Z

Your answer seems to be mixing locality with attributes. My understanding is that they're distinct and are passed separately on the command line:

cockroach start --locality=region=us,datacenter=us-east --attrs=ram:64gb

I agree with you 100% that arbitrary attributes should not become part of type system. The proposed locality function is only about the key/value pairs passed via the locality flag, not the attrs flag. My assertion is that there's no useful scenario where users would want to pass different locality keys to different nodes, and that it should be an error to do so. Can you show a plausible scenario where that is not desirable behavior?

knz · 2019-05-08T18:56:47Z

I'd say that it's possible/desirable for certain nodes to have more keys than the common set. for example suppose there are 3 regions, then in each region there's just 1 server, then later on additional servers are added in one of the regions. I expect that the nodes in that region will get additional attributes to distinguish them within the region, while the other nodes will remain with just a region attribute.

knz · 2019-05-08T18:57:15Z

Meanwhile the argument still stands, regardless of "mistakes" made, the same query should type-check identically at a given time stamp regardless of the node where you issue it.

andy-kimball · 2019-05-08T19:06:44Z

This is why I brought up the idea of CREATE LOCALITY (cloud, region, datacenter), because it resolves all the issues that you're highlighting.

Given that any sort of statically declared, strongly-typed solution like this is one or more releases away, I'm going to propose that we sidestep some of these questions by implementing only the function that Radu described:

locality-value('region')

That's easy to understand and optimize, and will fit nicely with whatever we decide to do in the future.

RaduBerinde · 2019-05-08T19:21:28Z

I haven't seen a convincing argument against JSON.. locality-value('region') would just be locality()->'region'. But it also gives you an easy way to inspect the entire thing or do other stuff with it. It would still be a legitimate solution even if the keys will be fixed.

andy-kimball · 2019-05-08T20:23:14Z

Here are my reasons for not wanting to add locality as a JSON function right now:

It introduces a backwards-compatibility issue if we decide to have a strongly-typed locality in the future. We don't yet know what we're doing along those lines, and I'd rather reserve locality until then.
The expressions you give above are subtly different:

locality-value('region') -- statically typed as <string>
locality()->'region' -- statically typed as <json>

Users will need to know that they should use ->> rather than ->. And if you made this mistake (and I originally made it as well), I suspect that most every user will make it. This makes the proper usage of this function unintuitive. Probably most users will get a type error, and then do this: (locality()->'region')::string, because they won't know about the undiscoverable ->> operator.

Making this JSON means that we use YAML to declare the localities/constraints and then JSON to view/query them, all embedded in relational statements. 3 data models. 3 syntaxes. It just feels like we're taking the expedient route here, not a strongly-typed route consistent with other SQL syntax/semantics. We're requiring a lot from our users (knowledge of 3 sets of syntax/operators/usage), just to interact with these features. No wonder people are getting confused.

RaduBerinde · 2019-05-08T21:37:25Z

I see, thanks. The yaml / json thing is pretty bad indeed. I see the value in holding off on this.

We may have some of the same backward compatibility issues with locality-value('x') (in that we might want to remove it once we have strongly-typed locality). Maybe we should make it crdb_internal.locality_value() for that reason?

andy-kimball · 2019-05-08T22:24:38Z

Yes, I like the suggestion to put the new function in the crdb_internal schema.

There is currently no easy way to programmatically inspect the locality of the current node. It appears in crdb_internal.gossip_nodes, but it's hard to work with because it's JSON, not constant-foldable, and keyed by node id. This commit adds a new crdb_internal.locality_value builtin function that returns the value of the locality key given as its argument. When building geo-distributed applications, this enables a very nice way to automatically assign the partition key, as illustrated below: CREATE TABLE charges ( region STRING NOT NULL DEFAULT crdb_internal.locality_value('region'), id UUID NOT NULL DEFAULT gen_random_uuid(), ... The DEFAULT expression for the region column automatically inserts the region value from the current node's locality. It also enables queries like this, that only touch rows in the current region (which is necessary to avoid cross-region hops): SELECT * FROM charges WHERE region = crdb_internal.locality_value('region') AND id = $1 The optimizer folds the function to a constant value, which can then be used to select an optimal index. Resolves cockroachdb#37310 Release note (sql change): Adds a new locality_value builtin function that returns the value of the locality key given as its argument.

37369: sql: Add crdb_internal.locality_value builtin function r=andy-kimball a=andy-kimball There is currently no easy way to programmatically inspect the locality of the current node. It appears in crdb_internal.gossip_nodes, but it's hard to work with because it's JSON, not constant-foldable, and keyed by node id. This commit adds a new crdb_internal.locality_value builtin function that returns the value of the locality key given as its argument. When building geo-distributed applications, this enables a very nice way to automatically assign the partition key, as illustrated below: CREATE TABLE charges ( region STRING NOT NULL DEFAULT crdb_internal.locality_value('region'), id UUID NOT NULL DEFAULT gen_random_uuid(), ... The DEFAULT expression for the region column automatically inserts the region value from the current node's locality. It also enables queries like this, that only touch rows in the current region (which is necessary to avoid cross-region hops): SELECT * FROM charges WHERE region = crdb_internal.locality_value('region') AND id = $1 The optimizer folds the function to a constant value, which can then be used to select an optimal index. Resolves #37310 Release note (sql change): Adds a new locality_value builtin function that returns the value of the locality key given as its argument. 37513: opt: maintain a set of grouping cols instead of a list r=ridwanmsharif a=ridwanmsharif Fixes #37317 and possibly also fixes #37444. This commit changes the semantics of the grouping columns stored in scope to only now save the columns if they're unique. Release note: None Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Ridwan Sharif <ridwan@cockroachlabs.com>

andy-kimball assigned jordanlewis May 4, 2019

andy-kimball added this to Triage in [DEPRECATED] Old SQLExec board. Don't move stuff here via automation May 4, 2019

jordanlewis moved this from Triage to Backlog in [DEPRECATED] Old SQLExec board. Don't move stuff here May 6, 2019

andy-kimball mentioned this issue May 7, 2019

sql: Add crdb_internal.locality_value builtin function #37369

Merged

awoods187 mentioned this issue May 9, 2019

sql: enhance setting of locality #37429

Closed

awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label May 9, 2019

craig bot closed this as completed in #37369 May 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: Add builtin function to return `locality` of current node #37310

sql: Add builtin function to return `locality` of current node #37310

andy-kimball commented May 4, 2019 •

edited

andy-kimball commented May 4, 2019

andy-kimball commented May 5, 2019 •

edited

andy-kimball commented May 6, 2019 •

edited

knz commented May 8, 2019

andy-kimball commented May 8, 2019 •

edited

andy-kimball commented May 8, 2019

RaduBerinde commented May 8, 2019

knz commented May 8, 2019

knz commented May 8, 2019

knz commented May 8, 2019 via email

awoods187 commented May 8, 2019

andy-kimball commented May 8, 2019 •

edited

knz commented May 8, 2019

andy-kimball commented May 8, 2019 •

edited

knz commented May 8, 2019

knz commented May 8, 2019

andy-kimball commented May 8, 2019

RaduBerinde commented May 8, 2019

andy-kimball commented May 8, 2019 •

edited

RaduBerinde commented May 8, 2019

andy-kimball commented May 8, 2019

sql: Add builtin function to return locality of current node #37310

sql: Add builtin function to return locality of current node #37310

Comments

andy-kimball commented May 4, 2019 • edited

Suggested feature

An example use case

Notes

andy-kimball commented May 4, 2019

andy-kimball commented May 5, 2019 • edited

andy-kimball commented May 6, 2019 • edited

knz commented May 8, 2019

andy-kimball commented May 8, 2019 • edited

andy-kimball commented May 8, 2019

RaduBerinde commented May 8, 2019

knz commented May 8, 2019

knz commented May 8, 2019

knz commented May 8, 2019 via email

awoods187 commented May 8, 2019

andy-kimball commented May 8, 2019 • edited

knz commented May 8, 2019

andy-kimball commented May 8, 2019 • edited

knz commented May 8, 2019

knz commented May 8, 2019

andy-kimball commented May 8, 2019

RaduBerinde commented May 8, 2019

andy-kimball commented May 8, 2019 • edited

RaduBerinde commented May 8, 2019

andy-kimball commented May 8, 2019

sql: Add builtin function to return `locality` of current node #37310

sql: Add builtin function to return `locality` of current node #37310

andy-kimball commented May 4, 2019 •

edited

andy-kimball commented May 5, 2019 •

edited

andy-kimball commented May 6, 2019 •

edited

andy-kimball commented May 8, 2019 •

edited

andy-kimball commented May 8, 2019 •

edited

andy-kimball commented May 8, 2019 •

edited

andy-kimball commented May 8, 2019 •

edited