release-21.1: sql: change physical planning heuristics a bit to prefer local execution #68613

yuzefovich · 2021-08-09T20:14:09Z

Backport 1/1 commits from #68524.

/cc @cockroachdb/release

Release justification: low risk change (hidden behind a cluster setting)
to physical planner heuristics that might lead to extremely suboptimal plans.

This commit changes two parts of the physical planner heuristics:

we now say that the lookup join "can be distributed" rather than
"should be distributed"
for top K sort we also say that it "can be" rather than "should be"
distributed.

I'm not certain whether both of these changes are always beneficial, but
here is some justification.

The change to the lookup join heuristic will make it so that the
distribution of the join reader stage is determined by other stages of
the physical plan in distsql=auto mode. Consider an example when the
input to the lookup join is the table reader that scans only a handful
of rows. Previously, because of the "should distribute" heuristic, such
a plan would be "distributed" meaning we would plan a single table
reader on the leaseholder for the relevant range (most likely a remote
node from the perspective of the gateway node for the query); this, in
turn, would force the planning of the join reader on the same node, and
all consequent stages - if any - too. Such a decision can create
a hotspot if that particular range is queried often (think append-only
access pattern where the latest data is accessed most frequently).

With this change in such a scenario we will get more even compute
utilization across the cluster because the flow will be fully planned on
the gateway (which assumed to be chosen randomly by a load balancer),
and the lookup join will be performed from the gateway (we'll still need
to perform a remote read from the leaseholder of that single range).

The change to the top K sort heuristic seems less controversial to me,
yet I don't have a good justification. My feeling is that usually the
value of K is small, so it's ok if we don't "force" ourselves to
distribute the sort operation if the physical plan otherwise isn't
calling for it.

Overall, the choice of making changes to these two heuristics isn't very
principled and is driven by a single query from one of our largest
customers which happened to hit the hot spot scenario as described
above. In their case, they have append-like workload that is constantly
updating a single range. Eventually that range is split automatically,
but both new ranges stay on the same node. The latest data is accessed
far more frequently than any other data in the table, yet according to
the KV heuristics the ranges aren't being reallocated because the scans
hitting the hot ranges aren't exceeding the threshold. What isn't
accounted for is the fact that other parts of the flow are far more
compute-intensive, so this change attempts to alleviate such a hot node
scenario.

Release note (sql change): Some queries with lookup joins and/or top
K sorts are now more likely to be executed in "local" manner with
distsql=auto session variable when newly introduced
sql.distsql.prefer_local_execution.enabled cluster setting is set to
true (false is the default).

blathers-crl · 2021-08-09T20:14:12Z

Thanks for opening a backport.

Please check the backport criteria before merging:

Patches should only be created for serious issues.
Patches should not break backwards-compatibility.
Patches should change as little code as possible.
Patches should not change on-disk formats or node communication protocols.
Patches should not add new functionality.

If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.

There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
Will this work in a cluster of mixed patch versions? Did we test that?
If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

cockroach-teamcity · 2021-08-09T20:14:17Z

This change is

rytaft

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach and @yuzefovich)

pkg/sql/distsql_physical_planner.go, line 378 at r1 (raw file):

// physical plan is deemed as "should be distributed", preferLocalExecution is
// ignored.
// TODO(radu): add tests for this.

nit: now it looks like this TODO is referring to this new param

pkg/sql/opt/exec/execbuilder/testdata/distsql_auto_mode, line 139 at r1 (raw file):

SELECT info FROM [EXPLAIN SELECT * FROM kv WHERE k>1 ORDER BY v LIMIT 1] WHERE info LIKE 'distribution%'
----
distribution: local

maybe also test that this is distributed when the setting is off

This commit changes two parts of the physical planner heuristics: - we now say that the lookup join "can be distributed" rather than "should be distributed" - for top K sort we also say that it "can be" rather than "should be" distributed. I'm not certain whether both of these changes are always beneficial, but here is some justification. The change to the lookup join heuristic will make it so that the distribution of the join reader stage is determined by other stages of the physical plan in `distsql=auto` mode. Consider an example when the input to the lookup join is the table reader that scans only a handful of rows. Previously, because of the "should distribute" heuristic, such a plan would be "distributed" meaning we would plan a single table reader on the leaseholder for the relevant range (most likely a remote node from the perspective of the gateway node for the query); this, in turn, would force the planning of the join reader on the same node, and all consequent stages - if any - too. Such a decision can create a hotspot if that particular range is queried often (think append-only access pattern where the latest data is accessed most frequently). With this change in such a scenario we will get more even compute utilization across the cluster because the flow will be fully planned on the gateway (which assumed to be chosen randomly by a load balancer), and the lookup join will be performed from the gateway (we'll still need to perform a remote read from the leaseholder of that single range). The change to the top K sort heuristic seems less controversial to me, yet I don't have a good justification. My feeling is that usually the value of K is small, so it's ok if we don't "force" ourselves to distribute the sort operation if the physical plan otherwise isn't calling for it. Overall, the choice of making changes to these two heuristics isn't very principled and is driven by a single query from one of our largest customers which happened to hit the hot spot scenario as described above. In their case, they have append-like workload that is constantly updating a single range. Eventually that range is split automatically, but both new ranges stay on the same node. The latest data is accessed far more frequently than any other data in the table, yet according to the KV heuristics the ranges aren't being reallocated because the scans hitting the hot ranges aren't exceeding the threshold. What isn't accounted for is the fact that other parts of the flow are far more compute-intensive, so this change attempts to alleviate such a hot node scenario. Release note (sql change): Some queries with lookup joins and/or top K sorts are now more likely to be executed in "local" manner with `distsql=auto` session variable when newly introduced `sql.distsql.prefer_local_execution.enabled` cluster setting is set to `true` (`false` is the default).

yuzefovich

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach and @rytaft)

pkg/sql/distsql_physical_planner.go, line 378 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: now it looks like this TODO is referring to this new param

Indeed, inlined the TODO.

pkg/sql/opt/exec/execbuilder/testdata/distsql_auto_mode, line 139 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

maybe also test that this is distributed when the setting is off

Done.

rytaft

Reviewed 2 of 2 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cucaroach)

yuzefovich · 2021-08-10T21:12:15Z

We haven't seen any fallout on master, and since the change is gated behind the cluster setting, we think it's ok to not wait for 2 weeks baking period. Also, 21.1.8 won't be released for a few weeks anyway.

68627: sql: fix cast from unknown to unknown r=mgartner a=mgartner Previously, an implicit cast performed in a set operation, like `UNION`, on a tuple type with an unknown inner type would cause an internal error. The cast failed because the volatility of a cast from unknown to unknown was not specified in the `validCasts` list. This commit fixes the issue by marking a such a cast as immutable. Fixes #68308 Release note (bug fix): A bug has been fixed that caused internal errors with set operations, like `UNION`, and columns with tuple types that contained constant `NULL` values. This bug was introduced in version 20.2. 69253: sql: change column in tenant_usage table r=RaduBerinde a=RaduBerinde The tenant usage table currently has the "last update" timestamp as an instance-only field. This field is necessary for the per-tenant state as well (it corresponds to the current bucket level). This change renames the `instance_last_update` column to `last_update` and moves it up to the group of columns that apply to all rows. Since this table was introduced only on `master`, we don't add a new migration. Release justification: category 2, code only used for Serverless. Release note: None 69261: settings: retire a setting only present on 21.1 branch r=yuzefovich a=yuzefovich `sql.distsql.prefer_local_execution.enabled` was introduced during a backport to 21.1 branch and is only present on that branch. This commit is retiring that setting on 21.2 branch. Retiring a setting introduced in #68613. Release note: None Release justification: This commit is safe for this release because it is kinda non-production code change. Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

yuzefovich requested review from rytaft, cucaroach and a team August 9, 2021 20:14

rytaft approved these changes Aug 9, 2021

View reviewed changes

yuzefovich force-pushed the backport21.1-68524 branch from e782387 to 0dd036b Compare August 9, 2021 21:14

yuzefovich commented Aug 9, 2021

View reviewed changes

rytaft approved these changes Aug 9, 2021

View reviewed changes

yuzefovich merged commit 87f6dcb into cockroachdb:release-21.1 Aug 10, 2021

yuzefovich deleted the backport21.1-68524 branch August 10, 2021 21:12

yuzefovich mentioned this pull request Aug 23, 2021

settings: retire a setting only present on 21.1 branch #69261

Merged

mikeCRL mentioned this pull request Aug 31, 2021

sql: change physical planning heuristics a bit to prefer local execution cockroachdb/docs#11195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-21.1: sql: change physical planning heuristics a bit to prefer local execution #68613

release-21.1: sql: change physical planning heuristics a bit to prefer local execution #68613

yuzefovich commented Aug 9, 2021

blathers-crl bot commented Aug 9, 2021

cockroach-teamcity commented Aug 9, 2021

rytaft left a comment

yuzefovich left a comment

rytaft left a comment

yuzefovich commented Aug 10, 2021

release-21.1: sql: change physical planning heuristics a bit to prefer local execution #68613

release-21.1: sql: change physical planning heuristics a bit to prefer local execution #68613

Conversation

yuzefovich commented Aug 9, 2021

blathers-crl bot commented Aug 9, 2021

cockroach-teamcity commented Aug 9, 2021

rytaft left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

yuzefovich commented Aug 10, 2021