Skip to content

Commit

Permalink
Merge #120668
Browse files Browse the repository at this point in the history
120668: opt/memo: extend OutputCols with VirtualCols in statistics builder r=DrewKimball,mgartner a=michae2

**opt: add props.Statistics.VirtualCols**

As of #118241 we now collect table statistics on virtual computed
columns, but do not yet use them in statistics builder. The difficulty
with using these stats in statistics builder is that virtual computed
columns are synthesized by various non-Scan expressions (Project,
Select, etc). When calculating stats for these non-Scan expressions, we
need to find the virtual column stats even though the virtual columns
are not produced by the input to these expressions.

To solve this, we add a VirtualCols set to props.Statistics which holds
all of the virtual columns that could be produced by the input to a
group. Expressions that could synthesize virtual columns will look in
this set to discover whether there are statistics for any of the scalar
expressions they render. If there are, they will call colStatXXX using
the virtual column ID as if the virtual column had originated from
their input.

This commit adds VirtualCols but does not yet use it.

Note that we cannot currently pass VirtualCols up through set operations
or with-scans, due to the column ID translation they use.

Informs: #68254

Epic: CRDB-8949

Release note: None

---

**sql: add optimizer_use_virtual_computed_column_stats session variable**

Informs: #68254

Epic: CRDB-8949

Release note (sql): Add new session variable
`optimizer_use_virtual_computed_column_stats`. When this variable is
enabled, the optimizer will make use of table statistics on virtual
computed columns.

---

**opt/memo: extend OutputCols with VirtualCols in statistics builder**

Throughout statistics builder we use OutputCols to determine which
columns come from the input to an expression. We then typically call
colStatXXX with those columns as part of statistics calculation.

In order to use statistics on virtual computed columns, we need to call
colStatXXX on any virtual columns that could come from our input, even
if they are not passed upward through OutputCols. To do this we extend
OutputCols with the VirtualCols set we built in a previous commit. This
commit replaces almost all usages of OutputCols in statistics builder
with a call to helper function colStatCols, which returns a union of
OutputCols and VirtualCols.

This is enough to get the optimizer to use statistics on virtual
computed columns in some simple plans. More complex plans will require
matching the virtual column scalar expressions, which will be in the
next PR. I've left some TODOs marking spots where this next PR will
touch.

Informs: #68254

Epic: CRDB-8949

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
  • Loading branch information
craig[bot] and michae2 committed Mar 22, 2024
2 parents c561383 + 10a9ab5 commit 8dc82e5
Show file tree
Hide file tree
Showing 25 changed files with 732 additions and 97 deletions.
4 changes: 4 additions & 0 deletions pkg/sql/exec_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -3744,6 +3744,10 @@ func (m *sessionDataMutator) SetPLpgSQLUseStrictInto(val bool) {
m.data.PLpgSQLUseStrictInto = val
}

func (m *sessionDataMutator) SetOptimizerUseVirtualComputedColumnStats(val bool) {
m.data.OptimizerUseVirtualComputedColumnStats = val
}

// Utility functions related to scrubbing sensitive information on SQL Stats.

// quantizeCounts ensures that the Count field in the
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/logictest/testdata/logic_test/information_schema
Original file line number Diff line number Diff line change
Expand Up @@ -6115,6 +6115,7 @@ optimizer_use_lock_op_for_serializable off
optimizer_use_multicol_stats on
optimizer_use_not_visible_indexes off
optimizer_use_provided_ordering_fix on
optimizer_use_virtual_computed_column_stats on
override_multi_region_zone_config off
parallelize_multi_key_lookup_joins_enabled off
password_encryption scram-sha-256
Expand Down
3 changes: 3 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/pg_catalog
Original file line number Diff line number Diff line change
Expand Up @@ -2897,6 +2897,7 @@ optimizer_use_lock_op_for_serializable off N
optimizer_use_multicol_stats on NULL NULL NULL string
optimizer_use_not_visible_indexes off NULL NULL NULL string
optimizer_use_provided_ordering_fix on NULL NULL NULL string
optimizer_use_virtual_computed_column_stats on NULL NULL NULL string
override_multi_region_zone_config off NULL NULL NULL string
parallelize_multi_key_lookup_joins_enabled off NULL NULL NULL string
password_encryption scram-sha-256 NULL NULL NULL string
Expand Down Expand Up @@ -3074,6 +3075,7 @@ optimizer_use_lock_op_for_serializable off N
optimizer_use_multicol_stats on NULL user NULL on on
optimizer_use_not_visible_indexes off NULL user NULL off off
optimizer_use_provided_ordering_fix on NULL user NULL on on
optimizer_use_virtual_computed_column_stats on NULL user NULL on on
override_multi_region_zone_config off NULL user NULL off off
parallelize_multi_key_lookup_joins_enabled off NULL user NULL false false
password_encryption scram-sha-256 NULL user NULL scram-sha-256 scram-sha-256
Expand Down Expand Up @@ -3250,6 +3252,7 @@ optimizer_use_lock_op_for_serializable NULL NULL NULL
optimizer_use_multicol_stats NULL NULL NULL NULL NULL
optimizer_use_not_visible_indexes NULL NULL NULL NULL NULL
optimizer_use_provided_ordering_fix NULL NULL NULL NULL NULL
optimizer_use_virtual_computed_column_stats NULL NULL NULL NULL NULL
override_multi_region_zone_config NULL NULL NULL NULL NULL
parallelize_multi_key_lookup_joins_enabled NULL NULL NULL NULL NULL
password_encryption NULL NULL NULL NULL NULL
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/logictest/testdata/logic_test/show_source
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,7 @@ optimizer_use_lock_op_for_serializable off
optimizer_use_multicol_stats on
optimizer_use_not_visible_indexes off
optimizer_use_provided_ordering_fix on
optimizer_use_virtual_computed_column_stats on
override_multi_region_zone_config off
parallelize_multi_key_lookup_joins_enabled off
password_encryption scram-sha-256
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/opt/bench/bench_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -730,6 +730,7 @@ func newHarness(tb testing.TB, query benchQuery, schemas []string) *harness {
h.evalCtx.SessionData().InsertFastPath = true
h.evalCtx.SessionData().OptSplitScanLimit = tabledesc.MaxBucketAllowed
h.evalCtx.SessionData().VariableInequalityLookupJoinEnabled = true
h.evalCtx.SessionData().OptimizerUseVirtualComputedColumnStats = true

// Set up the test catalog.
h.testCat = testcat.New()
Expand Down
4 changes: 2 additions & 2 deletions pkg/sql/opt/exec/execbuilder/testdata/explain
Original file line number Diff line number Diff line change
Expand Up @@ -2238,7 +2238,7 @@ quality of service: regular
query T
EXPLAIN (OPT, MEMO) SELECT * FROM tc JOIN t ON k=a
----
memo (optimized, ~17KB, required=[presentation: info:10] [distribution: test])
memo (optimized, ~18KB, required=[presentation: info:10] [distribution: test])
├── G1: (explain G2 [presentation: a:1,b:2,k:6,v:7] [distribution: test])
│ └── [presentation: info:10] [distribution: test]
│ ├── best: (explain G2="[presentation: a:1,b:2,k:6,v:7] [distribution: test]" [presentation: a:1,b:2,k:6,v:7] [distribution: test])
Expand Down Expand Up @@ -2315,7 +2315,7 @@ TABLE t
├── tableoid oid [hidden] [system]
└── PRIMARY INDEX t_pkey
└── k int not null
memo (optimized, ~17KB, required=[presentation: info:10] [distribution: test])
memo (optimized, ~18KB, required=[presentation: info:10] [distribution: test])
├── G1: (explain G2 [presentation: a:1,b:2,k:6,v:7] [distribution: test])
│ └── [presentation: info:10] [distribution: test]
│ ├── best: (explain G2="[presentation: a:1,b:2,k:6,v:7] [distribution: test]" [presentation: a:1,b:2,k:6,v:7] [distribution: test])
Expand Down
16 changes: 8 additions & 8 deletions pkg/sql/opt/exec/execbuilder/testdata/explain_redact
Original file line number Diff line number Diff line change
Expand Up @@ -700,7 +700,7 @@ upsert bc
query T
EXPLAIN (OPT, MEMO, REDACT) INSERT INTO bc SELECT a::float + 1 FROM a ON CONFLICT (b) DO UPDATE SET b = bc.b + 100
----
memo (optimized, ~33KB, required=[presentation: info:19] [distribution: test])
memo (optimized, ~34KB, required=[presentation: info:19] [distribution: test])
├── G1: (explain G2 [distribution: test])
│ └── [presentation: info:19] [distribution: test]
│ ├── best: (explain G2="[distribution: test]" [distribution: test])
Expand Down Expand Up @@ -1544,7 +1544,7 @@ delete f
query T
EXPLAIN (OPT, MEMO, REDACT) DELETE FROM f WHERE f = 8.5
----
memo (optimized, ~15KB, required=[presentation: info:10] [distribution: test])
memo (optimized, ~16KB, required=[presentation: info:10] [distribution: test])
├── G1: (explain G2 [distribution: test])
│ └── [presentation: info:10] [distribution: test]
│ ├── best: (explain G2="[distribution: test]" [distribution: test])
Expand Down Expand Up @@ -1690,7 +1690,7 @@ project
query T
EXPLAIN (OPT, MEMO, REDACT) SELECT * FROM bc WHERE b >= 1.0 AND b < 2.0
----
memo (optimized, ~11KB, required=[presentation: info:5] [distribution: test])
memo (optimized, ~12KB, required=[presentation: info:5] [distribution: test])
├── G1: (explain G2 [presentation: b:1,c:2] [distribution: test])
│ └── [presentation: info:5] [distribution: test]
│ ├── best: (explain G2="[presentation: b:1,c:2] [distribution: test]" [presentation: b:1,c:2] [distribution: test])
Expand Down Expand Up @@ -1841,7 +1841,7 @@ select
query T
EXPLAIN (OPT, MEMO, REDACT) SELECT * FROM g WHERE (g || 'abc') LIKE '%ggg%'
----
memo (optimized, ~12KB, required=[presentation: info:7] [distribution: test])
memo (optimized, ~13KB, required=[presentation: info:7] [distribution: test])
├── G1: (explain G2 [presentation: g:1] [distribution: test])
│ └── [presentation: info:7] [distribution: test]
│ ├── best: (explain G2="[presentation: g:1] [distribution: test]" [presentation: g:1] [distribution: test])
Expand Down Expand Up @@ -1972,7 +1972,7 @@ select
query T
EXPLAIN (OPT, MEMO, REDACT) SELECT a FROM a WHERE a % 8 > 2
----
memo (optimized, ~6KB, required=[presentation: info:5] [distribution: test])
memo (optimized, ~7KB, required=[presentation: info:5] [distribution: test])
├── G1: (explain G2 [presentation: a:1] [distribution: test])
│ └── [presentation: info:5] [distribution: test]
│ ├── best: (explain G2="[presentation: a:1] [distribution: test]" [presentation: a:1] [distribution: test])
Expand Down Expand Up @@ -2439,7 +2439,7 @@ project
query T
EXPLAIN (OPT, MEMO, REDACT) SELECT * FROM bc JOIN f ON b = f + 1
----
memo (optimized, ~26KB, required=[presentation: info:10] [distribution: test])
memo (optimized, ~27KB, required=[presentation: info:10] [distribution: test])
├── G1: (explain G2 [presentation: b:1,c:2,f:5] [distribution: test])
│ └── [presentation: info:10] [distribution: test]
│ ├── best: (explain G2="[presentation: b:1,c:2,f:5] [distribution: test]" [presentation: b:1,c:2,f:5] [distribution: test])
Expand Down Expand Up @@ -2824,7 +2824,7 @@ project
query T
EXPLAIN (OPT, MEMO, REDACT) SELECT f, g FROM f, LATERAL (SELECT count(DISTINCT c + f + 1) * 2 AS g FROM bc WHERE b * f < 10)
----
memo (optimized, ~32KB, required=[presentation: info:12] [distribution: test])
memo (optimized, ~33KB, required=[presentation: info:12] [distribution: test])
├── G1: (explain G2 [presentation: f:1,g:11] [distribution: test])
│ └── [presentation: info:12] [distribution: test]
│ ├── best: (explain G2="[presentation: f:1,g:11] [distribution: test]" [presentation: f:1,g:11] [distribution: test])
Expand Down Expand Up @@ -3103,7 +3103,7 @@ anti-join-apply
query T
EXPLAIN (OPT, MEMO, REDACT) SELECT * FROM a WHERE a > ALL (SELECT c::int + 2 FROM bc WHERE b > a::float * 3)
----
memo (optimized, ~21KB, required=[presentation: info:11] [distribution: test])
memo (optimized, ~22KB, required=[presentation: info:11] [distribution: test])
├── G1: (explain G2 [presentation: a:1] [distribution: test])
│ └── [presentation: info:11] [distribution: test]
│ ├── best: (explain G2="[presentation: a:1] [distribution: test]" [presentation: a:1] [distribution: test])
Expand Down
3 changes: 3 additions & 0 deletions pkg/sql/opt/memo/memo.go
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ type Memo struct {
useProvidedOrderingFix bool
mergeJoinsEnabled bool
plpgsqlUseStrictInto bool
useVirtualComputedColumnStats bool

// txnIsoLevel is the isolation level under which the plan was created. This
// affects the planning of some locking operations, so it must be included in
Expand Down Expand Up @@ -250,6 +251,7 @@ func (m *Memo) Init(ctx context.Context, evalCtx *eval.Context) {
useProvidedOrderingFix: evalCtx.SessionData().OptimizerUseProvidedOrderingFix,
mergeJoinsEnabled: evalCtx.SessionData().OptimizerMergeJoinsEnabled,
plpgsqlUseStrictInto: evalCtx.SessionData().PLpgSQLUseStrictInto,
useVirtualComputedColumnStats: evalCtx.SessionData().OptimizerUseVirtualComputedColumnStats,
txnIsoLevel: evalCtx.TxnIsoLevel,
}
m.metadata.Init()
Expand Down Expand Up @@ -398,6 +400,7 @@ func (m *Memo) IsStale(
m.useProvidedOrderingFix != evalCtx.SessionData().OptimizerUseProvidedOrderingFix ||
m.mergeJoinsEnabled != evalCtx.SessionData().OptimizerMergeJoinsEnabled ||
m.plpgsqlUseStrictInto != evalCtx.SessionData().PLpgSQLUseStrictInto ||
m.useVirtualComputedColumnStats != evalCtx.SessionData().OptimizerUseVirtualComputedColumnStats ||
m.txnIsoLevel != evalCtx.TxnIsoLevel {
return true, nil
}
Expand Down
6 changes: 6 additions & 0 deletions pkg/sql/opt/memo/memo_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -436,6 +436,12 @@ func TestMemoIsStale(t *testing.T) {
evalCtx.SessionData().OptimizerMergeJoinsEnabled = false
notStale()

// Stale optimizer_use_virtual_computed_column_stats.
evalCtx.SessionData().OptimizerUseVirtualComputedColumnStats = true
stale()
evalCtx.SessionData().OptimizerUseVirtualComputedColumnStats = false
notStale()

// User no longer has access to view.
catalog.View(tree.NewTableNameWithSchema("t", catconstants.PublicSchemaName, "abcview")).Revoked = true
_, err = o.Memo().IsStale(ctx, &evalCtx, catalog)
Expand Down

0 comments on commit 8dc82e5

Please sign in to comment.