Skip to content

Commit

Permalink
opt: reduce allocations when building join stats
Browse files Browse the repository at this point in the history
Prior to this commit, some queries with many joins would perform a large
number of allocations calculating the selectivity of null-rejecting join
filters. This was due to `statisticsBuiler.selectivityFromNullsRemoved`
allocating a single-column set for each not-null column, and allocating
column statistics for each set.

Many of those allocations and much unnecessary computations to traverse
the expression tree are now avoided. This is made possible by the
realization that the selectivity of a null-rejecting filter is always 1
if the column was already not-null in the input.

Release note: None
  • Loading branch information
mgartner committed Dec 1, 2023
1 parent 6eabe4b commit 5d626ba
Show file tree
Hide file tree
Showing 14 changed files with 98 additions and 88 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3650,7 +3650,7 @@ project
│ ├── key columns: [13 17 12 11] = [6 2 5 1]
│ ├── lookup columns are key
│ ├── immutable
│ ├── stats: [rows=0.01031997, distinct(1)=0.01032, null(1)=0, distinct(2)=0.01032, null(2)=0, distinct(5)=0.01032, null(5)=0, distinct(6)=0.01032, null(6)=0, distinct(10)=0.0102714, null(10)=0, distinct(11)=0.01032, null(11)=0, distinct(12)=0.01032, null(12)=0, distinct(13)=0.01032, null(13)=0, distinct(16)=0.0102638, null(16)=0, distinct(17)=0.01032, null(17)=0, distinct(18)=0.010244, null(18)=0, distinct(19)=0.010244, null(19)=0, distinct(20)=0.0102638, null(20)=0, distinct(21)=0.0102638, null(21)=0, distinct(2,5,6)=0.01032, null(2,5,6)=0]
│ ├── stats: [rows=0.01031997, distinct(1)=0.01032, null(1)=0, distinct(2)=0.01032, null(2)=0, distinct(5)=0.01032, null(5)=0, distinct(6)=0.01032, null(6)=0, distinct(11)=0.01032, null(11)=0, distinct(12)=0.01032, null(12)=0, distinct(13)=0.01032, null(13)=0, distinct(17)=0.01032, null(17)=0, distinct(2,5,6)=0.01032, null(2,5,6)=0]
│ ├── cost: 112.088874
│ ├── fd: ()-->(2,5,6,12,13,17,20,21), (16)-->(18,19), (11)==(1,16), (16)==(1,11), (12)==(5,20), (20)==(5,12), (13)==(6,21), (21)==(6,13), (6)==(13,21), (2)==(17), (17)==(2), (5)==(12,20), (1)==(11,16)
│ ├── distribution: ap-southeast-2
Expand All @@ -3660,7 +3660,7 @@ project
│ │ ├── key columns: [21 16] = [21 16]
│ │ ├── lookup columns are key
│ │ ├── immutable
│ │ ├── stats: [rows=1.101776, distinct(10)=1.08174, null(10)=0, distinct(11)=0.936667, null(11)=0, distinct(12)=0.936667, null(12)=0, distinct(13)=0.936667, null(13)=0, distinct(16)=0.936667, null(16)=0, distinct(17)=0.709904, null(17)=0, distinct(18)=0.693547, null(18)=0, distinct(19)=0.693547, null(19)=0, distinct(20)=0.936667, null(20)=0, distinct(21)=0.936667, null(21)=0]
│ │ ├── stats: [rows=1.101776, distinct(11)=0.936667, null(11)=0, distinct(12)=0.936667, null(12)=0, distinct(13)=0.936667, null(13)=0, distinct(16)=0.936667, null(16)=0, distinct(17)=0.709904, null(17)=0, distinct(20)=0.936667, null(20)=0, distinct(21)=0.936667, null(21)=0]
│ │ ├── cost: 107.514948
│ │ ├── fd: ()-->(12,13,17,20,21), (16)-->(18,19), (11)==(16), (16)==(11), (12)==(20), (20)==(12), (13)==(21), (21)==(13)
│ │ ├── distribution: ap-southeast-2
Expand All @@ -3680,7 +3680,7 @@ project
│ │ │ ├── project
│ │ │ │ ├── columns: "lookup_join_const_col_@17":28 str:10 abc_id:11 x.id2:12 x.crdb_region:13
│ │ │ │ ├── immutable
│ │ │ │ ├── stats: [rows=3.666667, distinct(10)=3.11719, null(10)=0, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(28)=1, null(28)=0]
│ │ │ │ ├── stats: [rows=3.666667, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(28)=1, null(28)=0]
│ │ │ │ ├── cost: 73.3666668
│ │ │ │ ├── fd: ()-->(12,13,28)
│ │ │ │ ├── distribution: ap-southeast-2
Expand All @@ -3689,7 +3689,7 @@ project
│ │ │ │ │ ├── columns: str:10 abc_id:11 x.id2:12 x.crdb_region:13
│ │ │ │ │ ├── constraint: /13/12/10/11/9: [/'ap-southeast-2'/'68088706-02c6-47d1-b993-a421cd761f2b' - /'ap-southeast-2'/'68088706-02c6-47d1-b993-a421cd761f2b']
│ │ │ │ │ ├── immutable
│ │ │ │ │ ├── stats: [rows=3.666667, distinct(10)=3.11719, null(10)=0, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(12,13)=1, null(12,13)=0]
│ │ │ │ │ ├── stats: [rows=3.666667, distinct(11)=3.11719, null(11)=0, distinct(12)=1, null(12)=0, distinct(13)=1, null(13)=0, distinct(12,13)=1, null(12,13)=0]
│ │ │ │ │ │ histogram(13)= 0 3.6667
│ │ │ │ │ │ <--- 'ap-southeast-2'
│ │ │ │ │ ├── cost: 73.2733335
Expand Down
4 changes: 2 additions & 2 deletions pkg/sql/opt/exec/execbuilder/testdata/explain_redact
Original file line number Diff line number Diff line change
Expand Up @@ -2308,7 +2308,7 @@ project
├── project
│ ├── columns: c:2 b:1
│ ├── immutable
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=1000, null(2)=0]
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ ├── cost: 1088.44
│ ├── key: (1)
│ ├── fd: (1)-->(2)
Expand Down Expand Up @@ -2378,7 +2378,7 @@ project
├── project
│ ├── columns: c:2(float!null) b:1(float!null)
│ ├── immutable
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=1000, null(2)=0]
│ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ ├── cost: 1088.44
│ ├── key: (1)
│ ├── fd: (1)-->(2)
Expand Down
14 changes: 7 additions & 7 deletions pkg/sql/opt/exec/execbuilder/testdata/inverted_index
Original file line number Diff line number Diff line change
Expand Up @@ -3307,14 +3307,14 @@ inner-join (lookup geo_table)
│ ├── columns: geo_table2.k:1 geo_table2.geom:2 geo_table.k:11
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=999.957, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2)
│ ├── distribution: test
│ ├── scan geo_table2
│ │ ├── columns: geo_table2.k:1 geo_table2.geom:2
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=100, null(2)=10]
│ │ ├── stats: [rows=1000, distinct(2)=100, null(2)=10]
│ │ ├── cost: 1108.82
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2)
Expand Down Expand Up @@ -3400,14 +3400,14 @@ left-join (lookup geo_table)
│ ├── first join in paired joiner; continuation column: continuation:16
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=1000, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2), (11)-->(16)
│ ├── distribution: test
│ ├── scan geo_table2
│ │ ├── columns: geo_table2.k:1 geo_table2.geom:2
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ │ ├── stats: [rows=1000]
│ │ ├── cost: 1108.82
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2)
Expand Down Expand Up @@ -3439,7 +3439,7 @@ semi-join (lookup geo_table)
│ ├── first join in paired joiner; continuation column: continuation:16
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=999.957, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2), (11)-->(16)
Expand Down Expand Up @@ -3479,14 +3479,14 @@ anti-join (lookup geo_table)
│ ├── first join in paired joiner; continuation column: continuation:16
│ ├── inverted-expr
│ │ └── st_intersects(geo_table2.geom:2, geo_table.geom:12)
│ ├── stats: [rows=10000, distinct(1)=1000, null(1)=0, distinct(11)=999.957, null(11)=0]
│ ├── stats: [rows=10000, distinct(11)=999.957, null(11)=0]
│ ├── cost: 41812.84
│ ├── key: (1,11)
│ ├── fd: (1)-->(2), (11)-->(16)
│ ├── distribution: test
│ ├── scan geo_table2
│ │ ├── columns: geo_table2.k:1 geo_table2.geom:2
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0]
│ │ ├── stats: [rows=1000]
│ │ ├── cost: 1108.82
│ │ ├── key: (1)
│ │ ├── fd: (1)-->(2)
Expand Down
12 changes: 11 additions & 1 deletion pkg/sql/opt/memo/statistics_builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -1340,7 +1340,17 @@ func (sb *statisticsBuilder) buildJoin(
corr := sb.correlationFromMultiColDistinctCountsForJoin(constrainedCols, leftCols, rightCols, join, s)
s.ApplySelectivity(sb.selectivityFromConstrainedCols(constrainedCols, histCols, join, s, corr))
s.ApplySelectivity(sb.selectivityFromUnappliedConjuncts(numUnappliedConjuncts))
s.ApplySelectivity(sb.selectivityFromNullsRemoved(join, relProps.NotNullCols, constrainedCols))

// Ignore columns that are already null in the input when calculating
// selectivity from null-removing filters - the selectivity would always be
// 1.
ignoreCols := constrainedCols
if relProps.NotNullCols.Intersects(h.leftProps.NotNullCols) ||
relProps.NotNullCols.Intersects(h.rightProps.NotNullCols) {
ignoreCols = ignoreCols.Union(h.leftProps.NotNullCols)
ignoreCols.UnionWith(h.rightProps.NotNullCols)
}
s.ApplySelectivity(sb.selectivityFromNullsRemoved(join, relProps.NotNullCols, ignoreCols))

// Update distinct counts based on equivalencies; this should happen after
// selectivityFromMultiColDistinctCounts and selectivityFromEquivalencies.
Expand Down
16 changes: 8 additions & 8 deletions pkg/sql/opt/memo/testdata/stats/inverted-join
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,12 @@ project
│ ├── columns: ltable.k:1(int!null) ltable.geom:2(geometry) rtable.k:10(int!null)
│ ├── inverted-expr
│ │ └── st_intersects(ltable.geom:2, rtable.geom:11) [type=bool]
│ ├── stats: [rows=10000, distinct(1)=999.957, null(1)=0, distinct(10)=999.957, null(10)=0]
│ ├── stats: [rows=10000, distinct(10)=999.957, null(10)=0]
│ ├── key: (1,10)
│ ├── fd: (1)-->(2)
│ ├── scan ltable
│ │ ├── columns: ltable.k:1(int!null) ltable.geom:2(geometry)
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=100, null(2)=10]
│ │ ├── stats: [rows=1000, distinct(2)=100, null(2)=10]
│ │ ├── key: (1)
│ │ └── fd: (1)-->(2)
│ └── filters (true)
Expand All @@ -65,12 +65,12 @@ project
│ ├── columns: ltable.k:1(int!null) ltable.geom:2(geometry) rtable.k:10(int!null)
│ ├── inverted-expr
│ │ └── st_intersects(ltable.geom:2, rtable.geom:11) [type=bool]
│ ├── stats: [rows=10000, distinct(1)=999.957, null(1)=0, distinct(10)=999.957, null(10)=0]
│ ├── stats: [rows=10000, distinct(10)=999.957, null(10)=0]
│ ├── key: (1,10)
│ ├── fd: (1)-->(2)
│ ├── scan ltable
│ │ ├── columns: ltable.k:1(int!null) ltable.geom:2(geometry)
│ │ ├── stats: [rows=1000, distinct(1)=1000, null(1)=0, distinct(2)=100, null(2)=10]
│ │ ├── stats: [rows=1000, distinct(2)=100, null(2)=10]
│ │ ├── key: (1)
│ │ └── fd: (1)-->(2)
│ └── filters (true)
Expand Down Expand Up @@ -169,7 +169,7 @@ inner-join (lookup json_arr1 [as=t1])
│ ├── columns: t2.k:8(int!null) t2.j:9(jsonb) t2.a:10(string[]) t1.k:13(int!null)
│ ├── inverted-expr
│ │ └── (t1.a:15 @> t2.a:10) AND (t1.a:15 @> ARRAY['foo']) [type=bool]
│ ├── stats: [rows=33.33333, distinct(8)=3.33319, null(8)=0, distinct(13)=32.9462, null(13)=0]
│ ├── stats: [rows=33.33333, distinct(13)=32.9462, null(13)=0]
│ ├── key: (8,13)
│ ├── fd: (8)-->(9,10)
│ ├── scan json_arr2 [as=t2]
Expand Down Expand Up @@ -204,7 +204,7 @@ project
│ ├── columns: t2.k:8(int!null) t2.j:9(jsonb) t2.a:10(string[]) t1.k:13(int!null)
│ ├── inverted-expr
│ │ └── t1.j:14 @> t2.j:9 [type=bool]
│ ├── stats: [rows=33.33333, distinct(8)=3.33319, null(8)=0, distinct(13)=32.9462, null(13)=0]
│ ├── stats: [rows=33.33333, distinct(13)=32.9462, null(13)=0]
│ ├── key: (8,13)
│ ├── fd: (8)-->(9,10)
│ ├── scan json_arr2 [as=t2]
Expand Down Expand Up @@ -265,7 +265,7 @@ inner-join (lookup json_arr1 [as=t1])
│ ├── columns: t2.k:8(int!null) t2.j:9(jsonb) t2.a:10(string[]) t1.k:13(int!null)
│ ├── inverted-expr
│ │ └── (t1.a:15 <@ t2.a:10) AND (t1.a:15 <@ ARRAY['foo']) [type=bool]
│ ├── stats: [rows=33.33333, distinct(8)=3.33319, null(8)=0, distinct(13)=32.9462, null(13)=0]
│ ├── stats: [rows=33.33333, distinct(13)=32.9462, null(13)=0]
│ ├── key: (8,13)
│ ├── fd: (8)-->(9,10)
│ ├── scan json_arr2 [as=t2]
Expand Down Expand Up @@ -300,7 +300,7 @@ project
│ ├── columns: t2.k:8(int!null) t2.j:9(jsonb) t2.a:10(string[]) t1.k:13(int!null)
│ ├── inverted-expr
│ │ └── t1.j:14 <@ t2.j:9 [type=bool]
│ ├── stats: [rows=33.33333, distinct(8)=3.33319, null(8)=0, distinct(13)=32.9462, null(13)=0]
│ ├── stats: [rows=33.33333, distinct(13)=32.9462, null(13)=0]
│ ├── key: (8,13)
│ ├── fd: (8)-->(9,10)
│ ├── scan json_arr2 [as=t2]
Expand Down

0 comments on commit 5d626ba

Please sign in to comment.