Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
53145: colexec: add support for DISTINCT ordered aggregation r=yuzefovich a=yuzefovich

**rowexec: speed up DISTINCT aggregation a bit**

Previously, whenever we needed to clear `seen` map (which tracks the
tuples we have already seen for the aggregation group), we would create
a new map. However, it turns out that it is actually faster (according
to micro-benchmarks) to delete all entries from the old map instead
which is what this commit does.

Release note: None

**colexec: add support for DISTINCT ordered aggregation**

This commit adds the support for DISTINCT ordered aggregation by
reusing the same code (with minor modification to reset `seen` maps when
the new group is encountered) as we have for hash aggregation. Quick
benchmarks show about 6-7x improvement when comparing against a wrapped
row-by-row processor.

Closes: #39242.

Release note (sql change): Vectorized execution now natively supports
ordered aggregation with DISTINCT clause.

53154: sql: pass down index descriptors to stats generators r=mjibson a=mjibson

This allows for geo indexes to have their first defined inverted index
have its configuration be used when generating inverted index entries
for histogram stats.

Release note: None

Fixes #52363

53216: colexec: fix recent problem with hash joiner r=yuzefovich a=yuzefovich

#53169 has just introduced an optimization in populating of `toCheck`
slice by having a default `hjInitialToCheck` slice pre-populated.
However, it could be of insufficient length due to randomization of
`coldata.BatchSize()` which is now fixed.

Release note: None

53217: builtins: implement ST_IsEmpty and ST_IsCollection r=otan a=erikgrinaker

`ST_IsCollection` does not handle `geopb.ShapeType_Geometry` in any special way, and will always return `false` - is this sufficient?

Release note (sql change): Implement the geometry builtins `ST_IsEmpty`
and `ST_IsCollection`.

Closes #48954.
Closes #48955.

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Matt Jibson <matt.jibson@gmail.com>
Co-authored-by: Erik Grinaker <erik@grinaker.org>
  • Loading branch information
4 people committed Aug 21, 2020
5 parents 86165b9 + c5d5760 + 1c0e607 + 9af54b7 + 7d4c1ca commit cab0b31
Show file tree
Hide file tree
Showing 20 changed files with 787 additions and 364 deletions.
4 changes: 4 additions & 0 deletions docs/generated/sql/functions.md
Expand Up @@ -1308,6 +1308,10 @@ calculated, the result is transformed back into a Geography with SRID 4326.</p>
<p>This function variant will attempt to utilize any available geospatial index.</p>
<p>This variant will cast all geometry_str arguments into Geometry types.</p>
</span></td></tr>
<tr><td><a name="st_iscollection"></a><code>st_iscollection(geometry: geometry) &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>Returns whether the geometry is of a collection type (including multi-types).</p>
</span></td></tr>
<tr><td><a name="st_isempty"></a><code>st_isempty(geometry: geometry) &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>Returns whether the geometry is empty.</p>
</span></td></tr>
<tr><td><a name="st_isvalid"></a><code>st_isvalid(geometry: geometry) &rarr; <a href="bool.html">bool</a></code></td><td><span class="funcdesc"><p>Returns whether the geometry is valid as defined by the OGC spec.</p>
<p>This function utilizes the GEOS module.</p>
</span></td></tr>
Expand Down
32 changes: 32 additions & 0 deletions pkg/geo/geomfn/unary_predicates.go
@@ -0,0 +1,32 @@
// Copyright 2020 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package geomfn

import (
"github.com/cockroachdb/cockroach/pkg/geo"
"github.com/cockroachdb/cockroach/pkg/geo/geopb"
)

// IsCollection returns whether the given geometry is of a collection type.
func IsCollection(g *geo.Geometry) (bool, error) {
switch g.ShapeType() {
case geopb.ShapeType_MultiPoint, geopb.ShapeType_MultiLineString, geopb.ShapeType_MultiPolygon,
geopb.ShapeType_GeometryCollection:
return true, nil
default:
return false, nil
}
}

// IsEmpty returns whether the given geometry is empty.
func IsEmpty(g *geo.Geometry) (bool, error) {
return g.Empty(), nil
}
86 changes: 86 additions & 0 deletions pkg/geo/geomfn/unary_predicates_test.go
@@ -0,0 +1,86 @@
// Copyright 2020 The Cockroach Authors.
//
// Use of this software is governed by the Business Source License
// included in the file licenses/BSL.txt.
//
// As of the Change Date specified in that file, in accordance with
// the Business Source License, use of this software will be governed
// by the Apache License, Version 2.0, included in the file
// licenses/APL.txt.

package geomfn

import (
"testing"

"github.com/cockroachdb/cockroach/pkg/geo"
"github.com/stretchr/testify/require"
)

func TestIsCollection(t *testing.T) {
testCases := []struct {
wkt string
expected bool
}{
{"POINT(1.0 1.0)", false},
{"POINT EMPTY", false},
{"LINESTRING(1.0 1.0, 2.0 2.0)", false},
{"LINESTRING EMPTY", false},
{"POLYGON((0.0 0.0, 1.0 0.0, 1.0 1.0, 0.0 0.0))", false},
{"POLYGON EMPTY", false},
{"MULTIPOINT((1.0 1.0), (2.0 2.0))", true},
{"MULTIPOINT EMPTY", true},
{"MULTILINESTRING((1.0 1.0, 2.0 2.0, 3.0 3.0), (6.0 6.0, 7.0 6.0))", true},
{"MULTILINESTRING EMPTY", true},
{"MULTIPOLYGON(((3.0 3.0, 4.0 3.0, 4.0 4.0, 3.0 3.0)), ((0.0 0.0, 1.0 0.0, 1.0 1.0, 0.0 0.0), (0.1 0.1, 0.2 0.1, 0.2 0.2, 0.1 0.1)))", true},
{"MULTIPOLYGON EMPTY", true},
{"GEOMETRYCOLLECTION (POINT (40 10),LINESTRING (10 10, 20 20, 10 40))", true},
{"GEOMETRYCOLLECTION EMPTY", true},
{"GEOMETRYCOLLECTION (GEOMETRYCOLLECTION(POINT (40 10),LINESTRING (10 10, 20 20, 10 40)))", true},
{"GEOMETRYCOLLECTION (GEOMETRYCOLLECTION EMPTY)", true},
}

for _, tc := range testCases {
t.Run(tc.wkt, func(t *testing.T) {
g, err := geo.ParseGeometry(tc.wkt)
require.NoError(t, err)
ret, err := IsCollection(g)
require.NoError(t, err)
require.Equal(t, tc.expected, ret)
})
}
}

func TestIsEmpty(t *testing.T) {
testCases := []struct {
wkt string
expected bool
}{
{"POINT(1.0 1.0)", false},
{"POINT EMPTY", true},
{"LINESTRING(1.0 1.0, 2.0 2.0)", false},
{"LINESTRING EMPTY", true},
{"POLYGON((0.0 0.0, 1.0 0.0, 1.0 1.0, 0.0 0.0))", false},
{"POLYGON EMPTY", true},
{"MULTIPOINT((1.0 1.0), (2.0 2.0))", false},
{"MULTIPOINT EMPTY", true},
{"MULTILINESTRING((1.0 1.0, 2.0 2.0, 3.0 3.0), (6.0 6.0, 7.0 6.0))", false},
{"MULTILINESTRING EMPTY", true},
{"MULTIPOLYGON(((3.0 3.0, 4.0 3.0, 4.0 4.0, 3.0 3.0)), ((0.0 0.0, 1.0 0.0, 1.0 1.0, 0.0 0.0), (0.1 0.1, 0.2 0.1, 0.2 0.2, 0.1 0.1)))", false},
{"MULTIPOLYGON EMPTY", true},
{"GEOMETRYCOLLECTION (POINT (40 10),LINESTRING (10 10, 20 20, 10 40))", false},
{"GEOMETRYCOLLECTION EMPTY", true},
{"GEOMETRYCOLLECTION (GEOMETRYCOLLECTION(POINT (40 10),LINESTRING (10 10, 20 20, 10 40)))", false},
{"GEOMETRYCOLLECTION (GEOMETRYCOLLECTION EMPTY)", true},
}

for _, tc := range testCases {
t.Run(tc.wkt, func(t *testing.T) {
g, err := geo.ParseGeometry(tc.wkt)
require.NoError(t, err)
ret, err := IsEmpty(g)
require.NoError(t, err)
require.Equal(t, tc.expected, ret)
})
}
}
39 changes: 39 additions & 0 deletions pkg/sql/colexec/aggregate_funcs.go
Expand Up @@ -361,3 +361,42 @@ func ProcessAggregations(
}
return
}

// aggBucket stores the aggregation functions for the corresponding aggregation
// group as well as other utility information.
type aggBucket struct {
fns []aggregateFunc
// seen is a slice of maps used to handle distinct aggregation. A
// corresponding entry in the slice is nil if the function doesn't have a
// DISTINCT clause. The slice itself will be nil whenever no aggregate
// function has a DISTINCT clause.
seen []map[string]struct{}
}

func (b *aggBucket) init(
batch coldata.Batch, fns []aggregateFunc, seen []map[string]struct{}, groups []bool,
) {
b.fns = fns
for fnIdx, fn := range b.fns {
fn.Init(groups, batch.ColVec(fnIdx))
}
b.seen = seen
}

const sizeOfAggBucket = unsafe.Sizeof(aggBucket{})

// aggBucketAlloc is a utility struct that batches allocations of aggBuckets.
type aggBucketAlloc struct {
allocator *colmem.Allocator
buf []aggBucket
}

func (a *aggBucketAlloc) newAggBucket() *aggBucket {
if len(a.buf) == 0 {
a.allocator.AdjustMemoryUsage(int64(hashAggregatorAllocSize * sizeOfAggBucket))
a.buf = make([]aggBucket, hashAggregatorAllocSize)
}
ret := &a.buf[0]
a.buf = a.buf[1:]
return ret
}

0 comments on commit cab0b31

Please sign in to comment.