New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1.9-rc: invertedidx: derive geog/geom filters which enable inverted index scan #109395
release-23.1.9-rc: invertedidx: derive geog/geom filters which enable inverted index scan #109395
Conversation
The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: cockroachdb#103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected.
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 6 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
@mgartner OK to merge this one to |
Sure, let's do it. |
Backport 3/3 commits from #108954.
/cc @cockroachdb/release
The query normalization rules FoldEqZeroSTDistance,
FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft
and FoldCmpSTMaxDistanceRight replace binary comparison operations,
=, <, >, <=, >= involving the
st_distance
orst_maxdistance
functionswith equivalent function calls which may enable inverted index scan, for
example st_dwithin, and st_dfullywithin. For non-null and non-empty
geographies and geometries, this rewrite is semantically equivalent, but
for null or empty inputs,
st_distance
uses three-valued boolean logicand returns a null.
st_dwithin
always returns true or false, so mayreturn a different value when the expression is negated. For example,
the following test returns
true
, but should returnnull
:The solution is to move all of the logic in the above normalization
rules into
geoFilterPlanner.extractInvertedFilterConditionFromLeaf
,and use the derived predicate only for the purposes of finding inverted
index scan spans. The spans will always include all rows qualified for
the original predicate, but may include additional rows. The original
filter is then applied to the results of the inverted index scan to
filter out any results with nulls or empty geographies and geometries.
Epic: none
Fixes: #103616
Release note (bug fix): This patch fixes a bug in geospatial queries,
where a query filter of the form
ST_Distance(geog1, geog2) > constant
,or
ST_MaxDistance(geom1, geom2) > constant
, where the operator is oneof >, <, >=, <=, or a filter of the form
ST_Distance(geog1, geog2, false) = 0
may mistakenly evaluate to
true
when one or both of the inputs is nullor an empty geography/geometry. More rows could be returned by the query
than expected.
xform: move st_distance rules tests from norm to xform
This commit moves the tests for the old normalization rules
FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight,
FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm
into xform. The tests now check whether the derived filter triggers the
GenerateInvertedIndexScans rule.
Epic: none
Informs: #103616
Release note: None
invertedidx: move geospatial filter derivation functions out of norm
This commit moves functions previously associated with normalization
rules, but now associated with exploration-time geospatial filter
derivation out of the norm package.
Epic: none
Informs: #103616
Release note: None
Release justification: fix for incorrect results from filters using the st_distance function on null or empty geography/geometry values