Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-1861] Spatial index, based on Hilbert space-filling curve #2111

Closed
wants to merge 2 commits into from

Conversation

julianhyde
Copy link
Contributor

Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.

Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.

Implement function ST_MakeEnvelope.

Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.

Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.

Geometry literals (not in the SQL parser (yet), but in
RexNode space).

Make Geom implement Comparable, so that it can be a literal
value.

Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.

Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.

Deprecate RelReferentialConstraint.getNumColumns().

julianhyde added a commit to julianhyde/calcite that referenced this pull request Aug 19, 2020
Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.

Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.

Implement function ST_MakeEnvelope.

Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.

Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.

Geometry literals (not in the SQL parser (yet), but in
RexNode space).

Make Geom implement Comparable, so that it can be a literal
value.

Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.

Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.

Deprecate RelReferentialConstraint.getNumColumns().

In tests and examples, call ST_Point(longitude, latitude)
because conventionally x is longitude and y is latitude. (Yes,
there's a tension here. In map references and English
sentences latitude comes before longitude, but in Cartesian
geometry x comes before y.)

Close apache#2111
julianhyde added a commit to julianhyde/calcite that referenced this pull request Aug 19, 2020
Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.

Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.

Implement function ST_MakeEnvelope.

Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.

Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.

Geometry literals (not in the SQL parser (yet), but in
RexNode space).

Make Geom implement Comparable, so that it can be a literal
value.

Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.

Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.

Deprecate RelReferentialConstraint.getNumColumns().

In tests and examples, call ST_Point(longitude, latitude)
because conventionally x is longitude and y is latitude. (Yes,
there's a tension here. In map references and English
sentences latitude comes before longitude, but in Cartesian
geometry x comes before y.)

Close apache#2111
@michaelmior michaelmior self-requested a review August 19, 2020 20:57
Copy link
Member

@michaelmior michaelmior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rather large diff and some of it touches parts of the code base I'm not as familiar with, so I'll admit my review did not go into a lot of depth. I left a few minor comments but overall LGTM.

Comment on lines +369 to +370
if (hint.startsWith("SqlKind:")) {
return SqlKind.valueOf(hint.substring("SqlKind:".length()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having hints just be plain strings seems like it could up creating a bit of a mess if they start getting used more. As a simple extension, what if a hint was just a pair with the first part being an enum defined in the hints class specifying the type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hints annotation is marked experimental, so we can change it without notice if it is not fit for purpose. For now, the free-format string is good for the mostly-internal current uses.

If we made it more rigidly typed it would increase coupling.

The purpose here was to allow functions such as Hilbert to be defined using the UDF framework but still recognized by internal code. In other words, looser coupling than we could do previously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how the coupling would be increased. I'm really just talking about having a type for the hint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair point. I didn't want to overthink this, so I marked it 'experimental'.

Maybe you're right that we should more strongly type the hint. If so, we will iterate in that direction.

I think you'll agree that if someone wants a function with a particular SqlKind, it's better to allow them to write it as a UDF (and have some means, such as annotation, to link the function to the SqlKind) than to require them to sub-class SqlFunction. That's all I meant by 'less coupling'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable to me.

builder.push(filter.getInput())
.filter(conjunctions)
.build());
return; // we found one useful constraint; don't look for more
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there's a good reason, but it's not immediately obvious to me why you wouldn't want to keep looking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general principle, if there are multiple function calls that can be optimized, we'd like the rule once for each, rather than trying to do too much in one shot.

We haven't really thought through complex cases for spatial optimizations. Maybe that can be follow-up work. For this change, I'm trying to make small rewrites that will (hopefully) compose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The end goal makes sense but won't the check for an existing predicate stop the optimization from being applied multiple times in this case?

final RexCall eqCall = (RexCall) predicate;
if (eqCall.operands.get(0) instanceof RexInputRef
&& eqCall.operands.get(1).getKind() == SqlKind.HILBERT) {
final RexInputRef ref = (RexInputRef) eqCall.operands.get(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, it would be nice if this equality condition was not necessary, but I understand this is likely to require DB-specific implementation so it's a reasonable starting point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this check is crucial to the algorithm. Without it, the algorithm loops. The algorithm is looking for artifacts that it created, so looking for a very particular pattern isn't so bad.

Maybe in future we can make more use of predicates rather than looking for Hilbert specifically. My work on sargs should allow us to identify whether the relation has already been filtered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're talking about different checks. I understand the check for the Hilbert predicate is necessary to avoid infinite loops. (My student ran into the same problem.) My point was that it would be nice to be able to get the "Hilbert column" to use during rewriting without relying on the predicate with the Hilbert function already existing in the query.

Copy link
Contributor Author

@julianhyde julianhyde Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this check is not whether Hilbert is called in the query. It is whether a term column = Hilbert(x, y) is in the constraints. Which is what happens when you have defined a calculated column in your CREATE TABLE statement. (It will also happen if you have previously applied Filter(i = Hilbert(x, y)) but that's less interesting.)

Remember, this rewrite requires a table that has a Hilbert computed column and is sorted on that column. The query that uses, say, ST_DWithin, will be rewritten to use it, even if it does not mention the Hilbert function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I didn't realize this was checking in the constraint. My point was that eventually we would ideally want to read metadata from the database to populate those constraints automatically.

julianhyde added a commit to julianhyde/calcite that referenced this pull request Aug 20, 2020
Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.

Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.

Implement function ST_MakeEnvelope.

Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.

Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.

Geometry literals (not in the SQL parser (yet), but in
RexNode space).

Make Geom implement Comparable, so that it can be a literal
value.

Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.

Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.

Deprecate RelReferentialConstraint.getNumColumns().

In tests and examples, call ST_Point(longitude, latitude)
because conventionally x is longitude and y is latitude. (Yes,
there's a tension here. In map references and English
sentences latitude comes before longitude, but in Cartesian
geometry x comes before y.)

Close apache#2111
Rationale: A SqlOperator has a lifetime that spans several
statements; but a type factory is only for one statement, and
each type belongs to that factory. We want to share
SqlOperator instances across statements and connections,
therefore we need to create them before there is a type
factory.

In particular:
* Remove (deprecate) method
  `List<RelDataType> SqlOperator.getParamTypes()`;
* Remove `RelDataTypeFactory` argument from
  `SqlUserDefinedAggFunction` constructor, and remove its
  `typeFactory` field.
* Add `interface SqlOperandMetadata extends
  SqlOperatorTypeChecker`, which has new methods
  `List<RelDataType>> paramTypes(RelDataTypeFactory)`
  and `List<String> paramNames()`.

The `SqlOperandMetadata` interface will typically be
implemented only for user-defined functions. Unlike SQL
built-in functions, UDFs have a fixed set of parameters
(although some of them may be optional), and the parameters
have names.

In `interface SqlOperandTypeChecker`, add method `boolean
isFixedParameters()`. Will typically return true for UDFs,
false for built-in functions. Returns false for table window
functions (e.g. HOP), even though these have named parameters
(which tends to make them look a bit like UDFs).

Following [CALCITE-4171], change `instanceof SqlWindowTableFunction`
to `checker.isFixedParameters()`, because it is less specific.

Add `SqlKind` argument to UDF constructors; we will populate it
using Java annotations in [CALCITE-1861].
Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.

Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.

Implement function ST_MakeEnvelope.

Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.

Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.

Geometry literals (not in the SQL parser (yet), but in
RexNode space).

Make Geom implement Comparable, so that it can be a literal
value.

Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.

Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.

Deprecate RelReferentialConstraint.getNumColumns().

In tests and examples, call ST_Point(longitude, latitude)
because conventionally x is longitude and y is latitude. (Yes,
there's a tension here. In map references and English
sentences latitude comes before longitude, but in Cartesian
geometry x comes before y.)

Close apache#2111
@asfgit asfgit closed this in eab043f Aug 21, 2020
MalteBellmann pushed a commit to caespes/calcite that referenced this pull request Feb 21, 2021
Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.

Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.

Implement function ST_MakeEnvelope.

Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.

Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.

Geometry literals (not in the SQL parser (yet), but in
RexNode space).

Make Geom implement Comparable, so that it can be a literal
value.

Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.

Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.

Deprecate RelReferentialConstraint.getNumColumns().

In tests and examples, call ST_Point(longitude, latitude)
because conventionally x is longitude and y is latitude. (Yes,
there's a tension here. In map references and English
sentences latitude comes before longitude, but in Cartesian
geometry x comes before y.)

Close apache#2111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants