-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-1861] Spatial index, based on Hilbert space-filling curve #2111
Conversation
8de4451
to
1c02997
Compare
Add SQL function Hilbert. It uses Google's uzaygezen library, and our wraper code is copied from LocationTech's SFCurve project. Rewrite calls to ST_DWithin to use a range of on a Hilbert column, plus the call to ST_DWithin for safety. Implement function ST_MakeEnvelope. Use constant reduction to recognize constant geometry expressions before we apply SpatialRules. Move interface Geom, and other classes, from GeoFunctions into new utility class Geometries. Geometry literals (not in the SQL parser (yet), but in RexNode space). Make Geom implement Comparable, so that it can be a literal value. Move SqlStdOperatorTables to new package, and rename to SqlOperatorTables. Add RelOptTestBase.Sql.withCatalogReaderFactory and withConformance, to make it easier to run planner tests with a different schema or SQL dialect. Deprecate RelReferentialConstraint.getNumColumns(). In tests and examples, call ST_Point(longitude, latitude) because conventionally x is longitude and y is latitude. (Yes, there's a tension here. In map references and English sentences latitude comes before longitude, but in Cartesian geometry x comes before y.) Close apache#2111
1c02997
to
d623025
Compare
Add SQL function Hilbert. It uses Google's uzaygezen library, and our wraper code is copied from LocationTech's SFCurve project. Rewrite calls to ST_DWithin to use a range of on a Hilbert column, plus the call to ST_DWithin for safety. Implement function ST_MakeEnvelope. Use constant reduction to recognize constant geometry expressions before we apply SpatialRules. Move interface Geom, and other classes, from GeoFunctions into new utility class Geometries. Geometry literals (not in the SQL parser (yet), but in RexNode space). Make Geom implement Comparable, so that it can be a literal value. Move SqlStdOperatorTables to new package, and rename to SqlOperatorTables. Add RelOptTestBase.Sql.withCatalogReaderFactory and withConformance, to make it easier to run planner tests with a different schema or SQL dialect. Deprecate RelReferentialConstraint.getNumColumns(). In tests and examples, call ST_Point(longitude, latitude) because conventionally x is longitude and y is latitude. (Yes, there's a tension here. In map references and English sentences latitude comes before longitude, but in Cartesian geometry x comes before y.) Close apache#2111
d623025
to
52e193c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a rather large diff and some of it touches parts of the code base I'm not as familiar with, so I'll admit my review did not go into a lot of depth. I left a few minor comments but overall LGTM.
if (hint.startsWith("SqlKind:")) { | ||
return SqlKind.valueOf(hint.substring("SqlKind:".length())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having hints just be plain strings seems like it could up creating a bit of a mess if they start getting used more. As a simple extension, what if a hint was just a pair with the first part being an enum defined in the hints class specifying the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hints
annotation is marked experimental, so we can change it without notice if it is not fit for purpose. For now, the free-format string is good for the mostly-internal current uses.
If we made it more rigidly typed it would increase coupling.
The purpose here was to allow functions such as Hilbert
to be defined using the UDF framework but still recognized by internal code. In other words, looser coupling than we could do previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how the coupling would be increased. I'm really just talking about having a type for the hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair point. I didn't want to overthink this, so I marked it 'experimental'.
Maybe you're right that we should more strongly type the hint. If so, we will iterate in that direction.
I think you'll agree that if someone wants a function with a particular SqlKind, it's better to allow them to write it as a UDF (and have some means, such as annotation, to link the function to the SqlKind) than to require them to sub-class SqlFunction. That's all I meant by 'less coupling'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me.
core/src/main/java/org/apache/calcite/prepare/CalciteCatalogReader.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/calcite/rel/RelReferentialConstraint.java
Outdated
Show resolved
Hide resolved
builder.push(filter.getInput()) | ||
.filter(conjunctions) | ||
.build()); | ||
return; // we found one useful constraint; don't look for more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure there's a good reason, but it's not immediately obvious to me why you wouldn't want to keep looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general principle, if there are multiple function calls that can be optimized, we'd like the rule once for each, rather than trying to do too much in one shot.
We haven't really thought through complex cases for spatial optimizations. Maybe that can be follow-up work. For this change, I'm trying to make small rewrites that will (hopefully) compose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The end goal makes sense but won't the check for an existing predicate stop the optimization from being applied multiple times in this case?
final RexCall eqCall = (RexCall) predicate; | ||
if (eqCall.operands.get(0) instanceof RexInputRef | ||
&& eqCall.operands.get(1).getKind() == SqlKind.HILBERT) { | ||
final RexInputRef ref = (RexInputRef) eqCall.operands.get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, it would be nice if this equality condition was not necessary, but I understand this is likely to require DB-specific implementation so it's a reasonable starting point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, this check is crucial to the algorithm. Without it, the algorithm loops. The algorithm is looking for artifacts that it created, so looking for a very particular pattern isn't so bad.
Maybe in future we can make more use of predicates rather than looking for Hilbert specifically. My work on sargs should allow us to identify whether the relation has already been filtered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're talking about different checks. I understand the check for the Hilbert predicate is necessary to avoid infinite loops. (My student ran into the same problem.) My point was that it would be nice to be able to get the "Hilbert column" to use during rewriting without relying on the predicate with the Hilbert function already existing in the query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this check is not whether Hilbert is called in the query. It is whether a term column = Hilbert(x, y)
is in the constraints. Which is what happens when you have defined a calculated column in your CREATE TABLE
statement. (It will also happen if you have previously applied Filter(i = Hilbert(x, y))
but that's less interesting.)
Remember, this rewrite requires a table that has a Hilbert computed column and is sorted on that column. The query that uses, say, ST_DWithin
, will be rewritten to use it, even if it does not mention the Hilbert
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake. I didn't realize this was checking in the constraint. My point was that eventually we would ideally want to read metadata from the database to populate those constraints automatically.
Add SQL function Hilbert. It uses Google's uzaygezen library, and our wraper code is copied from LocationTech's SFCurve project. Rewrite calls to ST_DWithin to use a range of on a Hilbert column, plus the call to ST_DWithin for safety. Implement function ST_MakeEnvelope. Use constant reduction to recognize constant geometry expressions before we apply SpatialRules. Move interface Geom, and other classes, from GeoFunctions into new utility class Geometries. Geometry literals (not in the SQL parser (yet), but in RexNode space). Make Geom implement Comparable, so that it can be a literal value. Move SqlStdOperatorTables to new package, and rename to SqlOperatorTables. Add RelOptTestBase.Sql.withCatalogReaderFactory and withConformance, to make it easier to run planner tests with a different schema or SQL dialect. Deprecate RelReferentialConstraint.getNumColumns(). In tests and examples, call ST_Point(longitude, latitude) because conventionally x is longitude and y is latitude. (Yes, there's a tension here. In map references and English sentences latitude comes before longitude, but in Cartesian geometry x comes before y.) Close apache#2111
52e193c
to
a955971
Compare
Rationale: A SqlOperator has a lifetime that spans several statements; but a type factory is only for one statement, and each type belongs to that factory. We want to share SqlOperator instances across statements and connections, therefore we need to create them before there is a type factory. In particular: * Remove (deprecate) method `List<RelDataType> SqlOperator.getParamTypes()`; * Remove `RelDataTypeFactory` argument from `SqlUserDefinedAggFunction` constructor, and remove its `typeFactory` field. * Add `interface SqlOperandMetadata extends SqlOperatorTypeChecker`, which has new methods `List<RelDataType>> paramTypes(RelDataTypeFactory)` and `List<String> paramNames()`. The `SqlOperandMetadata` interface will typically be implemented only for user-defined functions. Unlike SQL built-in functions, UDFs have a fixed set of parameters (although some of them may be optional), and the parameters have names. In `interface SqlOperandTypeChecker`, add method `boolean isFixedParameters()`. Will typically return true for UDFs, false for built-in functions. Returns false for table window functions (e.g. HOP), even though these have named parameters (which tends to make them look a bit like UDFs). Following [CALCITE-4171], change `instanceof SqlWindowTableFunction` to `checker.isFixedParameters()`, because it is less specific. Add `SqlKind` argument to UDF constructors; we will populate it using Java annotations in [CALCITE-1861].
Add SQL function Hilbert. It uses Google's uzaygezen library, and our wraper code is copied from LocationTech's SFCurve project. Rewrite calls to ST_DWithin to use a range of on a Hilbert column, plus the call to ST_DWithin for safety. Implement function ST_MakeEnvelope. Use constant reduction to recognize constant geometry expressions before we apply SpatialRules. Move interface Geom, and other classes, from GeoFunctions into new utility class Geometries. Geometry literals (not in the SQL parser (yet), but in RexNode space). Make Geom implement Comparable, so that it can be a literal value. Move SqlStdOperatorTables to new package, and rename to SqlOperatorTables. Add RelOptTestBase.Sql.withCatalogReaderFactory and withConformance, to make it easier to run planner tests with a different schema or SQL dialect. Deprecate RelReferentialConstraint.getNumColumns(). In tests and examples, call ST_Point(longitude, latitude) because conventionally x is longitude and y is latitude. (Yes, there's a tension here. In map references and English sentences latitude comes before longitude, but in Cartesian geometry x comes before y.) Close apache#2111
a955971
to
ca8d41f
Compare
Add SQL function Hilbert. It uses Google's uzaygezen library, and our wraper code is copied from LocationTech's SFCurve project. Rewrite calls to ST_DWithin to use a range of on a Hilbert column, plus the call to ST_DWithin for safety. Implement function ST_MakeEnvelope. Use constant reduction to recognize constant geometry expressions before we apply SpatialRules. Move interface Geom, and other classes, from GeoFunctions into new utility class Geometries. Geometry literals (not in the SQL parser (yet), but in RexNode space). Make Geom implement Comparable, so that it can be a literal value. Move SqlStdOperatorTables to new package, and rename to SqlOperatorTables. Add RelOptTestBase.Sql.withCatalogReaderFactory and withConformance, to make it easier to run planner tests with a different schema or SQL dialect. Deprecate RelReferentialConstraint.getNumColumns(). In tests and examples, call ST_Point(longitude, latitude) because conventionally x is longitude and y is latitude. (Yes, there's a tension here. In map references and English sentences latitude comes before longitude, but in Cartesian geometry x comes before y.) Close apache#2111
Add SQL function Hilbert. It uses Google's uzaygezen library,
and our wraper code is copied from LocationTech's SFCurve
project.
Rewrite calls to ST_DWithin to use a range of on a Hilbert
column, plus the call to ST_DWithin for safety.
Implement function ST_MakeEnvelope.
Use constant reduction to recognize constant geometry
expressions before we apply SpatialRules.
Move interface Geom, and other classes, from GeoFunctions into
new utility class Geometries.
Geometry literals (not in the SQL parser (yet), but in
RexNode space).
Make Geom implement Comparable, so that it can be a literal
value.
Move SqlStdOperatorTables to new package, and rename to
SqlOperatorTables.
Add RelOptTestBase.Sql.withCatalogReaderFactory and
withConformance, to make it easier to run planner tests with
a different schema or SQL dialect.
Deprecate RelReferentialConstraint.getNumColumns().