geos C++ dependecy removed#4431
Closed
sp-202 wants to merge 68 commits into
Closed
Conversation
…T_Within, ST_Area, ST_Centroid) Implements six Sedona-compatible spatial functions as native DataFusion ScalarUDFs backed by the pure-Rust `geo` crate, replacing JVM round-trips for geometry predicates and measurements. - Add geo/wkt/geos crate dependencies to native/core/Cargo.toml - Implement StContains, StIntersects, StWithin, StDistance, StArea, StCentroid as ScalarUDFImpl in native/core/src/execution/expressions/geo/ - Register all six UDFs with the DataFusion SessionContext in jni_api.rs - Add Scala serde (geo.scala) that reflectively maps Sedona ST_ expression classes to the corresponding ScalarFunc proto at runtime; Comet compiles cleanly without Sedona on the classpath - Wire geoExpressions into exprSerdeMap in QueryPlanSerde.scala Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Split long import line in geo.scala (col 110 violation) - Wrap Class.forName with scalastyle:off/on classforname guard - Replace non-ASCII em-dashes with hyphens in comments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Revert multi-line import to single line, rewrap doc comments to match the project's spotless/scalafmt rules, and fix scalastyle:on guard indentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spark's analyzer cannot resolve st_contains etc. unless they are registered in the Spark function catalog. Add CometGeoFunctions, an injectResolutionRule that registers all six geo UDFs as ScalaUDFs on the SparkSession at startup. When Comet native execution is active, CometScalaUDF serde intercepts the ScalaUDF expression and routes it to the native Rust path via DataFusion. The JVM lambda bodies (CometGeoFallback) are only reached when Comet is disabled, and throw UnsupportedOperationException with a clear message directing the user to enable Comet or add Sedona. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
injectResolutionRule fires per-query, after LookupFunctions already failed with UNRESOLVED_ROUTINE. Switch to calling registerAll() inside the injectColumnar lambda which Spark invokes once at session-build time, ensuring the UDFs are in the catalog before any SQL runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… access injectColumnar and injectResolutionRule both fire after LookupFunctions has already failed. injectOptimizerRule receives the SparkSession at session-build time before any query analysis runs, guaranteeing the UDFs are in the catalog when SQL first executes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asses
ScalaUDF is intercepted by Comet's codegen path which rejected the
lambdas, so the native Rust UDFs were never reached. Replace with
proper Spark Expression subclasses (StContains, StIntersects, etc.)
registered via SparkSessionExtensions.injectFunction. The serde
maps these expression classes directly to ScalarFunc { func = name }
so the DataFusion planner resolves them to the Rust geo UDFs.
- GeoExpressions.scala: BinaryExpression/UnaryExpression subclasses
for each ST_ function with JVM fallback via CometGeoFallback
- CometGeoFallback moved to expressions package (referenced from codegen)
- geo.scala serde: maps native expression classes + optional Sedona classes
- CometSparkSessionExtensions: registers via injectFunction (compile-time,
no session required, guaranteed to fire before any query analysis)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…for injectFunction extensions.FunctionDescription is a 3-tuple (FunctionIdentifier, ExpressionInfo, builder); our type alias was a 2-tuple missing FunctionIdentifier. ExpressionInfo 10-arg constructor does not exist in Spark 3.5 -- use the 2-arg (className, name) form instead. Also drop unused InternalRow and TypeCheckResult imports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Merge GeoExpressions import into comet group (no blank line, alphabetical order) - Break doGenCode string literals at 100-char boundary for contains/intersects/within/distance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nt geometries - st_contains/st_within/st_intersects: replace geo::Contains/Within/Intersects traits with geo::relate::Relate DE-9IM matrix to match OGC/Sedona/JTS boundary semantics. The trait impls used closed-set (covers) semantics; DE-9IM requires interior intersection. - st_distance: geo::EuclideanDistance on Geometry enum returns 0.0 for disjoint Polygon/Point pairs. Fix by checking intersection first, then falling through to euclidean_distance which correctly handles disjoint geometries via nearest-point. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ype, st_numpoints, st_x, st_y, st_envelope, st_convexhull, st_simplify, st_buffer, st_union, st_intersection) Pure-geo tier: st_length (EuclideanLength), st_isempty, st_geometrytype, st_numpoints (CoordsIter), st_x/st_y (Point coord extraction), st_envelope (BoundingRect), st_convexhull (ConvexHull), st_simplify (Ramer-Douglas-Peucker). GEOS tier: st_buffer, st_union, st_intersection via static geos crate. All registered in DataFusion SessionContext, mapped in exprSerdeMap, and exposed via SparkSessionExtensions.injectFunction with JVM fallback stubs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isempty/st_length/st_simplify - st_isempty: import HasDimensions and CoordsIter traits (required for is_empty/coords_count) - st_length: EuclideanLength not impl for Geometry enum; match on LineString/MultiLineString/ Polygon/MultiPolygon variants explicitly - st_simplify: Simplify not impl for Geometry enum; match on concrete variants, pass others through Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Moved to org.apache.comet.expressions in a previous commit; the old file was left behind causing a duplicate object definition compile error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
values_to_arrays does not broadcast scalar literals to batch length. Extract the numeric arg directly from ColumnarValue::Scalar before falling back to array extraction, preventing unwrap() panic on literals. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d POLYGON EMPTY bug
…nts, transformations, set ops)
…Block type Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove code\"\"\"...\"\"\" interpolation from StMakeEnvelope (confuses scalariform parser); use supportCodegen=false instead - Reorder geo.scala import: StGeometryType before StGeomFromGeoJson/Wkt (case-insensitive alphabetical order) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scalariform parser rejects (x, y) => when inline with defineCodeGen args; must place lambda on its own indented line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e error Scalariform fails to parse (x, y) => in defineCodeGen context; rename params to g1/g2 which match the established pattern in the codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Semicolons between val declarations confuse scalariform's parser, causing it to misidentify the next => in the file as a parse error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
StMakeEnvelope extends Expression directly (not BinaryExpression) and uses IndexedSeq[Expression] in withNewChildrenInternal. This unique structure causes scalariform 0.2.6 to corrupt its AST state, causing subsequent (g1, g2) => lambdas in the same file to fail to parse. Moving it to a separate file isolates the issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The 4-line split format was not in the original file. Only intended
change is removing s"POINT(${...})" interpolation from nullSafeEval.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ode for scalariform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ard style for scalariform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rd 2-line form for scalariform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…deGen format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rse error Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… for scalariform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ns for scalariform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s breaking scalariform Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xpression has neither) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… class) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…) tuple style Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents all 40 geospatial SQL functions registered by Comet as Spark SQL extensions. Each entry covers the SQL signature, parameter types, return type, a description of the semantics, and a usage example. Functions are grouped by category (constructors, serializers, predicates, measurements, accessors, transformations, set operations) with a summary table at the end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…functions The geos Rust crate (features = ["static"]) compiles GEOS from bundled C++ source and requires cmake. libgeos-dev provides headers used by pkg-config. On macOS, brew install geos adds the native library needed by the geos crate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes rustfmt --check failures in CI: splits long import lines, expands single-line trait methods, and wraps long register_udf calls in mod.rs to satisfy max_width = 100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y clippy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…:BooleanOps GEOS is LGPL-licensed and a C++ dependency, both of which are incompatible with Apache project policy. Replace ST_Union, ST_Intersection, ST_Difference, and ST_SymDifference with geo::BooleanOps (MIT/Apache-2.0). Also revert the CI build action changes that added cmake and libgeos-dev. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.