Tracking issue for adding a native Box2D bounding-box type to Sedona. Each child issue corresponds to one PR.
Background
Sedona has no first-class bounding-box value type. ST_Envelope returns a polygon Geometry, and users reconstruct bboxes via ST_MinX / ST_MaxX / ST_MinY / ST_MaxY. This is awkward for common operations — bbox-from-geometry, dataset extent, GeoParquet covering columns, partition pruning.
Sister project apache/sedona-db has an internal BoundingBox (rust/sedona-geometry/src/bounding_box.rs) but doesn't expose it as a SQL type. PostGIS has box2d / box3d. GeoParquet 1.1 standardizes a struct<xmin, ymin, xmax, ymax> bbox covering column, which Sedona already reads/writes as a raw struct (GeoParquetMetaData.scala, GeoParquetSpatialFilter.scala).
Plan
Add Box2D as a native value type. Phase 1 covers the Spark/JVM side, Python and Flink mirrors, and GeoParquet writer integration. Box3D and geography bboxes are out of scope and tracked as follow-ups.
Type
Box2DUDT is a struct-backed UDT with sqlType = struct<xmin: double, ymin: double, xmax: double, ymax: double> (all non-nullable). Struct-backed (not binary-backed) so values round-trip natively to Parquet and align zero-copy with GeoParquet 1.1 bbox covering columns.
Field names match the GeoParquet 1.1 spec and sedona-db's GeoParquet writer.
A Box2D value is always a valid finite bbox. Absence of a bbox (e.g. ST_Box2D of an empty geometry, ST_Extent over zero rows) is represented by SQL NULL at the column level, not by an in-band sentinel. This matches PostGIS behavior (where Box2D(POINT EMPTY) returns NULL) and leaves xmin > xmax reserved for a future antimeridian-wraparound semantics on geography bboxes (cf. sedona-db's WraparoundInterval, S2's S2LatLngRect).
Split Box2D / Box3D rather than a unified type with optional Z. Reasons:
- GeoParquet 1.1 covering columns are 2D-only. A dedicated
Box2D matches the spec bit-for-bit.
- Storage: 32 bytes/row vs. ~56 bytes for a unified type with nullable Z. Material cost on
ST_Extent shuffles.
- Static dispatch for dimension-specific functions (
ST_Area(box2d) vs ST_Volume(box3d)).
- PostGIS familiarity.
Box3D is deferred until a concrete need (point clouds, BIM, voxel data) lands.
SQL surface (Phase 1)
| Function |
Signature |
ST_Box2D(geom) |
Geometry → Box2D (NULL for empty geom) |
ST_MakeBox2D(point, point) |
(Point, Point) → Box2D |
ST_Extent(geom) |
aggregate Geometry → Box2D (NULL over zero rows) |
ST_XMin / ST_XMax / ST_YMin / ST_YMax(box2d) |
Box2D → Double (overload existing accessors) |
CAST(box2d AS geometry) |
Box2D → Polygon |
ST_AsText(box2d) |
Box2D → 'BOX(x1 y1, x2 y2)' |
ST_Envelope keeps returning a polygon Geometry (no break). ST_Envelope_Aggr is left untouched.
Sub-issues
Foundation
SQL surface
Storage
Cross-language bindings
Out of scope (future phases)
ST_Expand(box, dx, dy)
- Box predicates (
ST_BoxIntersects, ST_BoxContains)
- Implicit
geometry → box2d cast
Box3D, ST_3DExtent, ST_3DMakeBox, ST_ZMin/ZMax
ST_Box2dFromGeoHash, ST_EstimatedExtent
- Reader-side auto-materialization of GeoParquet bbox covering columns as
Box2D
- Geography bboxes (likely path: reuse
Box2D with antimeridian-wraparound semantics on the X axis, encoded via xmin > xmax)
sedona-db's GeoParquet writer uses xmin/ymin/xmax/ymax (Float32), but its st_analyze_agg returns minx/miny/maxx/maxy (Float64). Worth aligning on the Parquet-spec naming as part of this work.
Tracking issue for adding a native
Box2Dbounding-box type to Sedona. Each child issue corresponds to one PR.Background
Sedona has no first-class bounding-box value type.
ST_Envelopereturns a polygonGeometry, and users reconstruct bboxes viaST_MinX/ST_MaxX/ST_MinY/ST_MaxY. This is awkward for common operations — bbox-from-geometry, dataset extent, GeoParquet covering columns, partition pruning.Sister project
apache/sedona-dbhas an internalBoundingBox(rust/sedona-geometry/src/bounding_box.rs) but doesn't expose it as a SQL type. PostGIS hasbox2d/box3d. GeoParquet 1.1 standardizes astruct<xmin, ymin, xmax, ymax>bbox covering column, which Sedona already reads/writes as a raw struct (GeoParquetMetaData.scala,GeoParquetSpatialFilter.scala).Plan
Add
Box2Das a native value type. Phase 1 covers the Spark/JVM side, Python and Flink mirrors, and GeoParquet writer integration.Box3Dand geography bboxes are out of scope and tracked as follow-ups.Type
Box2DUDTis a struct-backed UDT withsqlType = struct<xmin: double, ymin: double, xmax: double, ymax: double>(all non-nullable). Struct-backed (not binary-backed) so values round-trip natively to Parquet and align zero-copy with GeoParquet 1.1 bbox covering columns.Field names match the GeoParquet 1.1 spec and
sedona-db's GeoParquet writer.A
Box2Dvalue is always a valid finite bbox. Absence of a bbox (e.g.ST_Box2Dof an empty geometry,ST_Extentover zero rows) is represented by SQL NULL at the column level, not by an in-band sentinel. This matches PostGIS behavior (whereBox2D(POINT EMPTY)returns NULL) and leavesxmin > xmaxreserved for a future antimeridian-wraparound semantics on geography bboxes (cf.sedona-db'sWraparoundInterval, S2'sS2LatLngRect).Split
Box2D/Box3Drather than a unified type with optional Z. Reasons:Box2Dmatches the spec bit-for-bit.ST_Extentshuffles.ST_Area(box2d)vsST_Volume(box3d)).Box3Dis deferred until a concrete need (point clouds, BIM, voxel data) lands.SQL surface (Phase 1)
ST_Box2D(geom)Geometry → Box2D(NULL for empty geom)ST_MakeBox2D(point, point)(Point, Point) → Box2DST_Extent(geom)Geometry → Box2D(NULL over zero rows)ST_XMin/ST_XMax/ST_YMin/ST_YMax(box2d)Box2D → Double(overload existing accessors)CAST(box2d AS geometry)Box2D → PolygonST_AsText(box2d)Box2D → 'BOX(x1 y1, x2 y2)'ST_Envelopekeeps returning a polygonGeometry(no break).ST_Envelope_Aggris left untouched.Sub-issues
Foundation
Box2DTypeSQL surface
ST_Box2D(geom)scalarST_XMin / XMax / YMin / YMax(box2d)accessor overloadsST_MakeBox2D(p1, p2)scalar constructorST_Extent(geom)aggregateCAST(box2d AS geometry)andST_AsText(box2d)Storage
Cross-language bindings
Out of scope (future phases)
ST_Expand(box, dx, dy)ST_BoxIntersects,ST_BoxContains)geometry → box2dcastBox3D,ST_3DExtent,ST_3DMakeBox,ST_ZMin/ZMaxST_Box2dFromGeoHash,ST_EstimatedExtentBox2DBox2Dwith antimeridian-wraparound semantics on the X axis, encoded viaxmin > xmax)Coordination with sedona-db
sedona-db's GeoParquet writer usesxmin/ymin/xmax/ymax(Float32), but itsst_analyze_aggreturnsminx/miny/maxx/maxy(Float64). Worth aligning on the Parquet-spec naming as part of this work.