Description
Catalog.scala currently registers ~340 functions in a single flat Seq[FunctionDescription], with sparse comments delimiting groups (// Expression for vectors, // Expression for rasters, // geom <-> geog conversion functions). The flat structure has two practical drawbacks:
- Categorization drifts. Comment-based grouping is loose, so over time predicates, accessors, editors, and operations end up interleaved. A new contributor adding a function has no clear hint about where to place it.
- The taxonomy is duplicated, not shared. The SQL docs already organize the same functions into 17 well-defined categories (Geometry Constructors, Predicates, Measurement Functions, Overlay Functions, etc.), and the raster docs follow the same pattern.
Catalog.scala doesn't reflect that taxonomy at all.
Proposal
Split the flat expressions list into named category sequences whose names match the existing docs categories, then concatenate them. Pure code organization — no behavior change, no new/removed functions, registration order preserved.
val geometryConstructorExprs: Seq[FunctionDescription] = Seq(...)
val predicateExprs: Seq[FunctionDescription] = Seq(...)
val measurementExprs: Seq[FunctionDescription] = Seq(...)
// ... etc
override val expressions: Seq[FunctionDescription] =
geometryConstructorExprs ++
geometryAccessorExprs ++
geometryEditorExprs ++
geometryOutputExprs ++
geometryProcessingExprs ++
geometryValidationExprs ++
predicateExprs ++
measurementExprs ++
overlayExprs ++
affineTransformationExprs ++
linearReferencingExprs ++
boundingBoxExprs ++
spatialReferenceSystemExprs ++
spatialIndexingExprs ++
addressExprs ++
otherExprs ++
geographyExprs ++
rasterConstructorExprs ++
rasterAccessorExprs ++
rasterBandAccessorExprs ++
rasterOperatorExprs ++
rasterOutputExprs ++
rasterPredicateExprs ++
rasterGeometryExprs ++
pixelExprs ++
mapAlgebraExprs ++
rasterTileExprs ++
geoStatsFunctions()
Categories (aligned with the docs taxonomy)
ST_ (vector) functions — 17 vals matching the Geometry Functions docs:
| val name |
docs page |
geometryConstructorExprs |
Geometry-Constructors |
geometryAccessorExprs |
Geometry-Accessors |
geometryEditorExprs |
Geometry-Editors |
geometryOutputExprs |
Geometry-Output (also ST_XZ2) |
geometryProcessingExprs |
Geometry-Processing |
geometryValidationExprs |
Geometry-Validation |
predicateExprs |
Predicates |
measurementExprs |
Measurement-Functions |
overlayExprs |
Overlay-Functions |
affineTransformationExprs |
Affine-Transformations |
linearReferencingExprs |
Linear-Referencing |
boundingBoxExprs |
Bounding-Box-Functions |
spatialReferenceSystemExprs |
Spatial-Reference-System |
spatialIndexingExprs |
Spatial-Indexing (also receives ST_KNN, see notes below) |
clusteringExprs |
Clustering-Functions (registered via geoStatsFunctions()) |
spatialStatisticsExprs |
Spatial-Statistics (registered via geoStatsFunctions()) |
addressExprs |
Address-Functions |
ST_Geog (geography) functions — match the docs/api/sql/geography/ subfolder:
| val name |
description |
geographyExprs |
ST_GeogFromText, ST_GeogToGeometry, ST_GeomToGeography, etc. |
RS_ (raster) functions — match the raster docs structure:
| val name |
docs page |
rasterConstructorExprs |
Raster-Constructors |
rasterAccessorExprs |
Raster-Accessors |
rasterBandAccessorExprs |
Raster-Band-Accessors |
rasterOperatorExprs |
Raster-Operators |
rasterOutputExprs |
Raster-Output |
rasterPredicateExprs |
Raster-Predicates |
rasterGeometryExprs |
Raster-Geometry-Functions |
pixelExprs |
Pixel-Functions |
mapAlgebraExprs |
Map-Algebra-Operators |
rasterTileExprs |
Raster-Tiles |
Aggregate functions stay in aggregateExpressions (different registration path), matching the Aggregate-Functions / Raster-Aggregate-Functions docs pages.
Two functions don't fit any docs category — propose explicit handling
Barrier — internal helper for join planning, not in any docs page. Add to a small otherExprs catch-all.
ST_KNN — has its own docs page (NearestNeighbourSearching.md) but isn't listed in any of the 18 categories on /api/sql/Geometry-Functions/. Group it under spatialIndexingExprs since it's a nearest-neighbor lookup helper.
dbx-incompatible group splits naturally by docs category
The current geoStatsFunctions() block can be split into two categorized sub-sequences:
ST_DBSCAN, ST_LocalOutlierFactor → Clustering-Functions
ST_GLocal, ST_BinaryDistanceBandColumn, ST_WeightedDistanceBandColumn → Spatial-Statistics
Both still wrapped in the same try/catch that returns empty sequences on unsupported DBR versions, so the registration semantics don't change.
Recommended companion test
A tiny test asserting every entry in Catalog.expressions lives in exactly one of the named sequences (e.g., assert no function name appears twice; assert the union covers every registered function name). When I implemented this proposal in an internal fork, that test caught three functions that had been added on master and silently dropped during the move. Without it, regressions of this kind are easy to miss.
Benefits
- Single canonical taxonomy across docs and code. A contributor who looks up a function in the docs finds the same category name in
Catalog.scala.
- Explicit categorization at the type level. Adding a new function means picking a category sequence — much clearer than "add it somewhere in this 340-line list".
- No new vocabulary. We borrow names the project already maintains; if a docs category gets added later, a corresponding val gets created in lockstep.
- Reusable for downstream needs. Any consumer that wants category-level information (telemetry buckets, docs generation, selective registration) can map over the named sequences directly without maintaining a parallel mapping.
Non-goals
- No new functions, no removals, no signature changes. Pure code organization.
registerAll semantics unchanged; expressions is still a Seq[FunctionDescription] of the same size and order.
Description
Catalog.scalacurrently registers ~340 functions in a single flatSeq[FunctionDescription], with sparse comments delimiting groups (// Expression for vectors,// Expression for rasters,// geom <-> geog conversion functions). The flat structure has two practical drawbacks:Catalog.scaladoesn't reflect that taxonomy at all.Proposal
Split the flat
expressionslist into named category sequences whose names match the existing docs categories, then concatenate them. Pure code organization — no behavior change, no new/removed functions, registration order preserved.Categories (aligned with the docs taxonomy)
ST_ (vector) functions — 17 vals matching the Geometry Functions docs:
geometryConstructorExprsgeometryAccessorExprsgeometryEditorExprsgeometryOutputExprsST_XZ2)geometryProcessingExprsgeometryValidationExprspredicateExprsmeasurementExprsoverlayExprsaffineTransformationExprslinearReferencingExprsboundingBoxExprsspatialReferenceSystemExprsspatialIndexingExprsST_KNN, see notes below)clusteringExprsgeoStatsFunctions())spatialStatisticsExprsgeoStatsFunctions())addressExprsST_Geog (geography) functions — match the
docs/api/sql/geography/subfolder:geographyExprsRS_ (raster) functions — match the raster docs structure:
rasterConstructorExprsrasterAccessorExprsrasterBandAccessorExprsrasterOperatorExprsrasterOutputExprsrasterPredicateExprsrasterGeometryExprspixelExprsmapAlgebraExprsrasterTileExprsAggregate functions stay in
aggregateExpressions(different registration path), matching the Aggregate-Functions / Raster-Aggregate-Functions docs pages.Two functions don't fit any docs category — propose explicit handling
Barrier— internal helper for join planning, not in any docs page. Add to a smallotherExprscatch-all.ST_KNN— has its own docs page (NearestNeighbourSearching.md) but isn't listed in any of the 18 categories on/api/sql/Geometry-Functions/. Group it underspatialIndexingExprssince it's a nearest-neighbor lookup helper.dbx-incompatible group splits naturally by docs category
The current
geoStatsFunctions()block can be split into two categorized sub-sequences:ST_DBSCAN,ST_LocalOutlierFactor→ Clustering-FunctionsST_GLocal,ST_BinaryDistanceBandColumn,ST_WeightedDistanceBandColumn→ Spatial-StatisticsBoth still wrapped in the same
try/catchthat returns empty sequences on unsupported DBR versions, so the registration semantics don't change.Recommended companion test
A tiny test asserting every entry in
Catalog.expressionslives in exactly one of the named sequences (e.g., assert no function name appears twice; assert the union covers every registered function name). When I implemented this proposal in an internal fork, that test caught three functions that had been added onmasterand silently dropped during the move. Without it, regressions of this kind are easy to miss.Benefits
Catalog.scala.Non-goals
registerAllsemantics unchanged;expressionsis still aSeq[FunctionDescription]of the same size and order.