Skip to content

[REFACTOR] Group Catalog.scala expressions by category for readability and maintainability #2861

@jiayuasu

Description

@jiayuasu

Description

Catalog.scala currently registers ~340 functions in a single flat Seq[FunctionDescription], with sparse comments delimiting groups (// Expression for vectors, // Expression for rasters, // geom <-> geog conversion functions). The flat structure has two practical drawbacks:

  1. Categorization drifts. Comment-based grouping is loose, so over time predicates, accessors, editors, and operations end up interleaved. A new contributor adding a function has no clear hint about where to place it.
  2. The taxonomy is duplicated, not shared. The SQL docs already organize the same functions into 17 well-defined categories (Geometry Constructors, Predicates, Measurement Functions, Overlay Functions, etc.), and the raster docs follow the same pattern. Catalog.scala doesn't reflect that taxonomy at all.

Proposal

Split the flat expressions list into named category sequences whose names match the existing docs categories, then concatenate them. Pure code organization — no behavior change, no new/removed functions, registration order preserved.

val geometryConstructorExprs: Seq[FunctionDescription] = Seq(...)
val predicateExprs: Seq[FunctionDescription] = Seq(...)
val measurementExprs: Seq[FunctionDescription] = Seq(...)
// ... etc

override val expressions: Seq[FunctionDescription] =
  geometryConstructorExprs ++
    geometryAccessorExprs ++
    geometryEditorExprs ++
    geometryOutputExprs ++
    geometryProcessingExprs ++
    geometryValidationExprs ++
    predicateExprs ++
    measurementExprs ++
    overlayExprs ++
    affineTransformationExprs ++
    linearReferencingExprs ++
    boundingBoxExprs ++
    spatialReferenceSystemExprs ++
    spatialIndexingExprs ++
    addressExprs ++
    otherExprs ++
    geographyExprs ++
    rasterConstructorExprs ++
    rasterAccessorExprs ++
    rasterBandAccessorExprs ++
    rasterOperatorExprs ++
    rasterOutputExprs ++
    rasterPredicateExprs ++
    rasterGeometryExprs ++
    pixelExprs ++
    mapAlgebraExprs ++
    rasterTileExprs ++
    geoStatsFunctions()

Categories (aligned with the docs taxonomy)

ST_ (vector) functions — 17 vals matching the Geometry Functions docs:

val name docs page
geometryConstructorExprs Geometry-Constructors
geometryAccessorExprs Geometry-Accessors
geometryEditorExprs Geometry-Editors
geometryOutputExprs Geometry-Output (also ST_XZ2)
geometryProcessingExprs Geometry-Processing
geometryValidationExprs Geometry-Validation
predicateExprs Predicates
measurementExprs Measurement-Functions
overlayExprs Overlay-Functions
affineTransformationExprs Affine-Transformations
linearReferencingExprs Linear-Referencing
boundingBoxExprs Bounding-Box-Functions
spatialReferenceSystemExprs Spatial-Reference-System
spatialIndexingExprs Spatial-Indexing (also receives ST_KNN, see notes below)
clusteringExprs Clustering-Functions (registered via geoStatsFunctions())
spatialStatisticsExprs Spatial-Statistics (registered via geoStatsFunctions())
addressExprs Address-Functions

ST_Geog (geography) functions — match the docs/api/sql/geography/ subfolder:

val name description
geographyExprs ST_GeogFromText, ST_GeogToGeometry, ST_GeomToGeography, etc.

RS_ (raster) functions — match the raster docs structure:

val name docs page
rasterConstructorExprs Raster-Constructors
rasterAccessorExprs Raster-Accessors
rasterBandAccessorExprs Raster-Band-Accessors
rasterOperatorExprs Raster-Operators
rasterOutputExprs Raster-Output
rasterPredicateExprs Raster-Predicates
rasterGeometryExprs Raster-Geometry-Functions
pixelExprs Pixel-Functions
mapAlgebraExprs Map-Algebra-Operators
rasterTileExprs Raster-Tiles

Aggregate functions stay in aggregateExpressions (different registration path), matching the Aggregate-Functions / Raster-Aggregate-Functions docs pages.

Two functions don't fit any docs category — propose explicit handling

  • Barrier — internal helper for join planning, not in any docs page. Add to a small otherExprs catch-all.
  • ST_KNN — has its own docs page (NearestNeighbourSearching.md) but isn't listed in any of the 18 categories on /api/sql/Geometry-Functions/. Group it under spatialIndexingExprs since it's a nearest-neighbor lookup helper.

dbx-incompatible group splits naturally by docs category

The current geoStatsFunctions() block can be split into two categorized sub-sequences:

  • ST_DBSCAN, ST_LocalOutlierFactor → Clustering-Functions
  • ST_GLocal, ST_BinaryDistanceBandColumn, ST_WeightedDistanceBandColumn → Spatial-Statistics

Both still wrapped in the same try/catch that returns empty sequences on unsupported DBR versions, so the registration semantics don't change.

Recommended companion test

A tiny test asserting every entry in Catalog.expressions lives in exactly one of the named sequences (e.g., assert no function name appears twice; assert the union covers every registered function name). When I implemented this proposal in an internal fork, that test caught three functions that had been added on master and silently dropped during the move. Without it, regressions of this kind are easy to miss.

Benefits

  • Single canonical taxonomy across docs and code. A contributor who looks up a function in the docs finds the same category name in Catalog.scala.
  • Explicit categorization at the type level. Adding a new function means picking a category sequence — much clearer than "add it somewhere in this 340-line list".
  • No new vocabulary. We borrow names the project already maintains; if a docs category gets added later, a corresponding val gets created in lockstep.
  • Reusable for downstream needs. Any consumer that wants category-level information (telemetry buckets, docs generation, selective registration) can map over the named sequences directly without maintaining a parallel mapping.

Non-goals

  • No new functions, no removals, no signature changes. Pure code organization.
  • registerAll semantics unchanged; expressions is still a Seq[FunctionDescription] of the same size and order.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions