Skip to content

[GH-2674] Add RS_SetCRS and RS_CRS for custom CRS string support#2677

Merged
jiayuasu merged 12 commits intomasterfrom
rs-set-crs-2674
Mar 2, 2026
Merged

[GH-2674] Add RS_SetCRS and RS_CRS for custom CRS string support#2677
jiayuasu merged 12 commits intomasterfrom
rs-set-crs-2674

Conversation

@jiayuasu
Copy link
Member

@jiayuasu jiayuasu commented Mar 1, 2026

Did you read the Contributor Guide?

Is this PR related to a ticket?

What changes were proposed in this PR?

Add two new raster functions: RS_SetCRS and RS_CRS, to support custom CRS string definitions beyond simple integer SRID codes.

RS_SetCRS(raster, crsString)

Sets the CRS of a raster using a CRS definition string. Unlike RS_SetSRID which only accepts integer EPSG codes, RS_SetCRS accepts CRS definitions in multiple formats:

  • EPSG codes: EPSG:4326
  • WKT1: GEOGCS["WGS 84", ...]
  • WKT2: GEOGCRS["WGS 84", ...]
  • PROJ strings: +proj=longlat +datum=WGS84 +no_defs
  • PROJJSON: {"type": "GeographicCRS", ...}

Internally, non-WKT1 formats (WKT2, PROJ, PROJJSON) are parsed using proj4sedona 0.0.8 and converted to WKT1 for GeoTools compatibility. The function includes a 3-tier projection name resolution strategy (exact alias match, normalized matching, hardcoded fallback) to handle naming differences between proj4sedona and GeoTools.

RS_CRS(raster[, format])

Returns the CRS of a raster as a string in the specified format:

  • projjson (default) - Modern JSON representation
  • wkt2 - ISO 19162 Well-Known Text 2
  • wkt1 - OGC Well-Known Text 1
  • proj - PROJ string format

Returns null if the raster has no CRS defined.

Design decisions

  • RS_SetCRS does not resolve EPSG codes from WKT/PROJ input. If the input CRS string doesn't contain an explicit AUTHORITY clause, RS_SRID will return 0. Users should use RS_CRS to retrieve the full CRS definition. This avoids expensive EPSG database scans for every RS_SetCRS call.
  • WKT parsing uses longitude-first axis order (FORCE_LONGITUDE_FIRST_AXIS_ORDER), consistent with Sedona's existing CRS handling in FunctionsGeoTools.
  • Thread-safe projection name caches using ConcurrentHashMap for safe concurrent Spark execution.
  • Export path prefers EPSG SRID when available, bypassing WKT1 projection name compatibility issues between GeoTools and proj4sedona.

Files changed

Java common layer:

  • CrsNormalization.java - Centralized CRS name normalization utility bridging GeoTools ↔ proj4sedona
  • RasterEditors.java - setCrs() implementation with CRS parsing pipeline
  • RasterAccessors.java - crs() implementation with multi-format export

Spark SQL:

  • RasterEditors.scala, RasterAccessors.scala - Spark SQL expression wrappers
  • Catalog.scala - Function registration

Tests:

  • RasterEditorsTest.java - 7 unit tests (EPSG, WKT1, WKT2, PROJ, PROJJSON, all proj4sedona projections)
  • RasterAccessorsTest.java - 7 unit tests (all output formats, null handling, invalid format)
  • CrsRoundTripComplianceTest.java - 81 round-trip compliance tests across 22+ representative EPSG codes x 4 formats (PROJ, PROJJSON, WKT1, WKT2), verifying idempotency of export-import-re-export cycles
  • rasteralgebraTest.scala - 9 Spark integration tests

Documentation:

  • RS_SetCRS.md, RS_CRS.md - Individual function docs with limitations sections
  • RS_SRID.md - Updated to document that 0 can mean custom (non-EPSG) CRS
  • Raster-Functions.md - Index page entries

Dependency:

  • Bumped proj4sedona from 0.0.6 to 0.0.8 (fixes datum name loss, lat_ts drift, ellipsoid expansion, WKT2 float drift, WKT2/PROJJSON import for Polar Stereographic & LAEA)

How was this patch tested?

  • 7 new RasterEditorsTest tests (EPSG, WKT1, WKT2, PROJ, PROJJSON, all proj4sedona projections)
  • 7 new RasterAccessorsTest tests (all output formats, null handling, invalid format)
  • 81 new CrsRoundTripComplianceTest tests verifying idempotency across 22+ representative EPSG codes × 4 formats
  • 9 new Spark SQL integration tests in rasteralgebraTest.scala
  • All 144 tests pass via mvn test -pl common

Did this PR include necessary documentation updates?

Implements RS_SetCRS(raster, crsString) that accepts CRS definitions in
EPSG, WKT1, WKT2, PROJ, and PROJJSON formats. Also implements
RS_CRS(raster[, format]) that exports the raster CRS in any of these
formats (default: PROJJSON).

Closes #2674
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new raster CRS utilities to Sedona to support setting and retrieving raster CRS using full CRS definition strings (not limited to integer EPSG/SRID), exposed both in the common Java raster layer and as Spark SQL functions.

Changes:

  • Introduces RS_SetCRS(raster, crsString) to set raster CRS from EPSG/WKT1/WKT2/PROJ/PROJJSON inputs via a GeoTools + proj4sedona parsing pipeline.
  • Introduces RS_CRS(raster[, format]) to export raster CRS as projjson (default), wkt2, wkt1, or proj.
  • Adds extensive unit + integration + round-trip compliance tests and SQL docs for the new functions.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
common/src/main/java/org/apache/sedona/common/raster/RasterEditors.java Adds CRS string parsing + projection-name normalization to support setCrs.
common/src/main/java/org/apache/sedona/common/raster/RasterAccessors.java Adds CRS export in multiple formats via proj4sedona.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterEditors.scala Adds Spark SQL expression wrapper for RS_SetCRS.
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/raster/RasterAccessors.scala Adds Spark SQL expression wrapper for RS_CRS (1-arg + 2-arg).
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala Registers RS_SetCRS and RS_CRS in the function catalog.
spark/common/src/test/scala/org/apache/sedona/sql/rasteralgebraTest.scala Adds Spark integration tests for the new SQL functions and formats.
common/src/test/java/org/apache/sedona/common/raster/RasterEditorsTest.java Adds unit tests for setCrs across input formats and projection support.
common/src/test/java/org/apache/sedona/common/raster/RasterAccessorsTest.java Adds unit tests for crs() output formats, null handling, and invalid format.
common/src/test/java/org/apache/sedona/common/raster/CrsRoundTripComplianceTest.java Adds broad CRS round-trip/idempotency compliance coverage across formats/EPSG codes.
docs/api/sql/Raster-Operators/RS_SetCRS.md Adds SQL documentation for RS_SetCRS.
docs/api/sql/Raster-Operators/RS_CRS.md Adds SQL documentation for RS_CRS output formats and limitations.
docs/api/sql/Raster-Functions.md Adds index entries for the two new raster functions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jiayuasu added 3 commits March 1, 2026 02:11
- Use longitude-first axis order for WKT parsing (FORCE_LONGITUDE_FIRST_AXIS_ORDER)
- Remove tryResolveToEpsg() - RS_SRID returns 0 for custom CRS, use RS_CRS instead
- Add null/empty guard for format parameter in crs()
- Use ConcurrentHashMap for thread-safe alias cache writes
- Guard DefaultMathTransformFactory downcast with instanceof
- Catch specific exceptions in proj4sedona parsing, attach as suppressed
- Remove 'lossless' claim from PROJJSON docs
- Update RS_SRID docs: 0 can mean custom (non-EPSG) CRS
- Fix Javadoc to include WKT2 in format list
- Update Spark test to expect SRID=0 for WKT1 without AUTHORITY
- Bump proj4sedona from 0.0.6 to 0.0.7 (fixes bugs #44-#48)
- Remove documented limitations for datum name loss (#47),
  lat_ts drift (#44), ellipsoid expansion (#45), WKT2 drift (#46)
- Convert WKT2/PROJJSON import-fail tests to normal round-trips (#48)
- Add EPSG:28992 to WKT2 round-trip tests (floating-point drift fixed)
- Add EPSG:6933 test (Lambert Cylindrical Equal Area now works)
- Fix export path: prefer EPSG SRID over WKT1 to avoid projection
  name compatibility issues between GeoTools and proj4sedona
- Add WKT1 projection name normalization for export fallback
- Handle +proj=sterea PROJ string re-import (normalize to +proj=stere)
- Use normalized matching in Tier 3 fallback (handles space vs underscore)
- Update docs to remove resolved limitations
@github-actions github-actions bot added the root label Mar 2, 2026
Consolidate all CRS name normalization into a single utility class:
- Shared PROJECTION_PATTERN regex (was duplicated in RasterEditors + RasterAccessors)
- Pre-normalized fallback map keys for O(1) Tier 3 lookup (was O(n) per call)
- Remove duplicate entries from fallback map (space vs underscore variants collapse)
- Single entry points: normalizeProjInput(), normalizeWkt1ForGeoTools(), normalizeWkt1ForProj4sedona()
- Remove ~210 lines of scattered normalization code from RasterEditors and RasterAccessors
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…y RS_CRS null semantics

- RasterAccessors.crs(): apply normalizeWkt1ForProj4sedona() when EPSG code
  fails and raw WKT1 also fails (consistent with srid==0 branch)
- RS_CRS.md: clarify that RS_SRID=0 can mean either no CRS or custom CRS,
  recommend RS_CRS(raster) IS NULL to test for missing CRS
proj4sedona 0.0.8 registers sterea as alias for Stereographic (#57),
so the +proj=sterea → +proj=stere normalization in CrsNormalization
is no longer needed.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- parseCrsString: add null/blank validation up front
- Parameter stripping loop: detect no-op strips, always record lastError
- Rename testSetCrsWithAllProj4SedonaProjections → Representative
- CrsNormalization.normalizeForMatch: use Locale.ROOT for toLowerCase
- RasterAccessors: extract createProjFromWkt1() helper to deduplicate
  the try/catch/normalize/retry logic used in both srid>0 and srid==0 branches
- RasterEditors.parseCrsString: hoist Hints+CRSFactory creation out of
  Step 2 try-block so Step 3 reuses the same instances
- RasterEditors.stripWktParameter: use Pattern.compile() explicitly
  instead of String.replaceAll() which recompiles on every call
- RasterAccessors.crs: avoid trimming format string twice
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu added this to the sedona-1.9.0 milestone Mar 2, 2026
@jiayuasu jiayuasu requested a review from Copilot March 2, 2026 08:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu marked this pull request as ready for review March 2, 2026 09:29
@jiayuasu jiayuasu merged commit d13b692 into master Mar 2, 2026
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can I register a custom CRS to a user-defined EPSG code to use with RS_SetSRID?

2 participants