Skip to content

doc(geography): document the DuckDB GEOGRAPHY boundary design#168

Open
estebanzimanyi wants to merge 1 commit into
MobilityDB:mainfrom
estebanzimanyi:doc/geography-boundary-design
Open

doc(geography): document the DuckDB GEOGRAPHY boundary design#168
estebanzimanyi wants to merge 1 commit into
MobilityDB:mainfrom
estebanzimanyi:doc/geography-boundary-design

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Adds doc/geography-boundary.md as the canonical write-up of how MobilityDuck represents geodetic geography values across the MEOS ↔ DuckDB columnar boundary.

Why this doc exists

Two layers of context get mixed up regularly when geodetic results come out of a query:

  1. MEOS has the closed-algebra property for geography. geog_in, geog_area, eIntersects(geog, geog), tgeog_length, tgeog_speed all take geodetic inputs, compute on the WGS-84 spheroid, and return properly-typed geodetic results — without leaving the MEOS C runtime.
  2. DuckDB's bundled spatial extension exposes one logical type — GEOMETRY — that has no geodetic bit. The flag is at risk of being lost the moment a MEOS geography result becomes a DuckDB column value.

The doc explains the boundary-layer solution: register a GEOGRAPHY LogicalType in MobilityDuck (a BLOB alias whose payload is MEOS-WKB with the geodetic flag in the type tag), so the columnar engine carries the type information verbatim while every operation stays inside MEOS.

What the doc covers

  • The problem in one paragraph + an ASCII boundary diagram.
  • The GEOGRAPHY LogicalType registration sketch (~10 LoC at registration time).
  • I/O surface — ST_GeogFromText, ST_AsText, ST_AsBinary, ST_GeogFromBinary — all thin shims over existing MEOS exports.
  • Operation surface — length, area, eIntersects, etc. — every call delegates to a MEOS function that takes geodetic input and returns the correct type.
  • The complete cast matrix (GEOMETRY / GEOGRAPHY / TGEOGPOINT / TGEOMPOINT), mirroring the MobilityDB-on-Postgres surface.
  • TemporalParquet round-trip preservation via the footer JSON.
  • Pitfalls a binding implementation must avoid (using ST_GeomFromText to construct a GEOGRAPHY value, reusing DuckDB Spatial Cartesian functions on a GEOGRAPHY BLOB, stripping the geodetic flag in Parquet output, etc.).
  • Current state and the bounded pending work (~430 LoC, single PR) to register the LogicalType + I/O UDFs + casts + tests.

Where it's linked

This is the doc the user asked for: "document all the DuckDB geography issue and solution properly so it can be widely available and findable in the documentation". The implementation (the ~430 LoC PR registering GEOGRAPHY + UDFs + casts + tests) is the natural next step.

Adds doc/geography-boundary.md as the canonical write-up of how
MobilityDuck represents geodetic geography values across the
MEOS<->DuckDB columnar boundary.

Covers:
- The closed-algebra property in MEOS and why it doesn't survive
  the columnar boundary without a dedicated LogicalType.
- The GEOGRAPHY LogicalType registration: a BLOB alias carrying
  MEOS-WKB with the geodetic flag preserved in the type tag, with
  no dependence on a DuckDB upstream change or on a third-party
  duckdb-geography extension.
- The I/O surface (ST_GeogFromText, ST_AsText, ST_AsBinary,
  ST_GeogFromBinary), all thin shims over existing MEOS exports.
- The operation surface (length, area, eIntersects, etc.) — every
  call delegates to a MEOS function that takes geodetic input and
  returns the correct type; DuckDB never sees a non-geodetic
  representation of a geodetic value during a computation.
- The complete inter-type cast matrix (GEOMETRY / GEOGRAPHY /
  TGEOGPOINT / TGEOMPOINT), mirroring the MobilityDB-on-Postgres
  surface.
- TemporalParquet round-trip preservation via the footer JSON's
  base_type / geodetic / srid fields.
- Pitfalls a binding implementation must avoid (using
  ST_GeomFromText to construct a GEOGRAPHY value, reusing DuckDB
  Spatial Cartesian functions on a GEOGRAPHY blob, stripping the
  geodetic flag in Parquet output, etc.).
- Current state of the implementation and the bounded pending
  work (~430 LoC, single PR) to register the LogicalType, the
  I/O UDFs, the casts, and the tests.

README updated with a single-paragraph pointer in the
parity-gaps neighbourhood so adopters land here when looking for
geography semantics on the DuckDB side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant