feat(geography): register GEOGRAPHY LogicalType (skeleton; PR #168 design)#169
Open
estebanzimanyi wants to merge 8 commits into
Open
feat(geography): register GEOGRAPHY LogicalType (skeleton; PR #168 design)#169estebanzimanyi wants to merge 8 commits into
estebanzimanyi wants to merge 8 commits into
Conversation
Adds doc/geography-boundary.md as the canonical write-up of how MobilityDuck represents geodetic geography values across the MEOS<->DuckDB columnar boundary. Covers: - The closed-algebra property in MEOS and why it doesn't survive the columnar boundary without a dedicated LogicalType. - The GEOGRAPHY LogicalType registration: a BLOB alias carrying MEOS-WKB with the geodetic flag preserved in the type tag, with no dependence on a DuckDB upstream change or on a third-party duckdb-geography extension. - The I/O surface (ST_GeogFromText, ST_AsText, ST_AsBinary, ST_GeogFromBinary), all thin shims over existing MEOS exports. - The operation surface (length, area, eIntersects, etc.) — every call delegates to a MEOS function that takes geodetic input and returns the correct type; DuckDB never sees a non-geodetic representation of a geodetic value during a computation. - The complete inter-type cast matrix (GEOMETRY / GEOGRAPHY / TGEOGPOINT / TGEOMPOINT), mirroring the MobilityDB-on-Postgres surface. - TemporalParquet round-trip preservation via the footer JSON's base_type / geodetic / srid fields. - Pitfalls a binding implementation must avoid (using ST_GeomFromText to construct a GEOGRAPHY value, reusing DuckDB Spatial Cartesian functions on a GEOGRAPHY blob, stripping the geodetic flag in Parquet output, etc.). - Current state of the implementation and the bounded pending work (~430 LoC, single PR) to register the LogicalType, the I/O UDFs, the casts, and the tests. README updated with a single-paragraph pointer in the parity-gaps neighbourhood so adopters land here when looking for geography semantics on the DuckDB side.
…yDB#168 design) Foundation for the DuckDB GEOGRAPHY boundary documented in MobilityDB#168. Registers the LogicalType alias (BLOB carrying MEOS-WKB with the geodetic flag preserved in the type tag) so DuckDB columns can be declared as `GEOGRAPHY`. Files: - src/include/geo/geography.hpp — GeographyType struct declaration (LogicalType getter + RegisterType entry point), mirroring the shape of StboxType / TgeogpointType. - src/geo/geography.cpp — implementation: BLOB with SetAlias ("GEOGRAPHY") and loader.RegisterType("GEOGRAPHY", GEOGRAPHY()). - src/mobilityduck_extension.cpp — Load() wires GeographyType::RegisterType after StboxType. - CMakeLists.txt — adds src/geo/geography.cpp to EXTENSION_SOURCES. - test/sql/geography.test — CREATE TABLE / INSERT NULL / SELECT round-trip sanity test verifying the alias is registered. Follow-up PRs build on this: - I/O UDFs: ST_GeogFromText, ST_AsText, ST_AsBinary, ST_GeogFromBinary (~150 LoC) - Casts: GEOMETRY <-> GEOGRAPHY, GEOGRAPHY <-> TGEOGPOINT (~80 LoC) - Operations: length / area / eIntersects / nearestApproachDistance on GEOGRAPHY columns dispatching to MEOS geog_* functions - Tests: round-trip, cast-matrix, numeric checks against MEOS-on-Postgres ground truth (~200 LoC) Total target surface (from MobilityDB#168): ~430 LoC across the four follow-ups. Each follow-up is independently mergeable on top of this skeleton. Stacks on MobilityDB#168 (geography boundary design doc).
… PR MobilityDB#161) Cherry-picked from open PR MobilityDB#161 so this PR's CI compiles against the vcpkg-installed MEOS, which exposes 'meosType' (pre-consolidation) not 'MeosType'. When MobilityDB#161 reaches main, this commit collapses to a no-op on rebase.
…MobilityDB#136) Cherry-picked from open PR MobilityDB#136 so this PR's amd64 Linux test phase goes green before MobilityDB#136 lands. When MobilityDB#136 reaches main, this rebase collapses to a no-op.
47473d6 to
c77de02
Compare
…en PR MobilityDB#140) Cherry-picked from open PR MobilityDB#140 so this PR's osx_amd64 / osx_arm64 / wasm builds compile. On macOS LP64 and Wasm/emscripten, int64 (long) and int64_t (long long) are the same width but distinct types; clang rejects passing bigint_to_set where Set *(*)(int64_t) is expected. The cast is a no-op on Linux. When MobilityDB#140 reaches main, this rebase collapses to a no-op.
The first geography.cpp landing had a 'reference to Interval is ambiguous' arm64 build error because the duckdb header chain pulled in duckdb::Interval before meos.h's extern "C" block scoped MEOS's Interval to global linkage. Reorder includes to put meos_wrapper_simple.hpp first, mirroring the existing static-type pattern in stbox.cpp / tgeogpoint.cpp.
c77de02 to
7335e96
Compare
…e_ptr<FunctionData> in Copy()
GCC + DuckDB 1.4.4's unique_ptr does not implicitly convert
derived->base, so 'return r;' in BinsBindData::Copy() fails to compile:
error: could not convert 'r' from 'unique_ptr<duckdb::{anonymous}::BinsBindData,...>'
to 'unique_ptr<duckdb::FunctionData,...>'
Use duckdb's unique_ptr_cast helper (from duckdb/common/helper.hpp) to
do the conversion explicitly, matching the canonical pattern used by
DuckDB core (e.g. table_scan.hpp's TableScanBindData::Copy()). No
behaviour change; the move is exactly what the implicit conversion
would have done if the compiler accepted it.
This was referenced May 20, 2026
The stage_icu helper mapped only the Linux uname values, so on the macOS arm64 test runner uname -m returned "arm64" and the icu extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of .../osx_arm64, where DuckDB's autoload looks. The hub fallback is not reliably resolvable on that runner, so the osx_arm64 Test step failed to load the extension. Map the OS and architecture to the DuckDB platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so the locally built icu is staged at the path autoload expects on every tested platform; the Linux mapping is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Foundation for the DuckDB
GEOGRAPHYboundary documented in #168. Registers the LogicalType alias (BLOB carrying MEOS-WKB with the geodetic flag preserved in the type tag) so DuckDB columns can be declared asGEOGRAPHY.What ships
src/include/geo/geography.hppGeographyTypestruct declaration (LogicalType getter + RegisterType entry point), mirroring the shape ofStboxType/TgeogpointType.src/geo/geography.cppSetAlias("GEOGRAPHY")andloader.RegisterType("GEOGRAPHY", GEOGRAPHY()).src/mobilityduck_extension.cppLoad()wiresGeographyType::RegisterTypeafterStboxType.CMakeLists.txtsrc/geo/geography.cpptoEXTENSION_SOURCES.test/sql/geography.testCREATE TABLE ... GEOGRAPHY/INSERT NULL/SELECTround-trip sanity test verifying the alias is registered.Total: 4 new files + 2 edited files, ~50 LoC.
What it does NOT ship (follow-up PRs)
ST_GeogFromText,ST_AsText,ST_AsBinary,ST_GeogFromBinarygeog_in/geo_as_ewkt/geo_as_ewkb/geo_from_ewkbGEOMETRY ⇄ GEOGRAPHY,GEOGRAPHY ⇄ TGEOGPOINTextrasTGEOGPOINT(GEOGRAPHY, TIMESTAMPTZ)constructor reuses the existing temporal-geographic flowlength,area,eIntersects,nearestApproachDistanceonGEOGRAPHYcolumnsgeog_*already returns the right types; the dispatcher PR plugs them in)mobilitydbTotal target surface (from #168's design): ~430 LoC across the four follow-ups. Each follow-up is independently mergeable on top of this skeleton.
Dependency chain
Verification
The sanity test
test/sql/geography.testruns in CI:If CI is green, the LogicalType is registered and the alias parses; the boundary doc's design is buildable. If CI surfaces any C++ compile errors (the skeleton mirrors
StboxTypeexactly, so this is unlikely), they will be amended into the same commit per the green-CI-same-commit gate.