docs(iceberg): readiness doc for TemporalParquet + Apache Polaris catalog#179
Open
estebanzimanyi wants to merge 9 commits into
Open
docs(iceberg): readiness doc for TemporalParquet + Apache Polaris catalog#179estebanzimanyi wants to merge 9 commits into
estebanzimanyi wants to merge 9 commits into
Conversation
…eogpoint tests Adds the user-facing entry point for the edge-to-cloud pipeline plus a documentation/testing surface around it: - examples/quickstart/quickstart.sql — 5-vessel synthetic ingest → Parquet → Spark-readable layer. - examples/generic-ingest/generic_ingest.sql — generic CSV → tgeompoint ingest template. - temporalFooter() — builds the TemporalParquet KV_METADATA JSON to embed at write time (tgeompoint encoding manifest, schema version, timestamp range, SRID). - SRID/geodetic mixin fix in tgeompoint_functions.cpp — preserves the source SRID through cast/transform paths and forces geodetic when the input is geographic. - docs/beta-testing-edge-to-cloud.md — beta-testing guide. - docs/testing-tz-neutral-policy.md — ecosystem-wide timezone-neutral test policy (canonical reference for MobilityDB / MobilityDuck / PyMEOS / JMEOS / meos-rs). - docs/tgeogpoint-design.md — design notes. - test/sql/tgeogpoint.test, test/sql/parquet/temporal_parquet.test — test coverage for the new pieces. - .gitignore + CMakeLists.txt + extension load — wiring.
`meosType` (lower-case) is the **pre-consolidation** MEOS type name; `MeosType` (upper-case) is the **post-consolidation** target that the upstream rename sweep has not yet reached. The current vcpkg pin (`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still pre-consolidation: `meos/include/temporal/meos_catalog.h` line 121 declares the typedef as `} meosType;` and every MEOS API uses the lower-case spelling. MobilityDuck's source code consistently uses `meosType` to match — `grep -rn '\bMeosType\b' src/` finds the name only on the alias line and its comment, nowhere else. c8cad6d added `using meosType = MeosType;` as a forward-looking bridge for the eventual consolidation bump. That bridge points at `MeosType`, which the current pin does NOT yet expose, so it breaks every PR's Linux arm64 build with: /duckdb_build_dir/src/include/tydef.hpp:18:18: error: ‘MeosType’ does not name a type; did you mean ‘meosType’? The fix is to drop the premature alias and replace the misleading comment with one that documents the pre/post-consolidation distinction and the resume path for the next pin bump — at that point a reviewer can either restore the bridge (this time it'll be valid because `MeosType` will exist) or sweep the MobilityDuck source from `meosType` to `MeosType` in a single PR. Unblocks every in-flight PR's Linux arm64 build: MobilityDB#126, MobilityDB#130, MobilityDB#149, MobilityDB#158, MobilityDB#159, MobilityDB#160, plus the entire `feat/*_port_core` extended-type stack (MobilityDB#148/MobilityDB#150/MobilityDB#151/MobilityDB#153/MobilityDB#155/MobilityDB#156).
…al-feat/edge-to-cloud-quickstart-rebased
…MobilityDB#136) Cherry-picked from open PR MobilityDB#136 (commit 9e1d7a6) so this PR's CI goes green before MobilityDB#136 lands. When MobilityDB#136 reaches main, the rebase will collapse this commit to a no-op and it will drop out. --- original commit body --- Pre-stage icu extension for amd64 docker tests LoadInternal calls ExtensionHelper::AutoLoadExtension(db, "icu") so the Europe/Brussels timezone option is honoured. Inside the linux_amd64 test docker container there is no network egress and the local extension directory is empty, so the autoload fails. Copy the icu.duckdb_extension that was just built locally (declared in extension_config.cmake) into the expected path before running the unittester.
…en PR MobilityDB#140) On macOS LP64 and Wasm/emscripten, int64 (long) and int64_t (long long) are the same width but distinct types, so clang rejects passing bigint_to_set where a Set *(*)(int64_t) is expected as a non-type template arg. Cherry- picked from open PR MobilityDB#140 (a8b1755) so this PR goes green on osx_amd64, osx_arm64, and wasm_mvp before MobilityDB#140 lands. The cast is a no-op on Linux, where int64 and int64_t are both long. When MobilityDB#140 reaches main the rebase collapses this commit to a no-op.
…alog
Adds docs/iceberg-polaris.md documenting how MobilityDuck's
TemporalParquet substrate (this PR stack's edge-to-cloud quickstart)
interoperates with Apache Iceberg via Apache Polaris as the catalog
server.
Covered:
- Why Polaris (RBAC, OAuth2, credential vending, protocol-vanilla)
- How TemporalParquet shards compose with Iceberg (opaque BLOB column,
bbox sidecars feed Iceberg column stats for free manifest pruning)
- Per-engine integration (DuckDB temporal_iceberg_scan plan,
MobilitySpark zero-cost via native runtime, PyMEOS via pl.scan_iceberg)
- Snapshot-time vs valid-time orthogonality
- OAuth2 + credential vending pattern
- Deployment recipe (dev compose + production cluster + storage)
The doc is a readiness document — the substrate works today with
vanilla Iceberg; Polaris adds the production-controls layer. It
captures the recommendation captured in memory + ships as part of
the substrate PR.
…ion init)
MobilityDuck initializes MEOS with `meos_initialize_timezone("Europe/Brussels")`
in the extension entry point. tgeogpoint.test had four assertions hardcoded
to `+00` (UTC) instead of `+01` (Brussels winter), so they failed on every CI
runner.
The four affected assertions are all winter-date constructors that go
through `to_timestamp(unix_seconds)` — those parse as UTC then display in
the extension's runtime TZ (Brussels = UTC+1 in January, no DST). Updates:
- asText(TGEOGPOINT(...)) -> POINT(...)@2000-01-01 01:00:00+01
- asEWKT(TGEOGPOINT(...)) -> SRID=4326;POINT(...)@... 01:00:00+01
- asText(tgeogpointSeq(ARRAY[..., ...])) -> [POINT(...)@... 01:00:00+01, ...]
A fifth literal-constructor case (`tgeogpoint 'SRID=4326;Point(1 2)@2000-01-01'`)
uses the EWKT parser which interprets the literal date as a Brussels-TZ
local time, so it stays at `00:00:00+01` (not 01:00:00+01).
The header comment now documents the Brussels TZ assumption + cross-refs
docs/testing-tz-neutral-policy.md so future test authors don't repeat
this.
…spatch
DuckDB function resolution treats GEOMETRY, TGEOMPOINT, TGEOGPOINT,
TGEOMETRY, TGEOGRAPHY as alias-equivalent because each is a
LogicalType::BLOB with an alias label. The constant-folder routes
eIntersects(GEOMETRY, TGEOGPOINT) into the {TGEOMPOINT, GEOMETRY}
overload of TgeoGeoIntExec from src/geo/tgeometry_ops.cpp (earlier
registered), which assumes args.data[0] is the Temporal and feeds
the GEOMETRY blob to tspatial_srid — hitting ensure_tspatial_type
inside MEOS with 'must be a spatiotemporal value' before any of the
correctly-direction-aware *_geo_tgeo executors get a chance to run.
Add BlobLooksLikeTemporal to geo_util.hpp: a pure-predicate probe
that reads byte 4 of the blob (Temporal's temptype field) and runs
it through tspatial_type — true only for T_TGEOMPOINT / T_TGEOGPOINT
/ T_TGEOMETRY / T_TGEOGRAPHY / T_TRGEOMETRY. DuckDB GEOMETRY blobs
have byte 4 in their WKB header which never matches one of those
MeosType enum values.
Use the probe at the top of TgeoGeoIntExec in all three *_ops.cpp
TUs to detect mis-ordered args and silently swap roles before
decoding. Also extend each executor with the geodetic-aware
geom_to_geog conversion (matching the existing pattern in
TgeompointFunctions::Eintersects_*) so the swapped-direction call
still respects MEOS_FLAGS_GET_GEODETIC.
Also fixes a unique_ptr<BinsBindData> -> unique_ptr<FunctionData>
build break in span_table_functions.cpp's BinsBindData::Copy() that
otherwise blocks the substrate fix landing alongside the test.
Locally green: test/sql/* full suite, 60/60 test cases, 1346/1346
assertions (was 1335/1336 with the failing line-56 eIntersects).
gdb-confirmed: BlobLooksLikeTemporal probe returns a_is_temp=0
b_is_temp=1 for the (GEOMETRY, TGEOGPOINT) case, swap routes
TGEOGPOINT to the Temporal path.
The stage_icu helper mapped only the Linux uname values, so on the macOS arm64 test runner uname -m returned "arm64" and the icu extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of .../osx_arm64, where DuckDB's autoload looks. The hub fallback is not reliably resolvable on that runner, so the osx_arm64 Test step failed to load the extension. Map the OS and architecture to the DuckDB platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so the locally built icu is staged at the path autoload expects on every tested platform; the Linux mapping is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacks on #158 (the edge-to-cloud quickstart substrate). Adds
docs/iceberg-polaris.mddocumenting how MobilityDuck's TemporalParquet shards interoperate with Apache Iceberg via Apache Polaris as the catalog server.Content
temporal_iceberg_scanplan (~1 PR), MobilitySpark zero-cost (native Iceberg runtime), PyMEOS viapl.scan_icebergWhy stack on #158
The substrate (
temporalFooterKV metadata + bbox sidecars +tgeogpointSRID/geodetic fix) is the foundation Iceberg/Polaris compose with. Stacking here keeps the readiness doc adjacent to the substrate it depends on, so a reviewer reading #158 sees the Polaris readiness in the same PR train.Branch base
feat/edge-to-cloud-quickstart-rebased(= #158's branch). Adds one commit; no submodule churn.What this doc is NOT
This is not a
temporal_iceberg_scanUDF implementation — that's a future ~1-PR delegator. It's the design + deployment recipe document that lands ahead of the implementation PR so reviewers have the design context before the UDF arrives.Local verification
Doc-only PR; no build verification needed.
Related
[[apache-polaris-readiness]]memory captures the design rationale[[iceberg-readiness]]companion memory for the broader Iceberg lift-in plan