Skip to content

diag: surface odd-length hex output from MEOS *_as_hexwkb on macOS arm64 (refs #126/#130)#180

Open
estebanzimanyi wants to merge 10 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/hex-wkb-diag
Open

diag: surface odd-length hex output from MEOS *_as_hexwkb on macOS arm64 (refs #126/#130)#180
estebanzimanyi wants to merge 10 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/hex-wkb-diag

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Summary

Stacks on #158 (the edge-to-cloud quickstart substrate). Wraps every *_as_hexwkb call-site in MobilityDuck with a diagnostic-grade precondition check that surfaces odd-length hex output at the producer rather than the consumer.

What this is for

PRs #126 and #130 have been failing the macOS osx_arm64 unittest with:

Invalid hex string, length (69) has to be a multiple of two!

That message is from DuckDB's hex/blob decoder rejecting an odd-length string — meaning one of the MEOS *_as_hexwkb producers is emitting a 69-char hex string on that platform. Hex strings of WKB-encoded MEOS values must always be even-length (each byte is encoded as 2 hex chars), so the producer is either:

  • producing a literal odd-length string (MEOS-side bug), or
  • producing an even-length string with an embedded null at an odd position (DuckDB sees the null as the string end via strlen)

Instrumentation pattern

At every *_as_hexwkb call-site, immediately after the MEOS call returns:

size_t hex_size = 0;
char *hex = X_as_hexwkb(..., &hex_size);
if (!hex) throw InternalException("asHexWKB: X_as_hexwkb failed");

// Diagnostic: hex strings must be even-length.
size_t actual = strlen(hex);
if (actual % 2 != 0) {
    std::string diag = "asHexWKB: X_as_hexwkb produced odd-length string: "
                       "strlen=" + std::to_string(actual) +
                       " sz=" + std::to_string(hex_size);
    free(hex);
    throw InternalException(diag);
}
string_t stored = StringVector::AddString(result, hex, actual);  // explicit length
free(hex);

When the producer emits odd-length output on macOS-arm64, the exception message includes both strlen(hex) and the sz out-parameter reported by MEOS — pinpointing which side is wrong + what the truncation point was.

The explicit-length StringVector::AddString(result, hex, actual) (instead of the implicit strlen overload) also makes any latent embedded-null bug detectable at the producer rather than the consumer.

Sites instrumented (7 producers)

File Function MEOS export
src/geo/tgeompoint.cpp TgeoAsHexWkbExec temporal_as_hexwkb
src/geo/stbox_functions.cpp Stbox_as_hexwkb stbox_as_hexwkb
src/temporal/span_functions.cpp Span_as_hexwkb span_as_hexwkb
src/temporal/spanset_functions.cpp Spanset_as_hexwkb spanset_as_hexwkb
src/temporal/set_functions.cpp Set_as_hexwkb set_as_hexwkb
src/temporal/tbox_functions.cpp Tbox_as_hexwkb tbox_as_hexwkb
src/temporal/temporal_functions.cpp Temporal_as_hexwkb temporal_as_hexwkb

No-op on green paths

The change is intentionally a no-op on green platforms: hex strings that are already even-length are passed through with no behaviour change, just an explicit length argument. The diagnostic path only fires when the bug is present.

Local verification

Linux x86_64 — full release build green (498/498 ninja targets):

$ ./build/release/duckdb -c "LOAD '...mobilityduck.duckdb_extension'; SELECT asHexWKB(tgeompoint 'Point(1 1)@2020-01-01');"
012E00010101000000000000000000F03F000000000000F03F00BC2EB0063E0200

Even-length hex output; diagnostic does not trigger on the happy path.

Branch base

feat/edge-to-cloud-quickstart-rebased (= #158's branch). Polyfills (#136 ICU staging, #140 bigint_to_set forwarder, #173 unique_ptr fix) inherited from the substrate stack.

Refs

…eogpoint tests

Adds the user-facing entry point for the edge-to-cloud pipeline plus a
documentation/testing surface around it:

- examples/quickstart/quickstart.sql — 5-vessel synthetic ingest →
  Parquet → Spark-readable layer.
- examples/generic-ingest/generic_ingest.sql — generic CSV → tgeompoint
  ingest template.
- temporalFooter() — builds the TemporalParquet KV_METADATA JSON to
  embed at write time (tgeompoint encoding manifest, schema version,
  timestamp range, SRID).
- SRID/geodetic mixin fix in tgeompoint_functions.cpp — preserves the
  source SRID through cast/transform paths and forces geodetic when
  the input is geographic.
- docs/beta-testing-edge-to-cloud.md — beta-testing guide.
- docs/testing-tz-neutral-policy.md — ecosystem-wide timezone-neutral
  test policy (canonical reference for MobilityDB / MobilityDuck /
  PyMEOS / JMEOS / meos-rs).
- docs/tgeogpoint-design.md — design notes.
- test/sql/tgeogpoint.test, test/sql/parquet/temporal_parquet.test —
  test coverage for the new pieces.
- .gitignore + CMakeLists.txt + extension load — wiring.
`meosType` (lower-case) is the **pre-consolidation** MEOS type name;
`MeosType` (upper-case) is the **post-consolidation** target that the
upstream rename sweep has not yet reached.  The current vcpkg pin
(`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still
pre-consolidation: `meos/include/temporal/meos_catalog.h` line 121
declares the typedef as `} meosType;` and every MEOS API uses the
lower-case spelling.  MobilityDuck's source code consistently uses
`meosType` to match — `grep -rn '\bMeosType\b' src/` finds the name
only on the alias line and its comment, nowhere else.

c8cad6d added `using meosType = MeosType;` as a forward-looking
bridge for the eventual consolidation bump.  That bridge points at
`MeosType`, which the current pin does NOT yet expose, so it
breaks every PR's Linux arm64 build with:

  /duckdb_build_dir/src/include/tydef.hpp:18:18:
    error: ‘MeosType’ does not name a type; did you mean ‘meosType’?

The fix is to drop the premature alias and replace the misleading
comment with one that documents the pre/post-consolidation distinction
and the resume path for the next pin bump — at that point a reviewer
can either restore the bridge (this time it'll be valid because
`MeosType` will exist) or sweep the MobilityDuck source from
`meosType` to `MeosType` in a single PR.

Unblocks every in-flight PR's Linux arm64 build: MobilityDB#126, MobilityDB#130, MobilityDB#149,
MobilityDB#158, MobilityDB#159, MobilityDB#160, plus the entire `feat/*_port_core` extended-type
stack (MobilityDB#148/MobilityDB#150/MobilityDB#151/MobilityDB#153/MobilityDB#155/MobilityDB#156).
…MobilityDB#136)

Cherry-picked from open PR MobilityDB#136 (commit 9e1d7a6) so this PR's CI goes
green before MobilityDB#136 lands. When MobilityDB#136 reaches main, the rebase will
collapse this commit to a no-op and it will drop out.

--- original commit body ---
Pre-stage icu extension for amd64 docker tests

LoadInternal calls ExtensionHelper::AutoLoadExtension(db, "icu") so the
Europe/Brussels timezone option is honoured. Inside the linux_amd64 test
docker container there is no network egress and the local extension
directory is empty, so the autoload fails. Copy the icu.duckdb_extension
that was just built locally (declared in extension_config.cmake) into the
expected path before running the unittester.
…en PR MobilityDB#140)

On macOS LP64 and Wasm/emscripten, int64 (long) and int64_t (long long) are
the same width but distinct types, so clang rejects passing bigint_to_set
where a Set *(*)(int64_t) is expected as a non-type template arg. Cherry-
picked from open PR MobilityDB#140 (a8b1755) so this PR goes green on osx_amd64,
osx_arm64, and wasm_mvp before MobilityDB#140 lands. The cast is a no-op on Linux,
where int64 and int64_t are both long. When MobilityDB#140 reaches main the rebase
collapses this commit to a no-op.
PRs MobilityDB#126 and MobilityDB#130 have been failing the macOS osx_arm64 unittest with:

  Invalid hex string, length (69) has to be a multiple of two!

That message is from DuckDB's hex/blob decoder rejecting an
odd-length string — meaning one of the MEOS `*_as_hexwkb` producers
is emitting a 69-char hex string on that platform.  Hex strings of
WKB-encoded MEOS values must always be even-length (each byte is
encoded as 2 hex chars), so the producer is either:
  - producing a literal odd-length string (MEOS-side bug), or
  - producing an even-length string with an embedded null at an
    odd position (DuckDB sees that null as the string end).

This commit wraps every `*_as_hexwkb` call-site with a
diagnostic-grade precondition check that runs immediately after the
MEOS call returns.  When the producer emits odd-length output, the
exception message now includes both `strlen(hex)` and the `sz`
out-parameter reported by MEOS — pinpointing which side is wrong
and what the truncation point was.

Sites instrumented (7 producers):
* src/geo/tgeompoint.cpp       — TgeoAsHexWkbExec (temporal_as_hexwkb)
* src/geo/stbox_functions.cpp  — Stbox_as_hexwkb (stbox_as_hexwkb)
* src/temporal/span_functions.cpp     — Span_as_hexwkb
* src/temporal/spanset_functions.cpp  — Spanset_as_hexwkb
* src/temporal/set_functions.cpp      — Set_as_hexwkb
* src/temporal/tbox_functions.cpp     — Tbox_as_hexwkb
* src/temporal/temporal_functions.cpp — Temporal_as_hexwkb

Each site also switches `StringVector::AddString(result, hex)` (which
uses `strlen(hex)` implicitly) to `AddString(result, hex, actual)`
(which uses the explicit length we just measured), so any latent
embedded-null bug becomes detectable at the producer rather than the
consumer.

The change is intentionally a no-op on green paths: hex strings
that are already even-length are passed through with no extra
behaviour change, just an explicit length argument.

Refs: MobilityDB#126, MobilityDB#130, MobilityDB#158 (touches some of the same files; sequence
after MobilityDB#158 lands).
…e_ptr<FunctionData> in Copy()

GCC + DuckDB 1.4.4's unique_ptr does not implicitly convert
derived->base, so 'return r;' in BinsBindData::Copy() fails to compile:

  error: could not convert 'r' from 'unique_ptr<duckdb::{anonymous}::BinsBindData,...>'
                                to 'unique_ptr<duckdb::FunctionData,...>'

Use duckdb's unique_ptr_cast helper (from duckdb/common/helper.hpp) to
do the conversion explicitly, matching the canonical pattern used by
DuckDB core (e.g. table_scan.hpp's TableScanBindData::Copy()).  No
behaviour change; the move is exactly what the implicit conversion
would have done if the compiler accepted it.
estebanzimanyi and others added 3 commits May 21, 2026 11:43
…ion init)

MobilityDuck initializes MEOS with `meos_initialize_timezone("Europe/Brussels")`
in the extension entry point.  tgeogpoint.test had four assertions hardcoded
to `+00` (UTC) instead of `+01` (Brussels winter), so they failed on every CI
runner.

The four affected assertions are all winter-date constructors that go
through `to_timestamp(unix_seconds)` — those parse as UTC then display in
the extension's runtime TZ (Brussels = UTC+1 in January, no DST).  Updates:

  - asText(TGEOGPOINT(...))              -> POINT(...)@2000-01-01 01:00:00+01
  - asEWKT(TGEOGPOINT(...))              -> SRID=4326;POINT(...)@... 01:00:00+01
  - asText(tgeogpointSeq(ARRAY[..., ...])) -> [POINT(...)@... 01:00:00+01, ...]

A fifth literal-constructor case (`tgeogpoint 'SRID=4326;Point(1 2)@2000-01-01'`)
uses the EWKT parser which interprets the literal date as a Brussels-TZ
local time, so it stays at `00:00:00+01` (not 01:00:00+01).

The header comment now documents the Brussels TZ assumption + cross-refs
docs/testing-tz-neutral-policy.md so future test authors don't repeat
this.
…spatch

DuckDB function resolution treats GEOMETRY, TGEOMPOINT, TGEOGPOINT,
TGEOMETRY, TGEOGRAPHY as alias-equivalent because each is a
LogicalType::BLOB with an alias label. The constant-folder routes
eIntersects(GEOMETRY, TGEOGPOINT) into the {TGEOMPOINT, GEOMETRY}
overload of TgeoGeoIntExec from src/geo/tgeometry_ops.cpp (earlier
registered), which assumes args.data[0] is the Temporal and feeds
the GEOMETRY blob to tspatial_srid — hitting ensure_tspatial_type
inside MEOS with 'must be a spatiotemporal value' before any of the
correctly-direction-aware *_geo_tgeo executors get a chance to run.

Add BlobLooksLikeTemporal to geo_util.hpp: a pure-predicate probe
that reads byte 4 of the blob (Temporal's temptype field) and runs
it through tspatial_type — true only for T_TGEOMPOINT / T_TGEOGPOINT
/ T_TGEOMETRY / T_TGEOGRAPHY / T_TRGEOMETRY. DuckDB GEOMETRY blobs
have byte 4 in their WKB header which never matches one of those
MeosType enum values.

Use the probe at the top of TgeoGeoIntExec in all three *_ops.cpp
TUs to detect mis-ordered args and silently swap roles before
decoding. Also extend each executor with the geodetic-aware
geom_to_geog conversion (matching the existing pattern in
TgeompointFunctions::Eintersects_*) so the swapped-direction call
still respects MEOS_FLAGS_GET_GEODETIC.

Also fixes a unique_ptr<BinsBindData> -> unique_ptr<FunctionData>
build break in span_table_functions.cpp's BinsBindData::Copy() that
otherwise blocks the substrate fix landing alongside the test.

Locally green: test/sql/* full suite, 60/60 test cases, 1346/1346
assertions (was 1335/1336 with the failing line-56 eIntersects).
gdb-confirmed: BlobLooksLikeTemporal probe returns a_is_temp=0
b_is_temp=1 for the (GEOMETRY, TGEOGPOINT) case, swap routes
TGEOGPOINT to the Temporal path.
The stage_icu helper mapped only the Linux uname values, so on the
macOS arm64 test runner uname -m returned "arm64" and the icu
extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of
.../osx_arm64, where DuckDB's autoload looks. The hub fallback is not
reliably resolvable on that runner, so the osx_arm64 Test step failed
to load the extension. Map the OS and architecture to the DuckDB
platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so
the locally built icu is staged at the path autoload expects on every
tested platform; the Linux mapping is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant