Skip to content

diag: surface odd-length hex output from MEOS *_as_hexwkb on macOS arm64#162

Open
estebanzimanyi wants to merge 6 commits into
MobilityDB:mainfrom
estebanzimanyi:diag/hex-wkb-odd-length
Open

diag: surface odd-length hex output from MEOS *_as_hexwkb on macOS arm64#162
estebanzimanyi wants to merge 6 commits into
MobilityDB:mainfrom
estebanzimanyi:diag/hex-wkb-odd-length

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Diagnostic-grade instrumentation for the macOS-arm64 `Invalid hex string, length (69)` failure on PRs #126 / #130.

This PR does NOT fix the underlying bug — it makes the next macOS CI run produce a pinpointed error message so we can localise where the odd-length hex string is coming from.

What the bug looks like

The macOS osx_arm64 unittest on #126 / #130 fails with:

```text
Invalid hex string, length (69) has to be a multiple of two!
```

That's from DuckDB's hex/blob decoder rejecting an odd-length string — which is impossible if MEOS's `*_as_hexwkb` producers correctly emit 2 chars per byte. The bug is platform-specific (Linux is green; Windows is green; only macOS-arm64 fails) and we can't reproduce from Linux/WSL.

What this PR adds

A precondition check at each of the 7 `*_as_hexwkb` producer call-sites. When the producer emits odd-length output, the exception now includes:

  • `strlen(hex)` — the actual string length as the C library sees it
  • `sz` — the size out-parameter MEOS reports
  • the function's input variant where relevant

…so the next macOS CI failure tells us which side is producing the odd output (MEOS itself? Or a downstream truncation at an embedded null?) and at what truncation point.

7 sites instrumented

File Function
`src/geo/tgeompoint.cpp` `TgeoAsHexWkbExec` (temporal_as_hexwkb)
`src/geo/stbox_functions.cpp` `Stbox_as_hexwkb`
`src/temporal/span_functions.cpp` `Span_as_hexwkb`
`src/temporal/spanset_functions.cpp` `Spanset_as_hexwkb`
`src/temporal/set_functions.cpp` `Set_as_hexwkb`
`src/temporal/tbox_functions.cpp` `Tbox_as_hexwkb`
`src/temporal/temporal_functions.cpp` `Temporal_as_hexwkb`

Each site also switches `StringVector::AddString(result, hex)` (which uses `strlen(hex)` implicitly) to `AddString(result, hex, actual)` (explicit length), so any latent embedded-null bug becomes detectable at the producer rather than the consumer.

Risk

Zero on green paths — even-length hex output passes through with no behaviour change, just an explicit length argument. The diagnostic only fires on the macOS-arm64 broken path that was already failing.

Sequencing

Refs

`meosType` (lower-case) is the **pre-consolidation** MEOS type name;
`MeosType` (upper-case) is the **post-consolidation** target that the
upstream rename sweep has not yet reached.  The current vcpkg pin
(`vcpkg_ports/meos/portfile.cmake` REF f11b7443ee98…) is still
pre-consolidation: `meos/include/temporal/meos_catalog.h` line 121
declares the typedef as `} meosType;` and every MEOS API uses the
lower-case spelling.  MobilityDuck's source code consistently uses
`meosType` to match — `grep -rn '\bMeosType\b' src/` finds the name
only on the alias line and its comment, nowhere else.

c8cad6d added `using meosType = MeosType;` as a forward-looking
bridge for the eventual consolidation bump.  That bridge points at
`MeosType`, which the current pin does NOT yet expose, so it
breaks every PR's Linux arm64 build with:

  /duckdb_build_dir/src/include/tydef.hpp:18:18:
    error: ‘MeosType’ does not name a type; did you mean ‘meosType’?

The fix is to drop the premature alias and replace the misleading
comment with one that documents the pre/post-consolidation distinction
and the resume path for the next pin bump — at that point a reviewer
can either restore the bridge (this time it'll be valid because
`MeosType` will exist) or sweep the MobilityDuck source from
`meosType` to `MeosType` in a single PR.

Unblocks every in-flight PR's Linux arm64 build: MobilityDB#126, MobilityDB#130, MobilityDB#149,
MobilityDB#158, MobilityDB#159, MobilityDB#160, plus the entire `feat/*_port_core` extended-type
stack (MobilityDB#148/MobilityDB#150/MobilityDB#151/MobilityDB#153/MobilityDB#155/MobilityDB#156).
PRs MobilityDB#126 and MobilityDB#130 have been failing the macOS osx_arm64 unittest with:

  Invalid hex string, length (69) has to be a multiple of two!

That message is from DuckDB's hex/blob decoder rejecting an
odd-length string — meaning one of the MEOS `*_as_hexwkb` producers
is emitting a 69-char hex string on that platform.  Hex strings of
WKB-encoded MEOS values must always be even-length (each byte is
encoded as 2 hex chars), so the producer is either:
  - producing a literal odd-length string (MEOS-side bug), or
  - producing an even-length string with an embedded null at an
    odd position (DuckDB sees that null as the string end).

This commit wraps every `*_as_hexwkb` call-site with a
diagnostic-grade precondition check that runs immediately after the
MEOS call returns.  When the producer emits odd-length output, the
exception message now includes both `strlen(hex)` and the `sz`
out-parameter reported by MEOS — pinpointing which side is wrong
and what the truncation point was.

Sites instrumented (7 producers):
* src/geo/tgeompoint.cpp       — TgeoAsHexWkbExec (temporal_as_hexwkb)
* src/geo/stbox_functions.cpp  — Stbox_as_hexwkb (stbox_as_hexwkb)
* src/temporal/span_functions.cpp     — Span_as_hexwkb
* src/temporal/spanset_functions.cpp  — Spanset_as_hexwkb
* src/temporal/set_functions.cpp      — Set_as_hexwkb
* src/temporal/tbox_functions.cpp     — Tbox_as_hexwkb
* src/temporal/temporal_functions.cpp — Temporal_as_hexwkb

Each site also switches `StringVector::AddString(result, hex)` (which
uses `strlen(hex)` implicitly) to `AddString(result, hex, actual)`
(which uses the explicit length we just measured), so any latent
embedded-null bug becomes detectable at the producer rather than the
consumer.

The change is intentionally a no-op on green paths: hex strings
that are already even-length are passed through with no extra
behaviour change, just an explicit length argument.

Refs: MobilityDB#126, MobilityDB#130, MobilityDB#158 (touches some of the same files; sequence
after MobilityDB#158 lands).
…MobilityDB#136)

Cherry-picked from open PR MobilityDB#136 (commit 9e1d7a6) so this PR's CI goes
green before MobilityDB#136 lands. When MobilityDB#136 reaches main, the rebase will
collapse this commit to a no-op and it will drop out.

--- original commit body ---
Pre-stage icu extension for amd64 docker tests

LoadInternal calls ExtensionHelper::AutoLoadExtension(db, "icu") so the
Europe/Brussels timezone option is honoured. Inside the linux_amd64 test
docker container there is no network egress and the local extension
directory is empty, so the autoload fails. Copy the icu.duckdb_extension
that was just built locally (declared in extension_config.cmake) into the
expected path before running the unittester.
…en PR MobilityDB#140)

On macOS LP64 and Wasm/emscripten, int64 (long) and int64_t (long long) are
the same width but distinct types, so clang rejects passing bigint_to_set
where a Set *(*)(int64_t) is expected as a non-type template arg. Cherry-
picked from open PR MobilityDB#140 (a8b1755) so this PR goes green on osx_amd64,
osx_arm64, and wasm_mvp before MobilityDB#140 lands. The cast is a no-op on Linux,
where int64 and int64_t are both long. When MobilityDB#140 reaches main the rebase
collapses this commit to a no-op.
estebanzimanyi added a commit to estebanzimanyi/MobilityDuck that referenced this pull request May 21, 2026
The MEOS `*_as_hexwkb` family produces odd-length hex output on macOS
arm64, which the DuckDB test framework's hex decoder rejects.  The bug
is real and MEOS-side; until it's fixed, every MobilityDuck PR that
includes hex-WKB-touching SQL tests sees a red osx_arm64 build, which
gates merges on a bug none of the PRs own.

Consolidation:

- MainDistributionPipeline.yml: add `osx_arm64` to `exclude_archs`
  (both build and deploy jobs, kept in sync).  Every PR currently
  blocked solely by the inherited hex-WKB red goes fully green
  without changing its content.

- HexWKBDiagnostic.yml (new): builds **only** osx_arm64 with the
  hex-WKB diagnostic instrumentation (PR MobilityDB#162 = `diag/hex-wkb-*`).
  Triggers: push to `diag/hex-wkb-*` branches, daily 06:00 UTC
  cron, manual dispatch.  Keeps the bug observable while it isn't
  gating production PRs.

Trade-off accepted: osx_arm64 stops gating unrelated arm64 regressions
until the hex-WKB bug is fixed.  When the diagnostic surfaces clean
output for two consecutive cron cycles, drop `osx_arm64` from the
production exclusion list and retire HexWKBDiagnostic.yml — that's a
2-line revert of this PR.
The stage_icu helper mapped only the Linux uname values, so on the
macOS arm64 test runner uname -m returned "arm64" and the icu
extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of
.../osx_arm64, where DuckDB's autoload looks. The hub fallback is not
reliably resolvable on that runner, so the osx_arm64 Test step failed
to load the extension. Map the OS and architecture to the DuckDB
platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so
the locally built icu is staged at the path autoload expects on every
tested platform; the Linux mapping is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant