fix error for initializing Geos by ngochoawindy · Pull Request #51 · MobilityDB/MobilityDuck

ngochoawindy · 2026-04-29T07:29:58Z

Mutex serialization for MobilityDuck scalar function bodies that ultimately call MEOS / liblwgeom. MEOS uses GEOS's legacy global API (initGEOS / finishGEOS) internally from worker threads; concurrent legacy GEOS init/teardown from DuckDB's parallelism triggers intermittent GEOS errors and allocator corruption when spatial and MobilityDuck coexist.
Wrapping each MEOS-backed ScalarFunction execution prevents overlapping MEOS/GEOS legacy pairs across concurrent pipelines while preserving parallelism elsewhere.

ngochoawindy · 2026-05-04T15:48:28Z

@estebanzimanyi can you help me review and fix the error in this PR.
In this PR, I tried to fix the error when running the query:
select tgeompointSeq(list(tgeompoint(ST_Transform(geom_4326, 'EPSG:4326', 'EPSG:25832', always_xy := true), t) ORDER BY t))
Sometimes this query throws the error of 'Uninitialize Geos', sometimes is 'malloc error', and sometimes it works.

estebanzimanyi · 2026-05-04T20:07:57Z

Thanks Ngoc for the detailed report. The mutex approach in this PR should in principle serialize all MobilityDuck–MEOS calls against each other, which prevents concurrent GEOS legacy-API (initGEOS/finishGEOS) calls originating from within MobilityDuck itself.

However, your specific query mixes two extensions in the same pipeline:

tgeompointSeq(list(tgeompoint(ST_Transform(geom_4326, 'EPSG:4326', 'EPSG:25832', always_xy := true), t) ORDER BY t))

ST_Transform is from DuckDB's spatial extension — it does not go through our mutex. Even though coordinate transformation uses PROJ (not GEOS) for the actual math, the spatial extension does initialize its own GEOS handle at load time. In some GEOS versions the legacy global-API (initGEOS) and the handle-based API (initGEOS_r) share internal state, which means a race can still occur between the spatial extension's GEOS instance and MEOS's.

To narrow down the root cause, could you try:

Remove ST_Transform — test with a geometry already in the target SRID to see if the error disappears without the spatial extension in the same pipeline.
Disable DuckDB parallelism — add SET threads=1; before the query. If it stops failing, it is definitely a concurrency issue.
Share the full error stack trace — the exact message ("Uninitialize Geos" vs. "malloc error") and any accompanying backtrace would help pinpoint which GEOS/MEOS code path is failing.

If the error vanishes when ST_Transform is removed (or when threads=1), the remaining fix is to ensure the spatial extension and MobilityDuck share one GEOS context — which requires either upgrading to GEOS's thread-safe handle API within MEOS, or using a process-wide GEOS init guard shared across both extensions. The correct long-term solution is the per-thread MEOS state work tracked in issue #404.

ngochoawindy · 2026-05-05T13:06:13Z

Remove ST_Transform: there's still error: segmentation fault.
Set threads = 1 works with and without ST_Transform
The error without setting threads = 1 now is segmentation fault

SELECT
        mmsi,
        tgeompointSeq(list(tgeompoint(ST_Transform(geom_4326, 'EPSG:4326', 'EPSG:25832', always_xy := true), t) ORDER BY t)) AS trip
      FROM AISInputTarget
      GROUP BY mmsi;
zsh: segmentation fault  ./build/release/duckdb

SELECT
        mmsi,
        tgeompointSeq(list(tgeompoint(geom_4326, t) ORDER BY t)) AS trip
      FROM AISInputTarget
      GROUP BY mmsi;
zsh: segmentation fault  ./build/release/duckdb

estebanzimanyi · 2026-05-07T14:17:25Z

Thank you for investigating this! The GEOS initialization segfault turned out to have a deeper root cause that the mutex approach doesn't fully address.

Root cause

MEOS PR #815 made all mutable MEOS state thread-local (MEOS_TLS): the PROJ context (PJ_CONTEXT), GSL RNGs (MEOS_GENERATION_RNG, MEOS_AGGREGATION_RNG), the error code (MEOS_ERR_NO), and the PROJ/Ways caches. The error handler (MEOS_ERROR_HANDLER) is the only piece that is process-global (written with __ATOMIC_RELEASE).

The initialization in mobilityduck_extension.cpp used std::call_once, which fires exactly once — on whichever thread first touches the extension. DuckDB's other worker threads never call meos_initialize(). With MEOS's TLS design, those threads hit NULL TLS pointers whenever they call any MEOS function → segfault.

Why the mutex doesn't fix it

The global std::mutex in this PR serializes every MEOS call through one thread, which has two problems:

It destroys DuckDB's parallel execution (the whole point of a columnar engine).
It still doesn't call meos_initialize() on the worker threads — the mutex only ensures exclusion, not initialization.

The correct fix

Each DuckDB worker thread must call meos_initialize() before touching any MEOS function. Branch fix/per-thread-meos-init replaces the mutex with a thread_local bool guard:

inline void EnsureMeosInitializedOnThread() {
    thread_local bool initialized = false;
    if (!initialized) {
        meos_initialize();
        meos_initialize_error_handler(&MobilityduckMeosErrorHandler);
        initialized = true;
    }
}

RegisterSerializedScalarFunction wraps every scalar function with a call to EnsureMeosInitializedOnThread() before dispatching — cost is one bool check per call, negligible.

Verified: tgeompointSeq(list(TGEOMPOINT(geom, ts) ORDER BY ts)) with SET threads = 4, 2000 rows / 50 vessels — previously segfaulted, now returns correct results with full parallelism.

Superseding this PR with fix/per-thread-meos-init. Thank you again for surfacing the issue!

estebanzimanyi · 2026-05-07T14:17:29Z

Closing in favour of #fix/per-thread-meos-init — see comment above for full analysis.

Replaces the process-global `std::mutex` (from the now-closed PR #51) that wrapped every MEOS scalar-function call with a per-thread `thread_local bool` guard in `EnsureMeosInitializedOnThread()`. Each DuckDB worker thread calls `meos_initialize()` and `meos_initialize_error_handler()` exactly once on first use; cost is one `bool` check per scalar-function call thereafter — negligible. Full DuckDB parallelism is preserved. Bundles the matching co-commit refinements (3 commits squashed to 1): 1. **Per-thread MEOS init** (was c237f6c) The mutex → thread_local swap. 2. **Cover cast functions** (was 55d4c5a) Cast functions bypass `RegisterSerializedScalarFunction` and would otherwise leave MEOS uninitialised when a cast is the first MEOS call (e.g. `SELECT stbox 'STBOX XT(...)'` before any scalar function). Initialise on the loading thread to cover them. 3. **TZ-neutral test assertions** (was 9dd765a) Make `stbox.test` timestamp assertions timezone-neutral so the test passes under both local and CI session timezones. ## Root cause MEOS [PR #815](MobilityDB/MobilityDB#815) made all mutable MEOS state thread-local (`MEOS_TLS`). Threads that hadn't called `meos_initialize()` had NULL TLS pointers and segfaulted. The mutex was the conservative workaround; per-thread init is the correct fix. Closes #404.

fix error for initializing Geos

c7dba22

estebanzimanyi mentioned this pull request May 1, 2026

Review queue manifest — recommended order #96

Open

estebanzimanyi closed this May 7, 2026

estebanzimanyi mentioned this pull request May 7, 2026

fix(threads): replace global mutex with per-thread MEOS initialization #111

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix error for initializing Geos#51

fix error for initializing Geos#51
ngochoawindy wants to merge 1 commit intomainfrom
geos-init-handle

ngochoawindy commented Apr 29, 2026

Uh oh!

ngochoawindy commented May 4, 2026

Uh oh!

estebanzimanyi commented May 4, 2026

Uh oh!

ngochoawindy commented May 5, 2026

Uh oh!

estebanzimanyi commented May 7, 2026

Uh oh!

estebanzimanyi commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngochoawindy commented Apr 29, 2026

Uh oh!

ngochoawindy commented May 4, 2026

Uh oh!

estebanzimanyi commented May 4, 2026

Uh oh!

ngochoawindy commented May 5, 2026

Uh oh!

estebanzimanyi commented May 7, 2026

Root cause

Why the mutex doesn't fix it

The correct fix

Uh oh!

estebanzimanyi commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants