Cleaner concurrency model

Related to #26, #29, https://github.com/developmentseed/lazycogs/issues/30 

If I understand correctly:

- `open` is fully synchronous because it's fully based on DuckDB, which is only synchronous. _Ideally_ this part could be also async, because loading the Stac-GeoParquet file is network-IO, but this is less important anyways because `open` is relatively fast
- Chunk reads currently expose only a synchronous API because it only implements the synchronous Xarray API (see #29). But under the hood a _large part of this process_ is actually async via async-geotiff. 

My opinion is that as a general principle, it's good to compose async and sync functionality as separate API calls so that the downstream user can decide when and how to manage concurrency between IO-bound calls and CPU-bound calls. This is why `async-tiff` has a separation between the tile fetch (async network bound) and the decompression (CPU bound).

Claude suggests there are these multitude of sync/async boundary crossings:
```
MultiBandStacBackendArray.__getitem__       SYNC   ← xarray BackendArray protocol
└── _raw_getitem                            SYNC
    ├── (resolve indexers, build _ChunkReadPlan)            sync, pure CPU
    └── _run_coroutine(_run_mosaic_all_dates(...))          sync→async bridge
        └── _run_mosaic_all_dates             ASYNC
            └── asyncio.gather(_run_one_date per time step)
                └── _run_one_date             ASYNC
                    ├── duckdb_client.search                SYNC, blocking — guarded by threading.Lock
                    └── async_mosaic_chunk    ASYNC
                        ├── per-item: async-geotiff open    ASYNC (network I/O)
                        ├── per-item: async-geotiff read    ASYNC (network I/O)
                        ├── per-item: reproject+mosaic      SYNC, dispatched via run_in_executor
                        └── pixel-selection mosaic logic    SYNC, in-line
```

Claude suggests

> **Infra — DuckDB-inside-async**: even if you exposed `async def read_chunk(...)`, it would still block the event loop during DuckDB calls. A truly async API should at minimum `await loop.run_in_executor(None, duckdb_client.search, ...)` so the loop yields. (Trivial change, big honesty improvement.)

Is there prior work about duckdb threading? can we put each duckdb chunk query into a separate thread? Claude says:

> ## DuckDB threading
> 
> DuckDB's official thread-safety model:
> 
> - **Connections are thread-safe with internal serialization** — operations on one connection block each other (single mutex), but the connection won't crash under concurrent access. DuckDB has documented this since 0.5.x.
> - **Multiple connections to the same DB file run in parallel** — that's the official pattern for parallel Python workloads. (`duckdb.connect(...)` once per thread, or `connection.cursor()` for cheap parallel transaction contexts on the same DB.)
> - **A single query uses multiple threads internally** via DuckDB's parallel executor (`SET threads = N`).
> 
> So the answer to "can we put each duckdb chunk query into a separate thread" is yes, and there are two flavors:
> 
> 1. **Cheap fix:** wrap the existing single client's `.search()` call in `await loop.run_in_executor(None, lambda: client.search(...))`. This doesn't get parallelism — the existing `threading.Lock` (or DuckDB's internal mutex) still serializes the queries — but it does **release the event loop** so other coroutines (COG opens, reads, reprojections for already-resolved time steps) can progress. Honesty improvement, no parallelism gain.
> 
> 2. **Real parallelism:** a thread-local or per-time-step `DuckdbClient` pool. Each thread calls `DuckdbClient()` once and reuses it; queries from different worker threads run truly in parallel (DuckDB's process-level engine handles concurrent connections fine). This requires checking what `rustac.DuckdbClient` actually does internally — `dir()` shows only sync methods (`search`, `search_to_arrow`, `execute`, `query_to_table`, `get_collections`), no `cursor()` or per-connection clone. **Worth a quick question to the rustac maintainers**: does each `DuckdbClient` instance hold its own connection, and is it safe to construct N of them for the same parquet file? If yes, the path is clear.
> 
> A useful data point for the issue: it's worth measuring how much wall-time the per-chunk DuckDB queries actually account for. There's already a `logger.debug` trace for this in `_backend.py:319-325`. If DuckDB is <5% of chunk time on typical workloads, fix #1 alone is the right call (cheap, honest, no infra). If it's 20%+, fix #2 starts paying for itself.
> 
> One thing to verify before posting: `_search_items` calls `client.search(...)` directly inside the coroutine (`_backend.py:357-368`), under a `threading.Lock`. The lock is held across the sync call, so even `loop.run_in_executor` wouldn't gain parallelism without removing it. That doesn't change the conclusion (loop is blocked → fix #1 is right) but it makes the issue tighter to cite the line numbers.


Ideally I feel like there should be an API split between the sync parts and the async parts, but the problem is that if we shoehorn this into xarray's API, it doesn't allow for that split. The xarray API is _either_ sync or async.

Perhaps this is a suggestion for lazycogs to have a high-level xarray API but another low-level API with more control? I'm not sure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaner concurrency model #31

DuckDB threading

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cleaner concurrency model #31

Description

DuckDB threading

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions