Search before asking
Version
3.0.6
What's Wrong
When executing a large partition scan query on internal tables stored on S3/OSS (cloud storage-compute separation mode), if the query times out or is cancelled, the S3 read operations are not terminated. This causes scanner threads to remain blocked on S3 I/O, eventually exhausting the scanner thread pool and blocking all subsequent queries and Stream Load jobs.
Root cause: OlapScanner (internal table scanner) does not propagate query cancellation to the IOContext used by the remote I/O layer. Unlike FileScanner (external table scanner) which overrides try_stop() to set io_ctx->should_stop = true, the internal table data scan path has no cancellation check at the remote I/O boundary.
Two blocking boundaries lack cancellation checks:
S3FileReader::read_at_impl() completely ignores the io_ctx parameter (parameter name is commented out: const IOContext* /*io_ctx*/). The S3 429 retry loop sleeps with exponential backoff (up to ~7s total) without checking cancellation.
CachedRemoteFileReader::read_at_impl() does not check cancellation in the FileBlock wait loop (each wait round blocks ~1s by default, up to 10 rounds = ~10s).
Because the data scan path issues thousands to tens-of-thousands of S3 GET requests per query (one per data page per column per segment), many scanner threads can become simultaneously stuck in these blocking boundaries after cancellation, leading to thread pool exhaustion.
Cascade failure chain
Large cold scan on S3 → query timeout → cancel signal sent
→ Scanner threads blocked in page-by-page S3 data reads, cannot respond to cancel
→ S3 429 retry loop sleeps up to ~7s per read without checking cancel
→ FileBlock wait loop blocks up to ~10s per read without checking cancel
→ Scanner thread pool exhausted
→ All subsequent queries blocked
→ Stream Load jobs blocked
→ Cluster appears frozen (low CPU, high I/O wait)
How to Reproduce
- Set up a Doris cluster in cloud storage-compute separation mode (S3/OSS backend)
- Create a partitioned table with several years of historical data
- Execute a large scan:
SELECT sum(col) FROM table WHERE dt >= '2021-01-01'
- Wait for the query to timeout or manually
KILL QUERY
- Immediately try to execute a small query — it will be blocked
- Check Stream Load — jobs will also be blocked
Expected Behavior
After query cancellation/timeout, scanner threads should detect the cancellation at the next wait/retry boundary and release back to the thread pool promptly.
Anything else?
No response
Are you willing to submit PR?
Search before asking
Version
3.0.6
What's Wrong
When executing a large partition scan query on internal tables stored on S3/OSS (cloud storage-compute separation mode), if the query times out or is cancelled, the S3 read operations are not terminated. This causes scanner threads to remain blocked on S3 I/O, eventually exhausting the scanner thread pool and blocking all subsequent queries and Stream Load jobs.
Root cause:
OlapScanner(internal table scanner) does not propagate query cancellation to theIOContextused by the remote I/O layer. UnlikeFileScanner(external table scanner) which overridestry_stop()to setio_ctx->should_stop = true, the internal table data scan path has no cancellation check at the remote I/O boundary.Two blocking boundaries lack cancellation checks:
S3FileReader::read_at_impl()completely ignores theio_ctxparameter (parameter name is commented out:const IOContext* /*io_ctx*/). The S3 429 retry loop sleeps with exponential backoff (up to ~7s total) without checking cancellation.CachedRemoteFileReader::read_at_impl()does not check cancellation in the FileBlock wait loop (each wait round blocks ~1s by default, up to 10 rounds = ~10s).Because the data scan path issues thousands to tens-of-thousands of S3 GET requests per query (one per data page per column per segment), many scanner threads can become simultaneously stuck in these blocking boundaries after cancellation, leading to thread pool exhaustion.
Cascade failure chain
How to Reproduce
SELECT sum(col) FROM table WHERE dt >= '2021-01-01'KILL QUERYExpected Behavior
After query cancellation/timeout, scanner threads should detect the cancellation at the next wait/retry boundary and release back to the thread pool promptly.
Anything else?
No response
Are you willing to submit PR?