[Bug] OlapScanner does not propagate query cancellation to S3 I/O, causing scanner thread pool exhaustion

### Search before asking

- [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.

### Version

3.0.6

### What's Wrong

When executing a large partition scan query on internal tables stored on S3/OSS (cloud storage-compute separation mode), if the query times out or is cancelled, the S3 read operations are **not terminated**. This causes scanner threads to remain blocked on S3 I/O, eventually exhausting the scanner thread pool and blocking all subsequent queries and Stream Load jobs.

**Root cause**: `OlapScanner` (internal table scanner) does not propagate query cancellation to the `IOContext` used by the remote I/O layer. Unlike `FileScanner` (external table scanner) which overrides `try_stop()` to set `io_ctx->should_stop = true`, the internal table data scan path has no cancellation check at the remote I/O boundary.

Two blocking boundaries lack cancellation checks:
1. `S3FileReader::read_at_impl()` completely ignores the `io_ctx` parameter (parameter name is commented out: `const IOContext* /*io_ctx*/`). The S3 429 retry loop sleeps with exponential backoff (up to ~7s total) without checking cancellation.
2. `CachedRemoteFileReader::read_at_impl()` does not check cancellation in the FileBlock wait loop (each wait round blocks ~1s by default, up to 10 rounds = ~10s).

Because the data scan path issues **thousands to tens-of-thousands of S3 GET requests per query** (one per data page per column per segment), many scanner threads can become simultaneously stuck in these blocking boundaries after cancellation, leading to thread pool exhaustion.

### Cascade failure chain

```
Large cold scan on S3 → query timeout → cancel signal sent
  → Scanner threads blocked in page-by-page S3 data reads, cannot respond to cancel
  → S3 429 retry loop sleeps up to ~7s per read without checking cancel
  → FileBlock wait loop blocks up to ~10s per read without checking cancel
  → Scanner thread pool exhausted
  → All subsequent queries blocked
  → Stream Load jobs blocked
  → Cluster appears frozen (low CPU, high I/O wait)
```

### How to Reproduce

1. Set up a Doris cluster in cloud storage-compute separation mode (S3/OSS backend)
2. Create a partitioned table with several years of historical data
3. Execute a large scan: `SELECT sum(col) FROM table WHERE dt >= '2021-01-01'`
4. Wait for the query to timeout or manually `KILL QUERY`
5. Immediately try to execute a small query — it will be blocked
6. Check Stream Load — jobs will also be blocked

### Expected Behavior

After query cancellation/timeout, scanner threads should detect the cancellation at the next wait/retry boundary and release back to the thread pool promptly.

### Anything else?

_No response_

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] OlapScanner does not propagate query cancellation to S3 I/O, causing scanner thread pool exhaustion #62230

Search before asking

Version

What's Wrong

Cascade failure chain

How to Reproduce

Expected Behavior

Anything else?

Are you willing to submit PR?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] OlapScanner does not propagate query cancellation to S3 I/O, causing scanner thread pool exhaustion #62230

Description

Search before asking

Version

What's Wrong

Cascade failure chain

How to Reproduce

Expected Behavior

Anything else?

Are you willing to submit PR?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions