[fix](s3) Add S3 rate limit observability#64038
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found coverage gaps in the new S3 throttling observability. The BE object-client path uses the new helper/counters, but the Cloud recycler has parallel S3/Azure object-client paths that still use the old limiter metrics and do not record final 429 failures. This means the new bvars can under-report or completely miss recycler throttling, even though the recycler uses the same retry strategy and is an important object-storage consumer.
Critical checkpoint conclusions:
- Goal/test: The PR adds S3 local limiter and 429 observability, with unit coverage for the holder metrics, but the goal is only partially met because Cloud recycler paths are not covered.
- Scope/focus: The change is generally focused, but parallel object-storage paths were missed.
- Concurrency/lifecycle: New atomic log counters and bvar adders are safe for concurrent updates; no new lock-order risk found.
- Config: The new mutable log interval has validation and is read dynamically where the BE helper is used.
- Compatibility: Bvar renames are user-visible; Cloud still exposes old names, so behavior is inconsistent across binaries.
- Parallel code paths: Cloud recycler S3/Azure clients are parallel paths and need the same accounting.
- Tests: Existing tests cover only the generic holder metric callback, not BE vs Cloud end-to-end metric coverage.
- Observability: Needs fixes below so the new observability is complete.
User focus: No additional user-provided review focus was specified.
| return [&](int64_t ns) { | ||
| inline auto metric_func_factory(bvar::Adder<int64_t>& sleep_ns_bvar, | ||
| bvar::Adder<int64_t>& sleep_count_bvar, | ||
| bvar::Adder<int64_t>* rejected_count_bvar = nullptr) { |
There was a problem hiding this comment.
Making the rejected counter optional leaves an existing S3 limiter path without the new metric. cloud/src/recycler/s3_accessor.cpp still constructs its AccessorRateLimiter with the old two-argument call (get_rate_limit_ns, get_rate_limit_exceed_req_num, etc.), and cloud/src/recycler/s3_obj_client.cpp / azure_obj_client.cpp call AccessorRateLimiter::instance().rate_limiter(op)->add(1) directly. As a result, recycler local limiter rejections still do not increment s3_get_rate_limit_rejected_count / s3_put_rate_limit_rejected_count, and the newly added throttled/rejected logging in apply_s3_rate_limit() is bypassed there. This contradicts the PR's stated metric rename/new rejection metric behavior for S3 local limiting. Please update the Cloud recycler accessor path to use the renamed bvars and pass rejection counters, or route it through a shared helper with the same logging/accounting.
| bvar::Adder<int64_t> s3_request_failed_too_many_requests_count( | ||
| "s3_request_failed_too_many_requests_count"); | ||
|
|
||
| void record_object_request_failed(int http_code) { |
There was a problem hiding this comment.
The new final-429 failure counter is common, but only the BE object clients call record_object_request_failed(). The Cloud recycler also installs S3CustomRetryStrategy / AzureRetryRecordPolicy in cloud/src/recycler/s3_accessor.cpp, so retry 429s from recycler requests are counted, but final 429 failures in cloud/src/recycler/s3_obj_client.cpp and azure_obj_client.cpp just log/return without calling this function. That makes s3_request_failed_too_many_requests_count under-report exactly the recycler failures that still exhausted retries. Please add the same failure recording to the Cloud S3/Azure non-success paths (excluding expected not-found handling) so the retry and failed 429 metrics cover the same object-client paths.
TPC-H: Total hot run time: 29261 ms |
TPC-DS: Total hot run time: 168900 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
Consider supporting recyclers as well? A general framework could be written in doris/common |
e989d8a to
ac67923
Compare
|
/review |
|
run buildall |
|
Addressed the review comments in ac67923.
Validation:
|
There was a problem hiding this comment.
I found one blocking concurrency issue in the new S3 rate-limiter logging path. The existing recycler metric coverage comments are already-known context and appear addressed in this revision, so I did not duplicate them.
Critical checkpoint conclusions:
- Goal/test: the PR adds S3/Azure retry/final-429 metrics and local limiter sleep/rejection metrics/logging, with unit coverage for metric accounting. The goal is mostly met, but the logging implementation introduces an unsafe concurrent access.
- Scope/focus: the change is focused on object-storage retry/rate-limit accounting.
- Concurrency: affected.
reset_s3_rate_limiter()can replaceTokenBucketRateLimiterHolder::rate_limiterunder the write lock while request threads callapply_s3_rate_limit(). The new log code reads holder state afteradd()releases the read lock, creating a race/use-after-free risk. - Lifecycle/static initialization: no additional cross-TU static initialization issue found beyond existing global bvars.
- Configuration:
s3_rate_limiter_log_intervalis added with validation and is read dynamically at call sites. - Compatibility: no storage/protocol compatibility issue found.
- Parallel paths: BE S3/Azure and Cloud recycler S3/Azure paths were reviewed; prior missing Cloud paths are covered in this revision.
- Conditional checks/error handling: expected not-found handling is preserved for the reviewed object-client paths.
- Test coverage: metric unit coverage exists, but no concurrent reset/logging coverage protects the race below.
- Observability/performance: new metrics/logging are useful, but the log state reads must be made concurrency-safe.
- Transactions/persistence/data writes/FE-BE variables: not applicable to this PR.
User focus response: no additional user-provided focus was specified.
| << ", sleep_ms=" << sleep_duration / 1000000 << ", sleep_count=" << count | ||
| << ", token_per_second=" << rate_limiter->get_max_speed() | ||
| << ", bucket_tokens=" << rate_limiter->get_max_burst() | ||
| << ", token_limit=" << rate_limiter->get_limit(); |
There was a problem hiding this comment.
This log path now reads rate_limiter->get_max_speed() / get_max_burst() / get_limit() after TokenBucketRateLimiterHolder::add() has returned and released its shared_lock. Both BE and Cloud expose reset_s3_rate_limiter() paths that take the holder write lock and replace the underlying unique_ptr (rate_limiter = std::make_unique<...>). A request thread that reaches this log branch can therefore race with a dynamic reset and dereference the replaced/freed limiter through these unlocked getters. Please keep the logged configuration snapshot under the same holder lock as add() (or return/snapshot the values from add() while locked) before logging.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29817 ms |
TPC-DS: Total hot run time: 169463 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29718 ms |
TPC-DS: Total hot run time: 169618 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Add observability for S3 local rate limiting and S3 429 responses so throttling-related performance issues can be diagnosed and alerted more directly.
What is changed?
New or renamed bvar metrics
Renamed local S3 GET limiter metrics:
get_rate_limit_ns->s3_get_rate_limit_sleep_nsget_rate_limit_exceed_req_num->s3_get_rate_limit_sleep_countRenamed local S3 PUT limiter metrics:
put_rate_limit_ns->s3_put_rate_limit_sleep_nsput_rate_limit_exceed_req_num->s3_put_rate_limit_sleep_countNew local S3 limiter rejection metrics:
s3_get_rate_limit_rejected_counts3_put_rate_limit_rejected_countNew S3 429 metrics:
s3_request_retry_too_many_requests_counts3_request_failed_too_many_requests_countMetric examples:
Example local limiter logs:
Tests
sh run-cloud-ut.sh --run --filter=s3_rate_limiter_test:*sh run-be-ut.sh --run --filter=S3FileWriterTest.*Note: an earlier BE run used an incorrect lowercase filter (
s3_file_writer_test:*), which matched no suite and started running the full BE suite; it was stopped and replaced by the correctS3FileWriterTest.*run above.