Skip to content

Conversation

justinjung04
Copy link
Contributor

@justinjung04 justinjung04 commented Aug 26, 2025

What this PR does:

isRetryableError from blocks_store_queryable checks if the error chain includes limiter.ResourceLimitReachedError struct:

func isRetryableError(err error) bool {
	// retry upon resource exhaustion error from resource monitor
	var resourceExhaustedErr *limiter.ResourceLimitReachedError
	if errors.As(err, &resourceExhaustedErr) {
		return true
	}

https://github.com/cortexproject/cortex/blob/master/pkg/querier/blocks_store_queryable.go#L1206-L1211

However, resource based limiter was not returning that error struct, hence this check was returning false.

Since the goal of resource based throttling is to protect subset of pods with high resource utilization, we do want to retry them on other pods.

Which issue(s) this PR fixes:
n/a

Testing:

Before the fix:

=== RUN   TestBlocksStoreQuerier_ShouldRetryResourceBasedThrottlingError
    blocks_store_queryable_test.go:2516: 
        	Error Trace:	/Users/jungjust/workplace/cortex/pkg/querier/blocks_store_queryable_test.go:2516
        	Error:      	Should be true
        	Test:       	TestBlocksStoreQuerier_ShouldRetryResourceBasedThrottlingError
--- FAIL: TestBlocksStoreQuerier_ShouldRetryResourceBasedThrottlingError (0.00s)

After the fix:

=== RUN   TestBlocksStoreQuerier_ShouldRetryResourceBasedThrottlingError
--- PASS: TestBlocksStoreQuerier_ShouldRetryResourceBasedThrottlingError (0.00s)

Checklist

  • Tests updated
  • [n/a] Documentation added
  • [n/a] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

I didn't add changelog as this is a minor change in an experimental feature.

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <jungjust@amazon.com>
@justinjung04 justinjung04 marked this pull request as ready for review August 26, 2025 17:25
Copy link
Contributor

@rajagopalanand rajagopalanand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@erlan-z erlan-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielblando danielblando merged commit 72d9b92 into cortexproject:master Aug 26, 2025
18 checks passed
eeldaly pushed a commit to eeldaly/cortex that referenced this pull request Sep 11, 2025
…ct#6992)

* Fix error from resource based throttling to be retryable

Signed-off-by: Justin Jung <jungjust@amazon.com>

* lint

Signed-off-by: Justin Jung <jungjust@amazon.com>

---------

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Essam Eldaly <eeldaly@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants