-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return data fetched from a subset of store-gateways instead of returning error if a single store-gateway fails #4532
Return data fetched from a subset of store-gateways instead of returning error if a single store-gateway fails #4532
Conversation
88550a2
to
e45c8e3
Compare
492feae
to
412eb6a
Compare
b09eb12
to
a787055
Compare
Hi, |
@@ -654,12 +659,14 @@ func (q *blocksStoreQuerier) fetchSeriesFromStores( | |||
if h := resp.GetHints(); h != nil { | |||
hints := hintspb.SeriesResponseHints{} | |||
if err := types.UnmarshalAny(h, &hints); err != nil { | |||
return errors.Wrapf(err, "failed to unmarshal series hints from %s", c.RemoteAddress()) | |||
level.Warn(spanLog).Log("err", errors.Wrapf(err, "failed to unmarshal series hints from %s", c.RemoteAddress())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -590,6 +590,9 @@ func (q *blocksStoreQuerier) fetchSeriesFromStores( | |||
// TODO(goutham): we should ideally be passing the hints down to the storage layer | |||
// and let the TSDB return us data with no chunks as in prometheus#8050. | |||
// But this is an acceptable workaround for now. | |||
|
|||
// Only fail the function if we have validation error. We should return blocks that were successfully |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also fail the function when store-gateways return a limit error. It'll be good to avoid unnecessary retries in that case.
3aea3af
to
8ebebae
Compare
Nice! LGTM |
@@ -595,6 +602,10 @@ func (q *blocksStoreQuerier) fetchSeriesFromStores( | |||
|
|||
stream, err := c.Series(gCtx, req) | |||
if err != nil { | |||
if isRetryableError(err) { | |||
level.Warn(spanLog).Log("err", errors.Wrapf(err, "failed to fetch series from %s", c.RemoteAddress())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it clear that this is a retriable error in the log message? Something like "failed to fetch series from %s due to retriable error".
…ing error if a single store-gateway fails Signed-off-by: Roy Chiang <roychi@amazon.com>
Signed-off-by: Roy Chiang <roychi@amazon.com>
Signed-off-by: Roy Chiang <roychi@amazon.com>
Signed-off-by: Roy Chiang <roychi@amazon.com>
Signed-off-by: Roy Chiang <roychi@amazon.com>
Signed-off-by: Roy Chiang <roychi@amazon.com>
546b41c
to
d1f10b2
Compare
What this PR does:
Previously, if a single store-gateway fails to return data, the whole function fails. This causes queries to fail without reattempting other available store-gateways.
Now, the function responsible for fetching data returns whatever was retrieved, and relies on the caller to determine whether all the necessary blocks were gathered.
Which issue(s) this PR fixes:
Fixes #4529
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]