-
-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fail hard on ES shard failure #23724
Conversation
this seems to happen only very rarely anyways, but returning incomplete results seems worse in most cases than just getting an error
corehq/elastic.py
Outdated
@@ -536,3 +540,6 @@ def report_shard_failures(search_result): | |||
|
|||
if search_result.get('_shards', {}).get('failed'): | |||
datadog_counter('commcare.es.partial_results', value=1) | |||
# Example message: | |||
# "_shards: {'successful': 2, 'failed': 0, 'total': 2}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: except the failed
value should never be zero if we get here
corehq/elastic.py
Outdated
@@ -536,3 +540,6 @@ def report_shard_failures(search_result): | |||
|
|||
if search_result.get('_shards', {}).get('failed'): | |||
datadog_counter('commcare.es.partial_results', value=1) | |||
# Example message: | |||
# "_shards: {'successful': 2, 'failed': 0, 'total': 2}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function (and docstring) is no longer well named as it now raises an exception.
Would it be useful to also include the query in the error message so we can figure out what happened? Or is that included in the stacktrace already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ES requests show up in the sentry breadcrumbs. I've confirmed that's the case for some ESError 500 sentry events, so I'm fairly sure about this
this seems to happen only very rarely anyways, but returning incomplete results seems
worse in most cases than just getting an error