New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise an exception if any indexer workers fail their health check #3726
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. I've shared some feedback on the pook usage to give hints for how to make the tests a bit simpler and leverage pook's abilities better (namely, by adding the expected body to the matcher, rather than manually checking it after the fact), but none of them are blockers, especially as the reviews on this PR have been delayed anyway.
for worker, mock_post, (start_id, end_id) in zip( | ||
workers, post_mocks, expected_ranges | ||
): | ||
assert mock_post.calls > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion to use mock_post.matched
(which itself does .calls > 0
), or probably better, mock_post.done
to ensure it was matched the expected number of times (in this case, once). matched
and calls > 0
just checks if it was matched at all, not that it was matched the expected number of times. In this case it is 1, so there's no real difference, but semantically, and when considering the approach to take for other tests that may have multi-use mocks, .done
is probably a more consistent and reliable assertion.
ingestion_server/test/unit_tests/test_distributed_reindex_scheduler.py
Outdated
Show resolved
Hide resolved
assert data == { | ||
"model_name": "sample_model", | ||
"table_name": "sample_table", | ||
"target_index": "sample_index", | ||
"start_id": start_id, | ||
"end_id": end_id, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than pulling the request data out of the pook mock, it's better to tell pook that's what to match in the first place:
pook.post(...).json({
"model_name": "sample_model",
"table_name": "sample_table",
"target_index": "sample_index",
"start_id": start_id,
"end_id": end_id,
})
If pook sees a request that doesn't match the expected body, it will raise a no match exception, with a detailed diff showing why the actual request didn't match the expected request.
This also lets you avoid needing to think about pook's internals when pulling out the matched request data manually like this, and would remove the need for this complex secondary loop, which could be changed to just [assert mock.done for mock in post_mocks]
or, better yet, pook.done
, which checks that all mocks registered in context are exhausted.
Co-authored-by: sarayourfriend <24264157+sarayourfriend@users.noreply.github.com>
Thanks so much for the idiomatic pook tips! I've applied those, hopefully they look okay :) |
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @obulat Excluding weekend1 days, this PR was ready for review 5 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @AetherUnbound, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT!
"start_id": start_id, | ||
"end_id": end_id, | ||
} | ||
# Pook will raise an exception here if any requests don't match the above |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does pook raise the exception? I think it just returns the boolean 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, you're right - I'll reword this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI passes and the code looks good.
if failures: | ||
raise ValueError( | ||
f"Some workers didn't respond to health check: {','.join(failures)}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious, do you know what value will failures
have?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the worker URLs that were unable to be reached, see line 58 above!
Fixes
Fixes #2708 by @AetherUnbound
Description
This PR adds some exception handling to the case where the indexer workers fail their healthchecks. Those exceptions are now raised appropriately in the ingestion server and will be alerted to Slack along with other exceptions.
Testing Instructions
I added some unit tests for these cases - this is my first time using
pook
and I'm basing it off of other tests in the ingestion server suite, so let me know if I'm doing something that isn't idiomatic!CI passing should be a positive indicator.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin