-
Notifications
You must be signed in to change notification settings - Fork 16
Smoke tests: Fix random seed on smoke tests and add asserts on results #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| continue | ||
|
|
||
| metrics_found = True | ||
| _assert_metrics_dict(metrics_to_assert, metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, so correct me if I'm wrong, but this looks like we're going to compare any metrics json dumped by a client to a single metrics_to_assert dictionary. For some of our example, that's definitely fine, because the different clients are loading the same dataset and running the same training cycle. However, there are instances where we might want to test when that isn't the case. That is, each client loads a distinct dataset. So their metrics won't be the same as each other.
Again, correct me if that interpretation is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, but I think the solution might be somewhere else. I think for that use case we could pass in different dictionaries for each client, giving each a client name or id, and then compare them appropriately when pulling their json files. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that makes sense. We can do another smoke test where two clients load slightly different datasets and include it as a separate test.
emersodb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks really great. Some fairly minor comments. The only one of moderate significance is the assumption that each client will produce matching metrics. In the general case, they won't. So we probably want that flexibility built into the smoke test infra.
PR Type
Other
Short Description
Clickup Ticket(s): https://app.clickup.com/t/8686mur37
Tests Added
The smoke tests themselves.