Skip to content

Conversation

@lotif
Copy link
Collaborator

@lotif lotif commented Jan 15, 2024

PR Type

Other

Short Description

Clickup Ticket(s): https://app.clickup.com/t/8686mur37

  • Passing a random seed to FedProx, APFL and SCAFFOLD
  • Making them save the metrics to a file at the end of their execution
  • Adding assertions to their metrics values in the smoke tests

Tests Added

The smoke tests themselves.

@lotif lotif changed the title Fix seed Smoke tests: Fix random seed on smoke tests and add asserts on results Jan 16, 2024
@lotif lotif marked this pull request as ready for review January 16, 2024 17:47
@lotif lotif requested a review from fatemetkl January 16, 2024 17:48
continue

metrics_found = True
_assert_metrics_dict(metrics_to_assert, metrics)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, so correct me if I'm wrong, but this looks like we're going to compare any metrics json dumped by a client to a single metrics_to_assert dictionary. For some of our example, that's definitely fine, because the different clients are loading the same dataset and running the same training cycle. However, there are instances where we might want to test when that isn't the case. That is, each client loads a distinct dataset. So their metrics won't be the same as each other.

Again, correct me if that interpretation is wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, but I think the solution might be somewhere else. I think for that use case we could pass in different dictionaries for each client, giving each a client name or id, and then compare them appropriately when pulling their json files. Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that makes sense. We can do another smoke test where two clients load slightly different datasets and include it as a separate test.

Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks really great. Some fairly minor comments. The only one of moderate significance is the assumption that each client will produce matching metrics. In the general case, they won't. So we probably want that flexibility built into the smoke test infra.

@emersodb emersodb self-requested a review January 19, 2024 20:35
@lotif lotif merged commit 5229a88 into main Jan 19, 2024
@lotif lotif deleted the fix-seed branch January 19, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants