Skip to content

Conversation

@awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Feb 28, 2023

What does this PR do?

Rewrite of all parity tests. Cleaner and easier to read.

  • Asserts the final weights are equal (as before)
  • Asserts the iteration time overhead (we allow ~1ms difference)
  • DDP and mixed precision tested as well
  • Asserts that peak memory usage is no bigger than that of torch
  • Standardized model interface so it can be switched out / parameterized later. We may want to test larger models as well with smaller tolerances (and fewer iteration steps) in the future.

Since these tests don't run on an isolated system, there will be noise from other tasks that can lead to flakiness in the assertions for timings we make. To mitigate this, we rerun the test up to 3 times if it fails.
Each test runs for about 8 seconds, 1000 training iterations. This is to ensure we get a robust measurement.

cc @carmocca @justusschock @awaelchli @Borda

@mergify mergify bot added the has conflicts label Mar 3, 2023
@mergify mergify bot removed the has conflicts label Mar 3, 2023
@mergify mergify bot added the ready PRs ready to be merged label Mar 4, 2023
@awaelchli awaelchli requested a review from carmocca March 6, 2023 12:55
@awaelchli awaelchli requested a review from carmocca March 6, 2023 15:04
@awaelchli awaelchli enabled auto-merge (squash) March 6, 2023 17:18
@awaelchli awaelchli merged commit 3e04353 into master Mar 6, 2023
@awaelchli awaelchli deleted the fabric/framework-overhead branch March 6, 2023 20:19
@Borda Borda mentioned this pull request Mar 23, 2023
7 tasks
Borda pushed a commit that referenced this pull request Mar 30, 2023
Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>

(cherry picked from commit 3e04353)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fabric lightning.fabric.Fabric performance ready PRs ready to be merged tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants