Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update compile test case to use larger test system #310

Merged
merged 3 commits into from
Mar 27, 2024

Conversation

hatemhelal
Copy link
Contributor

@hatemhelal hatemhelal commented Jan 30, 2024

This PR follows up the earlier torch.compile support #300 and aims to make the input test a bit more realistic with 64 carbon atoms. Added additional test cases that use the pytest-benchmark plugin to collect timings for different options.

One subtlety/controversial change is that the correctness test (test_mace) now uses torch.testing.assert_allclose as this uses more permissive comparison tolerances than using assert torch.allclose directly.

Measuring the inference time on an A10G:

Time (ms) Speedup vs eager fp64
Eager fp64 65.1 1
Eager fp32 23.4 2.8
compile default fp32 11.17 5.8
reduce-overhead fp32 9.75 6.7
compile default mixed precision 8.84 7.4
max-autotune fp32 6.75 9.6
reduce-overhead mixed precision 4.81 13.5
max-autotune mixed precision 4.25 15.3

@hatemhelal
Copy link
Contributor Author

As a quick experiment I tried torch.autocast to use mixed-precision fp16/fp32 and measured an inference time of 4.28 ms which corresponds to a 15x speedup over eager mode with fp64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants