You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tests written for horovod strategy, might be outdated as they were mostly written ~2 years back.
A lot of tests just check if the horovod run finished without any errors. The test purpose, however, may not be satisfied.
accelerator="auto" can be used, wherever possible to avoid separate tests for cpu and gpu devices.
For some tests, we don't need to pass default_root_dir and weights_save_path like arguments, and just pass the relevant arg for that test (like gradient_clip_algorithm).
Motivation
While working on #11911, @carmocca explained how these tests can be refactored, and creating an issue to rethink the strategy for testing horovod is a good idea.
Pitch
Welcoming comments and discussions on this one.
An example:
test_horovod_cpu_clip_grad_by_value just tests if the horovod run finished without any errors. It doesn't check if gradient_clip_val was correctly used. We can avoid running the process, and instead just check if gradient_clip_val served its purpose.
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
Proposed refactor
Tests written for
horovod
strategy, might be outdated as they were mostly written ~2 years back.horovod
run finished without any errors. The test purpose, however, may not be satisfied.accelerator="auto"
can be used, wherever possible to avoid separate tests forcpu
andgpu
devices.default_root_dir
andweights_save_path
like arguments, and just pass the relevant arg for that test (likegradient_clip_algorithm
).Motivation
While working on #11911, @carmocca explained how these tests can be refactored, and creating an issue to rethink the strategy for testing
horovod
is a good idea.Pitch
Welcoming comments and discussions on this one.
An example:
test_horovod_cpu_clip_grad_by_value
just tests if thehorovod
run finished without any errors. It doesn't check ifgradient_clip_val
was correctly used. We can avoid running the process, and instead just check ifgradient_clip_val
served its purpose.If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @carmocca @ananthsub @justusschock @awaelchli @rohitgr7 @Borda @akihironitta @kaushikb11
The text was updated successfully, but these errors were encountered: