Find file History
Kevin Wang
Latest commit ca32010 Jul 9, 2018

Readme.md

Horovod

Horovod is a distributed training framework for TensorFlow. The goal of Horovod is to make distributed Deep Learning fast and easy to use.

See official Horovod GitHub page.

Horovod-Tensorflow

This Horovod recipe contains information on how to run Horovod distributed training job for Tensorflow on a GPU cluster with Batch AI.

Horovod-PyTorch

This Horovod recipe contains information on how to run Horovod distributed training job for PyTorch on a GPU cluster with Batch AI.

Horovod-Infiniband-Benchmark

This Horovod-Infiniband-Benchmark recipe contains information on how to reproduce Horovod distributed training benchmarks with infiniband support using Batch AI.

Help or Feedback


If you have any problems or questions, you can reach the Batch AI team at AzureBatchAITrainingPreview@service.microsoft.com or you can create an issue on GitHub.

We also welcome your contributions of additional sample notebooks, scripts, or other examples of working with Batch AI.