TensorFlow Distributed GPU
This example demonstrate how to run standard TensorFlow sample (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/dist_test/python/mnist_replica.py) on Azure Batch AI cluster of 2 nodes.
- For demonstration purposes, MNIST dataset and
mnist_replica.pywill be deployed at Azure File Share;
- Standard output of the job will be stored on Azure File Share;
- MNIST dataset (http://yann.lecun.com/exdb/mnist/) is archived and uploaded into the blob.
- The recipe modifies official
mnist_replica.py(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/dist_test/python/mnist_replica.py) to generate model checkpoints and tensorboard event output files.
- Please refer to the official tutorial on distributed tensorflow training
Instructions to Run Recipe
Python Jupyter Notebook
You can find Jupyter Notebook for this recipe in TensorFlow-GPU-Distributed.ipynb.
Azure CLI 2.0
You can find Azure CLI 2.0 instructions for this recipe in cli-instructions.md.
Help or Feedback
If you have any problems or questions, you can reach the Batch AI team at AzureBatchAITrainingPreview@service.microsoft.com or you can create an issue on GitHub.
We also welcome your contributions of additional sample notebooks, scripts, or other examples of working with Batch AI.