Parallelization with PyTorch on multi GPU machines (tested on Google Cloud GPU machines) This is an assortment of utilities/scripts
- Scripts to setup a Google cloud machine with one or more GPUs (to procure a GPU machine see https://medium.com/@ajitrajasekharan/setting-up-a-machine-with-gpu-s-in-google-cloud-step-by-step-instructions-c6aa1086d8f9 for instructions)
- Pytorch installation steps/scripts
- Test utility to check multi gpu execution
After procuring a GPU machine (see https://medium.com/@ajitrajasekharan/setting-up-a-machine-with-gpu-s-in-google-cloud-step-by-step-instructions-c6aa1086d8f9 for instructions)
- Run first.sh - this will install basic utilities for next steps
- Run second.sh - Follow instructions in displayed link to get drivers for Nvidia. Install for Ubuntu 16.04 is provided in this repository (in second.sh - commented by default)
- Confirm proper installation using nvidia_smi
- Run third.sh - this will install anaconda, pytorch.
- conda activate bert
- python multi_gpu.py
note the batch of 30 inputs is spread across 8 GPUs - 7 GPUs get 4 inputs and the last gets 2 (7*4 + 2 = 30).
- The multigpu test (multi_gpu.py) is a near verbatim extraction from the PyTorch tutorial https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html
- Tensorflow test is near verbatim extraction from the blog https://jhui.github.io/2017/03/07/TensorFlow-GPU/
MIT License