Skip to content

Building on Google Cloud Platform

Ralf Gommers edited this page May 17, 2019 · 1 revision

tl;dr decently fast, a bit expensive but if you don't have a good build box available locally, this is an option.

Config:

  • 16 vCPUs, 16 GB memory, 1 Tesla P100 GPU, $1.44/hr
  • Intel optimized Deep Learning image: CUDA 10, MKL-DNN
  • 30 GB boot disk (SSD)
  • firewall: no http/https

First SSH login asks to install NVIDIA drivers --> yes There will be some warnings about 32-bit, DRM and X drivers, just ignore, not relevant.

The system has GCC 6.3.0 installed by default, no Clang:

$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

We'll be using Conda compilers, so this can be safely ignored.

Some potential issues related to logging in and working over SSH:

  • SSH keys disappearing, turns out they get overwritten; one needs to add them to the Metadata Server via the Cloud Console instead (see GCP docs)
  • detaching or SSH timeout kills proces:w ses. Best way to work around this is with a terminal multiplexer (tmux or screen).
  • also set in /etc/ssh/ssh_config:
Host *
ServerAliveInterval 120

To install conda, dependencies for PyTorch, and start building PyTorch itself:

sudo apt-get install locate
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-<etc>.sh  # install miniconda
# Now from instructions at https://github.com/pytorch/pytorch#install-dependencies:
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing
conda install -c pytorch magma-cuda100
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
git submodule sync 
git submodule update --init --recursive
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
tmux
time python setup.py develop  # real: 25min, user: 364min, sys: 18min
$ time python test/test_nn.py  # passes, real: 13min, user: 69min, sys: 86min