New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build NVIDIA-Docker image with proper PyTorch version to run DARTS code #10
Comments
Here're the complete steps to install NVIDIA-Docker on Ubuntu-18.04, AWS 1. Install CUDA driver Get the relatively new sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt-get update
sudo apt-get install -y nvidia-driver-430 nvidia-modprobe Test installation: $ nvidia-smi
Mon Oct 7 18:43:14 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 47C P0 59W / 149W | 0MiB / 11441MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ 2. Install the standard Docker Follow https://docs.docker.com/install/linux/docker-ce/ubuntu/ sudo apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
gnupg-agent \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
sudo groupadd docker
sudo usermod -aG docker $USER # allow running docker without sudo, need to re-login Test installation: $ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly. 3. Install NVIDIA-Docker Follow https://github.com/NVIDIA/nvidia-docker#quickstart distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker Test installation (https://github.com/NVIDIA/nvidia-docker#usage): $ docker run --gpus all nvidia/cuda:9.0-base nvidia-smi
Mon Oct 7 18:46:13 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:1E.0 Off | 0 |
| N/A 52C P0 69W / 149W | 0MiB / 11441MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+ |
The easiest way to get PyTorch-GPU image is from NVIDIA NGC. The image contains a lot of stuff including JupyterLab and TensorBoard (see release notes)
Inside the container, try: $ python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
1.2.0a0+afb7a16 True It is the latest |
Follow the README in docker/install-nvidia-docker and docker/darts-pytorch-image. Everyone should be able to run the default script and get the expected result:
|
@dylanrandle Here's how to run DARTS on graphene data within the container: # get data and source code
mkdir data
wget https://capstone2019-google.s3.amazonaws.com/graphene_processed.nc -P ./data/
git clone https://github.com/capstone2019-neuralsearch/darts.git
# run training
docker run --rm -it --gpus all -v $(pwd):/workdir/host_files darts-pytorch
cd host_files
python3 darts/cnn/train_search.py --data ./data/ --dataset graphene |
This is absolutely awesome. Brilliant! |
I changed
See 2172d1c |
The best way to run the DARTS scripts in production is probably via nvidia-docker. It is important to freeze the environment as the DARTS code requires
PyTorch == 0.3.1, torchvision == 0.2.0
. Newer pytorch version crashes for various reasons.The same container image can run on
The text was updated successfully, but these errors were encountered: