update gcp docs

CoEDL · Nov 29, 2021 · 6895e02 · 6895e02
1 parent 4f6ee9a
commit 6895e02
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 105 deletions.
diff --git a/docs/wiki/install-elpis-on-gcp-gpu.md b/docs/wiki/install-elpis-on-gcp-gpu.md
@@ -1,84 +1,47 @@
-# Install on Google Cloud with GPU
+# Install Elpis on Google Cloud with GPU
 
-## Check quotas
-
-[GPU quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop&folder&organizationId&metric=GPUs%20(all%20regions)&location=GLOBAL)
-
-[all quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop)
+If needed, do the "Setup you account" steps on the [Install Elpis on Google Cloud](install-elpis-on-gcp.md) wiki page. 
 
 
-## Install requirements
+## Create a Virtual Machine 
 
-### CPU
-
-For CPU machines to use Kaldi, we just need to install Docker. Put this code into the VM instance startup script text area. When the machine starts, it will install Docker and download and run Elpis.
-
-```
-sudo apt update
-sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
-curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
-sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
-sudo apt update
-sudo apt install ./containerd.io_1.4.3-1_amd64.deb
-sudo apt install -y docker-ce
-sudo chmod 666 /var/run/docker.sock
-sudo docker run -d --rm -p 80:5001/tcp coedl/elpis:latest
-```
-
-
-### GPU
-
-For GPU, we need to install NVIDIA stuff. Rather than doing this in an install script, start the machine, SSH to it and then install CUDA and Docker.
+The type of machine you can create depends on the quotas you have access to. 
 
+[GPU quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop&folder&organizationId&metric=GPUs%20(all%20regions)&location=GLOBAL)
 
-#### Create a new VM
+[all quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop)
 
-* GPU family
+For a basic machine, use these settings:
+* GPU
 * N1 series
 * n1-standard-16 (16 vCPUs, 60 GB memory)
 * 1 x NVIDIA Tesla T4 (approx $600/month)
 
-* Standard persistent disk Ubuntu 20.04 300GB
+* Standard persistent disk Ubuntu 20.04 approx 300GB
 * Allow http traffic
+* Add `tensorboard` to the `Networking, Disks, Security, Management, Sole-tenancy` > `Networking` > `Network tags` section
+* Add the script below to the `Management` > `Startup scripts` section
 
-Don't use image deploy because this limits OS to container optimised, which prevents use of `--gpus all` docker run flag. To use `--gpus all` flag, we need to install specific version of nvidia drivers, not container optimised.
-
-Here's a command line version.
-```
-gcloud compute instances create instance-name --project=elpis-workshop --zone=us-central1-c --machine-type=n1-standard-16 --network-interface=network-tier=PREMIUM,subnet=default --maintenance-policy=TERMINATE --service-account=XXXXXXXXXXXX-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --accelerator=count=1,type=nvidia-tesla-t4 --tags=http-server --create-disk=auto-delete=yes,boot=yes,device-name=instance-5,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20211102,mode=rw,size=200,type=projects/elpis-workshop/zones/us-central1-c/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any
-```
-
-#### After starting, ssh to the machine
-
-```
-gcloud init
-gcloud auth login
-gcloud config set project elpis-workshop
-gcloud compute instances list
-gcloud compute ssh instance-1
-```
+```shell
+# GPU startup script v0.1
 
+# Check if this has been done before & skip if so
+if [[ -f /etc/startup_installed ]]; then exit 0; fi
 
-#### Install CUDA
 
-From https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#ubuntu-driver-steps
+# Install CUDA
 
-```
 sudo apt install linux-headers-$(uname -r)
 curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
 sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
 sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
 sudo apt update
 sudo apt -y install cuda
-```
 
 
-#### Install Docker
+# Install NVIDIA Container Toolkit
 
-From https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
-
-```
 curl https://get.docker.com | sh \
   && sudo systemctl --now enable docker
 
@@ -89,71 +52,49 @@ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
 sudo apt-get update
 sudo apt-get install -y nvidia-docker2
 sudo systemctl restart docker
-```
+sudo usermod -aG docker $USER
+sudo chown $USER /var/run/docker.sock
+sudo chmod 666 /var/run/docker.sock
+docker pull coedl/elpis:hft
 
-Verify the installation
-```
-sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
-```
+# Handy little app to check NVIDIA GPUs stats
+sudo apt install nvtop
 
-Should give you something like 
-```
-+-----------------------------------------------------------------------------+
-| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
-|-------------------------------+----------------------+----------------------+
-| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
-| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
-|                               |                      |               MIG M. |
-|===============================+======================+======================|
-|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
-| N/A   39C    P0    67W / 149W |      0MiB / 11441MiB |    100%      Default |
-|                               |                      |                  N/A |
-+-------------------------------+----------------------+----------------------+
-
-+-----------------------------------------------------------------------------+
-| Processes:                                                                  |
-|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
-|        ID   ID                                                   Usage      |
-|=============================================================================|
-|  No running processes found                                                 |
-+-----------------------------------------------------------------------------+
-```
+# Get elpis
+cd ~
+git clone https://github.com/CoEDL/elpis.git
 
+# Pull Docker image
+docker pull coedl/elpis:hft
 
-#### Set Docker permissions
+# Make a file which can be detected on next startup and thus skip doing this every time
+touch /etc/startup_installed
 
 ```
-sudo usermod -aG docker $USER
-sudo chown $USER /var/run/docker.sock
-sudo chmod 666 /var/run/docker.sock
-```
 
 
-#### Download/update Elpis
+This startup script will only run the first time the VM starts, to reduce the instance load time on subsequent restarts.
 
-```
-docker run --gpus all --name elpis --rm -it -p 80:5001/tcp coedl/elpis:ben-hft-gpu
-```
 
+Don't use image deploy because this limits OS to container optimised, which prevents use of `--gpus all` docker run flag. To use `--gpus all` flag, we need to install specific version of nvidia drivers, not container optimised.
 
-#### Edit the model # epochs for dev sanity
 
-Get into the container in another window
 
-```
-docker exec -it $(docker ps -q) zsh
-```
 
-Edit the model file, set `DEBUG=True`
+## After starting, ssh to the machine
 
 ```
-vim /elpis/elpis/engines/hftransformers/objects/model.py
+gcloud init
+gcloud auth login
+gcloud config set project elpis-workshop
+gcloud compute instances list
+gcloud compute ssh instance-1
 ```
 
----
+Refer to the [Handy GCP commands](handy-gcp-commands.md) page for some handy scripts.
 
 
-### Optionally, download and share data into the container
+## Optionally, download and share data into the container
 
 This may be helpful if you write a python file to run Elpis in the container and avoid the GUI.
 
@@ -171,6 +112,3 @@ sudo unzip data.zip
 ```
 docker run --gpus all --name elpis -v /na-elpis:/na-elpis --rm -it -p 80:5001/tcp coedl/elpis:ben-hft-gpu
 ```
-
-
-
diff --git a/docs/wiki/install-elpis-on-gcp.md b/docs/wiki/install-elpis-on-gcp.md
@@ -1,4 +1,7 @@
-# Installing Elpis on Google Cloud Platform
+# Install Elpis on Google Cloud for Kaldi
+
+
+## Setup your account
 
 Create an account at [Google Cloud](https://cloud.google.com).
 
@@ -15,6 +18,8 @@ When the project has been created, the console will show the project's Dashboard
 To add a server to the project, open the left side Navigation Menu and select "Compute Engine". Then select "VM Instances". If this is the first time your Google account has used Cloud Platform you may be offered a free trial! If so, go through the process of signing up for it. Otherwise, you may need to add billing details to access VM instances (TODO add more info about that). You will need to enter credit card details during the free trial opt-in process, but you won't be billed unless you turn on Automatic Billing.
 
 
+## Create a Virtual Machine 
+
 Now that your account has free trial or billing set up, the VM instances page should show "Create" and "Import" buttons.
 
 Click "Create"
@@ -33,12 +38,12 @@ Paste the following code into the "Startup Script" box
 sudo apt update
 sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
-curl -O https://download.docker.com/linux/debian/dists/buster/pool/stable/amd64/containerd.io_1.4.3-1_amd64.deb
 sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
 sudo apt update
 sudo apt install ./containerd.io_1.4.3-1_amd64.deb
 sudo apt install -y docker-ce
-sudo docker run -d --rm -p 80:5001/tcp coedl/elpis:stable
+sudo chmod 666 /var/run/docker.sock
+sudo docker run -d --rm -p 80:5001/tcp coedl/elpis:latest
 ```
 
 Then press "Create"