Skip to content

Commit

Permalink
update gcp docs
Browse files Browse the repository at this point in the history
  • Loading branch information
benfoley committed Nov 29, 2021
1 parent 4f6ee9a commit 6895e02
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 105 deletions.
142 changes: 40 additions & 102 deletions docs/wiki/install-elpis-on-gcp-gpu.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,47 @@
# Install on Google Cloud with GPU
# Install Elpis on Google Cloud with GPU

## Check quotas

[GPU quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop&folder&organizationId&metric=GPUs%20(all%20regions)&location=GLOBAL)

[all quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop)
If needed, do the "Setup you account" steps on the [Install Elpis on Google Cloud](install-elpis-on-gcp.md) wiki page.


## Install requirements
## Create a Virtual Machine

### CPU

For CPU machines to use Kaldi, we just need to install Docker. Put this code into the VM instance startup script text area. When the machine starts, it will install Docker and download and run Elpis.

```
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
sudo apt install ./containerd.io_1.4.3-1_amd64.deb
sudo apt install -y docker-ce
sudo chmod 666 /var/run/docker.sock
sudo docker run -d --rm -p 80:5001/tcp coedl/elpis:latest
```


### GPU

For GPU, we need to install NVIDIA stuff. Rather than doing this in an install script, start the machine, SSH to it and then install CUDA and Docker.
The type of machine you can create depends on the quotas you have access to.

[GPU quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop&folder&organizationId&metric=GPUs%20(all%20regions)&location=GLOBAL)

#### Create a new VM
[all quotas](https://console.cloud.google.com/iam-admin/quotas?authuser=2&project=elpis-workshop)

* GPU family
For a basic machine, use these settings:
* GPU
* N1 series
* n1-standard-16 (16 vCPUs, 60 GB memory)
* 1 x NVIDIA Tesla T4 (approx $600/month)

* Standard persistent disk Ubuntu 20.04 300GB
* Standard persistent disk Ubuntu 20.04 approx 300GB
* Allow http traffic
* Add `tensorboard` to the `Networking, Disks, Security, Management, Sole-tenancy` > `Networking` > `Network tags` section
* Add the script below to the `Management` > `Startup scripts` section

Don't use image deploy because this limits OS to container optimised, which prevents use of `--gpus all` docker run flag. To use `--gpus all` flag, we need to install specific version of nvidia drivers, not container optimised.

Here's a command line version.
```
gcloud compute instances create instance-name --project=elpis-workshop --zone=us-central1-c --machine-type=n1-standard-16 --network-interface=network-tier=PREMIUM,subnet=default --maintenance-policy=TERMINATE --service-account=XXXXXXXXXXXX-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --accelerator=count=1,type=nvidia-tesla-t4 --tags=http-server --create-disk=auto-delete=yes,boot=yes,device-name=instance-5,image=projects/ubuntu-os-cloud/global/images/ubuntu-2004-focal-v20211102,mode=rw,size=200,type=projects/elpis-workshop/zones/us-central1-c/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --reservation-affinity=any
```

#### After starting, ssh to the machine

```
gcloud init
gcloud auth login
gcloud config set project elpis-workshop
gcloud compute instances list
gcloud compute ssh instance-1
```
```shell
# GPU startup script v0.1

# Check if this has been done before & skip if so
if [[ -f /etc/startup_installed ]]; then exit 0; fi

#### Install CUDA

From https://cloud.google.com/compute/docs/gpus/install-drivers-gpu#ubuntu-driver-steps
# Install CUDA

```
sudo apt install linux-headers-$(uname -r)
curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt update
sudo apt -y install cuda
```


#### Install Docker
# Install NVIDIA Container Toolkit

From https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

```
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker

Expand All @@ -89,71 +52,49 @@ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
```
sudo usermod -aG docker $USER
sudo chown $USER /var/run/docker.sock
sudo chmod 666 /var/run/docker.sock
docker pull coedl/elpis:hft

Verify the installation
```
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
# Handy little app to check NVIDIA GPUs stats
sudo apt install nvtop

Should give you something like
```
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 39C P0 67W / 149W | 0MiB / 11441MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
```
# Get elpis
cd ~
git clone https://github.com/CoEDL/elpis.git

# Pull Docker image
docker pull coedl/elpis:hft

#### Set Docker permissions
# Make a file which can be detected on next startup and thus skip doing this every time
touch /etc/startup_installed

```
sudo usermod -aG docker $USER
sudo chown $USER /var/run/docker.sock
sudo chmod 666 /var/run/docker.sock
```


#### Download/update Elpis
This startup script will only run the first time the VM starts, to reduce the instance load time on subsequent restarts.

```
docker run --gpus all --name elpis --rm -it -p 80:5001/tcp coedl/elpis:ben-hft-gpu
```

Don't use image deploy because this limits OS to container optimised, which prevents use of `--gpus all` docker run flag. To use `--gpus all` flag, we need to install specific version of nvidia drivers, not container optimised.

#### Edit the model # epochs for dev sanity

Get into the container in another window

```
docker exec -it $(docker ps -q) zsh
```

Edit the model file, set `DEBUG=True`
## After starting, ssh to the machine

```
vim /elpis/elpis/engines/hftransformers/objects/model.py
gcloud init
gcloud auth login
gcloud config set project elpis-workshop
gcloud compute instances list
gcloud compute ssh instance-1
```

---
Refer to the [Handy GCP commands](handy-gcp-commands.md) page for some handy scripts.


### Optionally, download and share data into the container
## Optionally, download and share data into the container

This may be helpful if you write a python file to run Elpis in the container and avoid the GUI.

Expand All @@ -171,6 +112,3 @@ sudo unzip data.zip
```
docker run --gpus all --name elpis -v /na-elpis:/na-elpis --rm -it -p 80:5001/tcp coedl/elpis:ben-hft-gpu
```



11 changes: 8 additions & 3 deletions docs/wiki/install-elpis-on-gcp.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
# Installing Elpis on Google Cloud Platform
# Install Elpis on Google Cloud for Kaldi


## Setup your account

Create an account at [Google Cloud](https://cloud.google.com).

Expand All @@ -15,6 +18,8 @@ When the project has been created, the console will show the project's Dashboard
To add a server to the project, open the left side Navigation Menu and select "Compute Engine". Then select "VM Instances". If this is the first time your Google account has used Cloud Platform you may be offered a free trial! If so, go through the process of signing up for it. Otherwise, you may need to add billing details to access VM instances (TODO add more info about that). You will need to enter credit card details during the free trial opt-in process, but you won't be billed unless you turn on Automatic Billing.


## Create a Virtual Machine

Now that your account has free trial or billing set up, the VM instances page should show "Create" and "Import" buttons.

Click "Create"
Expand All @@ -33,12 +38,12 @@ Paste the following code into the "Startup Script" box
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
curl -O https://download.docker.com/linux/debian/dists/buster/pool/stable/amd64/containerd.io_1.4.3-1_amd64.deb
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu bionic stable"
sudo apt update
sudo apt install ./containerd.io_1.4.3-1_amd64.deb
sudo apt install -y docker-ce
sudo docker run -d --rm -p 80:5001/tcp coedl/elpis:stable
sudo chmod 666 /var/run/docker.sock
sudo docker run -d --rm -p 80:5001/tcp coedl/elpis:latest
```

Then press "Create"
Expand Down

0 comments on commit 6895e02

Please sign in to comment.