# Earth Species Project GCP Setup

This is documentation of the process of setting up GCP for our team at Earth Species Project. It's a real time log, not written after the fact ... exploratory programming for our infrastructure.

(Hmm ... would be fun to be able to drop markdown into a shell REPL.)

#### Our needs are roughly ...
* Ability to quickly spin up a notebook with high speed read/write access to the ESP Library Datasets.
* Ability to quickly share work with others (internally and externally)
  * Binder running on a dedicated low cost VM?
  * Ability to fork a notebook in our cluster
  * Links to Binder from Github
  * What happens if that Binder link gets flooded with traffic?
  * How do deal with private datasets?

### Let's start digging!
Began searching around for modern GCP setups. First stumbled upon [AI Hub](https://cloud.google.com/ai-hub/docs/introduction). We're not using Kubernetes and this seems quite high level, but clicking through led me to this doc on PyTorch filled with lots of marketing buzzwords ...  
https://aihub.cloud.google.com/u/0/p/products%2F9e72e779-6542-4461-9cf1-605ba96769bf

and clicking the oddly framed "Use this asset: visit" took me to ...  
https://console.cloud.google.com/marketplace/details/click-to-deploy-images/deeplearning

Now we're just getting into the GCP marketplace.

Looks like this is a quick deploy of

![deploy](data/img/marketplace_vm.png)

OK not exactly what we want, but cool. Can use this as a benchmark.

### fast.ai
Let's see [what fast.ai recommends](https://course.fast.ai/start_gcp.html).

Even a quick prod at this shows that these docs are incorrect. (outdated?)

A quick glance at the [ref docs for GCP machine types (**n2d-highmem-8**)](https://cloud.google.com/compute/docs/machine-types) ...


In particular (this is in scary red bold):
```
Caution: N2 machine types do not support GPUs.

N2 machine types are only available in select zones and regions. The following list shows the available N2 predefined machine types.
```

Quick search through the forums shows that [people running into this](https://forums.fast.ai/t/google-cloud-platform/35907/53).

TODO: Fix fast.ai docs and submit a pull request.

### OK, back to GCP docs, then ...

Reading a bit more about their GPU VMs  
https://cloud.google.com/ai-platform/deep-learning-vm/docs/pytorch_start_instance#with_one_or_more_gpus_2

Gives us ...

```
export IMAGE_FAMILY="pytorch-latest-gpu"
export ZONE="us-west1-b"
export INSTANCE_NAME="my-instance"

gcloud compute instances create $INSTANCE_NAME \
  --zone=$ZONE \
  --image-family=$IMAGE_FAMILY \
  --image-project=deeplearning-platform-release \
  --maintenance-policy=TERMINATE \
  --accelerator="type=nvidia-tesla-v100,count=1" \
  --metadata="install-nvidia-driver=True"
```

Interesting bits here are
* **pytorch-latest-gpu** which defines the Python and CUDA versions
* and that it defaults to a **testla v100**... Options are 
  * nvidia-tesla-v100 (count=1 or 8)
  * nvidia-tesla-p100 (count=1, 2, or 4)
  * nvidia-tesla-p4 (count=1, 2, or 4)
  * nvidia-tesla-k80 (count=1, 2, 4, or 8)

And found this when doing a bit of searching about modern GCP stacks  
https://blog.kovalevskyi.com/deep-learning-images-for-google-cloud-engine-the-definitive-guide-bc74f5fb02bc

In particular, what caught my eye was

```proxy-mode=project_editors```

Which is gcps new automatic proxy for notebook access (Dope!)

### Google Cloud AI Platform Notebooks

(Google is awesome at naming things.)

Then stumbled upon this post  
https://medium.com/google-cloud/using-google-cloud-ai-platform-notebooks-as-a-web-based-python-ide-e729e0dc6eed

Which is long and seemed pretty high level so I just skimmed it a bit and stumbled upon this little gem

```
Many of the commands in this post use the Google Cloud gcloud command-line tool. Most of these steps are possible with the AI Platform Notebooks UI, but using gcloud gives more control and better reproducibility.

You can install gcloud on your local machine, but if you don’t already have a machine with gcloud just use Cloud Shell, a web-based terminal session with everything you need already installed. If you already have a Google Cloud project and are logged in, click this link to launch Cloud Shell.
```

In particular, what's this about doing the _setup of the Notebook Platform from within a notebook?_

This lead me to ...  
https://cloud.google.com/ai-platform-notebooks/ and then  
https://cloud.google.com/ai-platform/notebooks/docs  
https://cloud.google.com/ai-platform/notebooks/docs/before-you-begin  
https://cloud.google.com/ai-platform/notebooks/docs/create-new  

Boom! A few clicks and I have a notebook running, including fast.ai support!

![notebook](data/img/fastai_gcp_notebook.png)

Oh yeah, except for the inevitable GCP Quotas headache.
```
pytorch-fastai-test: Quota 'GPUS_ALL_REGIONS' exceeded. Limit: 0.0 globally.
```

https://cloud.google.com/compute/quotas  
Metric -> GPUs (All Regions) -> Edit

```
Compute Engine API
Thank you for submitting Case # (ID:23066692) to Google Cloud Platform support for the following quota:

    Change GPUs (all regions) from 0 to 3

Your request is being processed and you should receive an email confirmation for your request. Should you need further assistance, you can respond to that email.
```

OK. Now to wait on that request. It's 9:36p right now. Let's play with a CPU instance.

_Whoops._ Immediately got an email back ...

```
Hello,

We have received your quota request for okapi-274503.

Unfortunately, we are unable to grant you additional quota at this time. If  
this is a new project please wait 48h until you resubmit the request or  
until your Billing account has additional history.

Your Sales Rep is a good Escalation Path for these requests, and we highly  
recommend you to reach out to them.

If you have any further questions, please reply to this thread or feel free  
to reach out to us at gc-team@google.com.

Thanks.

Sincerely,
Cloud Platform Support
```

┌| ◔ ▃ ◔ |┐  

Fast forward 30 min of GCP billing admin hell ...  
**Fixed!** Only 1 GPU right now. Request for 3 was declined. Will try again next week.

### Sidebar on names
I had to choose a name for the gcp vm cluster. This was a bit of a rabbit hole.

![cow](data/img/cow.png)

It makes sense to name teams/infrastructure after different species. I settled on the Okapi for our AI team. (Slightly confused evolutionary GAN or perhaps from [Joel's](https://twitter.com/_joelsimon) [Artbreeder](https://artbreeder.com/)) 

![okapi](data/img/okapi_1901.jpg)

### OK, back to the notebook!

![worked](data/img/ai_platform_notebook.png)

Booting up Jupyter Lab ...

![jupyter](data/img/jupyter_lab.png)

OK. Let's see how easy it is to get FastAI v2. First to get my head around how remote terminal access works.

From a Jupyter Lab terminal ...
```
jupyter@pytorch-fastai-test:~$ pwd
/home/jupyter
jupyter@pytorch-fastai-test:~$ whoami
jupyter
jupyter@pytorch-fastai-test:~$ 
```

From "Cloud Shell" ...
```
britt@cloudshell:~ (okapi-274503)$ pwd
/home/britt
britt@cloudshell:~ (okapi-274503)$ whoami
britt
britt@cloudshell:~ (okapi-274503)$
```

### Let's move this doc to a notebook
I'm currently writing these notes in a .md file on my local computer. Let's move it into an .ipynb on this new instance!

...

And let's commit it to Github!
Note: [Personal access token auth](https://help.github.com/en/github/authenticating-to-github/creating-a-personal-access-token-for-the-command-line) doesn't seem to work from [here](https://cloud.google.com/ai-platform/notebooks/docs/save-to-github). Need to investigate.

Tomorrow:
* Test fastai2 lib
* get a sample of accoustic files on a shared [storage bucket](https://cloud.google.com/vision/docs/quickstart)
* start transfer of larger datasets