# Deploying a Dask Hub

All this material is taken from the following docs:
- https://docs.dask.org/en/latest/setup/kubernetes-helm.html
- https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/setup-kubernetes.html
- https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/setup-helm.html

## Creating a Kubernetes Cluster

First, you need to enable the Kubernetes API if not already done:
1. Go to console.cloud.google.com
2. Select the Kubernetes Engine in the menu: https://console.cloud.google.com/marketplace/product/google/container.googleapis.com
3. Enable the API.

Then you'll need a terminal with __gcloud__ and __kubectl__. The simplest is just to use the Google Cloud Shell from console.cloud.google.com. If you prefer, you can follow the links below to find how to install everything on your computer.

Ask Google Cloud to create a managed Kubernetes cluster and a default node pool to get nodes from:

```
gcloud container clusters create \
  --machine-type n1-standard-4 \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --num-nodes 1 \
  --zone europe-west1-b \
  --cluster-version latest \
  dask-hub-k8s
```

You can test if the cluster is running:
```
kubectl get node
```

Then get permissions to perform all administrative actions needed. Don't forget to replace your email below.

```
kubectl create clusterrolebinding cluster-admin-binding \
  --clusterrole=cluster-admin \
  --user=<GOOGLE-EMAIL-ACCOUNT>
```

## Setting up Helm

From your Google Cloud Shell or terminal:

```
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
helm list
```

should return:
```
NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION
```


## Helm install a Dask Hub for multiple users

We install a full Dask up for multiple users because:
- It enables automatic scaling of computing resources,
- If some of you is unable to deploy its Dask cluster, she or he will be able to use the cluster of someone else.

Verify that you’ve set up a Kubernetes cluster and added Dask’s helm charts:

```
helm repo add dask https://helm.dask.org/
helm repo update
```

Generate tokens to configure Jupyterhub and Dask-gateway:

```
openssl rand -hex 32  # generate token-1
openssl rand -hex 32  # generate token-2
```

Create the file below (for example using vi or any editor) and substitute those two values for `<token-1>` and `<token-2>`.
    
```
# file: secrets.yaml
jupyterhub:
  proxy:
    secretToken: "<token-1>"
  hub:
    services:
      dask-gateway:
        apiToken: "<token-2>"
  scheduling:
    podPriority:
      enabled: true
    userPlaceholder:
      # Specify three dummy user pods will be used as placeholders
      replicas: 1
    userScheduler:
      enabled: true
  singleuser:
    image:
      name: pangeo/pangeo-notebook  # Image to use for singleuser environment. Must include dask-gateyway.
      tag: 2021.01.16

dask-gateway:
  gateway:
    auth:
      jupyterhub:
        apiToken: "<token-2>"
    extraConfig:
      optionHandler: |
        from dask_gateway_server.options import Options, Integer, Float, String

        def options_handler(options):
          if ":" not in options.image:
            raise ValueError("When specifying an image you must also provide a tag")
          return {
            "worker_cores": options.worker_cores,
            "worker_memory": int(options.worker_memory * 2 ** 30),
            "image": options.image,
          }

        c.Backend.cluster_options = Options(
          Integer("worker_cores", default=1, min=1, max=4, label="Worker Cores"),
          Float("worker_memory", default=1, min=1, max=8, label="Worker Memory (GiB)"),
          String("image", default="pangeo/pangeo-notebook:2021.01.16", label="Image"),
          handler=options_handler,
        )
```

Now we just install Dask Hub:
```
helm upgrade --wait --install --render-subchart-notes \
    --namespace daskhub \
    --create-namespace \
    dhub dask/daskhub \
    --values=secrets.yaml
```

## Check install and go to Jupyter!

To get the public IP of your hub deployment:
```
kubectl --namespace=daskhub get service proxy-public
```

Get the external IP, and open it in your browser. You should be able to login with any username/password

## Ensure Dask is working, and K8S mecanisms too!

Just open a notebook in your newly created Dask enabled hub, and try to copy and past the following cells:

### Connect to Dask gateway

In [None]:
from dask_gateway import Gateway
# Use values stored in your local configuration (recommended)
gateway = Gateway()

Is there any existing Dask cluster in there?

In [None]:
gateway.list_clusters()

### Launch a Dask cluster

In [None]:
cluster = gateway.new_cluster(worker_cores=1, worker_memory=3.0)
cluster

This should display a fancy widget. You can open the Dask Dashboard from here. 

Now connect to the cluster, and scale it to get Dask workers.

In [None]:
client = cluster.get_client()
cluster.scale(20)

#### _What's happening in your K8S cluster?_

### Launch some computation, what about Pi?

We'll use Dask array, a Numpy extension for this:

In [None]:
import dask.array as da

sample = 10_000_000_000  # <- this is huge!
xxyy = da.random.uniform(-1, 1, size=(2, sample))
norm = da.linalg.norm(xxyy, axis=0)
summ = da.sum(norm <= 1)
insiders = summ.compute()
pi = 4 * insiders / sample
print("pi ~= {}".format(pi))

#### _How many workers did you get? Why?_

Now just close the cluster.

In [None]:
cluster.close()

#### _What happens after a few minutes?_

## Access GCS Data

In [None]:
import pandas as pd
df = pd.read_csv('gs://obd-dask/train.csv', nrows = 1_000_000)
df