# Deploying a Dask Hub

All this material is taken from the following docs:
- https://docs.dask.org/en/latest/setup/kubernetes-helm.html
- https://gateway.dask.org/install-kube.html
- https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/setup-kubernetes.html
- https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/setup-helm.html

## Creating a Kubernetes Cluster

First, you need to enable the Kubernetes and Compute APIs if not already done:
1. Go to console.cloud.google.com
2. Select the Kubernetes Engine in the menu: https://console.cloud.google.com/marketplace/product/google/container.googleapis.com
3. Enable the API if not already done.
4. Do the same for Compute Engine.

Then you'll need a terminal with __gcloud__ and __kubectl__. The easiest is to just use the Google Cloud Shell from console.cloud.google.com. If you prefer, you can follow the links above to find how to install everything on your computer.

Ask Google Cloud to create a managed Kubernetes cluster and a default node pool to get nodes (VMs) from:

```
gcloud container clusters create \
  --machine-type n1-standard-4 \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --num-nodes 1 \
  --zone europe-west4-b \
  dask-hub-k8s
```

Yhis will take a few minutes (up to about 5).
```
gcloud container clusters list
```

You can then test if the cluster is running:
```
kubectl get node
```

Then get permissions to perform all administrative actions needed. 

**⚠️Don't forget to replace your email below.⚠️**

```
kubectl create clusterrolebinding cluster-admin-binding \
  --clusterrole=cluster-admin \
  --user=<GOOGLE-EMAIL-ACCOUNT>
```

## Setting up Helm

From your Google Cloud Shell or terminal:

```
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 | bash
helm list
```

should return:
```
NAME    NAMESPACE       REVISION        UPDATED STATUS  CHART   APP VERSION
```


## Helm install a Dask Hub

We'll use the default configuration for Daskhub, which means installing Jupyterhub and Dask Gateway on top of it. Jupyterhub and Dask Gateway are made for multiple users of a single instance, but it is easier to install as it is the default.

Verify that you’ve set up a Kubernetes cluster and added Dask’s helm charts:

```
helm repo add dask https://helm.dask.org/
helm repo update
```

Generate tokens to configure Jupyterhub and Dask Gateway service:

```
openssl rand -hex 32  # generate token-1
openssl rand -hex 32  # generate token-2
```

Create the file below (for example using vim or cloud shell editor) and **⚠️substitute the token-1 and token-2 values⚠️**.

```yaml
# file: daskhub-config.yaml
jupyterhub:
  proxy:
    secretToken: "token 1"
  hub:
    services:
      dask-gateway:
        apiToken: "token 2"
  scheduling:
    podPriority:
      enabled: true
    userPlaceholder:
      replicas: 1
    userScheduler:
      enabled: true
  singleuser:
    image:
      name: "pangeo/pytorch-notebook"
      tag: "2024.01.03"

dask-gateway:
  gateway:
    auth:
      jupyterhub:
        apiToken: "token 2"
    extraConfig:
      optionHandler: |
        from dask_gateway_server.options import Options, Integer, Float, String

        def options_handler(options):
          if ":" not in options.image:
            raise ValueError("When specifying an image you must also provide a tag")
          return {
            "worker_cores": options.worker_cores,
            "worker_memory": int(options.worker_memory * 2 ** 30),
            "image": options.image,
          }

        c.Backend.cluster_options = Options(
          Integer("worker_cores", default=1, min=1, max=4, label="Worker Cores"),
          Float("worker_memory", default=4, min=1, max=8, label="Worker Memory (GiB)"),
          String("image", default="pangeo/pytorch-notebook:2024.01.03", label="Image"),
          handler=options_handler,
        )

```


Now we just install Dask Hub:
```
helm upgrade --wait --install --render-subchart-notes \
    --namespace daskhub \
    --create-namespace \
    dhub dask/daskhub \
    --values=daskhub-config.yaml
```

This will again take a few minutes.
```
helm list -n daskhub
```

## Check install and go to Jupyter!

To get the public IP of your hub deployment:
```
kubectl --namespace=daskhub get service proxy-public
```

Get the external IP, and open it in your browser. You should be able to login with any username/password.

# Ensure Dask is working, and K8S mecanisms too!

## Create a dask-gateway cluster

Using Dask Gateway API.

In [None]:
from dask_gateway import Gateway
gateway = Gateway()

Is there any existing Dask cluster in there?

In [None]:
gateway.list_clusters()

### Launch a Dask cluster

In [None]:
cluster = gateway.new_cluster(worker_cores=1, worker_memory=3.0)
cluster

This should display a fancy widget. You can open the Dask Dashboard from here. 

Now connect to the cluster, and scale it to get Dask workers.

In [None]:
client = cluster.get_client()
cluster.scale(20)

#### _What's happening in your K8S cluster after some minutes?_

### Launch some computation, what about Pi?

We'll use Dask array, a Numpy extension for this:

In [None]:
import dask.array as da

sample = 10_000_000_000  # <- this is huge!
xxyy = da.random.uniform(-1, 1, size=(2, sample))
norm = da.linalg.norm(xxyy, axis=0)
summ = da.sum(norm <= 1)
insiders = summ.compute()
pi = 4 * insiders / sample
print("pi ~= {}".format(pi))

You can watch the workers activities on your DaskDashboard/workers

#### _How many workers did you get? Why?_

Some hints to find out
```
kubectl get pod -n daskhub
```
```
kubectl describe pod <YOUR_POD_NAME> -n daskhub
```

Now just close the cluster.

In [None]:
cluster.close()

#### _What happens after a few minutes?_

## Deleting a Kubernetes Cluster

Get your cluster name and region
```
gcloud container clusters list
```
Delete your kubernetes cluster
```
gcloud container clusters delete <YOUR_CLUSTER_NAME> --region <YOUR_CLUSTER_REGION>
```


## Access GCS Data

Just to check this is working well...

In [None]:
import pandas as pd
df = pd.read_csv('gs://obd-dask/train.csv', nrows = 1_000_000)
df

# Backup : If you're behind a corporate proxy (like me)

Try to enable SSL (https).
That does not work with dask-gateway.

Generate a self signed certificate, and put it inside Kubernetes secrets:

```bash
openssl req -newkey rsa:2048 -nodes -keyout domain.key -x509 -days 365 -out domain.crt
kubectl --namespace daskhub create secret tls domain-tls --key="domain.key" --cert="domain.crt"
```

Update the yaml configuration file, in the jupyterhub.proxy section, enabling https and pointing to your certificate:
```yaml
jupyterhub:
  proxy:
    https:
      enabled: true
      #      hosts:
      #  - 34.140.104.238
      type: secret
      secret:
        name: domain-tls
```

Refresh helm deployment:

```bash
helm upgrade --cleanup-on-fail --render-subchart-notes \
    --namespace daskhub \
    --create-namespace \
    dhub dask/daskhub \
    --values=daskhub-config.yaml
```