# Controlboard for Eraneos Data Science
* [Initialization](#Initialization)
* [Jupyter Datascience-Notebook](#Jupyter-Data-Science-Notebook)
* [Debug Kubernetes](#Debug-Kubernetes)

Available stacks to plug into Jupyter:
* [PostgreSQL Database](#PostgreSQL-Database)
* [MySQL Database](#MySQL-Database)
* [Cloudbeaver](#Cloudbeaver)
* [Airflow](#Airflow)
* [Elasticsearch and Kibana](#Elasticsearch-and-Kibana)


<div class="alert alert-block alert-warning">
<b>Getting started</b> Check out the [knowhow folder](./knowhow). It contains valueable code snippets and best practises. 
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> To run this Notebook, certain Kubernetes priviledges are necessary. The Notebook can only be run with a "Controlboard" Jupyter Notebook featuring a dark/black GUI.
</div>

<div class="alert alert-block alert-info">
<b>Note:</b> This Notebook runs with very few resources (CPU, RAM/memory) - be patient! Use a normal Jupyter Datascience-Notebook (below) for any calculations.
</div>

To start a Controlboard Jupyter Notebook:
* Windows: use `run_controlboard.cmd`
* Linux: use `--set controlboard=true`:
  ```console
  helm upgrade -i -n myproject --create-namespace -f myvalues.yaml --wait controlboard jupyter/ --set controlboard=true
  ```

## Initialization
Always run this code before doing anything else. We read your `myvalues.yaml` file but you could override these settings below.

In [None]:
import yaml
import kubernetes
import secrets
import base64
from sqlalchemy import create_engine, MetaData
from sqlalchemy_utils import database_exists, create_database
from urllib import parse
from cryptography.fernet import Fernet

from modules import k8swrapper as k8s, db_utils

kubernetes.config.load_incluster_config()
api = kubernetes.client.CoreV1Api()

In [None]:
myvalues_path = './myvalues.yaml'

with open(myvalues_path) as f:
    myvalues = yaml.safe_load(f.read())
namespace = myvalues['namespace']
jupyter_release_name = myvalues['jupyterReleaseName']
sourcecode_dir = myvalues['sourcecodeDirectory']
data_dir = myvalues['dataDirectory']

print(f'Using Kubernetes namespace {namespace}, jupyterReleaseName {jupyter_release_name}')
print(f'Sourcecode directory: {sourcecode_dir}')
print(f'Data directory: {data_dir}')

***
## Jupyter Data Science-Notebook
### Start or reconnect to an already running Jupyter Kubernetes Pod
If this command does not provide an URL, simply re-run the cell again. If you get an error "Bad Gateway", wait a bit and refresh the page.

In [None]:
! helm upgrade --install -f {myvalues_path} {jupyter_release_name} jupyter

### Delete the Jupyter Pod
This will remove the Jupyter pod (stopping is not possible with Kubernetes). Only data in the ```sourcecode``` and ```data directory``` will be retained. Copy paste this code to execute.

```console
! helm delete {jupyter_release_name}
```


### Cleanup: delete secret
The secret contains the token to access Jupyter's web GUI. To delete it (and get a new token next time) type:
```console
! kubectl delete secret -l app.kubernetes.io/instance={jupyter_release_name}
```

***
## PostgreSQL Database

[PostgreSQL](https://www.postgresql.org) is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

Check out the [Database getting started Jupyter notebook](./knowhow/database_getting_started.ipynb) for code snippets! If you want to tune this stack, check out [its Helm chart](https://github.com/bitnami/charts/tree/master/bitnami/postgresql).

### PostgreSQL passwords
Retrieves an existing or creates a new k8s secret with a couple of strong passwords. Writes the passwords to an .env file so you can use them securly in other Notebooks without k8s access.

In [None]:
# Name of the Kubernetes secret
postgresql_secret = 'postgresql'
# Path and filename of the environment file to save the sensitive data for other Notebooks
postgresql_file = './postgres.env'
# Username of the "normal" database user - do NOT use postgres
postgresql_user = 'dbuser'

# Create a new secret with random strong passwords
# or retrieve the values of an existing k8s secret
secret = k8s.create_or_get_secret(
    api,
    postgresql_secret,
    namespace,
    {
        # Admin password for user "postgres"
        'postgres-password': k8s.alphanumeric_password(16),
        # Password for the normal user (define username above)
        'password': k8s.alphanumeric_password(16),
        # Replication password for user "repl_user"
        'replication-password': k8s.alphanumeric_password(16)
    })
# Write secrets to an *.env file in order to use them
# securely in another Notebook.
k8s.dict_to_env_file(
    postgresql_file,
    {
        'USERNAME': postgresql_user,
        'PASSWORD': secret['password'],
        'ADMIN_PASSWORD': secret['postgres-password'],
        'REPLICATION_PASSWORD': secret['replication-password']
    })

### Start the stack

In [None]:
postgresql_release = 'postgresql'
version = '12.5.5'

! helm upgrade -i --version {version} {postgresql_release} bitnami/postgresql \
    --set auth.username={postgresql_user} \
    --set auth.existingSecret={postgresql_secret}

### Remove the stack (database will be retained)

```console
! helm delete $postgresql_release
```

### Cleanup: delete secret and database
The command above will leave both the Kubernetes Secret and a PVC `data-postgresql-0` (where your PostgreSQL data is stored!). The Secrets are needed to access the data!! To delete secrets and data, manually type:
```console
! kubectl delete secret {postgresql_secret}
! kubectl delete pvc -l app.kubernetes.io/instance={postgresql_release}
! rm -f {postgresql_file}
```

***
## MySQL Database

[MySQL](https://www.mysql.com) is another popular open source database.

Check out the [Database getting started Jupyter notebook](./knowhow/database_getting_started.ipynb) for code snippets! If you want to tune this stack, check out [its Helm chart](https://github.com/bitnami/charts/tree/master/bitnami/mysql).

### MySQL passwords
Retrieves an existing or creates a new k8s secret with a couple of strong passwords. Writes the passwords to an .env file so you can use them securly in other Notebooks without k8s access.

In [None]:
# Name of the Kubernetes secret
mysql_secret = 'mysql'
# Path and filename of the environment file to save the sensitive data for other Notebooks
mysql_file = './mysql.env'
# Username of the "normal" database user
mysql_user = 'dbuser'

# Create a new secret with random strong passwords
# or retrieve the values of an existing k8s secret
secret = k8s.create_or_get_secret(
    api,
    mysql_secret,
    namespace,
    {
        # Admin password for user "root"
        'mysql-root-password': k8s.alphanumeric_password(16),
        # Password for the normal user (define username above)
        'mysql-password': k8s.alphanumeric_password(16),
        # Replication password for user "replicator"
        'mysql-replication-password': k8s.alphanumeric_password(16)
    })
# Write secrets to an *.env file in order to use them
# securely in another Notebook.
k8s.dict_to_env_file(
    mysql_file,
    {
        'USERNAME': mysql_user,
        'PASSWORD': secret['mysql-password'],
        'ROOT_PASSWORD': secret['mysql-root-password'],
        'REPLICATION_PASSWORD': secret['mysql-replication-password']
    })

### Start the stack

In [None]:
mysql_release = 'mysql'
version = '9.10.1'

! helm upgrade -i --version {version} {mysql_release} bitnami/mysql \
    --set auth.username={mysql_user} \
    --set auth.createDatabase=false \
    --set auth.existingSecret={mysql_secret}

### Remove the stack (database will be retained)

```console
! helm delete $mysql_release
```

### Cleanup: delete secret and database
The command above will leave both the Kubernetes Secret and a PVC `data-postgresql-0` (where your PostgreSQL data is stored!). The Secrets are needed to access the data!! To delete secrets and data, manually type:
```console
! kubectl delete secret $mysql_secret
! kubectl delete pvc -l app.kubernetes.io/instance=$mysql_release
! rm -f {mysql_file}
```

***
## Cloudbeaver
Manage your database from your browser: [Cloudbeaver](https://cloudbeaver.io/)

### Start the stack

In [None]:
! helm upgrade -i cloudbeaver ./cloudbeaver

### Optional: Cloudbeaver admin password
You CANNOT currently log-in as admin since we're using anonymous login. This will save the credentials to an `*.env`-file.

In [None]:
# Name of the Kubernetes secret
cloudbeaver_secret = 'cloudbeaver'
# Path and filename of the environment file to save the sensitive data for other Notebooks
cloudbeaver_file = './cloudbeaver.env'

# Grab the secret that was deployed above when installing Cloudbeaver;
# Write secrets to an *.env file in order to use them
# securely in another Notebook.
k8s.dict_to_env_file(cloudbeaver_file, k8s.get_secret(api, cloudbeaver_secret, namespace))

### Remove the stack (Cloudbeaver settings will be retained)

```console
! helm delete cloudbeaver
```

### Cleanup: delete Cloudbeaver settings
The command above will leave both a PVC `cloudbeaver` (where your Cloudbeaver settings are stored). To delete this data, manually type:
```console
! kubectl delete pvc -l app.kubernetes.io/instance=$mysql_release
! rm -f ./cloudbeaver.env
```

***
## Airflow
[Apache Airflow](https://airflow.apache.org/docs) is a platform created by the community to programmatically author, schedule and monitor workflows. Especially useful in conjunction with [dbt](https://docs.getdbt.com/docs/introduction). For customization, check the [helm chart](https://github.com/bitnami/charts/tree/master/bitnami/airflow) and set values accordingly in `./airflow/values.yaml`

### Preparations
Make sure that you started your PostgreSQL DB above. We need an sqlalchemy Engine object to mess with the PostgreSQL DB:

In [None]:
airflow_table = 'airflow'

airflow_engine = db_utils.create_db_engine(
    f'{postgresql_release}.{namespace}.svc.cluster.local',
    5432,
    'postgresql',
    postgresql_user,
    k8s.get_secret_key(api, postgresql_secret, namespace, 'password'),
    airflow_table)

#### Create an airflow DB table
Create a dedicated table for airflow (which also holds Airflow username and password for logging in). Make sure this command returns `True` (=Table actually got created) if you start from scratch, otherwise there already is a table that potentially contains a (wrong) password for login!

In [None]:
db_utils.create_table_if_not_exist(airflow_engine)

#### Create a secret for passwords
Note: there's currently a bug in the helm chart, hence this workaround

In [None]:
# You cannot use 'airflow', helm chart needs that
airflow_secret = 'myownairflowsecret'
# Username for web gui
airflow_user = 'user'
# Path and filename of the environment file to save the sensitive data for other Notebooks
airflow_file = './airflow.env'

# Create a new secret with random strong passwords
# or retrieve the values of an existing k8s secret
secret = k8s.create_or_get_secret(
    api,
    airflow_secret,
    namespace,
    {'airflow-password': k8s.alphanumeric_password(16),
     'airflow-fernet-key': Fernet.generate_key().decode(),
     'airflow-secret-key': k8s.alphanumeric_password(64)})
# Write secrets to an *.env file in order to use them
# securely in another Notebook.
k8s.dict_to_env_file(
    airflow_file,
    {
        'USERNAME': airflow_user,
        'PASSWORD': secret['airflow-password'],
        'FERNET_KEY': secret['airflow-fernet-key'],
        'SECRET_KEY': secret['airflow-secret-key']
    })

#### Create a git repo for the files you want to feed to Airflow
Create a (public or private) Repo that is accessible to Kubernetes. Alternatively, you could mount your files using a Kubernetes ConfigMap - but then you would have to delete and redeploy Airflow on any filechange! If you just want to test, leave the current values. 

If you use a private repository from GitHub, a possible option to clone the files is using a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and using it as part of the URL: https://USERNAME:PERSONAL_ACCESS_TOKEN@github.com/USERNAME/REPOSITORY

You can mount as many git Repos as you want.

See the [helm chart's "Load DAG Files"](https://github.com/bitnami/charts/tree/master/bitnami/airflow#load-dag-files) for more info.

In [None]:
# Do NOT use any spaces or NON-ASCII character!!
airflow_git_name = 'SomeAirflowExamples'
airflow_git_repo = 'https://github.com/ThomasKat/airflow_example_dags'
airflow_git_branch = 'main'

### Start the stack

In [None]:
# Name of the helm release
airflow_release = 'airflow'

! helm upgrade -i {airflow_release} airflow \
    --render-subchart-notes \
    --set airflow.web.baseUrl=localhost/{namespace}/airflow \
    --set airflow.auth.username={airflow_user} \
    --set airflow.auth.password={k8s.get_secret_key(api, airflow_secret, namespace, 'airflow-password')} \
    --set airflow.auth.fernetKey={k8s.get_secret_key(api, airflow_secret, namespace, 'airflow-fernet-key')} \
    --set airflow.auth.secretKey={k8s.get_secret_key(api, airflow_secret, namespace, 'airflow-secret-key')} \
    --set airflow.externalDatabase.host={postgresql_release}.{namespace}.svc.cluster.local \
    --set airflow.externalDatabase.port=5432 \
    --set airflow.externalDatabase.user={postgresql_user} \
    --set airflow.externalDatabase.existingSecret={postgresql_secret} \
    --set airflow.externalDatabase.existingSecretPasswordKey=password \
    --set airflow.externalDatabase.database={airflow_table} \
    --set airflow.git.dags.repositories[0].name={airflow_git_name} \
    --set airflow.git.dags.repositories[0].repository={airflow_git_repo} \
    --set airflow.git.dags.repositories[0].branch={airflow_git_branch}

### Delete the stack
This will delete almost all Kubernetes resources for Airflow., but leaves the secrets (=passwords) and data intact (stored in PostgreSQL table)
```console
! helm delete $airflow_release
```

### Cleanup: delete Airflow data and secrets
Airflow persists data in PostgreSQL (including e.g. log-in credentials). Delete the entire table:
```Python
db_utils.drop_table(airflow_engine)
```

Delete Airflow's Kubernetes secrets and PVCs. The Secret is needed to e.g. persist Airflow's web GUI login credentials.
```console
! kubectl delete secrets -l app.kubernetes.io/instance={airflow_release}
! kubectl delete secrets {airflow_secret}
! kubectl delete pvc -l app.kubernetes.io/instance={airflow_release}
! rm -f {airflow_file}
```

***
## Elasticsearch and Kibana
[Elasticsearch](https://www.elastic.co/products/elasticsearch) is a distributed search and analytics engine. It is used for web search, log monitoring, and real-time analytics. Ideal for Big Data applications.
> This will install Elasticsearch without any security measures enabled - elasticsearch won't ask for passwords. Check the [Helm chart](https://github.com/bitnami/charts/tree/master/bitnami/elasticsearch) to learn more

This will also install [Kibana](https://www.elastic.co/kibana), a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack.

### Start the stack

In [None]:
elastic_release = 'elastic'

! helm upgrade -i $elastic_release elastic \
    --set elasticsearch.kibana.configuration.server.basePath=/$namespace/kibana

### Example usage in your normal, white Jupyter Notebook
Get the Python client for elasticsearch:
```console
! pip install elasticsearch
```
Then, following the [elasticsearch Python client manual](https://elasticsearch-py.readthedocs.io/en/v8.3.3/), push an entry to elasticsearch, then search for it:
```Python
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch("http://elastic-elasticsearch:9200")

doc = {
    'author': 'kimchy',
    'text': 'Elasticsearch: cool. bonsai cool.',
    'timestamp': datetime.now(),
}
resp = es.index(index="test-index", id=1, document=doc)
print(resp['result'])

resp = es.get(index="test-index", id=1)
print(resp['_source'])

es.indices.refresh(index="test-index")

resp = es.search(index="test-index", query={"match_all": {}})
print("Got %d Hits:" % resp['hits']['total']['value'])
for hit in resp['hits']['hits']:
    print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])
```

### Delete the stack
This will delete all elasticsearch Kubernetes pods, services, etc., but leaves the data intact (stored in Kubernetes PVCs)
```console
! helm delete $elastic_release
```

### Cleanup: delete elasticsearch database
This command will delete all elasticsearch Kubernetes PVCs and thus elasticsearch's database
```console
! kubectl delete pvc -l app.kubernetes.io/instance=$elastic_release
```

***
# Debug Kubernetes

## Debugging helm
[helm](https://helm.sh/docs/) is a way of packaging and installing K8S stuff easily. If you use a certain ["helm chart"](https://helm.sh/docs/topics/charts), you will have your own "helm release" running in your K8S. In general, helm will likely create K8S [secrets](https://kubernetes.io/docs/concepts/configuration/secret/) as well as [PVCs](https://kubernetes.io/docs/concepts/storage/persistent-volumes/), but will NOT delete them once you delete a release. If you e.g. re-install a helm chart, this can lead to you not being able to log-in anymore or other weird errors. 

List all the helm releases that are currently running in your current K8S namespace

In [None]:
! helm list

Get more info on a certain helm release from above; you will probably only use "notes".

In [None]:
helm_release = 'airflow'
# available commands: all, hooks, manifest, notes, values
get_what = 'notes'

! helm get $get_what $helm_release

If something does not work: helm (and K8S) is made to tear not working stuff down and then just try again. Again, be aware that helm might leave secrets and/or PVCs intact. To remove a certain helm release, type
```console
helm_release = 'airflow'
! helm delete $helm_release
```

To list the remaining [secrets](https://kubernetes.io/docs/concepts/configuration/secret/) and [PVCs](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) for that particular helm release.

In [None]:
# You will probably only need: secret, pvc
k8s_object = 'secret'
! kubectl get $k8s_object -l app.kubernetes.io/instance=$helm_release

To delete **ALL** of the listed objects, type
```console
! kubectl delete $k8s_object -l app.kubernetes.io/instance=$helm_release
```

## Debugging Pods

A [pod](https://kubernetes.io/de/docs/concepts/workloads/pods/) is the smallest unit K8S will mess with. It can consist of one or several containers. pods can get stuck, fail to start, fail to upgrade, etc.

Show all pods in the current namespace. If one is not `Running`, that's where you start looking.

In [None]:
! kubectl get pods

Show all the K8S information for a certain pod (=the YAML that was used to create the pod in Kubernetes)

In [None]:
pod = 'airflow-worker-0'
! kubectl describe pod $pod

Dive into a pod and show its log files. If the pod has not started completely, this command might fail

In [None]:
! kubectl logs $pod

If you want to log into a pod and get shell access, start a new terminal (do NOT use the Notebook here) and type the following
```console
export pod=airflow-worker-0
kubectl exec -it $pod -- /bin/bash
```
Type `exit` to close the shell. You might need to replace `/bin/bash` with the correct shell; check the pod's original container image.

## Debugging other K8S objects than pods
You should never directly mess with pods, but they'll give you info on what's not working. Pods are created e.g. through a "Deployment" or a "Statefulset".
Choose a K8S object that you're interested in:

In [None]:
# Other commonly used objects: ingressroute, service, deployment, statefulset, configmap, secret, pvc, pv
k8s_object = 'secret'

List all such objects in the current namespace

In [None]:
! kubectl get $k8s_object

Pick a certain object name above and get all the K8S info for it

In [None]:
instance = 'airflow'
! kubectl describe $k8s_object $instance

## Housekeeping, e.g. disk space
Kubernetes is NOT made to let you manually mess with container images, e.g. deleting them. This is automatically done by K8S. 

List all container images in the current (!!) namespace (images take up disk space)

In [None]:
! kubectl get pods -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c

Check for any warnings, e.g. disk space. You will have to use these commands on the host machine, not here in the Notebook!
```console
! kubectl describe nodes
```