# Controlboard for Data Analytics-Stacks
* [Jupyter-Datascience-Notebook](#Jupyter-Datascience-Notebook)    
* [Very helpful: Manipulate your Docker environment](#Manipulate-your-Docker-environment)

Available stacks to plug into Jupyter:
* [Elastic Stack (formerly ELK-Stack)](#Elastic-Stack-(formerly-ELK-Stack))
* [PostgreSQL Database, using SQLAlchemy](#PostgreSQL-Database-using-SQLAlchemy)
* [MySQL Database, using SQLAlchemy](#MySQL)
* [Neo4j](#Neo4j)


***
## Jupyter Datascience-Notebook
#### Summary

Your "standard" [Jupyter Notebook for Datascience](https://github.com/jupyter/docker-stacks/tree/master/datascience-notebook) plus some additional libraries and the Jupyterlab extensions
* [jupyterlab-git: git and GitHub integration](https://github.com/jupyterlab/jupyterlab-git)
* [jupyterlab-lsp: Code completion, function definition look-up and more](https://github.com/krassowski/jupyterlab-lsp)
* [Code debugger](https://github.com/jupyterlab/debugger)

#### Set some variables first - be sure to run this code!
>Note: you can either set some variables here, or you will have to set them in each cell below!

Edit the variable definitions in the next cell to fit your needs.
* `PROJECT_NAME`: name of this project. Will show up in all container names associated with this project. No spaces or special characters allowed
* `DATALAB_SOURCECODE_DIR`: your Windows directory containing all your source code - including this datalab! Will appear as `/home/jovyan/work` in the Jupyter Notebook
* `DATALAB_DATA_DIR`: your Windows directory containing all data. Will be mounted as `/home/jovyan/data` in the Notebook

>Note: **Only data in either directory will survive the destruction of the Jupyter Notebook container!**

>Note: your Windows paths must be written in UNIX style .. use `wsl -e wslpath "c:\users\kat\projekte"` to
> translate your path into the valid `/mnt/c/users/kat/projekte`


Once your done editing in the cell below, run the code (Ctrl+Enter inside the cell):

In [None]:
# import the k8s wrapper module which abstracts calls to the Kubernetes Python library
from modules import k8swrapper as k8s

# ---------------------------------------------------------------------------------------
# project specific settings

# unique project name .. needed for configuration files and access 
PROJECT_NAME = 'test1'
# data dir can be shared between notebooks - this is where the configuration file
# for this notebook is stored, too
DATALAB_DATA_DIR = '/mnt/c/Users/Baw/OneDrive - AWK Group AG/projekte/docker_mount/awk_datalab/data'

# ---------------------------------------------------------------------------------------
# setup specific settings .. stay mostly the same.
# change the next entry to point to the AWK Datalab source dir
DATALAB_SOURCECODE_DIR = '/mnt/c/Users/Baw/OneDrive - AWK Group AG/projekte/docker_mount/awk_datalab/datalab'

# ---------------------------------------------------------------------------------------
# you probably do not need to change anything below here, as they remain largely the same for all notebooks


# ---------------------------------------------------------------------------------------
print('These are your settings:')
print('')
print('Projectname:\t' + PROJECT_NAME)
print('Data directory:\t' + DATALAB_DATA_DIR)
print('Work directory:\t' + DATALAB_SOURCECODE_DIR)
print('')

#### Start the container
Next, to start a single Jupyter Notebook with your settings above, just run this code:

In [None]:
k8s.create_project('jupyter',PROJECT_NAME, DATALAB_DATA_DIR, DATALAB_SOURCECODE_DIR)

Next, get the URL to access the notebook in your browser
>Note: k8s.get_project_url() tries 5 times to get the URL from the logfile .. and waits 3 seconds in between
>if it fails, it returns "Not ready yet, please try again."
>if it succeeds, the URL to access the Jupyter Notebook is returned.

In [None]:
print('Open your project in your browser with the following URL:')
print(k8s.get_project_url(PROJECT_NAME))

#### Stopping and cleaning up
Stop the Jupyter Notebook and delete the pod. The data will be retained in your DATALAB_DATA_DIR.

In [None]:
# delete project in default namespace "default"
k8s.delete_project('test1')

It might well be that terminating a pod and its container(s) takes a while .. just list the pods once again until the deleted pod is gone.

If you would like to clean up all the data and the configuration file for the projext (named <PROJECT_NAME>.yml, and stored in the DATALAB_DAZTA_DIR), then just remove it after you deleted the project as instructed above.

#### How to save your entire computational context if you installed additional packages
You might change your container by installing new **PIP** Python packages e.g. with `pip install <package name>`. This change will be lost with the container. To quickly save your entire pip environment, including all packages, copy-paste the following into your notebook:

In [None]:
! pip freeze > /home/jovyan/work/pip-environment.txt

To load your environment again from scratch, e.g. if you re-created your container:

In [None]:
! pip install -r /home/jovyan/work/pip-environment.txt

If you installed additional Python packages with **Anaconda**, `conda install <package name>`, here's how to save the entire conda environment:

In [None]:
! conda env export -n base > /home/jovyan/work/anaconda-environment.yml

To re-install all Anaconda packages from this file, do:

In [None]:
! conda env update --name base --file /home/jovyan/work/anaconda-environment.yml

***
## Elastic Stack (formerly ELK-Stack)

#### Summary

Elasticsearch, Kibana, Beats, and Logstash. Take data from any source, in any format, then search, analyze, and visualize it in real time.

* **Elasticsearch** is a distributed, RESTful search and analytics engine. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
* **Kibana** lets you visualize your Elasticsearch data and navigate the Elastic Stack so you can do anything from tracking query load to understanding the way requests flow through your apps.
* **Logstash** is a server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."
* **Beats** is the platform for single-purpose data shippers. They send data from hundreds or thousands of machines and systems to Logstash.

Note that Beats (e.g. Metricbeat or Systembeat) are not included in this stack

#### Connections once the stack has been started
* Direct Kibana browser access: `https://localhost/<PROJECT_NAME>-elastic`
* Elasticsearch access for Windows: [http://localhost:9200](http://localhost:9200)
* Access Elasticsearch from **within** a Jupyter container: [http://elasticsearch:9200](http://elasticsearch:9200)
* Logstash access from **within** a Jupyter container: [http://logstash:9600](http://elasticsearch:9600)

#### Create a volume to persist all data

As Rancher Desktop is able to mount local paths into the Kubernetes cluster, you simply will have to create a directory called `<PROJECT_NAME>-elastic` inside your data directory (i.e. DATALAB_DATA_DIR) **before** you start the ELK stack.

In [None]:
print (DATALAB_DATA_DIR)

#### Start the stack
To start the stack, execute the next cell. 
If necessary, you can change the port settings for Elastic, Logstash and Kibana. This is only necessary if you are NOT using the defaults:
```
    ELASTICSEARCH_PORT=9200 (API)
    LOGSTASH_PORT1=5000 (Beat)
    LOGSTASH_PORT2=9600 (Monitoring)
    KIBANA_PORT=5601
```

In [None]:
PROJECT_NAME= 'test1'

k8s.create_project('elk', PROJECT_NAME, DATALAB_DATA_DIR, DATALAB_SOURCECODE_DIR,
                   elasticPort='9200', logstashPort1='5000', logstashPort2='9600', kibanaPort='5601')
                   

Once pull has completed and containers are running, startup might take 1-2 minutes!

In order to complete the setup, you need to configure the default user for Kibana in Elasticsearch. To do this, run the following command in a shell on your notebook. Make sure to replace `<PROJECT_NAME>` with the name of your project before running the command.

In [None]:
kubectl exec <PROJECT_NAME>-elastic -c elasticsearch -- bin/elasticsearch-create-enrollment-token --scope kibana

You will get an enrollment token in the form `eyJ2ZXIiOiI4L...1RHcifQ==` that is needed to connect Kibana with Elastic.

Next, get the 6-digit code from Kibana to authenticate the first time. This can be done with


In [3]:
# get Kibana setup URL incl. code
k8s.get_kibana_setup_url('test1')


'Go to https://localhost/test1-kibana/?code=037625 to get started.'

In [None]:
Now, open your browser at the URL given above (e.g. `http://localhost/test1-kibana/?code=123123`) and paste the enrollment token into the form. Press enter. This will connect Kibana with Elasticsearch and finish the setup.

Next, run the following command in a shell on your notebook. This will create a user named "superuser" with password "admin1" and the role "superuser". You can use this user to login to Kibana.

In [None]:
kubectl exec <PROJECT_NAME>-elastic -c elasticsearch -- bin/elasticsearch-users useradd superuser -p admin1 -r superuser

#### Stop and remove the stack (Elasticsearch and Kibana data will be retained)

In [None]:
# delete project in default namespace "default"
k8s.delete_project('test1')

#### Delete all Elasticsearch and Kibana data

Simply remove the specific directory `<PROJECT_NAME>-elastic` from your data directory (i.e. DATALAB_DATA_DIR) on your laptop

In [None]:
directoryToBeRemoved = DATALAB_DATA_DIR + '/' + PROJECT_NAME + '-elastic'
print('Remove the directory .. "' + directoryToBeRemoved + '" on your notebook.')

***
## PostgreSQL Database using SQLAlchemy

* [PostgreSQL](https://en.wikipedia.org/wiki/PostgreSQL) is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.
* [SQLAlchemy](https://www.sqlalchemy.org/) is a GREAT Python wrapper to talk to almost any database.


Check out the [Database getting started Jupyter notebook](database_getting_started.ipynb) for code snippets!

### Manage the Stack
Create a volume to persist all data

As Rancher Desktop is able to mount local paths into the Kubernetes cluster, you simply will have to create a directory called `<PROJECT_NAME>-postgres` inside your data directory (i.e. DATALAB_DATA_DIR) **before** you start the Postgres database.

In [None]:
print (DATALAB_DATA_DIR)

Start the stack

In [None]:
PROJECT_NAME= 'test2'

k8s.create_project('postgres', PROJECT_NAME, DATALAB_DATA_DIR, DATALAB_SOURCECODE_DIR,
                   postgresPort='5432')


Stop and remove the stack (database will be retained)

In [None]:
# delete project in default namespace "default"
k8s.delete_project('test2')

Delete the actual database and thus all Postgres data


Simply remove the specific directory `<PROJECT_NAME>-postgres` from your data directory (i.e. DATALAB_DATA_DIR) on your laptop

In [None]:
directoryToBeRemoved = DATALAB_DATA_DIR + '/' + PROJECT_NAME + '-postgres'
print('Remove the directory .. "' + directoryToBeRemoved + '" on your notebook.')

***
# MySQL
* [MySQL](https://www.mysql.com) is another popular database.
* [SQLAlchemy](https://www.sqlalchemy.org/) is a GREAT Python wrapper to talk to almost any database.

Check out the [Database getting started Jupyter notebook](database_getting_started.ipynb) for code snippets!

### Manage the stack
Create a volume to persist all data

As Rancher Desktop is able to mount local paths into the Kubernetes cluster, you simply will have to create a directory called `<PROJECT_NAME>-mysql` inside your data directory (i.e. DATALAB_DATA_DIR) **before** you start the Postgres database.

In [None]:
print (DATALAB_DATA_DIR)

Start the stack

In [None]:
PROJECT_NAME= 'test3'

k8s.create_project('mysql', PROJECT_NAME, DATALAB_DATA_DIR, DATALAB_SOURCECODE_DIR,
                   mySqlPort='3306')


Stop and remove the stack (database will be retained)

In [None]:
# delete project in default namespace "default"
k8s.delete_project('test3')

Delete the actual database and thus all MySQL data

Simply remove the specific directory `<PROJECT_NAME>-mysql` from your data directory (i.e. DATALAB_DATA_DIR) on your laptop

In [None]:
directoryToBeRemoved = DATALAB_DATA_DIR + '/' + PROJECT_NAME + '-mysql'
print('Remove the directory .. "' + directoryToBeRemoved + '" on your notebook.')

***
## Neo4j
[Neo4j](https://neo4j.com/) is the leading graph database platform. The two plugins [APOC](https://neo4j.com/developer/neo4j-apoc/) and [Graph Data Science](https://neo4j.com/docs/graph-data-science/current/) are included in the stack. All data is saved into a new directory `neo4j` in your `DATALAB_DATA_DIR`.
* Neo4j web GUI: http://localhost:7474
* Bolt access: http://localhost:7687

Neo4j features powerful plugins. You probably want to download [Awesome Procedures APOC](https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases) and/or the [Graph Data Science Library](https://github.com/neo4j/graph-data-science/releases). Simply save the `*.tar` file into the folder `./datalab-stacks/neo4j/plugins` **BEFORE** you start the container.

### Manage the Stack
As Rancher Desktop is able to mount local paths into the Kubernetes cluster, you simply will have to create a directory called `<PROJECT_NAME>-neo4j` inside your data directory (i.e. DATALAB_DATA_DIR) **before** you start the Postgres database.

In [None]:
print (DATALAB_DATA_DIR)

Start the stack. Note: we assume that you saved the entire datalab in a subfolder `datalab` of your `DATALAB_SOURCECODE_DIR` for plugins to work.

In [None]:
PROJECT_NAME= 'test4'

k8s.create_project('neo4j', PROJECT_NAME, DATALAB_DATA_DIR, DATALAB_SOURCECODE_DIR,
                   neo4jHttpPort='7474', neo4jBoltPort='7687')


Stop and remove the stack (database will be retained)

In [None]:
# delete project in default namespace "default"
k8s.delete_project('test4')

Delete the actual database and thus all Neo4j data

Simply remove the specific directory `<PROJECT_NAME>-neo4j` from your data directory (i.e. DATALAB_DATA_DIR) on your laptop

In [None]:
directoryToBeRemoved = DATALAB_DATA_DIR + '/' + PROJECT_NAME + '-neo4j'
print('Remove the directory .. "' + directoryToBeRemoved + '" on your notebook.')

***
# Manipulate your Kubernetes environment

Show all existing pods

In [None]:
from modules import k8swrapper as k8s

# list pods in default namespace "default"
k8s.list_pods(k8s.get_pods_namespace())

Show all Docker images including their filesizes

In [None]:
! sudo docker images

Show all volumes (=data volumes if you choose to not mount a Windows directory, for example):

In [None]:
! sudo docker volume ls

In desperate need to figure out what's eating up your disk space? This command shows where Docker is using disk space:

In [None]:
! sudo docker system df -v

#### Manipulate a container
Set a container name (or CONTAINER ID) first

In [None]:
container = "jupyter"

Stop the container

In [None]:
! sudo docker stop $container

Get the running container's logs saved to the Python variable `logoutput`

In [None]:
logoutput = ! sudo docker logs $container

Restart an existing (currently stopped) container

In [None]:
! sudo docker start $container

Remove the container completely

In [None]:
! sudo docker rm $container

#### Cleaning up and freeing disk space
Remove an image (give either it's name or IMAGE ID)

In [None]:
image = "test"
! sudo docker image rm $image

Remove all stopped containers at once

In [None]:
! sudo docker container prune --force

Remove a volume (=data volume, thus potentially deleting your data!):

In [None]:
volume = "test"
! sudo docker volume rm $volume

**Danger zone**: remove all stopped containers, and all images and all volumes that are currently not associated/mounted with a **running container**. Type the following manually:
* Delete all stopped containers, all "dangling" images, the build cache, any unattached network: ```! sudo docker system prune --force```
* To also delete all currently unused images: ```! sudo docker system prune --all --force```
* To also delete all currently unused volumes (potentially deleting your data!): ```! sudo docker system prune --volumes --force```