#  Run on Kubernetes

Run benchmarking on a kubernetes cluster with the given configuration.

Talks to kubernetes to create `n` amount of new `pods` with a dask worker inside of each
forming a `dask` cluster. Then, a function specified from `config` is being imported and
run with the given arguments. The tasks created by this `function` are being run on the
`dask` cluster for distributed computation.

The config dict must contain the following sections:
* run
* dask_cluster

Within the `run` section you need to specify:
* function:
    The complete python path to the function to be run.
* args:
    A dictionary containing the keyword args that will be used with the given function.

Within the `dask_cluster` section you can specify:
* workers:
    The amount of workers to use.
   
* worker_config: A dictionary with the following keys:
    * resources: A dictionary containig the following keys:
        * memory:
            The amount of RAM memory.
        * cpu:
            The amount of cpu's to use.
    * image: A docker image to be used (optional).
    * setup: A dictionary containing the following keys:
        * script: Location to bash script from the docker container to be run.
        * git_repository: A dictionary containing the following keys:
            * url: Link to the github repository to be cloned.
            * reference: A reference to the branch or commit to checkout at.
            * install: command run to install this repository.
        * pip_packages: A list of pip packages to be installed.
        * apt_packages: A list of apt packages to be installed.

In [1]:
config = {
    'run': {
        'function': 'btb_benchmark.main.run_benchmark',
        'args': {
            'iterations': 1,
            'sample': 1,
            'tuners': 'BTB.UniformTuner',
            'challenge_types': 'xgboost',
            'detailed_output': True,
        }
    },
    'dask_cluster': {
        'workers': {
            'maximum': 1
        },
        'worker_config': {
            'resources': {
                'memory': '2G',
                'cpu': 1
            },
            'image': 'pythiac/btb_benchmark:latest',
        },
    },
}

After we created our config dictionary, we can now run `run_benchmark` with the specified config

In [13]:
from btb_benchmark.kubernetes import run_on_kubernetes

results = run_on_kubernetes(config)

distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at: tcp://192.168.1.132:46093
distributed.scheduler - INFO -   dashboard at:                     :8787
distributed.scheduler - INFO - Receive client connection: Client-a1ec5e28-8bd4-11ea-af24-00d8610cc1df
distributed.core - INFO - Starting established connection


[                                        ] | 0% Completed |  3.5s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.115:42391', name: tcp://10.244.0.115:42391, memory: 0, processing: 8>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.115:42391
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 16.1s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.116:42895', name: tcp://10.244.0.116:42895, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.116:42895
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 17.5s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.117:35259', name: tcp://10.244.0.117:35259, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.117:35259
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 18.8s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.118:40021', name: tcp://10.244.0.118:40021, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.118:40021
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 20.3s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.119:40259', name: tcp://10.244.0.119:40259, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.119:40259
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 21.6s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.120:44031', name: tcp://10.244.0.120:44031, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.120:44031
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 24.1s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.121:37587', name: tcp://10.244.0.121:37587, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.121:37587
distributed.core - INFO - Starting established connection


[##########                              ] | 25% Completed | 25.6s

distributed.scheduler - INFO - Register worker <Worker 'tcp://10.244.0.122:40581', name: tcp://10.244.0.122:40581, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://10.244.0.122:40581
distributed.core - INFO - Starting established connection


[########################################] | 100% Completed |  7min 13.8s

distributed.scheduler - INFO - Remove client Client-a1ec5e28-8bd4-11ea-af24-00d8610cc1df
distributed.scheduler - INFO - Remove client Client-a1ec5e28-8bd4-11ea-af24-00d8610cc1df
distributed.scheduler - INFO - Close client connection: Client-a1ec5e28-8bd4-11ea-af24-00d8610cc1df
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms
distributed.scheduler - INFO - Remove worker <Worker 'tcp://10.244.0.115:42391', name: tcp://10.244.0.115:42391, memory: 0, processing: 0>
distributed.core - INFO - Removing comms to tcp://10.244.0.115:42391
