V0.3.9 (#29)

* Masterscript: AWS - store log of last stop and clean * Masterscript: AWS - log volume key * Masterscript: Querymanagement * Masterscript: Read experiments log * Masterscript: Test if experiments path exists * Masterscript: setCode for continuing experiments * Masterscript: K8s more infos about host - monitoring metrics of multiple gpus * Masterscript: K8s - getTimediff * Masterscript: K8s more infos about host - debugging * Masterscript: K8s more infos about host - monitoring metrics * Masterscript: K8s more infos about host - gpu ids and node * Masterscript: K8s catch missing portforwarding in stop * Masterscript: K8s escape $ in some kubectl commands * Masterscript: K8s getMemory() for python 03. Jul * Masterscript: K8s kubectl as list, Popen, Shell * Masterscript: AWS - set alias * Masterscript: AWS - add / to config path * Masterscript: K8s - delay in runExperiment() * Docs: Alternative workflows, (un)park, experiments.config * Masterscript: K8s - add / to config path * Masterscript: Load experiment workflow * Masterscript: K8s - remove C: from download * Masterscript: AWS - log workflow * Masterscript: Load experiment workflow * Masterscript: K8s - log workflow * Masterscript: K8s - delay in prepare and start, log experimental steps * Docs: Monitoring, reporting, experiments.config * Masterscript: AWS - monitoring exporters in config * Masterscript: K8s - allow different query files for same workload * Masterscript: runReporting() for AWS * Masterscript: More details for k8s * Masterscript: Prevent crash at missing data * Masterscript: Init paths for K8s corrected * Docs: Minor extensions * Masterscript: SSH bug in k8s init * Masterscript: Wait 30s after unpark * Masterscript: Wait after unpark * Masterscript: UnparkExperiment removes old docker * Masterscript: ParkExperiment removes old parked * Masterscript: Default connectionname for unpark * Masterscript: Write connection prettily * Masterscript: Sketch park/unpark experiment * Hardware: RAM in bytes from /proc/meminfo * Prepare new version * Hardware: RAM in bytes, disk space in Kb
Beuth-Erdelt · Dec 31, 2019 · 79521cc · 79521cc
1 parent f9d71d3
commit 79521cc
Show file tree

Hide file tree

Showing 5 changed files with 441 additions and 101 deletions.
diff --git a/README.md b/README.md
@@ -17,8 +17,11 @@ This document
   * [Run Benchmarks](#run-benchmarks)
   * [Stop an Experiment](#stop-experiment)
   * [Clean an Experiment](#clean-experiment)
+* shows [alternative workflows](#alternative-workflows)
+  * [Parking DBMS at AWS](#parking-dbms-at-aws)
+  * [Rerun a List of Experiments](#rerun-a-list-of-experiments)
 
-This module has been tested with docker images of Brytlyt, MariaDB, MemSQL, Mariadb, MonetDB, OmniSci and PostgreSQL.
+This module has been tested with docker images of Brytlyt, MariaDB, MemSQL, MonetDB, OmniSci and PostgreSQL.
 
 ## Concepts
 
@@ -360,6 +363,11 @@ We additionally need
         'monitor': {
             'grafanatoken': 'Bearer 46363756756756476754756745', # Grafana: Access Token
             'grafanaurl': 'http://127.0.0.1:3000/api/datasources/proxy/1/api/v1/', # Grafana: API URL
+            'exporter': {
+                'dcgm': docker run --runtime=nvidia --name gpu_monitor_dcgm --rm -d --publish 8000:8000 1234.dkr.ecr.eu-central-1.amazonaws.com/name/dcgm:latest',
+                'nvlink': 'docker run --runtime=nvidia --name gpu_monitor_nvlink --rm -d --publish 8001:8001 1234.dkr.ecr.eu-central-1.amazonaws.com/name/nvlink:latest',
+                'node': 'docker run --name cpu_monitor_prom --rm -d --publish 9100:9100 prom/node-exporter:latest'
+            } 
         },
         'worker': {
             'ip': '127.1.2.3', # Elastic IP: IP Address
@@ -423,7 +431,19 @@ This requires
   * required ports open
 * EIP for attaching to the current experiment host
 * EBS volumes containing raw data
-* ECR for simple docker registry
+* Optionally: ECR for simple docker registry
+
+#### Monitoring
+
+Monitoring requires
+* A server having Prometheus installed
+  * With Prometheus scraping the fixed EIP for a fixed list of ports
+* A server (the same) having Grafana installed
+  * With Grafana importing metrics from Prometheus
+  * `grafanatoken` and `grafanaurl` to access this from DBMSBenchmarker
+* A dict of exporters given as docker commands
+  * Will be installed and activated automatically at each instance when `cluster.prepareExperiment()` is invoked.
+
 
 ## API Details
 
@@ -554,6 +574,8 @@ The command `cluster.runBenchmarks()` runs an [external benchmark tool](https://
     <img src="https://github.com/Beuth-Erdelt/DBMS-Benchmarker/raw/master/docs/Concept-Benchmarking.png" width="320">
 </p>
 
+#### Connectionname and Client Configurations
+
 This tool provides the benchmark tool information about the installed experiment host.
 This information is packed into a so called connection, which is identified by it's name.  
 The default connection name is given as `cluster.docker+"-"+cluster.script+"-"+cluster.instance+'-'+cluster.name`.  
@@ -571,8 +593,29 @@ cluster.connectionmanagement['numProcesses'] = 8
 cluster.connectionmanagement['runsPerConnection'] = 5
 cluster.connectionmanagement['timeout'] = 1200
 ```
+
+#### Collect Results
+
+For each setup of experiments there is a unique code for identification.
+DBMSBenchmarker generates this code when the first experiment is run.
+All experiments belonging together will be stored in a folder having this code as name.
+It is also possible to continue an experiment by giving `cluster.code`.
+
 For more information about that, please consult the docs of the benchmark tool: https://github.com/Beuth-Erdelt/DBMS-Benchmarker#connection-file
 
+The result folder also contains
+* Copies of deployment yaml used to prepare K8s pods
+* A list of dicts in a file `experiment.config`, which lists all experiment steps and
+  * Cluster information
+  * Host settings: Instances, volumes, init scripts and DBMS docker data
+  * Benchmark settings: Connectionmanagement
+
+  and that allow to [rerun the experiments](#rerun-a-list-of-experiments).
+
+**Note this means it stores confidential informations**
+
+#### Collect Host Informations
+
 Some information is given by configuration (JDBC data e.g.), some is collected from the experiment host:
 ```
 cluster.getMemory()
@@ -598,10 +641,14 @@ Most of these run inside the docker container:
 * `cluster.getCUDA()`: Collects `nvidia-smi | grep \'CUDA\'`
 * `cluster.getGPUs()`: Collects `nvidia-smi -L` and then aggregates the type using `Counter([x[x.find(":")+2:x.find("(")-1] for x in l if len(x)>0])`
 * `cluster.copyInits()`: Copy init scripts to benchmark result folder on host
-* `cluster.copyLog()`: Copy dbms logs to benchmark result folder on host
+* `cluster.copyLog()`: Copy DBMS logs to benchmark result folder on host
 * `cluster.downloadLog()`: Downloads the benchmark result folder from host to local result folder
 
-The external tool also does the reporting, and it uses these informations among others.
+#### Reporting
+
+The external tool also does reporting, and it uses the host informations among others.
+Reporting can be started by `cluster.runReporting()`.
+This generates reports about all experiments that have been stored in the same code.
 
 ### Stop Experiment
 
@@ -686,4 +733,38 @@ cluster.stopInstance()
 * `cluster.stopInstance()`: Stops the instance
 
 
+## Alternative Workflows
+
+### Parking DBMS at AWS
+
+An alternative workflow is to not (un)install the DBMS every time they are used, but to park the docker containers:
 
+```
+cluster.setExperiment()
+cluster.prepareExperiment()
+cluster.unparkExperiment()
+cluster.runBenchmarks()
+cluster.parkExperiment()
+cluster.cleanExperiment()
+```
+
+* `parkExperiment()`: The docker container is stopped and renamed from `bechmark` to `benchmark-connectionname`, where `connectionname` is the name given for benchmarking.
+* `unparkExperiment()`: The docker container is renamed from `benchmark-connectionname` to `benchmark` and restarted
+
+This allows to keep the prepared docker containers including loaded data.
+We can retrieve a list of all parked containers using `cluster.listDocker()`.
+To remove all parked containers we can invoke `cluster.stopExperiment()`.
+
+This only works for AWS since in K8s the DBMS is an essential part of the instance (pod).
+
+### Rerun a List of Experiments
+
+When we run a workflow using `runExperiment()` or the composing methods, all steps are logged and stored as a Python dict in the result folder of DBMSBenchmarker.
+
+We may want to rerun the same experiment in all steps.
+This needs the cluster config file and the name (`code`) of the result folder:  
+```
+workflow = experiments.workflow(clusterconfig='cluster.config', code=code)
+workflow.runWorkflow()
+workflow.cluster.runReporting()
+```
diff --git a/bexhoma/__init__.py b/bexhoma/__init__.py
@@ -1,4 +1,4 @@
 """
 The clustermanager module
 """
-__all__ = ["masterAWS", "masterK8s"]
+__all__ = ["masterAWS", "masterK8s", "experiments"]
diff --git a/bexhoma/experiments.py b/bexhoma/experiments.py
@@ -0,0 +1,63 @@
+from os import makedirs, path
+import ast
+from bexhoma import masterAWS, masterK8s
+
+class workflow():
+    def __init__(self, clusterconfig='', resultfolder='', code=None):
+        with open(clusterconfig) as f:
+            configfile=f.read()
+            self.config = eval(configfile)
+        if len(resultfolder) == 0:
+            self.resultfolder = self.config['benchmarker']['resultfolder']
+        else:
+            self.resultfolder = resultfolder
+        self.clusterconfig = clusterconfig
+        self.code = code
+        filename = self.resultfolder+'/'+str(self.code)+'/experiments.config'
+        if path.isfile(filename):
+            with open(filename,'r') as inp:
+                self.experiments = ast.literal_eval(inp.read())
+        if self.experiments[0]['clustertype'] == 'K8s':
+            self.cluster = masterK8s.testdesign(
+                clusterconfig=clusterconfig,#experiments[0]['clusterconfig'],
+                configfolder=self.experiments[0]['configfolder'],
+                #configfolder=self.resultfolder+'/'+code+'/',#experiments[0]['configfolder'],
+                yamlfolder=self.resultfolder+'/'+self.code+'/',#experiments[0]['yamlfolder'],
+                #queryfile=experiments[0]['queryfile']
+                )
+    def runWorkflow(self):
+        for i,e in enumerate(self.experiments):
+            print(e['step'])
+            step = e['step']
+            if step == 'prepareExperiment':
+                #print(e['docker'])
+                #print(e['instance'])
+                #print(e['volume'])
+                #print(e['initscript'])
+                self.cluster.setExperiment(
+                    docker=list(e['docker'].keys())[0],
+                    instance=e['instance'],
+                    volume=e['volume'],
+                    script=list(e['initscript'].keys())[0],
+                    )
+                self.cluster.prepareExperiment()
+                self.cluster.delay(e['delay'])
+            if step == 'startExperiment':
+                self.cluster.setExperiment(
+                    docker=list(e['docker'].keys())[0],
+                    instance=e['instance'],
+                    volume=e['volume'],
+                    script=list(e['initscript'].keys())[0],
+                    )
+                self.cluster.startExperiment()
+                self.cluster.delay(e['delay'])
+            if step == 'stopExperiment':
+                self.cluster.stopExperiment()
+            if step == 'cleanExperiment':
+                self.cluster.cleanExperiment()
+            if step == 'runBenchmarks':
+                self.cluster.connectionmanagement = e['connectionmanagement']
+                self.cluster.runBenchmarks(
+                    connection=e['connection'],
+                    configfolder=self.resultfolder+'/'+str(self.code))
+