Skip to content

Commit

Permalink
V0.3.9 (#29)
Browse files Browse the repository at this point in the history
* Masterscript: AWS - store log of last stop and clean
* Masterscript: AWS - log volume key
* Masterscript: Querymanagement
* Masterscript: Read experiments log 
* Masterscript: Test if experiments path exists 
* Masterscript: setCode for continuing experiments 
* Masterscript: K8s more infos about host - monitoring metrics of multiple gpus
* Masterscript: K8s - getTimediff 
* Masterscript: K8s more infos about host - debugging 
* Masterscript: K8s more infos about host - monitoring metrics 
* Masterscript: K8s more infos about host - gpu ids and node 
* Masterscript: K8s catch missing portforwarding in stop 
* Masterscript: K8s escape $ in some kubectl commands 
* Masterscript: K8s getMemory() for python 03. Jul 
* Masterscript: K8s kubectl as list, Popen, Shell 
* Masterscript: AWS - set alias 
* Masterscript: AWS - add / to config path 
* Masterscript: K8s - delay in runExperiment() 
* Docs: Alternative workflows, (un)park, experiments.config 
* Masterscript: K8s - add / to config path 
* Masterscript: Load experiment workflow 
* Masterscript: K8s - remove C: from download 
* Masterscript: AWS - log workflow 
* Masterscript: Load experiment workflow 
* Masterscript: K8s - log workflow 
* Masterscript: K8s - delay in prepare and start, log experimental steps 
* Docs: Monitoring, reporting, experiments.config 
* Masterscript: AWS - monitoring exporters in config 
* Masterscript: K8s - allow different query files for same workload 
* Masterscript: runReporting() for AWS 
* Masterscript: More details for k8s 
* Masterscript: Prevent crash at missing data 
* Masterscript: Init paths for K8s corrected 
* Docs: Minor extensions 
* Masterscript: SSH bug in k8s init 
* Masterscript: Wait 30s after unpark 
* Masterscript: Wait after unpark 
* Masterscript: UnparkExperiment removes old docker 
* Masterscript: ParkExperiment removes old parked 
* Masterscript: Default connectionname for unpark 
* Masterscript: Write connection prettily 
* Masterscript: Sketch park/unpark experiment 
* Hardware: RAM in bytes from /proc/meminfo 
* Prepare new version 
* Hardware: RAM in bytes, disk space in Kb
  • Loading branch information
perdelt committed Dec 31, 2019
1 parent f9d71d3 commit 79521cc
Show file tree
Hide file tree
Showing 5 changed files with 441 additions and 101 deletions.
89 changes: 85 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,11 @@ This document
* [Run Benchmarks](#run-benchmarks)
* [Stop an Experiment](#stop-experiment)
* [Clean an Experiment](#clean-experiment)
* shows [alternative workflows](#alternative-workflows)
* [Parking DBMS at AWS](#parking-dbms-at-aws)
* [Rerun a List of Experiments](#rerun-a-list-of-experiments)

This module has been tested with docker images of Brytlyt, MariaDB, MemSQL, Mariadb, MonetDB, OmniSci and PostgreSQL.
This module has been tested with docker images of Brytlyt, MariaDB, MemSQL, MonetDB, OmniSci and PostgreSQL.

## Concepts

Expand Down Expand Up @@ -360,6 +363,11 @@ We additionally need
'monitor': {
'grafanatoken': 'Bearer 46363756756756476754756745', # Grafana: Access Token
'grafanaurl': 'http://127.0.0.1:3000/api/datasources/proxy/1/api/v1/', # Grafana: API URL
'exporter': {
'dcgm': docker run --runtime=nvidia --name gpu_monitor_dcgm --rm -d --publish 8000:8000 1234.dkr.ecr.eu-central-1.amazonaws.com/name/dcgm:latest',
'nvlink': 'docker run --runtime=nvidia --name gpu_monitor_nvlink --rm -d --publish 8001:8001 1234.dkr.ecr.eu-central-1.amazonaws.com/name/nvlink:latest',
'node': 'docker run --name cpu_monitor_prom --rm -d --publish 9100:9100 prom/node-exporter:latest'
}
},
'worker': {
'ip': '127.1.2.3', # Elastic IP: IP Address
Expand Down Expand Up @@ -423,7 +431,19 @@ This requires
* required ports open
* EIP for attaching to the current experiment host
* EBS volumes containing raw data
* ECR for simple docker registry
* Optionally: ECR for simple docker registry

#### Monitoring

Monitoring requires
* A server having Prometheus installed
* With Prometheus scraping the fixed EIP for a fixed list of ports
* A server (the same) having Grafana installed
* With Grafana importing metrics from Prometheus
* `grafanatoken` and `grafanaurl` to access this from DBMSBenchmarker
* A dict of exporters given as docker commands
* Will be installed and activated automatically at each instance when `cluster.prepareExperiment()` is invoked.


## API Details

Expand Down Expand Up @@ -554,6 +574,8 @@ The command `cluster.runBenchmarks()` runs an [external benchmark tool](https://
<img src="https://github.com/Beuth-Erdelt/DBMS-Benchmarker/raw/master/docs/Concept-Benchmarking.png" width="320">
</p>

#### Connectionname and Client Configurations

This tool provides the benchmark tool information about the installed experiment host.
This information is packed into a so called connection, which is identified by it's name.
The default connection name is given as `cluster.docker+"-"+cluster.script+"-"+cluster.instance+'-'+cluster.name`.
Expand All @@ -571,8 +593,29 @@ cluster.connectionmanagement['numProcesses'] = 8
cluster.connectionmanagement['runsPerConnection'] = 5
cluster.connectionmanagement['timeout'] = 1200
```

#### Collect Results

For each setup of experiments there is a unique code for identification.
DBMSBenchmarker generates this code when the first experiment is run.
All experiments belonging together will be stored in a folder having this code as name.
It is also possible to continue an experiment by giving `cluster.code`.

For more information about that, please consult the docs of the benchmark tool: https://github.com/Beuth-Erdelt/DBMS-Benchmarker#connection-file

The result folder also contains
* Copies of deployment yaml used to prepare K8s pods
* A list of dicts in a file `experiment.config`, which lists all experiment steps and
* Cluster information
* Host settings: Instances, volumes, init scripts and DBMS docker data
* Benchmark settings: Connectionmanagement

and that allow to [rerun the experiments](#rerun-a-list-of-experiments).

**Note this means it stores confidential informations**

#### Collect Host Informations

Some information is given by configuration (JDBC data e.g.), some is collected from the experiment host:
```
cluster.getMemory()
Expand All @@ -598,10 +641,14 @@ Most of these run inside the docker container:
* `cluster.getCUDA()`: Collects `nvidia-smi | grep \'CUDA\'`
* `cluster.getGPUs()`: Collects `nvidia-smi -L` and then aggregates the type using `Counter([x[x.find(":")+2:x.find("(")-1] for x in l if len(x)>0])`
* `cluster.copyInits()`: Copy init scripts to benchmark result folder on host
* `cluster.copyLog()`: Copy dbms logs to benchmark result folder on host
* `cluster.copyLog()`: Copy DBMS logs to benchmark result folder on host
* `cluster.downloadLog()`: Downloads the benchmark result folder from host to local result folder

The external tool also does the reporting, and it uses these informations among others.
#### Reporting

The external tool also does reporting, and it uses the host informations among others.
Reporting can be started by `cluster.runReporting()`.
This generates reports about all experiments that have been stored in the same code.

### Stop Experiment

Expand Down Expand Up @@ -686,4 +733,38 @@ cluster.stopInstance()
* `cluster.stopInstance()`: Stops the instance


## Alternative Workflows

### Parking DBMS at AWS

An alternative workflow is to not (un)install the DBMS every time they are used, but to park the docker containers:

```
cluster.setExperiment()
cluster.prepareExperiment()
cluster.unparkExperiment()
cluster.runBenchmarks()
cluster.parkExperiment()
cluster.cleanExperiment()
```

* `parkExperiment()`: The docker container is stopped and renamed from `bechmark` to `benchmark-connectionname`, where `connectionname` is the name given for benchmarking.
* `unparkExperiment()`: The docker container is renamed from `benchmark-connectionname` to `benchmark` and restarted

This allows to keep the prepared docker containers including loaded data.
We can retrieve a list of all parked containers using `cluster.listDocker()`.
To remove all parked containers we can invoke `cluster.stopExperiment()`.

This only works for AWS since in K8s the DBMS is an essential part of the instance (pod).

### Rerun a List of Experiments

When we run a workflow using `runExperiment()` or the composing methods, all steps are logged and stored as a Python dict in the result folder of DBMSBenchmarker.

We may want to rerun the same experiment in all steps.
This needs the cluster config file and the name (`code`) of the result folder:
```
workflow = experiments.workflow(clusterconfig='cluster.config', code=code)
workflow.runWorkflow()
workflow.cluster.runReporting()
```
2 changes: 1 addition & 1 deletion bexhoma/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""
The clustermanager module
"""
__all__ = ["masterAWS", "masterK8s"]
__all__ = ["masterAWS", "masterK8s", "experiments"]
63 changes: 63 additions & 0 deletions bexhoma/experiments.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
from os import makedirs, path
import ast
from bexhoma import masterAWS, masterK8s

class workflow():
def __init__(self, clusterconfig='', resultfolder='', code=None):
with open(clusterconfig) as f:
configfile=f.read()
self.config = eval(configfile)
if len(resultfolder) == 0:
self.resultfolder = self.config['benchmarker']['resultfolder']
else:
self.resultfolder = resultfolder
self.clusterconfig = clusterconfig
self.code = code
filename = self.resultfolder+'/'+str(self.code)+'/experiments.config'
if path.isfile(filename):
with open(filename,'r') as inp:
self.experiments = ast.literal_eval(inp.read())
if self.experiments[0]['clustertype'] == 'K8s':
self.cluster = masterK8s.testdesign(
clusterconfig=clusterconfig,#experiments[0]['clusterconfig'],
configfolder=self.experiments[0]['configfolder'],
#configfolder=self.resultfolder+'/'+code+'/',#experiments[0]['configfolder'],
yamlfolder=self.resultfolder+'/'+self.code+'/',#experiments[0]['yamlfolder'],
#queryfile=experiments[0]['queryfile']
)
def runWorkflow(self):
for i,e in enumerate(self.experiments):
print(e['step'])
step = e['step']
if step == 'prepareExperiment':
#print(e['docker'])
#print(e['instance'])
#print(e['volume'])
#print(e['initscript'])
self.cluster.setExperiment(
docker=list(e['docker'].keys())[0],
instance=e['instance'],
volume=e['volume'],
script=list(e['initscript'].keys())[0],
)
self.cluster.prepareExperiment()
self.cluster.delay(e['delay'])
if step == 'startExperiment':
self.cluster.setExperiment(
docker=list(e['docker'].keys())[0],
instance=e['instance'],
volume=e['volume'],
script=list(e['initscript'].keys())[0],
)
self.cluster.startExperiment()
self.cluster.delay(e['delay'])
if step == 'stopExperiment':
self.cluster.stopExperiment()
if step == 'cleanExperiment':
self.cluster.cleanExperiment()
if step == 'runBenchmarks':
self.cluster.connectionmanagement = e['connectionmanagement']
self.cluster.runBenchmarks(
connection=e['connection'],
configfolder=self.resultfolder+'/'+str(self.code))

0 comments on commit 79521cc

Please sign in to comment.