Skip to content

Commit

Permalink
V0.6.7 (#201)
Browse files Browse the repository at this point in the history
* Masterscript: ENVs for statefulsets

* Masterscript: ENVs for statefulsets only adding

* Masterscript: ENVs for sut-job

* Masterscript: VolumeMounts optional in sut deployment

* Masterscript: Volumes optional in sut deployment

* Masterscript: service_name for loading scripts as argument

* Masterscript: volumeClaimTemplates for statefulset reads storage requests

* Masterscript: volumeClaimTemplates as a list

* Masterscript: SUT's services can have component names different from selector

* Masterscript: Catch more expections in get_host_diskspace_used_data()

* Masterscript: storage_parameter in connection infos

* Tool: Show worker volumes

* Update README.md

JOSS draft badge

* Masterscript: Also remove worker storage after experiment

* Masterscript: Label DBMS in job

* Masterscript: Label DBMS in monitoring

* Docs: CockroachDB tested successfully

* Masterscript: Do not retry delete_pvc() if pvc cannot be found

* Masterscript: Name of worker storage contains experiment code

* Masterscript: Find worker pvc by labels

* Masterscript: benchmarking_parameters per experiment and configuration

* Masterscript: Wait 5s before (re)checking status of workers

* Masterscript: Remove worker pv if not wanted

* Masterscript: Loading waits 60 secs for all pods

* Masterscript: Benchbase similar to ycsb

* Masterscript: Benchbase convert results to df

* Masterscript: Benchbase collect results into df

* Masterscript: Benchbase collect results into df and set index

* Masterscript: Benchbase uses name_format for results

* Masterscript: Fix container to dashboard when copying to result component

* Masterscript: Accept successful pods as completed

* Masterscript: Benchbase all collect results into single df at end of benchmark

* Masterscript: Benchbase no results for loading phase and benchmarker results per job only

* Masterscript: YCSB all collect results into single df at end of benchmark

* Masterscript: Debug messages about evaluation

* Masterscript: BEXHOMA_CONNECTION set to connection for benchmarker, to configuration otherwise

* Masterscript: HammerDB merge results into df

* Masterscript: HammerDB merge results into dfm, differ between connection and configuration

* Masterscript: BEXHOMA_CONNECTION test for benchbase

* Masterscript: Benchbase merge results into dfm, differ between connection and configuration

* Masterscript: Debug messages about evaluation for YCSB collected dfs

* Masterscript: HammerDB extract pod name

* Masterscript: YCSB extract pod name

* Masterscript: YCSB dump more information

* Masterscript: HammerDB extract pod name

* Masterscript: YCSB extract pod name

* Masterscript: YCSB evaluation improved

* Masterscript: Benchbase evaluation improved

* Masterscript: BEXHOMA meta data in job envs

* Masterscript: HammerDB concat dbms infos

* Masterscript: Fetch metrics for specific connection

* Masterscript: HammerDB also keep config file for single connection

* Masterscript: Benchbase requests schema file

* Masterscript: All experiments keep config file for single connection

* Masterscript: Allow all job ENVs to be overwritten

* Masterscript: BEXHOMA_CLIENT set to number of benchmarker client

* Masterscript: Fetch metrics for specific connection for all benchmarker

* Masterscript: BEXHOMA_CLIENT set to number of benchmarker client, thus is 0 during loading

* Masterscript: Show list of open benchmarks per configuration

* Masterscript: NEVER rerun, only one connection in config for detached - all benchmarker collect all dbms in one connection file

* Masterscript: Copy connection file for connection specified by name

* Masterscript: Copy connection file for connection specified by name

* Masterscript: fetch_metrics_loading() for connection

* Masterscript: fetch_metrics_loading() for connection, run in dashboard pod

* Masterscript: set connection file name

* Masterscript: fetch_metrics_loading() after loading, dump results

* Build script for Docker images

* Python 3.11.5 instead of 3.12 because of bug in setuptools

* Benchmarker: Less output

* DBMS: YugyByteDB dummy deployment

* Docs: YCSB at entry page

* Docs: scaled-out drivers at entry page

* Docs: TPC-C at entry page

* Docs: scaled-out drivers at entry page

* Docs: Example: Run a custom SQL workload

* # Conflicts:
#	README.md
#	bexhoma/configurations.py
#	bexhoma/experiments.py

* requirements: no nbconvert

* requirements: python 3.11.15

* Docs: .readthedocs.yaml

* requirements: no m2r2

* requirements: sphinx

* Docs: Example: Run a custom SQL workload

* Docs: Formatting

* YCSB: scaling-factor-operations

* YugabyteDB dummy less resources

* fix: requirements.txt to reduce vulnerabilities

* fix: requirements.txt to reduce vulnerabilities - Werkzeug>=3.0.1

* Require only Python 3.10.2

* v0.6.7 prerelease
  • Loading branch information
perdelt committed Nov 15, 2023
1 parent bb6c399 commit 23e570b
Show file tree
Hide file tree
Showing 6 changed files with 22 additions and 21 deletions.
25 changes: 10 additions & 15 deletions README.md
Expand Up @@ -23,7 +23,7 @@ The basic workflow is [1,2]: start a containerized version of the DBMS, install
A more advanced workflow is: Plan a sequence of such experiments, run plan as a batch and join results for comparison.

It is also possible to scale-out drivers for generating and loading data and for benchmarking to simulate cloud-native environments as in [4].
See [example](TPCTC23/README.md) results as presented in [A Cloud-Native Adoption of Classical DBMS Performance Benchmarks and Tools](http://dx.doi.org/10.13140/RG.2.2.29866.18880) and how they are generated.
See [example](https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/TPCTC23/README.md) results as presented in [A Cloud-Native Adoption of Classical DBMS Performance Benchmarks and Tools](http://dx.doi.org/10.13140/RG.2.2.29866.18880) and how they are generated.

See the [homepage](https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager) and the [documentation](https://bexhoma.readthedocs.io/en/latest/).

Expand All @@ -33,33 +33,28 @@ If you encounter any issues, please report them to our [Github issue tracker](ht

1. Download the repository: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager
1. Install the package `pip install bexhoma`
1. Make sure you have a working `kubectl` installed
(Also make sure to have access to a running Kubernetes cluster - for example [Minikube](https://minikube.sigs.k8s.io/docs/start/))
(Also make sure, you can create PV via PVC and dynamic provisioning)
1. Make sure you have a working `kubectl` installed.
* (Also make sure to have access to a running Kubernetes cluster - for example [Minikube](https://minikube.sigs.k8s.io/docs/start/))
* (Also make sure, you can create PV via PVC and dynamic provisioning)
1. Adjust [configuration](https://bexhoma.readthedocs.io/en/latest/Config.html)
1. Rename `k8s-cluster.config` to `cluster.config`
1. Set name of context, namespace and name of cluster in that file
1. Install result folder
Run `kubectl create -f k8s/pvc-bexhoma-results.yml`
1. Install result folder: Run `kubectl create -f k8s/pvc-bexhoma-results.yml`


## Quickstart


1. Run `python ycsb.py -ms 1 -dbms PostgreSQL -workload a run`.
This installs PostgreSQL and runs YCSB workload A with varying target.
The driver is monolithic with 64 threads.
The experiments runs a second time with the driver scaled out to 8 instances each having 8 threads.
1. You can watch status using `bexperiments status` while running.
This is equivalent to `python cluster.py status`.
1. After benchmarking has finished, run `bexperiments dashboard` to connect to a dashboard. You can open dashboard in browser at `http://localhost:8050`.
This is equivalent to `python cluster.py dashboard`
Alternatively you can open a Jupyter notebook at `http://localhost:8888`.
1. Run `python ycsb.py -ms 1 -dbms PostgreSQL -workload a run`. This installs PostgreSQL and runs YCSB workload A with varying target. The driver is monolithic with 64 threads. The experiments runs a second time with the driver scaled out to 8 instances each having 8 threads.
1. You can watch status using `bexperiments status` while running. This is equivalent to `python cluster.py status`.
1. After benchmarking has finished, run `bexperiments dashboard` to connect to a dashboard. You can open dashboard in browser at `http://localhost:8050`. This is equivalent to `python cluster.py dashboard`. Alternatively you can open a Jupyter notebook at `http://localhost:8888`.


## More Informations

For full power, use this tool as an orchestrator as in [2]. It also starts a monitoring container using [Prometheus](https://prometheus.io/) and a metrics collector container using [cAdvisor](https://github.com/google/cadvisor). For analytical use cases, the Python package [dbmsbenchmarker](https://github.com/Beuth-Erdelt/DBMS-Benchmarker), [3], is used as query executor and evaluator as in [1,2].
For transactional use cases, HammerDB's TPC-C, Benchbase's TPC-C and YCSB are used as drivers for generating and loading data and for running the workload as in [4].

See the [images](https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/tree/master/images/) folder for more details.

<p align="center">
Expand Down
2 changes: 1 addition & 1 deletion docs/Example-custom.md
Expand Up @@ -2,7 +2,7 @@

## Preparation

* clone repository, branch v0.6.5
* clone repository
* pip install requirements
* rename `k8s-cluster.config` to `cluster.config`
* replace inside that file where to store the results locally
Expand Down
4 changes: 2 additions & 2 deletions k8s/deploymenttemplate-YugabyteDB.yml
Expand Up @@ -63,8 +63,8 @@ spec:
#- {containerPort: 5433}
#- {containerPort: 9042}
resources:
limits: {cpu: 16000m, memory: 128Gi}
requests: {cpu: 16000m, memory: 128Gi}
limits: {cpu: 1000m, memory: 1Gi}
requests: {cpu: 1000m, memory: 1Gi}
#, ephemeral-storage: "1536Gi"}
volumeMounts:
- {mountPath: /data, name: benchmark-data-volume}
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Expand Up @@ -13,3 +13,5 @@ redis
mistune>=2.0.3 # not directly required, pinned by Snyk to avoid a vulnerability
numpy>=1.22.2 # not directly required, pinned by Snyk to avoid a vulnerability
setuptools>=65.5.1 # not directly required, pinned by Snyk to avoid a vulnerability
pillow>=10.0.1 # not directly required, pinned by Snyk to avoid a vulnerability
Werkzeug>=3.0.1
4 changes: 2 additions & 2 deletions setup.py
Expand Up @@ -8,7 +8,7 @@

setuptools.setup(
name="bexhoma",
version="0.6.6",
version="0.6.7",
author="Patrick Erdelt",
author_email="perdelt@beuth-hochschule.de",
description="This python tools helps managing DBMS benchmarking experiments in a Kubernetes-based HPC cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.",
Expand All @@ -22,7 +22,7 @@
"Operating System :: OS Independent",
],
license="GNU Affero General Public License v3",
python_requires='>=3.11.5',
python_requires='>=3.10.2',
include_package_data=True,
install_requires=requirements,
package_dir={'bexhoma': 'bexhoma'},
Expand Down
6 changes: 5 additions & 1 deletion ycsb.py
Expand Up @@ -44,6 +44,7 @@
parser.add_argument('-nl', '--num-loading', help='number of parallel loaders per configuration', default=1)
parser.add_argument('-nlp', '--num-loading-pods', help='total number of loaders per configuration', default=[1,8])
parser.add_argument('-sf', '--scaling-factor', help='scaling factor (SF) = number of rows in millions', default=1)
parser.add_argument('-sfo', '--scaling-factor-operations', help='scaling factor (SF) = number of operations in millions (=SF if not set)', default=None)
parser.add_argument('-su', '--scaling-users', help='scaling factor = number of total threads', default=64)
parser.add_argument('-sbs', '--scaling-batchsize', help='batch size', default="")
parser.add_argument('-ltf', '--list-target-factors', help='comma separated list of factors of 16384 ops as target - default range(1,9)', default="1,2,3,4,5,6,7,8")
Expand Down Expand Up @@ -75,6 +76,9 @@
monitoring_cluster = args.monitoring_cluster
mode = str(args.mode)
SF = str(args.scaling_factor)
SFO = str(args.scaling_factor_operations)
if SFO is None:
SFO = SF
SU = int(args.scaling_users)
target_base = int(args.target_base)
list_target_factors = args.list_target_factors
Expand Down Expand Up @@ -214,7 +218,7 @@
#experiment.name_format = '{dbms}-{threads}-{pods}-{target}'
experiment.set_experiment(script='Schema')
ycsb_rows = int(SF)*1000000 # 1kb each, that is SF is size in GB
ycsb_operations = int(SF)*1000000
ycsb_operations = int(SFO)*1000000
# note more infos about experiment in workload description
experiment.workload['info'] = experiment.workload['info']+" YCSB data is loaded using several processes."
if len(args.dbms):
Expand Down

0 comments on commit 23e570b

Please sign in to comment.