v0.7.0 - TPC-H and YCSB (#249)

* security context tests - matplotlib cache needs root filesystem * TPC-H: Typo * security context tests - could not create lock file "/var/run/postgresql/.s.PGSQL.5432.lock": Read-only file system * security context tests - matplotlib cache needs root filesystem * TPC-H: More notes * TPC-H: Security context test for loaders * TPC-H: Loader for MonetDB, MySQL * TPC-H: Loader for MySQL fix-missing * TPC-H: no security context for jobs (loaders) * TPC-H: bexhoma_start_epoch also for existing datasets * TPC-H: no security context, because /tmp will otherwise not be writable * bexhoma: print more details about times (generator, loader) * TPC-H: Loader for MySQL fix shell version * TPC-H: Loader for MySQL --fix-missing --fix-broken * TPC-H and YCSB: start_messagequeue() automatically * bexhoma: redis needs readOnlyRootFilesystem=false * bexhoma: redis fsGroup * TPC-H: Fast profiling (only keys) * bexhoma: redis fsGroup removed * bexhoma: Cleaned MonetDB and MySQL * Cleaned requirements * bexhoma: redis only allowPrivilegeEscalation=false * TPC-H: MySQL loader script more similar to PostgreSQL * TPC-H: MySQL loader script release debug * TPC-H: Q15 for MonetDB and timeout=600 * Bexhoma: Improved output * TPC-H: Queries rewritten * Bexhoma: Improved output * TPC-H: Queries rewritten * Bexhoma: Improved output * Bexhoma: Ignore job logs when log does not contain container name * TPC-H: BEXHOMA_SYNCH_GENERATE * TPC-H: generator waits for other pods before exiting * Bexhoma: Improved output * TPC-H: index, constraint and statistics scripts * Bexhoma: Improved output and docs * MySQL: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory * TPC-H: MySQL innodb_redo_log_capacity = 1GB * Prometheus needs to write to /etc/prometheus/prometheus-bexhoma.yml * TPC-H: MySQL innodb_redo_log_capacity = 1GB * TPC-H: MySQL PK after import * Bexhoma: Improved output and docs * DBMSBenchmarker: download complete result folder of experiment from pod * TPC-H: show summary at end of experiment * Bexhoma: cluster.get_pod_containers() * YCSB: Write file system * YCSB: docs * improved docs * Avoid GPU nodes * YCSB: summary * Bexhoma: Improved output and docs - benchmarker times * Bexhoma: Log of benchmarker pods also contain container name in filename * fix: requirements.txt to reduce vulnerabilities The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-PILLOW-6182918 * YCSB: summary * Bexhoma: Improved output and docs * Bexhoma: localdashboard * Bexhoma: notebooks folder * Bexhoma: notebooks folder must be in images * Bexhoma: Only show non-empty columns in status * Bexhoma: Start local jupyter notebook server from cli * TPC-H: Evaluation demo in Docker image * DBMS: Current versions of MySQL, PostgreSQL and MonetDB * DBMS: Current version of MySQL, try innodb-use-native-aio * DBMS: Current version of MySQL, tune InnoDB * DBMS: Current version of MySQL, tune InnoDB and Redo Log * DBMS: Current version of MySQL, tune start delay * DBMS: Current version of MySQL, tune performance * TPC-H: Improved output summary sorted * MySQL: Test some settings * MySQL: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8): No such file or directory - fixed * TPC-H: Improve summary * MySQL: Test some settings * Bexhoma: Improved output and docs * DBMS: Current version of MySQL, try some settings * Bexhoma: Wait till cluster is ready: evaluation and message queue * Bexhoma: Improved output and docs * Bexhoma: Monitoring in summary for TPC-H * DBMS: Current version of MySQL, try some settings * Bexhoma: Monitoring in summary for YCSB * Bexhoma: Monitoring in summary * MySQL: Test some settings * MySQL: Try 8.2.0 * MySQL: Switch back to 8.0.36 * MySQL: Test some settings * TPC-H: Improve summary * TPC-H: Improve docs * TPC-H: Improve summary of throughput * YCSB: Improve summary * TPC-H: DBMSBENCHMARKER_RECREATE_PARAMETER * TPC-H: --shuffle-queries * TPC-H: --shuffle-queries and --shuffle-queries data type in bash * Bexhoma: Load labels from existing PV * Bexhoma: Store parameters in connection even if there has been no loading * Bexhoma: Read times from PV * TPC-H: improve docs * TPC-H: DBMSBENCHMARKER_DEV mode from CLI * Bexhoma: Only fetch loading metrics if there is no PV * Bexhoma: Monitoring per stream is done outside of DBMSBenchmarker * Bexhoma: Status of dashboard and message queue * Bexhoma: Status of data and result directories * Bexhoma: Monitoring per stream is done inside of DBMSBenchmarker? * TPC-H: Metrics every 30s * Bexhoma: Improved output and docs * Bexhoma: Setup and run test for shared directories data and result * YCSB: Storage configuration * Bexhoma: Improved output and docs * Bexhoma: Setup and run test for shared directories data and result * YCSB: Allow experiment that compares PostgreSQL and MySQL * YCSB: Tolerate GPU taint * Bexhoma: Improved output and docs * Bexhoma: storageConfiguration consistently * Bexhoma: storageConfiguration consistently used * Bexhoma: when loaded from PVC, system may not be ready yet and has to wait * TPC-H: storageConfiguration * TPC-H: send SF to benchmarker * Bexhoma: Improved output and docs * Bexhoma: Improved output and docs * YCSB: benchmarking_pods as parameter * YCSB: YCSB_ prefix for rows and operations * fix: requirements.txt to reduce vulnerabilities (#244) The following vulnerabilities are fixed by pinning transitive dependencies: - https://snyk.io/vuln/SNYK-PYTHON-DASH-6226335 - https://snyk.io/vuln/SNYK-PYTHON-PILLOW-6219984 - https://snyk.io/vuln/SNYK-PYTHON-PILLOW-6219986 Co-authored-by: snyk-bot <snyk-bot@snyk.io> * Bexhoma: Improved output and docs * YCSB: summary show experiment_run * Bexhoma: Improved output and docs * TPC-H: allow fixation of driver pods * Bexhoma: Bash test script * Start final tests for next release * Bexhoma: Improved output and docs * Final tests for next release * Bexhoma: Current version of dbmsbenchmarker * Final tests for next release * PostgreSQL: Shut down gracefully fast and longer period * Final tests for next release
Beuth-Erdelt · Feb 14, 2024 · dec671d · dec671d
1 parent 68321af
commit dec671d
Show file tree

Hide file tree

Showing 79 changed files with 118,517 additions and 1,267 deletions.
diff --git a/README.md b/README.md
@@ -15,11 +15,11 @@ It enables users to configure hardware / software setups for easily repeating te
 
 It serves as the **orchestrator** [2] for distributed parallel benchmarking experiments in a Kubernetes Cloud.
 This has been tested at Amazon Web Services, Google Cloud, Microsoft Azure, IBM Cloud, Oracle Cloud, and at Minikube installations,
-running with Citus Data (Hyperscale), Clickhouse, CockroachDB, Exasol, IBM DB2, MariaDB, MariaDB Columnstore, MemSQL (SingleStore), MonetDB, MySQL, OmniSci (HEAVY.AI), Oracle DB, PostgreSQL, SQL Server, SAP HANA, TimescaleDB, and Vertica.
+running with Citus Data (Hyperscale), Clickhouse, CockroachDB, Exasol, IBM DB2, MariaDB, MariaDB Columnstore, MemSQL (SingleStore), MonetDB, MySQL, OmniSci (HEAVY.AI), Oracle DB, PostgreSQL, SQL Server, SAP HANA, TimescaleDB, Vertica and YugabyteDB.
 
 Benchmarks included are YCSB, TPC-H and TPC-C (HammerDB and Benchbase version).
 
-The basic workflow is [1,2]: start a containerized version of the DBMS, install monitoring software, import existing data, run benchmarks and shut down everything with a single command.
+The basic workflow is [1,2]: start a containerized version of the DBMS, install monitoring software, import data, run benchmarks and shut down everything with a single command.
 A more advanced workflow is: Plan a sequence of such experiments, run plan as a batch and join results for comparison.
 
 It is also possible to scale-out drivers for generating and loading data and for benchmarking to simulate cloud-native environments as in [4].
@@ -37,17 +37,41 @@ If you encounter any issues, please report them to our [Github issue tracker](ht
     * (Also make sure to have access to a running Kubernetes cluster - for example [Minikube](https://minikube.sigs.k8s.io/docs/start/))
     * (Also make sure, you can create PV via PVC and dynamic provisioning)
 1. Adjust [configuration](https://bexhoma.readthedocs.io/en/latest/Config.html)
-    1. Rename `k8s-cluster.config` to `cluster.config`
+    1. Copy `k8s-cluster.config` to `cluster.config`
     1. Set name of context, namespace and name of cluster in that file
-1. Install result folder: Run `kubectl create -f k8s/pvc-bexhoma-results.yml`
+    2. Make sure the `resultfolder` is set to a folder that exists on your local filesystem
+1. Other components like the shared data and result directories, the message queue and the evaluator are installed automatically when you start an experiment. Before that, you might want to adjust  
+    * Result directory: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/blob/master/k8s/pvc-bexhoma-results.yml  
+      * `storageClassName`: must be an available storage class of type `ReadWriteMany` in your cluster
+      * `storage`: size of the directory
+    * Data directory: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager/blob/master/k8s/pvc-bexhoma-data.yml  
+      * `storageClassName`: must be an available storage class of type `ReadWriteMany` in your cluster
+      * `storage`: size of the directory
 
 
 ## Quickstart
 
+### YCSB
 
-1. Run `python ycsb.py -ms 1 -dbms PostgreSQL -workload a run`. This installs PostgreSQL and runs YCSB workload A with varying target. The driver is monolithic with 64 threads. The experiments runs a second time with the driver scaled out to 8 instances each having 8 threads.
+1. Run `python ycsb.py -ms 1 -dbms PostgreSQL -workload a run`.  
+  This installs PostgreSQL and runs YCSB workload A with varying target. The driver is monolithic with 64 threads. The experiments runs a second time with the driver scaled out to 8 instances each having 8 threads.
 1. You can watch status using `bexperiments status` while running. This is equivalent to `python cluster.py status`.
-1. After benchmarking has finished, run `bexperiments dashboard` to connect to a dashboard. You can open dashboard in browser at `http://localhost:8050`. This is equivalent to `python cluster.py dashboard`. Alternatively you can open a Jupyter notebook at `http://localhost:8888`.
+1. After benchmarking has finished, you will see a summary.  
+  For further inspections, run `bexperiments dashboard` to connect to a dashboard. You can open dashboard in browser at `http://localhost:8050`. This is equivalent to `python cluster.py dashboard`. Alternatively you can open a Jupyter notebook at `http://localhost:8888`.
+
+See more details at https://bexhoma.readthedocs.io/en/latest/Example-YCSB.html
+
+### TPC-H
+
+1. Run `python tpch.py -ms 1 -dbms PostgreSQL run`.  
+  This installs PostgreSQL and runs TPC-H at scale factor 1. The driver is monolithic.
+1. You can watch status using `bexperiments status` while running. This is equivalent to `python cluster.py status`.
+1. After benchmarking has finished, you will see a summary.  
+  For further inspections, run `bexperiments dashboard` to connect to a dashboard. This is equivalent to `python cluster.py dashboard`.  You can open a Jupyter notebook at `http://localhost:8888`.
+
+See more details at https://bexhoma.readthedocs.io/en/latest/Example-TPC-H.html
+
+
 
 
 ## More Informations

diff --git a/TPCTC23/evaluator.py b/TPCTC23/evaluator.py
@@ -571,7 +571,7 @@ def benchmarking_aggregate_by_parallel_pods(self, df):
         :param df: DataFrame of results 
         :return: DataFrame of results
         """
-        column = "connection"
+        column = ["connection","experiment_run"]
         df_aggregated = pd.DataFrame()
         for key, grp in df.groupby(column):
             #print(key, len(grp.index))
@@ -646,9 +646,12 @@ def benchmarking_aggregate_by_parallel_pods(self, df):
                 }}
             #print(grp.agg(aggregate))
             dict_grp = dict()
-            dict_grp['connection'] = key
-            dict_grp['configuration'] = grp['configuration'][0]
-            dict_grp['experiment_run'] = grp['experiment_run'][0]
+            dict_grp['connection'] = key[0]
+            dict_grp['configuration'] = grp['configuration'].iloc[0]
+            dict_grp['experiment_run'] = grp['experiment_run'].iloc[0]
+            #dict_grp['connection'] = key
+            #dict_grp['configuration'] = grp['configuration'][0]
+            #dict_grp['experiment_run'] = grp['experiment_run'][0]
             #dict_grp['client'] = grp['client'][0]
             #dict_grp['pod'] = grp['pod'][0]
             dict_grp = {**dict_grp, **grp.agg(aggregate)}
@@ -756,8 +759,8 @@ def loading_aggregate_by_parallel_pods(self, df):
             #print(grp.agg(aggregate))
             dict_grp = dict()
             dict_grp['connection'] = key[0]
-            dict_grp['configuration'] = grp['configuration'][0]
-            dict_grp['experiment_run'] = grp['experiment_run'][0]
+            dict_grp['configuration'] = grp['configuration'].iloc[0]
+            dict_grp['experiment_run'] = grp['experiment_run'].iloc[0]
             #dict_grp['client'] = grp['client'][0]
             #dict_grp['pod'] = grp['pod'][0]
             #dict_grp['pod_count'] = grp['pod_count'][0]