V0.3.11 (#34)

* Prepare next release * K8s: Some demo yml files for deployments and services * TPC-H: Some demo queries and init scripts * TPC-H: Some demo configs and scripts * Docs: TPC-H example * Prepare next release 0.4.0
Beuth-Erdelt · Aug 29, 2020 · a2ab975 · a2ab975
1 parent 12eb26e
commit a2ab975
Show file tree

Hide file tree

Showing 39 changed files with 3,620 additions and 163 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,8 @@ This tool supports AWS and kubernetes (k8s) based clusters.
 
 This documentation
 * illustrates the [concepts](docs/Concept.md)
-* provides [basic examples](docs/Examples.md)
+* provides a basic [TPC-H like example](docs/Example-TPC-H.md)
+* provides [more detailed examples](docs/Examples.md)
   * [Example: TPC-H Benchmark for 3 DBMS on 1 Virtual Machine](docs/Examples.md#example-tpc-h-benchmark-for-3-dbms-on-1-virtual-machine)
   * [Example: TPC-H Benchmark for 1 DBMS on 3 Virtual Machines](docs/Examples.md#example-tpc-h-benchmark-for-1-dbms-on-3-virtual-machines)
 * defines [how to configure an experiment setup](docs/Config.md)

diff --git a/demo-tpch-k8s.py b/demo-tpch-k8s.py
@@ -0,0 +1,113 @@
+"""
+    Demo for bexhoma
+    This compares MonetDB and PostgreSQL performing some some TPC-H queries.
+    The cluster is managed using Kubernetes.
+    Copyright (C) 2020  Patrick Erdelt
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU Affero General Public License as
+    published by the Free Software Foundation, either version 3 of the
+    License, or (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU Affero General Public License for more details.
+
+    You should have received a copy of the GNU Affero General Public License
+    along with this program.  If not, see <https://www.gnu.org/licenses/>.
+"""
+from bexhoma import *
+import logging
+import urllib3
+import gc
+
+urllib3.disable_warnings()
+logging.basicConfig(level=logging.ERROR)
+
+# continue previous experiment?
+code=None
+# pick query file
+queryfile = 'queries-tpch.config'
+# pick scaling factor
+SF = '1'
+# number of repetition
+numExperiments = 1
+# pick hardware
+cpu = "4000m"
+memory = '16Gi'
+cpu_type = 'epyc-7542'
+
+# set basic config
+cluster = masterK8s.testdesign(
+	clusterconfig = 'cluster.config',
+	yamlfolder = 'k8s/',
+	configfolder = 'experiments/tpch',
+	queryfile = queryfile)
+
+# remove existing pods
+cluster.cleanExperiment()
+
+# set data volume
+cluster.set_experiment(volume='tpch')
+
+# set DDL scripts
+cluster.set_experiment(script='1s-SF'+SF+'-index')
+
+# continue previous experiment?
+cluster.set_code(code=code)
+
+# set workload parameters - this overwrites infos given in the query file
+cluster.set_workload(
+	name = 'TPC-H Queries',
+	info = 'This experiment compares instances of different DBMS on different machines.'
+	)
+
+# set connection parameters - this overwrites infos given in the query file
+cluster.set_connectionmanagement(
+	numProcesses = 1,
+	runsPerConnection = 0,
+	timeout = 600,
+	singleConnection = False)
+
+# set query parameters - this overwrites infos given in the query file
+cluster.set_querymanagement(numRun = 1)
+
+# set hardware requests and limits
+cluster.set_resources(
+	requests = {
+		'cpu': cpu,
+		'memory': memory
+	},
+	limits = {
+		'cpu': 0,
+		'memory': 0
+	},
+	nodeSelector = {
+		'cpu': cpu_type,
+	})
+
+
+# function to capture recurring parts of the workflow
+def run_experiments(docker, alias):
+	cluster.set_experiment(docker=docker)
+	cluster.set_experiment(instance=cpu+"-"+memory)
+	cluster.prepareExperiment(delay=60)
+	cluster.startExperiment(delay=60)
+	for i in range(1,numExperiments+1):
+		connection = cluster.getConnectionName()
+		cluster.runBenchmarks(connection=connection+"-"+str(i), alias=alias+'-'+str(i))
+	cluster.stopExperiment()
+	cluster.cleanExperiment()
+	del gc.garbage[:]
+
+
+# run experiments
+run_experiments(docker='MonetDB', alias='DBMS-A')
+run_experiments(docker='PostgreSQL', alias='DBMS-B')
+
+# run reporting
+cluster.runReporting()
+
+exit()
+
diff --git a/docs/Example-TPC-H.md b/docs/Example-TPC-H.md
@@ -0,0 +1,67 @@
+# Example: TPC-H
+
+This example shows how to benchmark 22 reading queries Q1-Q22 derived from TPC-H in MonetDB and PostgreSQL.
+
+> The query file is derived from the TPC-H and as such is not comparable to published TPC-H results, as the query file results do not comply with the TPC-H Specification.
+
+Official TPC-H benchmark - http://www.tpc.org/tpch
+
+**Content**:
+* [Prerequisites](#prerequisites)
+* [Perform Benchmark](#perform-benchmark)
+* [Evaluate Results in Dashboard](#evaluate-results-in-dashboard)
+
+## Prerequisites
+
+We need configuration file containing the following informations in a predefined format, c.f. [demo file](../k8s-cluster.config).
+We may adjust the configuration to match the actual environment.
+The demo also includes the necessary settings for some DBMS: MariaDB, MonetDB, MySQL, OmniSci and PostgreSQL.
+
+For basic execution of benchmarking we need
+* a Kubernetes (K8s) cluster
+  * a namespace `mynamespace`
+  * `kubectl` usable, i.e. access token stored in a default vault like `~/.kube`
+  * a persistent volume named `vol-benchmarking` containing the raw TPC-H data in `/data/tpch/SF1/`
+* JDBC driver `./monetdb-jdbc-2.29.jar` and `./postgresql-42.2.5.jar`
+* a folder `/benchmarks` for the results
+
+
+For also enabling monitoring we need
+* a monitoring instance Prometheus / Grafana that scrapes metrics from `localhost:9300`
+* an access token and URL for asking Grafana for metrics  
+  https://grafana.com/docs/grafana/latest/http_api/auth/#create-api-token
+
+
+## Perform Benchmark
+
+For performing the experiment we can run the [demo file](../demo-tpch-k8s.py).
+
+The actual benchmarking is done by
+```
+# run experiments
+run_experiments(docker='MonetDB', alias='DBMS-A')
+run_experiments(docker='PostgreSQL', alias='DBMS-B')
+```
+
+### Adjust Parameter
+
+You maybe want to adjust some of the parameters that are set in the file.
+
+The hardware requirements are set via
+```
+# pick hardware
+cpu = "4000m"
+memory = '16Gi'
+cpu_type = 'epyc-7542'
+```
+
+The number of executions of each query can be adjusted here
+```
+# set query parameters - this overwrites infos given in the query file
+cluster.set_querymanagement(numRun = 1)
+```
+
+### Evaluate Results in Dashboard
+
+Evaluation is done using DBMSBenchmarker: https://github.com/Beuth-Erdelt/DBMS-Benchmarker/blob/master/docs/Dashboard.md
+
diff --git a/experiment-example-AWS.py b/experiment-example-AWS.py
diff --git a/experiment-example-k8s.py b/experiment-example-k8s.py
diff --git a/experiments/tpch/MariaDB/initconstraints-tpch.sql b/experiments/tpch/MariaDB/initconstraints-tpch.sql
@@ -0,0 +1,36 @@
+-- sccsid:     @(#)dss.ri	2.1.8.1
+-- tpcd benchmark version 8.0
+
+-- for table nation
+alter table tpch.nation
+add foreign key (n_regionkey) references tpch.region(r_regionkey);
+
+-- for table supplier
+alter table tpch.supplier
+add foreign key (s_nationkey) references tpch.nation(n_nationkey);
+
+-- for table customer
+alter table tpch.customer
+add foreign key (c_nationkey) references tpch.nation(n_nationkey);
+
+-- for table partsupp
+alter table tpch.partsupp
+add foreign key (ps_suppkey) references tpch.supplier(s_suppkey);
+
+alter table tpch.partsupp
+add foreign key (ps_partkey) references tpch.part(p_partkey);
+
+-- for table orders
+alter table tpch.orders
+add foreign key (o_custkey) references tpch.customer(c_custkey);
+
+-- for table lineitem
+alter table tpch.lineitem
+add foreign key (l_orderkey)  references tpch.orders(o_orderkey);
+
+alter table tpch.lineitem
+add foreign key (l_partkey,l_suppkey) references 
+        tpch.partsupp(ps_partkey,ps_suppkey);
+
+
+