Skip to content

Commit

Permalink
V0.3.11 (#34)
Browse files Browse the repository at this point in the history
* Prepare next release

* K8s: Some demo yml files for deployments and services

* TPC-H: Some demo queries and init scripts

* TPC-H: Some demo configs and scripts

* Docs: TPC-H example

* Prepare next release 0.4.0
  • Loading branch information
perdelt committed Aug 29, 2020
1 parent 12eb26e commit a2ab975
Show file tree
Hide file tree
Showing 39 changed files with 3,620 additions and 163 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ This tool supports AWS and kubernetes (k8s) based clusters.

This documentation
* illustrates the [concepts](docs/Concept.md)
* provides [basic examples](docs/Examples.md)
* provides a basic [TPC-H like example](docs/Example-TPC-H.md)
* provides [more detailed examples](docs/Examples.md)
* [Example: TPC-H Benchmark for 3 DBMS on 1 Virtual Machine](docs/Examples.md#example-tpc-h-benchmark-for-3-dbms-on-1-virtual-machine)
* [Example: TPC-H Benchmark for 1 DBMS on 3 Virtual Machines](docs/Examples.md#example-tpc-h-benchmark-for-1-dbms-on-3-virtual-machines)
* defines [how to configure an experiment setup](docs/Config.md)
Expand Down
113 changes: 113 additions & 0 deletions demo-tpch-k8s.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
"""
Demo for bexhoma
This compares MonetDB and PostgreSQL performing some some TPC-H queries.
The cluster is managed using Kubernetes.
Copyright (C) 2020 Patrick Erdelt
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
from bexhoma import *
import logging
import urllib3
import gc

urllib3.disable_warnings()
logging.basicConfig(level=logging.ERROR)

# continue previous experiment?
code=None
# pick query file
queryfile = 'queries-tpch.config'
# pick scaling factor
SF = '1'
# number of repetition
numExperiments = 1
# pick hardware
cpu = "4000m"
memory = '16Gi'
cpu_type = 'epyc-7542'

# set basic config
cluster = masterK8s.testdesign(
clusterconfig = 'cluster.config',
yamlfolder = 'k8s/',
configfolder = 'experiments/tpch',
queryfile = queryfile)

# remove existing pods
cluster.cleanExperiment()

# set data volume
cluster.set_experiment(volume='tpch')

# set DDL scripts
cluster.set_experiment(script='1s-SF'+SF+'-index')

# continue previous experiment?
cluster.set_code(code=code)

# set workload parameters - this overwrites infos given in the query file
cluster.set_workload(
name = 'TPC-H Queries',
info = 'This experiment compares instances of different DBMS on different machines.'
)

# set connection parameters - this overwrites infos given in the query file
cluster.set_connectionmanagement(
numProcesses = 1,
runsPerConnection = 0,
timeout = 600,
singleConnection = False)

# set query parameters - this overwrites infos given in the query file
cluster.set_querymanagement(numRun = 1)

# set hardware requests and limits
cluster.set_resources(
requests = {
'cpu': cpu,
'memory': memory
},
limits = {
'cpu': 0,
'memory': 0
},
nodeSelector = {
'cpu': cpu_type,
})


# function to capture recurring parts of the workflow
def run_experiments(docker, alias):
cluster.set_experiment(docker=docker)
cluster.set_experiment(instance=cpu+"-"+memory)
cluster.prepareExperiment(delay=60)
cluster.startExperiment(delay=60)
for i in range(1,numExperiments+1):
connection = cluster.getConnectionName()
cluster.runBenchmarks(connection=connection+"-"+str(i), alias=alias+'-'+str(i))
cluster.stopExperiment()
cluster.cleanExperiment()
del gc.garbage[:]


# run experiments
run_experiments(docker='MonetDB', alias='DBMS-A')
run_experiments(docker='PostgreSQL', alias='DBMS-B')

# run reporting
cluster.runReporting()

exit()

67 changes: 67 additions & 0 deletions docs/Example-TPC-H.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Example: TPC-H

This example shows how to benchmark 22 reading queries Q1-Q22 derived from TPC-H in MonetDB and PostgreSQL.

> The query file is derived from the TPC-H and as such is not comparable to published TPC-H results, as the query file results do not comply with the TPC-H Specification.
Official TPC-H benchmark - http://www.tpc.org/tpch

**Content**:
* [Prerequisites](#prerequisites)
* [Perform Benchmark](#perform-benchmark)
* [Evaluate Results in Dashboard](#evaluate-results-in-dashboard)

## Prerequisites

We need configuration file containing the following informations in a predefined format, c.f. [demo file](../k8s-cluster.config).
We may adjust the configuration to match the actual environment.
The demo also includes the necessary settings for some DBMS: MariaDB, MonetDB, MySQL, OmniSci and PostgreSQL.

For basic execution of benchmarking we need
* a Kubernetes (K8s) cluster
* a namespace `mynamespace`
* `kubectl` usable, i.e. access token stored in a default vault like `~/.kube`
* a persistent volume named `vol-benchmarking` containing the raw TPC-H data in `/data/tpch/SF1/`
* JDBC driver `./monetdb-jdbc-2.29.jar` and `./postgresql-42.2.5.jar`
* a folder `/benchmarks` for the results


For also enabling monitoring we need
* a monitoring instance Prometheus / Grafana that scrapes metrics from `localhost:9300`
* an access token and URL for asking Grafana for metrics
https://grafana.com/docs/grafana/latest/http_api/auth/#create-api-token


## Perform Benchmark

For performing the experiment we can run the [demo file](../demo-tpch-k8s.py).

The actual benchmarking is done by
```
# run experiments
run_experiments(docker='MonetDB', alias='DBMS-A')
run_experiments(docker='PostgreSQL', alias='DBMS-B')
```

### Adjust Parameter

You maybe want to adjust some of the parameters that are set in the file.

The hardware requirements are set via
```
# pick hardware
cpu = "4000m"
memory = '16Gi'
cpu_type = 'epyc-7542'
```

The number of executions of each query can be adjusted here
```
# set query parameters - this overwrites infos given in the query file
cluster.set_querymanagement(numRun = 1)
```

### Evaluate Results in Dashboard

Evaluation is done using DBMSBenchmarker: https://github.com/Beuth-Erdelt/DBMS-Benchmarker/blob/master/docs/Dashboard.md

91 changes: 0 additions & 91 deletions experiment-example-AWS.py

This file was deleted.

65 changes: 0 additions & 65 deletions experiment-example-k8s.py

This file was deleted.

36 changes: 36 additions & 0 deletions experiments/tpch/MariaDB/initconstraints-tpch.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
-- sccsid: @(#)dss.ri 2.1.8.1
-- tpcd benchmark version 8.0

-- for table nation
alter table tpch.nation
add foreign key (n_regionkey) references tpch.region(r_regionkey);

-- for table supplier
alter table tpch.supplier
add foreign key (s_nationkey) references tpch.nation(n_nationkey);

-- for table customer
alter table tpch.customer
add foreign key (c_nationkey) references tpch.nation(n_nationkey);

-- for table partsupp
alter table tpch.partsupp
add foreign key (ps_suppkey) references tpch.supplier(s_suppkey);

alter table tpch.partsupp
add foreign key (ps_partkey) references tpch.part(p_partkey);

-- for table orders
alter table tpch.orders
add foreign key (o_custkey) references tpch.customer(c_custkey);

-- for table lineitem
alter table tpch.lineitem
add foreign key (l_orderkey) references tpch.orders(o_orderkey);

alter table tpch.lineitem
add foreign key (l_partkey,l_suppkey) references
tpch.partsupp(ps_partkey,ps_suppkey);



0 comments on commit a2ab975

Please sign in to comment.