BSBM

Background

BSBM is an evolving benchmark. Currently, it has three sections. An "explore" use case, a BI use case (which uses SPARQL aggregation), and an "update" use case (which uses SPARQL update).

As of bigdata 1.0.x, we only support the "explore" use case. Bigdata 1.1.x adds support for SPARQL 1.1 aggregation so you can run that use case as well.

The benchmark is summarized by one metric, Query Mixes per Hour (QMpH). This number tells you how many mixtures of the use case queries could be processed by the database in an hour. Higher is better.

** Running a benchmark correctly is not simple. If you are not getting the same results, then your local configuration is probably wrong. **

Setup bsbmtools

Checkout and build bsbmtools

 svn co https://bsbmtools.svn.sourceforge.net/svnroot/bsbmtools bsbmtools 
 cd bsbmtools/trunk
 ant

Generate a dataset

This shows how to generate the 100M triple data set. The data will wind up in a subdirectory, which means that you can generate multiple data sets at different scales. I then go ahead and compress the generated file in order to conserve space on the volume.

 mkdir td_100m
 ./generate -fc -pc 284826 -fn td_100m/dataset -dir td_100m/td_data
 gzip td_100m/dataset.nt

This shows how you would generate the 200M triple data set.

 mkdir td_200m
 ./generate -fc -pc 566496 -fn td_200m/dataset -dir td_200m/td_data
 gzip td_200m/dataset.nt

If you are going to be loading the data on a cluster, the generated files should be partitioned. A 100,000 statements per file is a reasonable partition size, so for the 200M data set you want 2000 partitions (-nof 2000).

 mkdir td_200m
 ./generate -nof 2000 -fc -pc 566496 -fn td_200m/dataset -dir td_200m/td_data
 gzip td_200m/*.nt

Note: You can use larger input files, but on the cluster each file is parsed into memory before anything is written onto the disk. Larger input files tend to cause swings where the parser is hot and then the indices are hot. They can also challenge machines which do not have a lot of RAM.

Note: Many file systems have difficulties with large numbers of files in a directory. 2000 in a directory is Ok. 20,000 is pushing it for a lot of file systems. So, if you are going to generate an even larger data set, then you either need to modify the BSBM Generator code to put it into sub-directories or you need to do that yourself after the data have been generated. Also, the data generator appears to write on each file partition in parallel, so make sure you have enough file handles available if you are going to generate a large number of partitions.

Setup Blazegraph

Requirements

Apache ant (version 1.8.0+).
A 64-bit OS and a 64-bit server JVM. We have tested most extensible with Oracle JDK 1.6.x.
A decent amount of RAM.
A decent IO system.
Checkout bigdata

If you are going to use the bigdata-perf/bsbm3 ant script, then you need to checkout bigdata. See GettingStarted.

Build bigdata

Change into the top-level directory and build bigdata using the following command.

 ant clean bundleJar

Bigdata BSBM3 Setup

 # Switch to the bsbm package.
 cd bigdata-perf/bsbm3

 ##
 # Edit build.properties (see README.txt).
 ##

Load the data (non-clustered)

Load an BSBM data set into bigdata per the build.properties file (you must have already generated the data set per the test procedure).

 ant run-load

BSBM uses a lot of "large literals". Up through bigdata 1.0.x, this resulted in slow data load times for bigdata. This issue is resolved for the 1.1.x release by the introduction of a "blobs" index.

Setup NanoSparqlServer for BSBM

The NanoSparqlServer is used to answer SPARQL queries. It "knows" about bigdata's MVCC semantics (multi-version concurrency control) and will issue queries to a read-only connection reading from the last commit time on the database and may have somewhat better performance or concurrency as a result. You can more or less follow the same instructions if you want to run against a bigdata federation, but you will have to have the federation up and running already and you will have to use the bulk data loader for the federation to get the data into the database.

Start an http sparql endpoint for that bigdata database instance.

 ant start-sparql-server

The NanoSparqlServer has an internal queue which controls the #of SPARQL queries which can run concurrently. The default is 16. You can configure this from the command line for the benchmark or the web.xml file for the WAR. This DOES NOT directly control the #of threads that bigdata will use when processing those queries. There will be many threads per query due to internal parallelism in the query engine and the lack of async IO (recently introduced in Java 7).

Qualification

Qualifying trial

Note: edit queries/ignoreQueries.txt and make sure that all queries are running for the qualification run (we normally do not run Q5 since it is not part of the reduced query mix, but Q5 must be run for the qualification run)

First, start the NanoSparqlServer. Once it is running:

 ./testdriver -q -qf run.qual -idir td_100m/td_data/ http://localhost:9999/blazegraph/sparql

Where run.qua is the name of the file with the qualification results. You can change this name if you want to test multiple conditions.

Where td_100m is the directory with the 100M triples data set.

Where http://localhost:9999/blazegraph/sparql is the URL of the SPARQL end point.

Analyzing Qualification Runs

You need correct.qual to run this step. This does not appear to be bundled in SVN. I obtained a copy of the ground truth answers for BSBM v3 from the benchmark authors. You can download that ground truth data set here. This is a binary file, so you need to right click and choose "Save as..." when you are downloading it.

To analyze the qualification run against ground truth:

 ./qualification correct.qual run.qual

Where correct.qual is the ground truth answers.

Where run.qual is the qualification results you generated in the step above.

This generates qual.log, which is the results of the qualification trial.

Expected differences in the qualification log

Bigdata inlines xsd:dateTime so answers may be reported which differ only in the timezone component. These answers are "effectively" the same as those in the ground truth. You will see something like this in the log report from analyzing the qualification:

Result for Query 8 of run 1 differs:
Wrong results and/or wrong ordering in row 1.
        Correct: reviewDate: 2007-09-18T00:00:00
        Found: reviewDate: 2007-09-18T00:00:00.000Z
...

Result for Query 11 of run 4 differs:
2 results are missing. 2 results are incorrect.

Bigdata inlines xsd:decimal. During inlining, values which are "equals()" are collapsed onto the same point in the value space. Therefore, you will see something like this in the log report:

Result for Query 10 of run 6 differs:
Wrong results and/or wrong ordering in row 1.
        Correct: price: 754.10
        Found: price: 754.1

These differences may be safely ignored.

Test Procedure

Please refer to the Berlin Sparql Benchmark for the correct test procedure.

Finally, follow the test procedure for the benchmark per the link above.

Here are the instructions from the benchmark authors on running BSBM (personal email). I've adapted the commands slightly to reflect the SPARQL end point URL, etc.

For the 100m dataset I run: (these are the same for the 200M data set).

1) ./testdriver -seed 1212123 -rampup -idir td_100m http://localhost:80/sparql
2) ./testdriver -o single.xml -seed 9834533 -idir td_100m http://localhost:80/sparql
3) ./testdriver -o mt_"$3".xml -seed $1 -w $2 -mt $3 -idir td_100m http://localhost:80/sparql

where $1, $2 and $ 3 for the three multi-threaded runs are:

1. 8188326 4 4
2. 9175932 8 8
3. 4187411 64 64

The warm-up runs make sure that the connection-establishment overhead is not measured.

Faster Rampup

The purpose of the rampup procedure is to bring the database to a state where the performance has reached a steady state. In practice, the working set for BSBM generally fits entirely in RAM so the end point of the rampup procedure corresponds to when the IO Wait falls to nearly zero. However, the official rampup procedure can take a LONG time to run. We've found that the following procedure works just as well and gets you to a hot database state that you can recognize much more quickly.

This will show you the progress of the benchmark.

 ./testdriver -o mt_8.xml -seed $RANDOM -w 50 -mt 8 -idir td_100m/td_data http://localhost:80/sparql

The following command is more convenient when you are "ramping up" the database. It only reports the final QMpH score for each trial so you can see how the net performance changes from trial to trial.

 ./testdriver -o mt_8.xml -seed $RANDOM -w 50 -mt 8 -idir td_100m/td_data http://localhost:80/sparql|grep QMpH

Either way, just repeat this until performance levels off. 50+ random trials is required. And, yes, this is much faster than the official rampup procedure.

Test Results

Performance should be extremely good for the reduced query mix, which can be enabled by editing:

 queries/explore/ignoreQueries

For the reduced query mix, "ignoreQueries" should contain "5 6". For the full query mix, it should be an empty file.

The static query optimizer and vectored pipelined joins do a great job on most of the BSBM queries. However, there are two queries which do not do so well out of the box:

Query 5. We are working on a fix for this query. It winds up doing producing an intermediate product which is too large when pipelining the statement patterns in the query. See https://sourceforge.net/apps/trac/bigdata/ticket/253 for details.

Query 6. This was part of the original BSBM benchmark, but has since been dropped by the benchmark authors. Query 6 used a REGEX filter. Bigdata does not have index support for REGEX, so this winds up visiting a lot of data and then filtering using the REGEX. This drags the overall performance down dramatically. Lot's of triple stores had this problem, which is why Query 6 was dropped.

BSBM V3 reduced query mix results (36608 QMpH)

This result was obtained for BSBM V3 using the reduced query mix (Q5 was not presented).

Scale factor: 284826

Number of warmup runs: 50
Number of clients: 8
Seed: 9175932
Number of query mix runs (without warmups): 500 times min/max Querymix runtime: 0.5236s / 1.2590s
Total runtime (sum): 384.261 seconds
Total actual runtime: 49.168 seconds
QMpH: 36608.83 query mixes per hour
CQET: 0.76852 seconds average runtime of query mix
CQET (geom.): 0.75804 seconds geometric mean runtime of query mix

Metrics for Query: 1
Count: 500 times executed in whole run
AQET: 0.023844 seconds (arithmetic mean)
AQET(geom.): 0.021225 seconds (geometric mean)
QPS: 327.77 Queries per second
minQET/maxQET: 0.00430900s / 0.19842800s
Average result count: 7.88
min/max result count: 0 / 10
Number of timeouts: 0

Metrics for Query: 2
Count: 3000 times executed in whole run
AQET: 0.027569 seconds (arithmetic mean)
AQET(geom.): 0.024954 seconds (geometric mean)
QPS: 283.48 Queries per second
minQET/maxQET: 0.00580500s / 0.22326000s
Average result count: 19.36
min/max result count: 7 / 37
Number of timeouts: 0

Metrics for Query: 3
Count: 500 times executed in whole run
AQET: 0.109546 seconds (arithmetic mean)
AQET(geom.): 0.085314 seconds (geometric mean)
QPS: 71.34 Queries per second
minQET/maxQET: 0.00909600s / 0.50606200s
Average result count: 5.57
min/max result count: 0 / 10
Number of timeouts: 0

Metrics for Query: 4
Count: 500 times executed in whole run
AQET: 0.035182 seconds (arithmetic mean)
AQET(geom.): 0.032202 seconds (geometric mean)
QPS: 222.14 Queries per second
minQET/maxQET: 0.00587100s / 0.22067800s
Average result count: 7.56
min/max result count: 0 / 10
Number of timeouts: 0

Metrics for Query: 5
Count: 0 times executed in whole run
AQET: 0.000000 seconds (arithmetic mean)
AQET(geom.): NaN seconds (geometric mean)
QPS: Infinity Queries per second
minQET/maxQET: 179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.00000000s / 0.00000000s
Average result count: 0.00
min/max result count: 2147483647 / -2147483648
Number of timeouts: 0

Metrics for Query: 7
Count: 2000 times executed in whole run
AQET: 0.039152 seconds (arithmetic mean)
AQET(geom.): 0.036275 seconds (geometric mean)
QPS: 199.61 Queries per second
minQET/maxQET: 0.00948900s / 0.22935300s
Average result count: 12.31
min/max result count: 1 / 100
Number of timeouts: 0

Metrics for Query: 8
Count: 1000 times executed in whole run
AQET: 0.028075 seconds (arithmetic mean)
AQET(geom.): 0.025042 seconds (geometric mean)
QPS: 278.37 Queries per second
minQET/maxQET: 0.00416200s / 0.21368300s
Average result count: 4.80
min/max result count: 0 / 19
Number of timeouts: 0

Metrics for Query: 9
Count: 2000 times executed in whole run
AQET: 0.028693 seconds (arithmetic mean)
AQET(geom.): 0.025425 seconds (geometric mean)
QPS: 272.37 Queries per second
minQET/maxQET: 0.00225400s / 0.22867700s
Average result (Bytes): 6762.83
min/max result (Bytes): 1528 / 12831
Number of timeouts: 0

Metrics for Query: 10
Count: 1000 times executed in whole run
AQET: 0.025270 seconds (arithmetic mean)
AQET(geom.): 0.022319 seconds (geometric mean)
QPS: 309.27 Queries per second
minQET/maxQET: 0.00443900s / 0.21323000s
Average result count: 1.87
min/max result count: 0 / 9
Number of timeouts: 0

Metrics for Query: 11
Count: 500 times executed in whole run
AQET: 0.034836 seconds (arithmetic mean)
AQET(geom.): 0.032530 seconds (geometric mean)
QPS: 224.34 Queries per second
minQET/maxQET: 0.01361800s / 0.22590800s
Average result count: 10.00
min/max result count: 10 / 10
Number of timeouts: 0

Metrics for Query: 12
Count: 500 times executed in whole run
AQET: 0.021630 seconds (arithmetic mean)
AQET(geom.): 0.018793 seconds (geometric mean)
QPS: 361.32 Queries per second
minQET/maxQET: 0.00390600s / 0.20190300s
Average result (Bytes): 1470.73
min/max result (Bytes): 1433 / 1507
Number of timeouts: 0

This result is quoted using the same seed as the official benchmark run and 8 clients using 50 warmup trials and 500 presentations of the query mixes. The database was the bigdata RWStore running on a single machine. These results were obtained against the QUADS_QUERY_BRANCH from SVN r4241. The machine is a quad core AMD Phenom II X4 with 8MB Cache @ 3Ghz running Centos with 16G of RAM and a striped RAID array with 6x SAS disks with 15k spindles (Seagate Cheetah with 16MV Cache, 3.5″). IO utilization approximately 50%. CPU utilization was 50% during the run. The JVM was Oracle Java 1.6.0_23 using “-server -Xmx4g -XX:+UseParallelOldGC”. The Java process size was approximately 3.6G during the benchmark run.

The machine used by the BSBM V3.0 benchmark has more RAM and slower disks. However, the benchmark protocol has a ramp up procedure and much of the data winds up cached in RAM. As a result, BSBM preferentially favors machines with more RAM over machines with faster disks.

BSBM v3.1 (53712 QMpH 16 clients)

Scale factor:           284826
Number of warmup runs:  50
Number of clients:      16
Seed:                   1075
Number of query mix runs (without warmups): 500 times
min/max Querymix runtime: 0.7289s / 1.6698s
Total runtime (sum):    525.696 seconds
Total actual runtime:   33.512 seconds
QMpH:                   53712.44 query mixes per hour
CQET:                   1.05139 seconds average runtime of query mix
CQET (geom.):           1.04659 seconds geometric mean runtime of query mix

Metrics for Query:      1
Count:                  500 times executed in whole run
AQET:                   0.039063 seconds (arithmetic mean)
AQET(geom.):            0.036439 seconds (geometric mean)
QPS:                    401.58 Queries per second
minQET/maxQET:          0.00889232s / 0.11675030s
Average result count:   7.98
min/max result count:   0 / 10
Number of timeouts:     0

Metrics for Query:      2
Count:                  3000 times executed in whole run
AQET:                   0.040905 seconds (arithmetic mean)
AQET(geom.):            0.038344 seconds (geometric mean)
QPS:                    383.49 Queries per second
minQET/maxQET:          0.00988646s / 0.20486457s
Average result count:   19.48
min/max result count:   6 / 36
Number of timeouts:     0

Metrics for Query:      3
Count:                  500 times executed in whole run
AQET:                   0.049103 seconds (arithmetic mean)
AQET(geom.):            0.046191 seconds (geometric mean)
QPS:                    319.47 Queries per second
minQET/maxQET:          0.01107620s / 0.23461456s
Average result count:   5.47
min/max result count:   0 / 10
Number of timeouts:     0

Metrics for Query:      4
Count:                  500 times executed in whole run
AQET:                   0.048209 seconds (arithmetic mean)
AQET(geom.):            0.045754 seconds (geometric mean)
QPS:                    325.39 Queries per second
minQET/maxQET:          0.01487138s / 0.12486670s
Average result count:   7.56
min/max result count:   0 / 10
Number of timeouts:     0

Metrics for Query:      5
Count:                  0 times executed in whole run
AQET:                   0.000000 seconds (arithmetic mean)
AQET(geom.):            NaN seconds (geometric mean)
QPS:                    Infinity Queries per second
minQET/maxQET:          179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.00000000s / 0.00000000s
Average result count:   0.00
min/max result count:   2147483647 / -2147483648
Number of timeouts:     0

Metrics for Query:      7
Count:                  2000 times executed in whole run
AQET:                   0.080021 seconds (arithmetic mean)
AQET(geom.):            0.076796 seconds (geometric mean)
QPS:                    196.04 Queries per second
minQET/maxQET:          0.02779225s / 0.33339837s
Average result count:   11.97
min/max result count:   1 / 100
Number of timeouts:     0

Metrics for Query:      8
Count:                  1000 times executed in whole run
AQET:                   0.043752 seconds (arithmetic mean)
AQET(geom.):            0.040962 seconds (geometric mean)
QPS:                    358.54 Queries per second
minQET/maxQET:          0.01055718s / 0.22980238s
Average result count:   4.85
min/max result count:   0 / 19
Number of timeouts:     0

Metrics for Query:      9
Count:                  2000 times executed in whole run
AQET:                   0.030722 seconds (arithmetic mean)
AQET(geom.):            0.028557 seconds (geometric mean)
QPS:                    510.61 Queries per second
minQET/maxQET:          0.00424333s / 0.11850076s
Average result (Bytes): 6861.40
min/max result (Bytes): 1519 / 13057
Number of timeouts:     0

Metrics for Query:      10
Count:                  1000 times executed in whole run
AQET:                   0.038099 seconds (arithmetic mean)
AQET(geom.):            0.035781 seconds (geometric mean)
QPS:                    411.74 Queries per second
minQET/maxQET:          0.00881888s / 0.17458824s
Average result count:   1.78
min/max result count:   0 / 9
Number of timeouts:     0

Metrics for Query:      11
Count:                  500 times executed in whole run
AQET:                   0.030195 seconds (arithmetic mean)
AQET(geom.):            0.027771 seconds (geometric mean)
QPS:                    519.51 Queries per second
minQET/maxQET:          0.00423775s / 0.09756225s
Average result count:   10.00
min/max result count:   10 / 10
Number of timeouts:     0

Metrics for Query:      12
Count:                  500 times executed in whole run
AQET:                   0.032718 seconds (arithmetic mean)
AQET(geom.):            0.030581 seconds (geometric mean)
QPS:                    479.46 Queries per second
minQET/maxQET:          0.00585602s / 0.10520701s
Average result (Bytes): 1476.21
min/max result (Bytes): 1446 / 1509
Number of timeouts:     0

This result is quoted 16 clients, 50 warmup trials and 500 presentations of the query mixes. The database was the bigdata RWStore running on a single machine. These results were obtained against branches/BIGDATA_RELEASE_1_1_0 from SVN r6122. The machine is a dual core i7 (four cores total) with 4MB shared cache @ 2.7Ghz running Ubuntu 11 (Natty) with 16G of DDR3 1333MHz RAM and a single SATA3 256G SSD drive (an 2011 Apple Mac Mini). IO utilization approximately 0%. CPU utilization was 65% during the run. The JVM was Oracle Java 1.6.0_27 using “-server -Xmx4g -XX:+UseParallelOldGC”. The Java process size was approximately 4.4G during the benchmark run.

Introduction