The following instructions will let you run the LUBM benchmark against an embedded bigdata database.
Get the code The LUBM benchmark can be downloaded from . Directions on its use are available from the project home page. You can download a modified version of the LUBM benchmark which can make it a bit easier to use with bigdata from here. The core benchmark is the same. We've added an HTTP SPARQL end point which is used to connect to bigdata and some new options for the generator which are useful when you are generating very large data sets for a cluster. Please contact the project maintainers if you have questions about this modified version of the LUBM benchmark.
The rest of this page assumes that you are working with the modified version of the LUBM test harness.
Obtain and unpack the code.
https://www.blazegraph.com/files/bigdata-lubm.tgz tar xvfz bigdata-lubm.tgz cd bigdata-lubm
Edit build.properties, paying attention to at least:
- bigdata.dir - Where to find the bigdata source code distribution.
- lubm.univ - The data set size.
- lubm.maxMem - The JVM heap used by the NanoSparqlServer in the tests.
- lubm.baseDir - Where to put the generated data files, etc.
- lubm.journalFile - The bigdata backing store file.
Note: The bigdata-lubm/lib directory includes a version of the sesame one jar. You may need to replace this jar with the one that works with the version of bigdata that you are test. For example, use sesame 2.3.0 with bigdata 1.0.x.
Download the blazegraph.jar
Note: The openrdf dependencies are required in order to build the bigdata-lubm project. You MUST use the correct version of the openrdf dependency for the version of bigdata that you are testing. If you compile the bigdata-lubm project against the wrong openrdf dependency version then you can have run-time dependency errors when you try to load the data or query the data.
cd ... ant
Generate a data set Generate the LUBM data set per the build.properties file.
Load a data set
Load an LUBM data set into bigdata per the build.properties file.
The NanoSparqlServer is used to answer SPARQL queries. It "knows" about bigdata's MVCC semantics (multi-version concurrency control) and will issue queries to a read-only connection reading from the last commit time on the database and may have somewhat better performance or concurrency as a result. You can more or less follow the same instructions if you want to run against a bigdata federation, but you will have to have the federation up and running already and you will have to use the bulk data loader for the federation to get the data into the database.
Start an http sparql endpoint for that bigdata database instance.
Run the lubm queries (do this in a different terminal window).
Here are some sample results.
LUBM U50 (WORM)
LUBM U50 using the Journal in the WORM mode. The load time was 122 seconds (56,183 triples per second). Closure time was 44 seconds.
[java] query Time Result# [java] query1 40 4 [java] query3 8 6 [java] query4 48 34 [java] query5 59 719 [java] query7 22 61 [java] query8 260 6463 [java] query10 22 0 [java] query11 20 0 [java] query12 27 0 [java] query13 19 0 [java] query14 3068 393730 [java] query6 2800 430114 [java] query9 3590 8627 [java] query2 999 130 [java] Total 10982
LUBM U50 (RWStore)
[java] query Time Result# [java] query1 28 4 [java] query3 17 6 [java] query4 29 34 [java] query5 39 719 [java] query7 16 61 [java] query8 166 6463 [java] query10 29 0 [java] query11 29 0 [java] query12 25 0 [java] query13 27 0 [java] query14 2778 393730 [java] query6 2920 430114 [java] query2 540 130 [java] query9 3356 8627 [java] Total 9999