Skip to content

LUBM_Cluster

Brad Bebee edited this page Feb 13, 2020 · 1 revision

LUBM Benchmark using Bulk Data Loader

Generate an LUBM data set

This generates the LUBM U100 data set. You can generate much larger data sets in exactly the same way. The files will be written onto $NAS in the specified directory. By default this will generate gzip'd RDF/XML files. The LUBM generator is single threaded so it can take quite a while to generate a large data set.


mkdirs ${NAS}/data/U100

# This assumes that you are running as root. If running as a normal user, then set the corresponding group on this directory.
chgrp -R wheel ${NAS}/data

cd ${NAS}/data/U100

lubmGen.sh 100

Bulk load a data set

Edit the main bigdata configuration file and specify the data set to bulk load in the RDFDataLoadMaster configuration section. With the federation running, start the bulk load using RDFDataLoadMaster.sh. The same approach works with any data set. The RDFDataLoadMaster is setup by default to load files from a shared volume, but the behavior is extensible and can be made to load from URLs, HDFS, etc.

nohup RDFDataLoadMaster.sh&
tail -f nohup.out

nohup is used since a large data set load can run for hours. If you have setup the ssh tunnel then you can watch the progress using the Excel worksheets.

LUBM Query

Run the LUBM queries for the named KB instance.

nohup lubmQuery.sh U100&
tail -f nohup.out
Clone this wiki locally