DARQL - Deep Analysis of SPARQL Queries
A large-scale study using the code from this repository has been performed, more details can be found here.
You can use the software to either
To run DARQL, the following requirements must be met:
- Java JDK 8
- Maven build tool
- PostreSQL (recommended via Docker) for exploration
- GNU make (optional)
There are also some additional scripts that are written in other languages such as bash and Python3. The software was developed and executed on Ubuntu 16.04 LTS. To execute the software, the easiest way is to stay close to this system setup, for example by using a VM or container like Docker with the same environment. It is possible to run in other environments, but it has to be set up. Here are some instructions to help with this.
To install the requirements on a Debian-based distro such as Ubuntu:
sudo apt-get install docker.io
For a newer version of Docker follow instructions on their site.
To install Java and Maven with SDKMAN:
curl -s "https://get.sdkman.io" | bash
sdk install java 8.0.201-zulu
sdk install maven
To easily install the requirements with Homebrew:
brew install docker
brew install make
To install Java and Maven with SDKMAN:
curl -s "https://get.sdkman.io" | bash
sdk install java 8u161-oracle
sdk install maven
To easily install the requirements with Chocolatey:
cinst -y docker
cinst -y jdk8
cinst -y maven
cinst -y msys2
msys2
pacman -Sy make
Alternatively, you can use WSL on Windows 10 and follow the instructions for Linux. Because Java can run natively on Windows, you could use the software natively as well, but the included scripts to make running easier are designed for a Unix environment. Another alternative to achieve this is to simply use a VM.
If make
is installed, predefined targets can be used:
- Clone the repository and
cd
into it. SetMakefile=src/main/resources/sample/demo/demo.mk
. - Run
make -f $Makefile pg_start
to start PostgreSQL via Docker. - Run
make -f $Makefile db
to populate the database with examples. - Run
make -f $Makefile web
to start the server. - Open http://localhost:8888 in a browser.
Alternatively, look in the Makefile
to run the steps manually or to adjust them.
For example, PostgreSQL can be without Docker, but then it has to be set up.
By default, the target will set up and populate the database with data from Wikidata.
To make changes and supply different data with other formats, adjust or create a new configuration file config.yaml
as supplied in the Makefile
.
When finished, you can stop the server from the command line and then shutdown PostgreSQL with make pg_stop
.
A full analysis on prepared data sets can be performed and aggregated results will be output. This yields fully reproducible results for all analyses.
- Clone the repository and
cd
into it. Thenexport PRIME_DIR=.
. - For logs from Wikidata,
run
bash scripts/batch/prepare_wikidata.sh
to prepare the data for input. - By default, the previous step will also run the full analysis after preparations are complete.
The script for the full analysis
scripts/batch/batch_wikidata.sh
can also be invoked manually. - A timestamped spreadsheet with results will be output to
out/xlsx
by default. Additional output can be found in a timestamped subdirectory for each run inout/batch
.
Keep in mind that the scripts are a best effort to perform everything automatically. The scripts are designed to be flexible for different configurations and manual adjustments may be required. Some important points are:
- The data sets and intermediate output requires several hundreds of gigabytes of storage.
- A large amount of RAM should be used to process data, especially for deduplication.
At least 64GB or better 128GB should be used.
To override the RAM allocated to the JVM, as an example,
export MAVEN_OPTS=-Xms32g -Xmx32g
to try to only use 32GB of RAM. - If there are problems, check if all tools used by scripts.
Keep in mind, different version (either outdated or much newer versions) of
bash
and Python could break scripts. Check the scripts that are run and modify them as needed.