In order to study this effect, the current benchmark setup also has the ability to run update operations/queries (atomic updates) along side the search/select queries mentioned above
But these update operations are run as a background task and the performance of only the search/select operations are measured

The search/select queries that are used are stored in text files
Depending on the type of query chosen for benchmarking, the relevant query files are read by the client and requests are continuously submitted to Solr cluster

The benchmark allows sending the requests at a fixed target rate
But in order to measure the peak throughput that can be achieved, the target rate (targetRateForSelectOpAtWarmup, targetRateForSelectOp) is deliberately set to a very high value (see config file) and the actual rate achieved is recorded.

The background update operations are run at a fixed rate of 1000 requests/sec

Details of dataset used in benchmarking

A ~50GB wikimedia dump (link) is indexed into the Solr cluster against which the benchmark is run
It is a derivative of pages-articles-multistream.xml.bz2 by the Wikimedia Foundation, used under CC BY-SA 3.0

This data dump is licensed under CC BY-SA 3.0 by Azul Systems, Inc.

How to run the benchmark ?

Prepare the benchmarking setup

Only 3 steps are necessary to prepare the setup for benchmarking:

First, clone this repo
Provision the necessary AWS nodes
Configure provisioned nodes

Since the entire cluster (3 node zookeeper ensemble + 4 Solr nodes) and the client/load generator runs on AWS instances, a light-weight instance is sufficient to act as a central coordinator or leader to take care of running the above 3 steps, starting the benchmark runs on the cluster, collecting the results of the benchmark runs etc ...
This light-weight instance can either be user's laptops (with linux/mac OS) or a separate small AWS instance can be used

Provision the necessary AWS nodes

To provision the necessary AWS instances, follow the instructions here

Configure provisioned nodes

Run the below command to configure the nodes, install necessary tools, download the necessary artifacts etc ...

bash scripts/setup.sh all

NOTE: Make sure JAVA_HOME env (pointing to JDK11) is set on the host which runs this script

The above command takes care of the following:

prepares 3 node Zookeeper ensemble (zoo-node-1, zoo-node-2, zoo-node-3)
prepares 4 node Solr cluster (solr-node-1, solr-node-2, solr-node-3, solr-node-4)
prepares a client node (solrj-client-1)
wikimedia dump is indexed into the Solr cluster

Starting the benchmark

General command to run the benchmark against a given query type:

QUERY_TYPE=<QUERY TYPE> JAVA_HOME=<ABS_PATH_TO_JAVA_HOME_ON_AWS> bash scripts/main.sh startBenchmark

NOTE: To pass additional JVM args to the Solr cluster, SOLR_JAVA_MEM and GC_TUNE env variables can be used:

GC_TUNE='-XX:-UseZST -XX:+PrintGCDetails' SOLR_JAVA_MEM='-Xms55g -Xmx70g' QUERY_TYPE=<QUERY TYPE> JAVA_HOME=<ABS_PATH_TO_JAVA_HOME_ON_AWS> bash scripts/main.sh startBenchmark

Sample commands to launch/start the benchmark

To benchmark Solr with phrase queries, on Zing:

COMMON_LOG_DIR=phrase-queries-on-zing QUERY_TYPE=phrase JAVA_HOME=/home/centos/zing21.07.0.0-3-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark

To benchmark Solr with term/field queries, on Zulu:

COMMON_LOG_DIR=field-queries-on-zulu QUERY_TYPE=field JAVA_HOME=/home/centos/zulu11.50.19-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark

NOTE:
If QUERY_TYPE=<QUERY_TYPE> is omitted, the benchmark will run against a mix of all the above listed query types

Where to find the results of the benchmark run ?

The results of the benchmark runs are captured in the benchmark.log under COMMON_LOG_DIR
For the 2 sample launches shown above, the final result can be found under:

${WORKING_DIR}/phrase-queries-on-zing/benchmark.log
${WORKING_DIR}/field-queries-on-zulu/benchmark.log

The result is simply reported in a single line in the following format:

Requested rate = <requested_rate> req/sec | Actual rate = <actual_rate_achieved> req/sec (<nubmer requests of submmitted by the client to the Solr cluster> queries in `<duration of benchmark run>` sec)

Sample results:

Requested rate = 100000 req/sec | Actual rate = 47821 req/sec (43039651 queries in 900 sec)
Requested rate = 100000 req/sec | Actual rate = 31619 req/sec (28457667 queries in 900 sec)

In addition to the benchmark.log, the Solr logs, GC logs etc are also collected and stored under COMMON_LOG_DIR after the benchmark run

A simple script to run all the queries on Zing and Zulu multiple times:

for queryType in "field" "phrase" "proximity" "range" "fuzzy"
do
    for i in 1 2 3
    do
        HEADER=zing-${queryType}-run${i}
        COMMON_LOG_DIR=${HEADER} QUERY_TYPE=${queryType} JAVA_HOME=/home/centos/zing21.07.0.0-3-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark

        HEADER=zulu-${queryType}-run${i}
        COMMON_LOG_DIR=${HEADER} QUERY_TYPE=${queryType} JAVA_HOME=/home/centos/zulu11.50.19-ca-jdk11.0.12-linux_x64/ bash scripts/main.sh startBenchmark
    done
done

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
img		img
scripts		scripts
src/main/java		src/main/java
terraform		terraform
LICENSE		LICENSE
README.md		README.md
bench-config.yaml		bench-config.yaml
logging.properties		logging.properties
pom.xml		pom.xml
run.sh		run.sh
test-config.yaml		test-config.yaml

License

AzulSystems/solr-benchmark

Folders and files

Latest commit

History

Repository files navigation

Contents

Overview

Benchmark setup

What does the benchmark do ?

Details of dataset used in benchmarking

How to run the benchmark ?

Prepare the benchmarking setup

Provision the necessary AWS nodes

Configure provisioned nodes

Starting the benchmark

General command to run the benchmark against a given query type:

Sample commands to launch/start the benchmark

Where to find the results of the benchmark run ?

A simple script to run all the queries on Zing and Zulu multiple times:

About

Resources

License

Stars

Watchers

Forks

Languages