Skip to content
Automatic procedure to benchmarking file index
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This is a tool to setup a big database and Apache Lucene index to load test some index usage.

In particular, we aim to provide reproducible load tests for:
 * Hibernate Search -
 * Infinispan's distributed Lucene Directory implementation - -

As a source of documents to index we use a specific dump of the Wikipedia database.

1 - Setup a MySQL database, create users:
	And create database:
2 - Modify file changing jdbc.schemaname, jdbc.username and jdbc.password with database name, user and password respectively.

The build.xml ant file has the following main tasks:

1 - have-empty-schema: it drops all the database tables and recreates them.

2 - download-wikipedia: it downloads the reference wikipedia dump, containing only last version of each article, in English only:
	[WARNING! 12GB sized download]

3 - import-wikipedia: it executes data import downloading and running mwdumper.jar (described on
	able to load efficiently large amount of data.

4 - run-indexing: it creates Apache Lucene index from database content using hibernate-search library.

5 - create-hibernate-config: it creates hibernate.cfg.xml and hibernatesearch-infinispan.cfg.xml configuration files using the content of file.

6 - clean: it cleans the environment. In particular it deletes the reference wikipedia dump and mwdumper.jar library 
	and get and apply the database schema.
	Before to do this, it asks confirmation to the user. If you do not want to be asked for confirmation, you have to add 'database.autoclenaup=n' property
	to file.

Something went wrong with that request. Please try again.