IndexCollection, generates Lucene index files that we can load into Solr. Solr is a search engine build on Lucene that has desirable tools such as an interface to perform queries on Lucene (Anserini) indices.
In order to integrate Anserini and Solr, we'll be using Docker - make sure this is setup on your machine before continuing.
Additionally, ensure that the Docker SDK for Python is installed via
pip install docker
Loading a Lucene index into Solr is fairly straightforward as Solr is built on top of Lucene. In a nutshell, the following needs to happen:
- Create the Solr core (index) that will hold our data.
- Copy the Lucene index files into the
<my_core>/data/index/directory of the Solr server.
- Update the schema (
<my_core>/conf/managed-schema) file to match the fields in our index.
- Reload the core.
This has been automated through a number of scripts to automatically load
mb11 collection indices into Solr.
Build Anserini and copy the fatjar (important) artifact into the root directory of the SolrAnserini repo, changing the name to
- Edit the
config.jsonfile to point to the index and config locations on the host machine.
- Run the Python script to build the Docker image with index and config volumes mounted.
python run.py(optionally specifying