Skip to content
Anserini integration with Solr
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
configsets
.gitignore
Dockerfile
README.md
config.json
run.py

README.md

Solr Integration

Anserini, using IndexCollection, generates Lucene index files that we can load into Solr. Solr is a search engine build on Lucene that has desirable tools such as an interface to perform queries on Lucene (Anserini) indices.

Docker

In order to integrate Anserini and Solr, we'll be using Docker - make sure this is setup on your machine before continuing.

Additionally, ensure that the Docker SDK for Python is installed via pip install docker

Overview

Loading a Lucene index into Solr is fairly straightforward as Solr is built on top of Lucene. In a nutshell, the following needs to happen:

  1. Create the Solr core (index) that will hold our data.
  2. Copy the Lucene index files into the <my_core>/data/index/ directory of the Solr server.
  3. Update the schema (<my_core>/conf/managed-schema) file to match the fields in our index.
  4. Reload the core.

This has been automated through a number of scripts to automatically load core17 and mb11 collection indices into Solr.

Instructions

Build Anserini and copy the fatjar (important) artifact into the root directory of the SolrAnserini repo, changing the name to anserini.jar.

  1. Edit the config.json file to point to the index and config locations on the host machine.
  2. Run the Python script to build the Docker image with index and config volumes mounted.
    • python run.py (optionally specifying --config <config_location>)
You can’t perform that action at this time.