Skip to content

Storing data in SolrCloud

Tom Barber edited this page Oct 3, 2020 · 1 revision

Download and unzip solr(tested on 8.6.2) https://lucene.apache.org/solr/downloads.html

Then copy the crawldb schema into the solr configsets directory:

cp -rf conf/solr/crawldb ${SOLR_HOME}/server/solr/configsets/

Start solr cloud: ./bin/solr -e cloud

Accept defaults until you see: Please provide a name for your new collection at which point name it crawldb for consistency

Then accept the defaults until:

Please choose a configuration for the crawldb collection, available options are: _default or sample_techproducts_configs [_default]

Again enter crawldb

When started solr should show you the list of schema fields for sparkler: http://localhost:8983/solr/#/crawldb/schema

Then to run a crawl:

./bin/sparkler.sh inject -su https://news.bbc.co.uk -cdb crawldb::localhost:9983 ./bin/sparkler.sh crawl -cdb crawldb::localhost:9983 -id sjob-<id>

In the CDB url the crawldb before the :: is the collection name, the latter part is the solr cloud url.