Skip to content
This repository has been archived by the owner on Dec 18, 2019. It is now read-only.

The map/reduce file ingest tool of the full-scale E-ARK deployment. It unpackages TAR packaged E-ARK information packages and initiates the indexing of the individual files using the Lily API. The Java-based tool runs as a service and consumes RabbitMQ messages notifying about new packages being available for indexing in HDFS.

License

Notifications You must be signed in to change notification settings

eark-project/dm-file-ingest

Repository files navigation

dm-file-ingest

eArk WP6 - index file contents from extracted archives

Text is extracted from PDF-, Word and other documents. Structural information (e.g. headlines) is not parsed and can not be used for search queries.

How to: reset the Lily index and/or add new fields

reset lily index

cd /srv/lily-2.4/bin

list indexes

./lily-list-indexes

set environment

LILY_CONFIG=/srv/dm/dm-file-ingest/src/main/config/lily

only if a new field should be added: edit the following files

$LILY_CONFIG/schema.json
$LILY_CONFIG/indexerconf.xml
/srv/apache-solr-4.0.0/example/solr/eark1/conf/schema.xml

load the schema

./lily-import -s $LILY_CONFIG/schema.json

delete the now outdated index

./lily-update-index -n eark1 --state DELETE_REQUESTED

add index

./lily-add-index -n eark1 -c $LILY_CONFIG/indexerconf.xml -sm classic -s shard1:http://localhost:8983/solr/eark1 -dt eark1

clear solr index

curl http://localhost:8983/solr/eark1/update/?commit=true -d "<delete><query>*:*</query></delete>" -H "Content-Type: text/xml"

rebuild the index

./lily-update-index -n eark1 --build-state BUILD_REQUESTED

About

The map/reduce file ingest tool of the full-scale E-ARK deployment. It unpackages TAR packaged E-ARK information packages and initiates the indexing of the individual files using the Lily API. The Java-based tool runs as a service and consumes RabbitMQ messages notifying about new packages being available for indexing in HDFS.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages