This repository contains the pacific_rim_library
Python package. The main executable module is indexer
.
At a high level, indexer
watches for two classes of events (create/update and delete) on regular files under a user-specified directory. It expects these files to only ever be Dublin Core XML.
On create/update events, it transforms files into Solr documents and adds them to an index, and copies any relevant images (whose HTTP URLs are embedded in the XML) to a S3 bucket.
On delete events, it removes any traces of the record represented by the deleted file from Solr and S3.
-
Install Docker Compose v1.28.0 or later (for service profile support).
-
Download and extract this repository.
-
Create a
.env
file at the project root and fill in the blanks:$ cp .env.example .env $ vim .env
-
Edit
logging.yml
to configure logging as desired. -
Create a local
prl-solr
Docker image:-
Create a local
solr4
Docker image by following the instructions at https://github.com/docker-solr/docker-solr4 (you must be authenticated to hub.docker.com) -
Clone https://github.com/UCLALibrary/prrla-solr-conf and then run something like this (note that the value of the
CORE_NAME
build arg must match the value ofSOLR_CORE_NAME
specified in.env
):$ docker image build --build-arg CORE_NAME=prl . --tag prl-solr:latest
-
-
Create a local jOAI Docker image per the instructions here.
-
Build and run the containers (NOTE: for local development, pass
--profile dev
on the command line beforeup
):$ docker-compose -p prl up --build
-
Install Python 3.4 or greater and the AWS CLI.
-
Download and extract this repository.
-
Create target directories for harvested files:
mkdir ~/prl-records ~/prl-thumbnails
-
Add a AWS CLI profile for accessing the thumbnails S3 bucket (be sure to specify the bucket region):
aws configure --profile prl-thumbnails
-
(OPTIONAL) If setting up a development environment, install a Python 3 virtual environment manager and create an environment:
# Ubuntu sudo apt-get install python3-venv python3 -m venv venv-prl
# OSX pip3 install virtualenv python3 -m virtualenv venv-prl
Fire it up:
source venv-prl/bin/activate
-
Install the latest
setuptools
:pip3 install --upgrade setuptools
-
Install Python dependencies:
python3 setup.py install
If
plyvel
fails to install, try installing withpip3
and then re-do this step.On OSX: if
lxml
fails to install, you may need to install it withSTATIC_DEPS
set totrue
per https://lxml.de/installation.html. -
Fill in the blanks in
config.toml
. -
Edit
logging.yml
to configure logging as desired. -
Install the configuration files:
python3 -m pacific_rim_library.configure
The module is meant to be run as a background process. For usage instructions:
python3 -m pacific_rim_library.indexer --help
To run automated tests:
python3 setup.py test