This repository contains the configuration and setup of the Solr instance used by the DINA Collections manager.
tldr;
- Clone repository
- Create a new branch and make changes
- Create a pull request for review
- Merge pull request
- Pull new image from Dockerhub
The main part of the Solr index is the configuration files. It consists of two files, one for managing the connection to the datasource and one for defining the schema.
The connection to the datasource is defined in data-config.xml. It uses a MySQL-connector and defines a set of entities.
Each entity contains three different queries:
- query: Query to perform full import. Used when Solr instance is restarted and needs to do a full index.
- deltaQuery: Query to fetch IDs of all entites created after last import.
- deltaImportQuery: Query to fetch data for all IDs from the deltaQuery.
The fields of each entity is prefixed to be able to separate the from each other. For exmaple Collecting event
fields are prefixed with ce_{some_attribute}
.
All entites should also contain a field for:
- entity_type: The type of entity, used to search for specific types. Eg.
collectingevent
orlocality
. - primary_id: The primary id of the entity unmodified. Used to for example fetch ore data from Collections API.
- id: Unique ID for the Solr index. Usally the primary_id prefixed using the same prefix as before. Eg.
ce_{primary_id}
forCollecting event
.
Fields that are common accross many entites are not prefixed. Such as DisciplineID
.
The DINA schema configuration is located in dina-schema.xml. The schema consists of the same fields as the data-config.xml but with some other attributes.
All fields should have the stored
attribute set to false
to not store the actual value for entities. This way you can only fetch primary ids and then get more data by doing a lookup in the Collection API.
The configartion sets the default join operator to AND
in the solrQueryParser
configuration.
The Solr index is build into a Docker image and pushed to Dockerhub using Travis-CI. The configuration for the image is found in the Dockerfile and the Travis configuration is located in .travis.yml.
Once deplyed you'll need to update the index to reflect the new changes to the schema. This can be done manually by navigating to server-url.com/solr
, selecting the dina
core and trigger a full import on the dataimport
page.
Note: An import needs to be done as soon as the container been destroyed or updated since the data is stored inside the container.