Solr Search for Collections manager

This repository contains the configuration and setup of the Solr instance used by the DINA Collections manager.

Development

tldr;

Clone repository
Create a new branch and make changes
Create a pull request for review
Merge pull request
Pull new image from Dockerhub

Configuration files

The main part of the Solr index is the configuration files. It consists of two files, one for managing the connection to the datasource and one for defining the schema.

Datasource config

The connection to the datasource is defined in data-config.xml. It uses a MySQL-connector and defines a set of entities.

Each entity contains three different queries:

query: Query to perform full import. Used when Solr instance is restarted and needs to do a full index.
deltaQuery: Query to fetch IDs of all entites created after last import.
deltaImportQuery: Query to fetch data for all IDs from the deltaQuery.

The fields of each entity is prefixed to be able to separate the from each other. For exmaple Collecting event fields are prefixed with ce_{some_attribute}.

All entites should also contain a field for:

entity_type: The type of entity, used to search for specific types. Eg. collectingevent or locality.
primary_id: The primary id of the entity unmodified. Used to for example fetch ore data from Collections API.
id: Unique ID for the Solr index. Usally the primary_id prefixed using the same prefix as before. Eg. ce_{primary_id} for Collecting event.

Fields that are common accross many entites are not prefixed. Such as DisciplineID.

Schema configuration

The DINA schema configuration is located in dina-schema.xml. The schema consists of the same fields as the data-config.xml but with some other attributes.

All fields should have the stored attribute set to false to not store the actual value for entities. This way you can only fetch primary ids and then get more data by doing a lookup in the Collection API.

The configartion sets the default join operator to AND in the solrQueryParser configuration.

Building

The Solr index is build into a Docker image and pushed to Dockerhub using Travis-CI. The configuration for the image is found in the Dockerfile and the Travis configuration is located in .travis.yml.

Deployment

Once deplyed you'll need to update the index to reflect the new changes to the schema. This can be done manually by navigating to server-url.com/solr, selecting the dina core and trigger a full import on the dataimport page.

Note: An import needs to be done as soon as the container been destroyed or updated since the data is stored inside the container.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
conf		conf
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

.gitignore

.gitignore

.travis.yml

.travis.yml

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Solr Search for Collections manager

Development

Configuration files

Datasource config

Schema configuration

Building

Deployment

About

Releases

Packages

Contributors 3

License

dina-web-nrm/search-docker

Folders and files

Latest commit

History

Repository files navigation

Solr Search for Collections manager

Development

Configuration files

Datasource config

Schema configuration

Building

Deployment

About

Resources

License

Stars

Watchers

Forks