search infrastructure for Minneapolis Institute of Art collection
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
queries
.gitignore
Makefile
mappings.json
package.json
readme.md
search.js

readme.md

Elasticsearch for Mia's collection data.

Setup

(Getting this all running requires that you have a local redis instance that's replicating our internal museum redis. You can create your own from our open data)

  1. Install elasticsearch: brew install homebrew/versions/elasticsearch17
  2. Enable groovy scripting for aggregations
  3. Start elasticsearch.
  4. Build the index: make clean createIndex update

Search

The search looks at the following "fields" for each artwork. Boost determines how important that particular field is.

field boost description
artist.artist 15 the artist
artist.folded 15 artist with special characters (é, ü, …) replaced with 'normal' 'english' letters
title 11 the title of an artwork
description 3 the "registrar" description of the artwork - how it was describes when accessioned
text 2 "curatorial" text, the general label written about this work
accession_number object "accession number"
_all all the fields in the record combined together, so nothing gets missed
artist.ngram 2 artist's name, ngrammed
title.ngram artwork title, ngrammed

ngrams break search terms down into sub-word grams. So a search for o'keefe returns results for "Georgia O'Keffee" even when it's spelled differently.

Then there are "ranking functions" applied to the results. A few examples:

{filter: {term: {highlight: 'true'}}, weight: 3},
{filter: {term: {image: 'valid'}}, weight: 2},
{filter: {prefix: {room: 'g'}}, weight: 1.1},

…if it's a highlight, boost it by 3; if it has a valid image, 2; if it's currently on view, 1.1.

This all happens within a function score query.

API

Here are the main endpoints we use. Test them out at search.artsmia.org.

endpoint description example
/:query searches for the given text, using ES query string syntax horses from China
/id/:id JSON for a single object by id Olive Trees, Vincent Van Gogh
/ids/:ids multiple objects by id two personal favorites
/random/art return one or more random artworks, matching an optional query ten random artworks, currently displayed on the Museum's 3rd floor

Indexing

We index our objects regularly from our custom-built TMS API. See Makefile for the confusing, shell-scripted details. It works by pulling the data from a local redis database that's synchronized with a system that watches for changes as they happen in TMS. We also index related content to our objects. A few other layers of data are added into elasticsearch to complement and improve the data from our API.