Tutorial

adamrmor edited this page Jan 21, 2018 · 3 revisions

Introduction

Have you ever wondered what's behind the closed doors of the museum? What's hidden away in the storerooms? Where all the objects come from and what they're made of? With only 10% of the Museum’s objects on display, how can we answer these broader questions about the collection?

In a landmark Future Museum project we have opened up our catalogue so that you can begin to explore and investigate 163 years of your Museum’s collecting history.

Auckland Museum’s Collections Online celebrated its first anniversary last year. Now, researchers, educators and the public have free and open access to around one million collection records. We have released the collection using the principles of Linked Open Data, which means that, not only is it available on our museum website, it can also be searched via any number of specialist aggregators and applications.

For the past 160 years, our Curators, Collection Managers, and volunteers have been creating descriptions, classifications and taxonomies — meaning we have amassed a huge amount of data about the objects in our care.

One of the aims of Future Museum is to open up our collections and engage with online communities. The public API, which allows detailed open access to collection data, is a major part of our 'open-first' approach that fulfils this aim.

What is an API?

An Application Programming Interface is essentially a set of instructions that tells two pieces of software how to communicate with one other — allowing you to build an app or integrate your website into another, for example.

If you have used Collections Online on our website, then you've already seen the API in action. When you put in keywords and hit enter, your browser makes a request to the API, which delivers the data as search results. With the API, you can bypass the website and send requests directly to the database that runs the website — particularly useful if you're a researcher or app developer; seeing the raw data can be very useful for a variety of reasons.

Getting Started — JSON Viewer

Our API provides responses using the JSON data standard. In order to view the data that the API sends back, you'll need some way to look at JSON-formatted data. Most modern browsers — i.e., Chrome, Safari, and Firefox — have extensions that can do this for you.


The Empty Search

The most basic form of search API is the empty search, which doesn’t specify any query but simply returns all records: http://api.aucklandmuseum.com/search/_search

We could also specify which index we want to search over:

api.aucklandmuseum.com/search/{index}/_search

collectionsonline index to perform searches over all Collections data.
http://api.aucklandmuseum.com/search/collectionsonline/_search

cenotaph index to perform searches over just Cenotaph data. http://api.aucklandmuseum.com/search/cenotaph/_search

By default, a search will only return the first 10 results. We learn how to change this here

Search a keyword

Next, let’s try searching the all text fields for the word "cat". To do this, we’ll use a lightweight search method that is easy to use. This method is often referred to as a query-string search, since we pass the search as a URL query-string parameter.

We use the same _search endpoint in the path, and we add the query itself in a q= parameter.

api.aucklandmuseum.com/search/_search?q=

http://api.aucklandmuseum.com/search/_search?q=cat

or with a specified index

api.aucklandmuseum.com/search/{index}/_search?q=

http://api.aucklandmuseum.com/search/collectionsonline/_search?q=cat

#Decoding the results At the top of your results you see that the query was successful - the hits section shows the total number of records that matched our search query. Each record is also given a relevance _score, which is a measure of how well the document matches the query. By default, results are returned with the most relevant documents first. The max_score is the highest _score of any document that matches our query.

  },
  "hits": {
    "total": 416,
    "max_score": 4.5924325,
    "hits": [

Below this section are the first 10 hits

        "_index": "collectionsonline-2016-10-18-1",
        "_type": "ecrm:E20_Biological_Object",
        "_id": "http://api.aucklandmuseum.com/id/naturalsciences/object/261552",
        "_score": 1.0,
        "_source": {

The index specifies which index the results is from, the _type is the high level categorisation of the object. We have six top categories:

_Type usage
ecrm:E20_Biological_Object Objects from the Natural Science Collection
ecrm:E22_Man-Made_Object 3D Objects from the Human History Collection
ecrm:E84_Information_Carrier 2D Objects from the Documentary Heritage Collection
am:MilitaryPerson Online Cenotaph Records - Military records New Zealanders
ecrm:E21_Person Non Cenotaph Person records (Field Collectors, Artists, Creators etc.)
am:Corporation Non-Cenotaph Corporation Records
am:vessel Ships and transport vessels associated with Cenotaph

The _id is the unique reference for the object - you can following these links to view the full record page. http://api.aucklandmuseum.com/id/naturalsciences/object/261552

#Query string syntax

syntax usage Example link
- Must not be present Search for ice axe and not Hillary http://api.aucklandmuseum.com/search/collectionsonline/_search?q=ice axe -hillary
+ Must be present Search for Hillary, Nepal must be present http://api.aucklandmuseum.com/search/collectionsonline/_search?q=hillary +nepal
OR Contains either search term Search for both Hillary and Tenzing http://api.aucklandmuseum.com/search/collectionsonline/_search?q=hillary or tenzing
_missing_ Field has no value Results missing a Title field http://api.aucklandmuseum.com/search/collectionsonline/_search?q=_missing_:dc_title
_exists_ Field must have a value Results must contain the language field http://api.aucklandmuseum.com/search/collectionsonline/_search?q=_exists_:language

Due to the number of Person records in the system you may wish to add the following to the end of any general queries:

-type=am_MilitaryPerson -type:ecrm_E21

#Wildcards and Fuzziness

syntax usage Example
? replace a single character http://api.aucklandmuseum.com/search/collectionsonline/_search?q=auc?land
* replace zero or more characters http://api.aucklandmuseum.com/search/collectionsonline/_search?q=auck*
~ search for terms that are similar to, but not exactly like our search terms http://api.aucklandmuseum.com/search/collectionsonline/_search?q=aptery~1

#Simple Range Searches Inclusive ranges are specified with square brackets [min TO max] dc_date:[2012-01-01 TO 2012-12-31]

http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_date:[1950-01-01 TO 2012-01-01]

#Boosting We can use the boost operator ^ to make one term more relevant than another. In this example we want all vases, but we are partially interested in vases with flowers on them

http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_title:vase +flower^2

The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance.

Boosts can also be applied to phrases or to groups

#Fields available for query-string searches Instead of an all text field search we can also specify certain fields in a query string search. If we only wanted to search the description field we would use: http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_description:dog

field usage example
dc_contributor The person associated with the record. This could be the field collector, author, creator or classifier. http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_contributor:Robin+Morrison
dc_date Searches the Text date fields - for complex date searches see below http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_date:1985
dc_description The free text description of the object http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_description:dog
dc_identifier If you know the museums identification number for the object http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_identifier:2008.30.1
dc_format Used in Documentary Heritage to show the file format or physical medium of the object http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_format:film
dc_place Concatenation of all places data (place made, place found, place acquired, place published, place associated place captured) http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_place:auckland
dc_title The name given to the record. http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_title:ice axe
department The Collection area responsible for the record. http://api.aucklandmuseum.com/search/collectionsonline/_search?q=department:Botany
culturalOrigin http://api.aucklandmuseum.com/search/collectionsonline/_search?q=culturalOrigin:samoan
documentType http://api.aucklandmuseum.com/search/collectionsonline/_search?q=documentType:serial
gender Used in Person dataset and Natural Science Specimens http://api.aucklandmuseum.com/search/collectionsonline/_search?q=gender:male
geoSubject The geographic subject heading associated with the object http://api.aucklandmuseum.com/search/collectionsonline/_search?q=geoSubject:Hawke's Bay
kindOfSpecimen http://api.aucklandmuseum.com/search/collectionsonline/_search?q=kindOfSpecimen:bones
localityDescription http://api.aucklandmuseum.com/search/collectionsonline/_search?q=localityDescription:browns bay
notes Free text additional notes http://api.aucklandmuseum.com/search/collectionsonline/_search?q=notes:Auckland
subjectCategory Used in the History Collection http://api.aucklandmuseum.com/search/collectionsonline/_search?q=subjectCategory:war
subjectStatus Library Of Congress Subject Headings used in Documentary Heritage Collections http://api.aucklandmuseum.com/search/collectionsonline/_search?q=subjectStatus:Sheep
collection http://api.aucklandmuseum.com/search/collectionsonline/_search?q=collection:Melanesian+Mission
content http://api.aucklandmuseum.com/search/collectionsonline/_search?q=content:
language http://api.aucklandmuseum.com/search/collectionsonline/_search?q=language:samoan
keyword http://api.aucklandmuseum.com/search/collectionsonline/_search?q=keyword:Auckland
responsibility http://api.aucklandmuseum.com/search/collectionsonline/_search?q=responsibility:edmund+hillary
copyright http://api.aucklandmuseum.com/search/collectionsonline/_search?q=copyright:CC

How to find a record if you know the ID.

We can specify the address of the record — the collections area, object or library sub-collection, and the ID of the record. Using those three pieces of information, we can return the original JSON document:

eg. http://api.aucklandmuseum.com/id/library/ephemera/13270

	naturalsciences/object/{ID}  
	library/photography/{ID}  
	library/manuscriptsandarchives/{ID}  
	library/paintinganddrawings/{ID}  
	library/catalogq40/{ID}  
	library/ephemera/{ID} 

Complex searches

Query-string search is handy for ad hoc searches, but it has limitations. We can use the Elasticsearch Domain-specific language (DSL) to build much more complicated, robust queries. For these searches we will be POST data to the API. To do this you will require a API Client - software that will allow you interact with our API. For Chrome you can use Postman or Advanced REST Client

We have instructions how to set these up here

The search URL remains the same: http://api.aucklandmuseum.com/search/collectionsonline/_search

The basic all text set up will look like this:

{
    "query": {
        "query_string": {
            "query": "Cat"
        }
    }
 }

You can use all the standard search operators in the query:

{
    "query": {
        "query_string": {
            "query": "Cat +dog -kitten"
        }
    }
 }

Lets try searching just the free text field "dc_description" for references to the Victoria Cross.

{
    "query" : {
        "match" : {
            "dc_description" : "Victoria Cross",
     
        }
    }
}

#Boolean Searches

	{  
	    "query" : {  
	        "bool": {  
	            "must": {  
	                "match" : {  
	                    "firstName" : "Edmund"   
	                }  
	            },  
	            "filter": {  
	                "range" : {  
	                    "_score" : { "gt" : 8 }   
	                }  
	            }  
	        }  
	    }  
	}  
{
  "query": {
    "bool": {
      "should": [
        { "match": { "am_documentNotes":  "Vol1" }},
        { "match": { "am_embarkationBody.rdf_value": "6th*"   }}
      ]
    }
  }
}

#Selecting a list of records

If you have a list of known records.

http://api.aucklandmuseum.com/search/collectionsonline/_search

{
    "query" :{
        "ids": {
            "values": [
                "http://api.aucklandmuseum.com/id/humanhistory/object/65211",
                "http://api.aucklandmuseum.com/id/humanhistory/object/657199"
        ]
        }
    }
}

Boolean Searches.

{
    "query" : {
        "bool": {
            "must": {
                "match" : {
                    "firstName" : "Edmund" 
                }
            }
        }
    }
}

#Source Filtering (select which fields are returned)

{
	"_source": {
        "includes": [ "dc_title", "dc_description", "displayLocation" ]
	}
}

#To all CC BY records and select only the top 8 fields:

{
    "fields" : ["lastModifiedOn", "_id", "copyright", "dc_contributor", "dc_description", "dc_identifier", "primaryRepresentation","appellation.Primary Title"],
    "query": {
        "match": {
            "copyright": "CC BY"
        }
    }
 }

Faceted Search

Elasticsearch has functionality called aggregations, which allow you to generate sophisticated analytics over the data. Agrregations can be run over all the available fields. We are setting the sizeto zero as we don’t need the actual search results we just want the summary. Returning zero hits will also speeds up the query.

{
  "size":0, 
  "aggs": {
    "Format": {
      "terms": { "field": "dc_format" }
    }
  }
}

Or the most common surname:

{
   "size":0,  
   "aggs": {
    "FamilyName": {
      "terms": { "field": "familyName" }
    }
  }
}

#Complex Facets

{
   "size" : 0,
   "aggs": {
      "LastModified": {
         "date_histogram": {
            "field": "lastModifiedOn",
            "interval": "day", 
            "format": "yyyy-MM-dd" 
         }
      }
   }
}

#Nested Fields

{
  "size" : 0,
  "aggs": {
    "accession date": { 
      "nested": {
        "path": "period"
      },
      "aggs": {
        "by_month": {
          "date_histogram": { 
            "field":    "period.accession.end",
            "interval": "month",
            "format":   "yyyy-MM"
          }	
        }
      }
   }
}
}

This query doesn't include any months with 0 results - if you require this data (for creating graphs etc) then the following parameters can be added

#Geographical Bounding Box Search

The geo_distance filter draws a circle around the specified location and finds all documents that have a geo-point within that circle:

{
  "query": {
    "filtered": {
      "filter": {
        "geo_distance": {
          "distance": "100km", 
          "geopos": { 
            "lat":  36.5,
            "lon": 175
          }
        }
      }
    }
  }
}

#Errors 200 search results found
400 bad request
404 not found

#Pagination

By default, a search will only return the top 10 results, you can use size and from to change how many results you can view.
Beware of paging too deep or requesting too many results at once. Results are sorted before being returned and large requests may results in a timeout error.

http://api.aucklandmuseum.com/search/_search?q=dc_description:cat&size=100

#Apendix

List of Department

{
  "size":0, 
  "aggs": {
      "Format": {
      "terms": { "field": "department",
      "size":50}
    }
  }
}   
Name
botany
entomology
photography
marine
publication
ethnology
pacific
history
applied arts
land vertebrates
ephemera
archaeology
birds
geology
manuscripts
world ethnology
amphibians
maori ethnology
paintings
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.