# Get a random work from Trove using queries and facets

Here's another way you can get a random work from Trove's `book`, `article`, `picture`, `map`, `music`, or `collection` zones. This approach is particularly useful if you want to get a random result from a search, or want to apply a variety of facets. It's not as quick as [pinging random work ids at Trove](notebooks/random_work_by_id.ipynb), but it's more flexible.

Basically this method gets all the available facets for a particular search. If the search has more than 100 results, it chooses one of the facets at random and applies it. It keeps doing this until the search returns less that 100 results. Then it chooses a work at random from the results. If you don't supply a query, it uses a random stop word to mix things up a bit.

The problem with this approach is that facets can't always be extracted from records, and there's no way of finding records without a particular facet. For example, you can use the `year` facet to limit results to a particular year, but what about records that don't have a `year` value. Once you start using that facet, they're invisible. I'm worried that this will mean that certain parts of Trove will never be surfaced. It would of course be much better if Trove just supported random sorting so I didn't have to do all these stupid workarounds.

Collection searches (ie using NUC identifiers) are particularly tricky, because items from a single collection can share very similar facet values. To try and limit the results in this sort of situation, I've provided a couple of extra parameters:

* `add_word` – adds a random stopword to the query
* `add_number` – adds a random two digit number to the query (useful if the records use numeric identifiers)

These can help increase the degree of randomness, but again I suspect some parts of collections will never be reached.

In [96]:
import requests
import json
import random
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from IPython.display import display, HTML

s = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[ 502, 503, 504 ])
s.mount('https://', HTTPAdapter(max_retries=retries))
s.mount('http://', HTTPAdapter(max_retries=retries))

with open('stopwords.json', 'r') as json_file:
    STOPWORDS = json.load(json_file)

In [132]:
API_KEY = 'YOUR API KEY'
API_URL = 'http://api.trove.nla.gov.au/v2/result'

In [127]:
def get_facets(data):
    '''
    Get the names/terms of facets available from a search.
    '''
    facets = []
    for facet in data['response']['zone'][0]['facets']['facet']:
        if facet['name'][:3] != 'adv' and facet['name'] != 'decade':
            terms = get_facet_terms(facet['term'])
            facets.append({'facet': facet['name'], 'terms': terms})
    return facets


def set_query(params, query=None, add_word=False, add_number=False):
    '''
    Add a 'q' value to the parameters, including random words and numbers if required.
    '''
    random_word = random.choice(STOPWORDS)
    random_number = random.randrange(1, 100)
    if query:
        if add_word:
            params['q'] = f'{query} "{random_word}"'
        elif add_number:
            params['q'] = f'{query} "{random_number:02}"'
        else:
            params['q'] = query
    else:
        params['q'] = f'"{random_word}"'
    return params


def get_random_work_from_zone(zone, query, **kwargs):
    total = 0
    applied_facets = []
    params = {
        'zone': zone,
        'encoding': 'json',
        'n': '100',
        'key': API_KEY,
        'facet': 'all',
        'include': 'links'
    }
    params['q'] = query
    for key, value in kwargs.items():
        params[f'l-{key}'] = value
        applied_facets.append(key)
    response = s.get(API_URL, params=params)
    data = response.json()
    total = int(data['response']['zone'][0]['records']['total']) 
    facets = get_facets(data)
    facets[:] = [f for f in facets if f.get('facet') not in applied_facets]
    # Keep going until we either have less than 100 results or we run out of facets
    while total == 0 or total > 100 or len(facets) == 0:
        # Select another facet
        new_facet = random.choice(facets)
        # Add it to the applied list
        applied_facets.append(new_facet)
        # Add the new facet as a parameter
        params[f'l-{new_facet["facet"]}'] = random.choice(new_facet['terms'])
        # Get the new results
        response = s.get(API_URL, params=params)
        data = response.json()
        # Get the facets available from the new search
        facets = get_facets(data)
        # Remove facets from the list that have already been applied
        facets[:] = [f for f in facets if f.get('facet') not in applied_facets]
        total = int(data['response']['zone'][0]['records']['total'])
        # print(total)
        # print(response.url)
    return random.choice(data['response']['zone'][0]['records']['work'])


def get_zones(data):
    zones = []
    for zone in data['response']['zone']:
        if int(zone['records']['total']) > 0:
            zones.append(zone['name'])
    return zones


def get_random_work(zone=None, query=None, add_word=False, add_number=False, **kwargs):
    zones = []
    params = {
        'encoding': 'json',
        'n': '0',
        'key': API_KEY,
    }
    if zone:
        params['zone'] = zone
    else:
        params['zone'] = 'book,article,picture,map,music,collection'
    params = set_query(params, query, add_word)
    for key, value in kwargs.items():
        params[f'l-{key}'] = value
    while len(zones) == 0:
        params = set_query(params, query, add_word, add_number)
        response = s.get(API_URL, params=params)
        #print(response.url)
        data = response.json()
        zones = get_zones(data)
    work = get_random_work_from_zone(zone=random.choice(zones), query=params['q'], **kwargs)
    return work

## Get a work from Chinese-Australian Historical Images in Australia (CHIA)

This is a collection were facets aren't terribly useful in slicing up the results because the range of values is very limited. However, items in this collection do have numeric identifiers, and so including the `add_number` parameter seems to help divide it up into chunks of less than 100.

In [128]:
get_random_work(query='(nuc:"VMUS:CHIA")', add_number=True)

{'id': '197894841',
 'url': '/work/197894841',
 'troveUrl': 'https://trove.nla.gov.au/work/197894841',
 'title': 'Bennett Street showing Wing Cheong Sing',
 'type': ['Photograph'],
 'holdingsCount': 1,
 'versionCount': 1,
 'relevance': {'score': '620.87946', 'value': 'very relevant'},
 'identifier': [{'type': 'url',
   'linktype': 'notonline',
   'value': 'http://www.chia.chinesemuseum.com.au/objects/D003054.htm'}]}

## Get a photo with a thumbnail

Using the new `imageInd` parameter in the query to find records with thumbnails.

In [129]:
get_random_work(zone='picture', q='imageInd:thumbnail', format='Photograph')

{'id': '167583715',
 'url': '/work/167583715',
 'troveUrl': 'https://trove.nla.gov.au/work/167583715',
 'title': '[Young man and woman with musical instruments]',
 'issued': '1880-1900',
 'type': ['Photograph'],
 'holdingsCount': 1,
 'versionCount': 1,
 'relevance': {'score': '45.53745', 'value': 'very relevant'},
 'snippet': ['[Young man and woman <b>with</b> musical instruments] [picture].',
  '[Young man and woman <b>with</b> musical instruments]',
  ', playing an according, wearing frock <b>with</b> ruched skirt gathered <b>with</b> buttons, man standing beside her'],
 'identifier': [{'type': 'url',
   'linktype': 'fulltext',
   'value': 'http://handle.slv.vic.gov.au/10381/46040'},
  {'type': 'url',
   'linktype': 'thumbnail',
   'value': 'http://digital.slv.vic.gov.au/webclient/DeliveryManager?pid=311368'}]}

## Get a work tagged 'Japan'

You can include as many additional facets as you want. Here's an example using `publictag`.

In [123]:
get_random_work(publictag='Japan')

{'id': '6411867',
 'url': '/work/6411867',
 'troveUrl': 'https://trove.nla.gov.au/work/6411867',
 'title': 'The enigma of Japanese power : people and politics in a stateless nation / Karel van Wolferen',
 'contributor': ['Wolferen, Karel Van'],
 'issued': '1989-1993',
 'type': ['Book', 'Audio book', 'Book/Illustrated'],
 'holdingsCount': 52,
 'versionCount': 10,
 'relevance': {'score': '0.0035119425', 'value': 'vaguely relevant'},
 'identifier': [{'type': 'url',
   'linktype': 'fulltext',
   'linktext': 'Direct link to full text: http://openlibrary.org/details/enigmaofjapanese00kare',
   'value': 'http://openlibrary.org/books/OL16828742M'},
  {'type': 'url',
   'linktype': 'fulltext',
   'linktext': 'Direct link to full text: http://openlibrary.org/details/enigmaofjapanese00wolf_0',
   'value': 'http://openlibrary.org/books/OL2217099M'},
  {'type': 'url',
   'linktype': 'fulltext',
   'linktext': 'Direct link to full text: http://openlibrary.org/details/enigmaofjapanese00wolf',
   'val

## Display a random thumbnail

Just to cheer myself up a bit...

In [124]:
record = get_random_work(zone='picture', q='imageInd:thumbnail', format='Photograph')
for link in record['identifier']:
    if link['linktype'] == 'thumbnail':
        url = link['value']
        break
display(HTML(f'<img src="{url}">'))

## Speed test

In [131]:
%%timeit
get_random_work()

The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached.
3.08 s ± 1.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
