# Exploring facets

<p class="alert alert-info">New to Jupyter notebooks? Try <a href="getting-started/Using_Jupyter_notebooks.ipynb"><b>Using Jupyter notebooks</b></a> for a quick introduction.</p>

Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove.

In [11]:
import os
import warnings
from operator import itemgetter

warnings.simplefilter(action="ignore", category=FutureWarning)

import altair as alt
import pandas as pd
import requests

# Make sure data directory exists
os.makedirs("data", exist_ok=True)

In [12]:
%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv

Insert your API key between the quotes.

In [None]:
# This creates a variable called 'api_key', paste your key between the quotes
api_key = ""

# Use an api key value from environment variables if it is available (useful for testing)
if os.getenv("TROVE_API_KEY"):
    api_key = os.getenv("TROVE_API_KEY")

# This displays a message with your key
print("Your API key is: {}".format(api_key))

In [14]:
api_search_url = "https://api.trove.nla.gov.au/v2/result"

Set up our query parameters. We want everything, so we set the `q` parameter to be a single space.

In [15]:
params = {
    "q": " ",  # A space to search for everything
    "facet": "format",
    "zone": "book",
    "key": api_key,
    "encoding": "json",
    "n": 1,
}

In [16]:
response = requests.get(api_search_url, params=params)
data = response.json()

In [17]:
def facet_totals():
    """
    Loop through facets saving terms and counts.
    Returns a list of dictionaries.
    """
    facets = []
    # Sort alphabetically by facet name
    facet_list = sorted(
        data["response"]["zone"][0]["facets"]["facet"]["term"], key=itemgetter("search")
    )
    for term in facet_list:
        term_count = int(term["count"])
        if "term" in term:
            # There be sub-terms!
            for subterm in sorted(term["term"], key=itemgetter("search")):
                facets.append(
                    {"facet": subterm["search"], "total": int(subterm["count"])}
                )
                # Subtract the subterm count from the term count
                term_count = term_count - int(subterm["count"])
                # print('{:<50} {:,}'.format(subterm['search'], int(subterm['count'])))
        # print('{:<50} {:,}'.format(term['search'], term_count))
        facets.append({"facet": term["search"], "total": term_count})
    return pd.DataFrame(facets)


facet_totals = facet_totals()
facet_totals

Unnamed: 0,facet,total
0,Archived website,24241
1,Audio book,292212
2,Book/Braille,36517
3,Book/Illustrated,7915819
4,Book/Large print,115127
5,Book,9148457
6,Conference Proceedings,517361
7,Microform,891376
8,Thesis,645896


Now we can create a bar chart using Altair. The `x` values will be the zone names, and the `y` values will be the totals.

In [18]:
# Comment out either or both of these lines if not necessary
# Sort by total (highest to lowest) and take the top twenty
# top_facets = facet_totals.sort_values(by="total", ascending=False)[:20]

In [21]:
# Create a bar chart
alt.Chart(facet_totals).mark_bar().encode(
    x="total:Q", y="facet:N", tooltip=["facet:N", alt.Tooltip("total:Q", format=",")]
)

In [20]:
facet_totals.to_csv("data/facet-{}.csv".format(params["facet"]), index=False)

Once you've saved this file, you can download it from the workbench [data directory](data).

## Going further

For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see [Visualise Trove newspaper searches over time](https://glam-workbench.github.io/trove-newspapers/#visualise-trove-newspaper-searches-over-time).

----

Created by [Tim Sherrratt](https://timsherratt.org) for the [GLAM workbench](https://glam-workbench.net/). Support this project by [becoming a GitHub sponsor](https://github.com/sponsors/wragge?o=esb).