# Exploring facets

<p class="alert alert-info">New to Jupyter notebooks? Try <a href="getting-started/Using_Jupyter_notebooks.ipynb"><b>Using Jupyter notebooks</b></a> for a quick introduction.</p>

Facets aggregate collection data in interesting and useful ways, allowing us to build pictures of the collection. This notebook shows you how to get facet data from Trove.

In [1]:
import os

import altair as alt
import pandas as pd
import requests

# Make sure data directory exists
os.makedirs("data", exist_ok=True)

In [2]:
%%capture
# Load variables from the .env file if it exists
# Use %%capture to suppress messages
%load_ext dotenv
%dotenv

Insert your API key between the quotes.

In [3]:
# This creates a variable called 'api_key', paste your key between the quotes
API_KEY = ""

# Use an api key value from environment variables if it is available (useful for testing)
if os.getenv("TROVE_API_KEY"):
    API_KEY = os.getenv("TROVE_API_KEY")

# This displays a message with your key
print("Your API key is: {}".format(API_KEY))

Your API key is: gq29l1g1h75pimh4


In [4]:
api_search_url = "https://api.trove.nla.gov.au/v3/result"

Set up our query parameters. We want everything, so we set the `q` parameter to be a single space.

In [5]:
params = {
    "q": " ",  # A space to search for everything
    "facet": "format",
    "category": "book",
    "encoding": "json",
    "n": 1,
}

headers = {"X-API-KEY": API_KEY}

In [6]:
response = requests.get(api_search_url, params=params)
data = response.json()

In [7]:
def facet_totals(data):
    """
    Loop through facets saving terms and counts.
    Returns a list of dictionaries.
    """
    facets = []
    try:
        terms = data["category"][0]["facets"]["facet"][0]["term"]
    except KeyError:
        pass
    else:
        for term in terms:
            facets.append({"facet": term["search"], "total": int(term["count"])})
            if "term" in term:
                # There be sub-terms!
                for subterm in term["term"]:
                    facets.append(
                        {"facet": subterm["search"], "total": int(subterm["count"])}
                    )
    return pd.DataFrame(facets)


facet_totals = facet_totals(data)
facet_totals.sort_values("facet")

Unnamed: 0,facet,total
21,Archived website,33660
4,Article,7377170
5,Article/Abstract,99
6,Article/Book chapter,67276
7,Article/Conference paper,112605
8,Article/Journal or magazine article,1971332
9,Article/Other article,4770227
10,Article/Report,466581
11,Article/Review,285937
12,Article/Working paper,73468


In [8]:
# Assign a group by splitting
facet_totals["group"] = facet_totals["facet"].apply(lambda x: x.split("/")[0])

Now we can create a bar chart using Altair. The `x` values will be the zone names, and the `y` values will be the totals.

In [9]:
# Comment out either or both of these lines if not necessary
# Sort by total (highest to lowest) and take the top twenty
# top_facets = facet_totals.sort_values(by="total", ascending=False)[:20]

In [10]:
# Create a bar chart
alt.Chart(facet_totals).mark_bar().encode(
    x="total:Q",
    y="facet:N",
    color="group:N",
    tooltip=["facet:N", alt.Tooltip("total:Q", format=",")],
)

In [11]:
facet_totals.to_csv(f"data/facet-{params['facet']}.csv", index=False)

Once you've saved this file, you can download it from the workbench [data directory](data).

## Going further

For an in depth exploration of facets in the newspaper zone and how they can help us visualise change over time, see [Visualise Trove newspaper searches over time](https://glam-workbench.github.io/trove-newspapers/#visualise-trove-newspaper-searches-over-time).

----

Created by [Tim Sherrratt](https://timsherratt.org) for the [GLAM workbench](https://glam-workbench.net/). Support this project by [becoming a GitHub sponsor](https://github.com/sponsors/wragge?o=esb).