## Imports

First, we need to import some libraries which we'll use later in our code.

In [None]:
%matplotlib inline

import pandas as pd
import requests
import matplotlib.pyplot as plt
import seaborn as sns

We'll also create some utility functions which we'll user later on.

In [None]:
def save_file(title):
    """Saves PNG files to a `visualizations` directory."""
    filename = title.lower().replace(" ", "_").replace("(", "").replace(")", "")
    plt.savefig(f"visualizations/{filename}_line_graph.png")

In [None]:
def construct_title(subject, collection, start_year=None, end_year=None):
    """Creates title for visualizations"""
    collection_title = requests.get(f"https://api.rockarch.org/collections/{collection}").json()["title"]
    if all([start_year, end_year]):
        return f"{subject} {start_year}-{end_year} ({collection_title})"
    return f"{subject} ({collection_title})"
    

## Single Line Graphs

To create a single line graph which plots the occurrence of a given term over time, we'll first need to fetch data from the RAC Collections API.

In [None]:
def get_timeline_data(collection_id, querystrings, start_year, end_year):
    """Fetch timeline data using the minimap endpoint."""
    years = []
    counts = []
    collection = requests.get(f"https://api.rockarch.org/collections/{collection_id}").json()
    start_year = start_year if start_year else sorted([int(d["begin"]) for d in collection["dates"]])[0]
    end_year = end_year if end_year else sorted([int(d["end"]) for d in collection["dates"]])[-1]
    dataframe = pd.DataFrame({'year': list(range(start_year, end_year))})
    for query in querystrings:
        query_counts = []
        for year in range(start_year, end_year):
            hits = requests.get(f"https://api.rockarch.org/collections/{collection_id}/minimap", params={"query": query, "start_date": year}).json()["hits"]
            query_counts.append(len(hits))
        dataframe[query] = query_counts
    return dataframe

Then we need to wrap the `get_timeline_data` function in another function which plots that data on a single line.

In [None]:
def draw_lineplot(collection, query, start_year=None, end_year=None):
    """Draw single lineplot."""
    single_lineplot_data = get_timeline_data(collection, [query], start_year, end_year)
    title = construct_title(query, collection, start_year, end_year)
    sns.set_theme(palette="bright", font="monospace")
    ax = sns.lineplot(
        x=single_lineplot_data['year'],
        y=single_lineplot_data[query]).set_title(title)
    save_file(title)

Now we can call that `draw_lineplot` function with two arguments.
- The first is an identifier for the collection we want to search in.
- The second is the the term to be queried for within the collection. It's okay for this value to contain spaces.

Note that both arguments are passed as strings (meaning they're wrapped in quotes).

You can update one or both of those values and execute the cell below again. Each time you execute it, a PNG version of the graph displayed will be saved to a `visualizations` directory.

In [None]:
draw_lineplot("2HnhFZfibK6SVVu86skz3k", "exchange")

By default, `draw_lineplot` uses the collection start and end dates as the beginning and end dates of the graph. However, you might want to zoom in on a certain time period. To do this, you can pass `start_year` and/or `end_year` arguments.

In [None]:
draw_lineplot("2HnhFZfibK6SVVu86skz3k", "exchange", start_year=1950, end_year=1990)

## Multiple Line Graphs

In order to draw multiple lines on a graph, we need to create a new function called `draw_lineplots`. This reuses the `get_timeline_data` function we wrote earlier.

In [None]:
def draw_lineplots(collection, querystrings, subject, start_year=None, end_year=None):
    """Draw multiple lineplots."""
    data = get_timeline_data(collection, querystrings, start_year, end_year)
    chart_title = construct_title(subject, collection, start_year, end_year)
    sns.set_theme(palette="bright", font="monospace")
    ax = sns.lineplot(x='year', y='value', hue='variable', 
             data=pd.melt(data, ['year'])).set_title(chart_title)
    save_file(chart_title)

Now we can draw multiple lines on a single graph by passing two arguments to `draw_lineplots`:
- The first argument is the identifier of the collection we want to search in. Note that this argument must be passed as a string (wrapped in quotation marks)
- The second argument is a list of terms to be queried. It's okay for these terms to have spaces in them. This argument must be passed as a list of strings, or a series of values wrapped in quotes, comma-separated, and contained within brackets.

You can update one or both of those values and execute the cell below again. Each time you execute it, a PNG version of the graph displayed will be saved to a visualizations directory.

In [None]:
draw_lineplots(
    "2HnhFZfibK6SVVu86skz3k", 
    ["fellowship", "scholarship", "grants to individuals", "training", "exchange"], 
    "Funding strategies")

By default, `draw_lineplots` uses the collection start and end dates as the beginning and end dates of the graph. However, you might want to zoom in on a certain time period. To do this, you can pass `start_year` and/or `end_year` arguments.

In [None]:
draw_lineplots(
    "2HnhFZfibK6SVVu86skz3k", 
    ["fellowship", "scholarship", "grants to individuals", "training", "exchange"], 
    "Funding strategies",
    start_year=1950,
    end_year=2000)

## Distribution Heatmaps

In order to draw heatmaps which show the distribution of hits over a collection, we need to create a new function which fetches data from the RAC Collections API.

In [None]:
def get_distribution_data(collection, query, axis_size=12):
    """Fetch distribution data from the minimap endpoint."""
    minimap_data = requests.get(
        f"https://api.rockarch.org/collections/{collection}/minimap", params={"query": query}).json()
    squares = [0] * (axis_size ** 2)
    total_per_square = minimap_data["total"] / (axis_size ** 2)

    for hit in minimap_data["hits"]:
        square_idx = int((hit["index"] / total_per_square) - 1)
        squares[square_idx] += 1

    data = [squares[i:i + axis_size] for i in range(0, len(squares), axis_size)]
    return data

Now we can wrap that `get_distribution_data` method in a function which takes that data and draws a heatmap.

In [None]:
def draw_distribution_heatmap(collection, query):
    """Draw distribution heatmap."""
    data = get_distribution_data(collection, query)
    title = construct_title(query, collection)
    sns.set_theme(palette="bright", font="monospace")
    sns.heatmap(
        data, 
        cmap="Oranges", 
        xticklabels=False, 
        yticklabels=False, 
        square=True).set_title(title)
    save_file(title)

Now we can draw a heatmap by passing two arguments to the `draw_distribution_heatmap` function:
- The identifier for a collection we want to search in.
- The term we want to query. It's okay for this term to contain spaces.

Note that both of these arguments must be passed as strings, in other words, wrapped in quotation marks.

In [None]:
draw_distribution_heatmap("WY7fpswEV3oLhyjiArpHES", "yellow fever")