# Using the python allotaxonometer

Download `example_data` from the repo to run the examples below. You may need to adjust paths based on where you store this data.

Examples include:
- [Optionally convert data](##Optionally-convert-data)
- [Basic usage](##Basic-usage)
- [HTML only](##HTML-only)
- [Many comparisons](##Many-comparisons)
- [Multiprocessing](##Multiprocessing)

In [1]:
from itertools import combinations
import multiprocessing as mp
import os

from py_allotax.generate_svg import generate_svg
from py_allotax.utils import convert_csv_data, get_rtd

## Optionally convert data

The `generate_svg` method takes in `.json` files and expects the keys/columns: "types", "counts", "totalunique", "probs". If you have `.csv` data (for instance, because the web-app tool requires `.csv`), you can convert it to the expected format using the `convert_data` method.

In [2]:
convert_csv_data("json", "example_data/boys_1968.csv", "example_data/boys_2018.csv")
# find the files converted to .json files in convert/ directory

## Basic usage

This example saves the HTML and the PDF

Arguments to provide: system 1, system 2, output file name, alpha

In [3]:
file_name = "notebook_test"

generate_svg(
    "example_data/boys_2022.json",
    "example_data/boys_2023.json",
    f"example_charts/{file_name}.pdf",
    "0.17",
    "Boys 2022",
    "Boys 2023"
)

PDF saved to example_charts/notebook_test.pdf


## HTML only

This example saves only the HTML, which can be opened in browser. You can also use the HTML in whichever rendering method you want outside of what's currently available in this package.

In [4]:
file_name = "html_only_test"

generate_svg(
    "example_data/boys_2022.json",
    "example_data/boys_2023.json",
    f"example_charts/{file_name}.html",
    "0.17",
    "Boys 2022",
    "Boys 2023",
    "html"
)

HTML saved to example_charts/html_only_test.html


## Get the underlying instrument output

In [6]:
result = get_rtd("example_data/boys_2022.json", "example_data/boys_2023.json", "0.17", top_n=0) # all words
rtd = result['rtd']
words_df = result['words_df']  # columns: type, rank_diff, metric
words_df

Processing data...
RTD + barData (1292/1292 words) saved to /var/folders/xb/yr7ybhzx6sg2hb1smfzmt9wm0000gp/T/tmpp6n5rwid.json


Unnamed: 0,type,rank1,rank2,rank_diff,metric
0,Grover,413.5,20.0,393.5,0.001209
1,Cleveland,585.0,98.0,487.0,0.000665
2,Cleve,1175.5,190.5,985.0,0.000612
3,Dale,1175.5,349.0,826.5,0.000413
4,Garfield,110.5,286.5,-176.0,-0.000404
...,...,...,...,...,...
1287,Guy,68.0,68.0,0.0,0.000000
1288,Homer,76.0,76.0,0.0,0.000000
1289,Virgil,163.5,163.5,0.0,0.000000
1290,Forest,309.0,309.0,0.0,0.000000


## Many comparisons

This example follows the idea of running many system comparisons to create a 'flipbook'. This is useful for visualizing the differences, for example, over years, over many systems, etc.

You would likely have a folder with many data files, and you would loop through them to create the comparisons. We will use `convert/` to demonstrate this.

In [7]:
years = [1968, 2018, 2022, 2023]

# compare each (we could add some logic to specify a different alpha for each comparison)
yr_combos = list(combinations(years, 2))
print(yr_combos)

[(1968, 2018), (1968, 2022), (1968, 2023), (2018, 2022), (2018, 2023), (2022, 2023)]


In [8]:
# Will take ~40 seconds to run

for comparison in yr_combos:
    system1 = comparison[0]
    system2 = comparison[1]
    generate_svg(
        f"example_data/boys_{system1}.json",
        f"example_data/boys_{system2}.json",
        f"example_charts/boys_{system1}_{system2}.pdf",
        "0.17",
        f"Boys {system1}",
        f"Boys {system2}"
    )

PDF saved to example_charts/boys_1968_2018.pdf
PDF saved to example_charts/boys_1968_2022.pdf
PDF saved to example_charts/boys_1968_2023.pdf
PDF saved to example_charts/boys_2018_2022.pdf
PDF saved to example_charts/boys_2018_2023.pdf
PDF saved to example_charts/boys_2022_2023.pdf


## Multiprocessing

This example demonstrates how to use multiprocessing to speed up the comparison of many systems (assuming your machine has available computing resources). If unfamiliar, multiprocessing lets you speed up tasks that can be divided into independent parts. In this case, each system comparison is an independent process that can be run parallel to the others. Note that this will use more memory on your machine, so be sure to have enough available or close down other programs. 

We use the same example above and different alphas to demonstrate creating an arglist for all the combinations we want to test. Think of it as specifying each different variable in a simulation.

In [9]:
# Get the number of available CPUs
n_processes = os.cpu_count()
print("Number of available CPUs:", n_processes)

# Create a list of all combinations of parameters (list of lists--1 per function call)
arglist = []
for combo in yr_combos:
    for alpha in ["0.17", "0.6667"]:
        # Append all required args for the method to the arglist
        system1 = f"example_data/boys_{combo[0]}.json"
        system2 = f"example_data/boys_{combo[1]}.json"
        arglist.append(
            (system1,
             system2,
             f"example_charts/boys_{combo[0]}_{combo[1]}_{alpha}.pdf",
             alpha,
             f"Boys {combo[0]}",
             f"Boys {combo[1]}")
        )

# Will take ~14 seconds to run (on 10 CPUs); note the print statements may not be in order
with mp.Pool(processes=n_processes) as pool:
    # Run the method
    pool.starmap(generate_svg, arglist)

Number of available CPUs: 10
PDF saved to example_charts/boys_1968_2023_0.6667.pdf
PDF saved to example_charts/boys_1968_2023_0.17.pdf
PDF saved to example_charts/boys_1968_2022_0.6667.pdf
PDF saved to example_charts/boys_1968_2022_0.17.pdf
PDF saved to example_charts/boys_2022_2023_0.17.pdf
PDF saved to example_charts/boys_2022_2023_0.6667.pdf
PDF saved to example_charts/boys_2018_2023_0.17.pdf
PDF saved to example_charts/boys_2018_2022_0.6667.pdf
PDF saved to example_charts/boys_1968_2018_0.6667.pdf
PDF saved to example_charts/boys_1968_2018_0.17.pdf
PDF saved to example_charts/boys_2018_2023_0.6667.pdf
PDF saved to example_charts/boys_2018_2022_0.17.pdf
