# Testing the Models

This notebook contains the steps for testing how well the Greek-Latin Identification and Author Reconciliation models perform on data from HathiTrust.

## Load the Necessary Modules

This notebook uses the following modules:

- `csv`: for processing the output of the model
- `codecarbon`: to track energy usage
- `pandas`: for opening and working with the CSV files
- `time`: for timing operations
- `torch`: for machine learning operations
- `transformers`: for loading the models and using them
- `utilities`: for some local helper functions. Note: this is a local file, not a library available through repos like condaforge or pypl.

In [22]:
import csv
from codecarbon import EmissionsTracker
import pandas as pd
import time
import torch
import torch.nn.functional as F
from transformers.models.auto.tokenization_auto import AutoTokenizer
from transformers.models.auto.modeling_auto import AutoModelForSequenceClassification
from transformers.models.distilbert import DistilBertForSequenceClassification, DistilBertTokenizerFast
import utilities as utilities

## Select the Device for Processing

The following block chooses "CUDA" if an NVIDIA GPU is available; "MPS" if an Apple Silicon GPU is available; or "CPU" in the absence of a GPU.

In [23]:
# General device selection for Colab (CUDA), Mac (MPS), or CPU fallback
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using device: CUDA (GPU)")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Using device: MPS (Apple Silicon GPU)")
else:
    device = torch.device("cpu")
    print("Using device: CPU")

Using device: MPS (Apple Silicon GPU)


## Load the DLL Catalog Data

The authority and work records from the DLL Catalog are loaded from CSV files and converted into Python dictionaries. These will be used as lookup tables to translate the outputs from the Author Reconciliation model into more comprehensible information for humans (e.g., "Julius Caesar", not "A4644").

In [24]:
# Read in the authors data
authors = pd.read_csv('../data/authors_db.csv',encoding='utf-8',quotechar='"')
# Read in the works data
works = pd.read_csv('../data/works_db.csv',encoding='utf-8',quotechar='"')

# Change the names of the columns to be lower case without spaces or punctuation
authors = authors.rename(columns={'Variant':'variant_name','Authorized Name':'authorized_name','DLL Identifier (Author)':'dll_id_author'})
works = works.rename(columns={'Title':'title','DLL Identifier (Work)':'dll_id_work','DLL Identifier (Author)': 'dll_id_author'})

# Prepare the lookup dictionaries of variant author names and titles
variant_to_authorized, title_to_work = utilities.prepare_dicts(authors,works)

## Load the Models

The following cells have two options. By default, they are set to load the versions of the models in repositories on HuggingFace. However, if you have cloned this repository and want to run the version that was created when you ran the fine-tuning notebooks (`python/fine_tune_distilmbert_author_local.ipynb` and `python/fine_tune_distilmbert_greek_local.ipynb`), then you can comment out the Huggingface cells and uncomment the local cells.

### Greek-Latin Identification Model (HuggingFace Version)

In [25]:
# Path to the Greek-Latin Identification Model
greek_latin_model_repo = "sjhuskey/distilbert_multilingual_cased_greek_latin_classifier"


# Load the model and tokenizer from Hugging Face Hub
greek_latin_model = DistilBertForSequenceClassification.from_pretrained(greek_latin_model_repo)
# Move the model to the appropriate device
greek_latin_model.to(device)
# Load the tokenizer
greek_latin_tokenizer = DistilBertTokenizerFast.from_pretrained(greek_latin_model_repo)

print("Model and tokenizer loaded successfully!")
# Verify label mappings
label2id = greek_latin_model.config.label2id
id2label = greek_latin_model.config.id2label

print("Label-to-ID Mapping:", label2id)
print("ID-to-Label Mapping:", id2label)

config.json:   0%|          | 0.00/701 [00:00<?, ?B/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


model.safetensors:   0%|          | 0.00/541M [00:00<?, ?B/s]

Model and tokenizer loaded successfully!
Label-to-ID Mapping: {'Greek': 0, 'Latin': 1}
ID-to-Label Mapping: {0: 'Greek', 1: 'Latin'}


### Greek-Latin Identification Model (Local)

If you choose this option, be sure to build the model first by running the notebook at `python/fine_tune_distilmbert_greek_local.ipynb` and uncomment the code in the next cell.

In [26]:
# # Path to the Greek-Latin Identification Model
# local_greek_model_path = "../greek"

# # Load the model and tokenizer from Hugging Face Hub
# greek_latin_model = DistilBertForSequenceClassification.from_pretrained(local_greek_model_path)
# # Move the model to the appropriate device
# greek_latin_model.to(device)
# # Load the tokenizer
# greek_latin_tokenizer = DistilBertTokenizerFast.from_pretrained(local_greek_model_path)

# print("Model and tokenizer loaded successfully!")
# # Verify label mappings
# label2id = greek_latin_model.config.label2id
# id2label = greek_latin_model.config.id2label

# print("Label-to-ID Mapping:", label2id)
# print("ID-to-Label Mapping:", id2label)

### Author Reconciliation Model

In [27]:
# Path to Author Reconciliation Model
author_matching_repo = 'sjhuskey/distilbert_multilingual_cased_latin_author_identifier'

# Load the model for author reconciliation
author_matching = AutoModelForSequenceClassification.from_pretrained(author_matching_repo)
# Move the model to the appropriate device
author_matching.to(device)
# Load the tokenizer for author reconciliation
author_matching_tokenizer = AutoTokenizer.from_pretrained(author_matching_repo)

print("Author Reconciliation Model loaded successfully!")

# Verify label mappings
label2id = author_matching.config.label2id
id2label = author_matching.config.id2label

print("Label-to-ID Mapping:", label2id)
print("ID-to-Label Mapping:", id2label)

config.json:   0%|          | 0.00/124k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/551M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.92M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Author Reconciliation Model loaded successfully!
Label-to-ID Mapping: {'A1868': 0, 'A1870': 1, 'A2181': 2, 'A2491': 3, 'A2492': 4, 'A2493': 5, 'A2494': 6, 'A2495': 7, 'A2508': 8, 'A2755': 9, 'A2868': 10, 'A2870': 11, 'A2871': 12, 'A2872': 13, 'A2873': 14, 'A2874': 15, 'A2875': 16, 'A2876': 17, 'A2877': 18, 'A2878': 19, 'A2879': 20, 'A2880': 21, 'A2881': 22, 'A2882': 23, 'A2883': 24, 'A2884': 25, 'A2885': 26, 'A2886': 27, 'A2887': 28, 'A2888': 29, 'A2889': 30, 'A2890': 31, 'A2891': 32, 'A2892': 33, 'A2893': 34, 'A2894': 35, 'A2895': 36, 'A2896': 37, 'A2897': 38, 'A2898': 39, 'A2901': 40, 'A2902': 41, 'A2903': 42, 'A2904': 43, 'A2905': 44, 'A2906': 45, 'A2907': 46, 'A2908': 47, 'A2909': 48, 'A2910': 49, 'A2911': 50, 'A2912': 51, 'A2913': 52, 'A2914': 53, 'A2915': 54, 'A2916': 55, 'A2917': 56, 'A2918': 57, 'A2919': 58, 'A2920': 59, 'A2921': 60, 'A2922': 61, 'A2923': 62, 'A2924': 63, 'A2925': 64, 'A2926': 65, 'A2927': 66, 'A2928': 67, 'A2929': 68, 'A2930': 69, 'A2931': 70, 'A2932': 71, 'A2

### Author Reconciliation Model (Local)

If you choose this option, be sure to build the model first by running the notebook at `python/fine_tune_distilmbert_author_local.ipynb` and uncomment the code in the next cell.

In [28]:
# # Path to Author Reconciliation Model
# local_author_model_path = "../authors"

# # Load the model for author reconciliation
# author_matching = AutoModelForSequenceClassification.from_pretrained(local_author_model_path)
# # Move the model to the appropriate device
# author_matching.to(device)
# # Load the tokenizer for author reconciliation
# author_matching_tokenizer = AutoTokenizer.from_pretrained(local_author_model_path)

# print("Author Reconciliation Model loaded successfully!")

# # Verify label mappings
# label2id = author_matching.config.label2id
# id2label = author_matching.config.id2label

# print("Label-to-ID Mapping:", label2id)
# print("ID-to-Label Mapping:", id2label)

## Metadata Processing Functions

The following functions are necessary for providing comprehensible output from the models' inferences.

### Greek-Latin Identification

- `classify_author_language`: Tokenizes authors' names from the incoming metadata records and passes them to the Greek-Latin Identification model. Returns the predicted label ("Greek" or "Latin") and the confidence score for the prediction.
- `classify_and_split_by_language`: Returns three Pandas dataframes—`classified_df`, with all results from the Greek-Latin Identification model; `greek_df`, with results labeled "Greek"; `latin_df`, with results labeled "Latin". The latter will be the input for the Author Reconciliation model.

Note that these functions process one input at a time. There is potential for speeding up the process by batching the inputs.

In [29]:
# Language Classification Functions
def classify_author_language(input_author):
    """Classify the author's language as Greek or Latin using the fine-tuned model."""
    if not isinstance(input_author, str):
        return "Unknown", 0.0

    # Tokenize and encode the input author name
    inputs = greek_latin_tokenizer(input_author, return_tensors="pt", truncation=True, padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = greek_latin_model(**inputs)
    logits = outputs.logits.detach().cpu()

    # Apply softmax to logits to get probabilities
    probabilities = F.softmax(logits, dim=-1).numpy()
    predicted_class = probabilities.argmax(axis=-1)[0]
    confidence = probabilities.max()  # Highest probability

    # Use the id2label mapping for label names
    predicted_label = greek_latin_model.config.id2label[predicted_class]
    print(f"Language Classification: {input_author} -> {predicted_label} (Confidence: {confidence:.4f})")

    return predicted_label, confidence

def classify_and_split_by_language(processed_df):
    """Classify authors as Greek or Latin, add language info, and split the dataframe."""
    language_results = []

    for _, row in processed_df.iterrows():
        input_author = row["author"]

        # Perform language classification
        author_language, language_confidence = classify_author_language(input_author)

        # Add the results to a new row while preserving existing data
        updated_row = {
            **row,  # Include all original columns
            "language": author_language,
            "language_confidence": language_confidence,
        }
        language_results.append(updated_row)

    # Create a dataframe with the updated rows
    classified_df = pd.DataFrame(language_results)

    # Split the dataframe into Greek and Latin subsets
    greek_df = classified_df[classified_df["language"] == "Greek"].reset_index(drop=True)
    latin_df = classified_df[classified_df["language"] == "Latin"].reset_index(drop=True)

    return classified_df, greek_df, latin_df

### Author Reconciliation Functions

- `distilbert_author_match`: Tokenizes authors' names and passes them to the Author Reconciliation model. Returns the predicted label ("DLL ID") and the confidence score for the prediction.
- `process_metadata`: Manages input and output for the Author Reconciliation model. Returns a dataframe with the results of the inference run.

Note that these functions process one input at a time. There is potential for speeding up the process by batching the inputs.

In [30]:
# Author Reconciliation Functions
def distilbert_author_match(input_author):
    if not isinstance(input_author, str):
        return None, 0.0

    inputs = author_matching_tokenizer(input_author, return_tensors="pt", truncation=True, padding=True)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = author_matching(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=-1).item()
    confidence = torch.softmax(logits, dim=-1).max().item()

    predicted_author_id = author_matching.config.id2label[predicted_class]
    for variant, author_info in variant_to_authorized.items():
        if author_info["author_id"] == predicted_author_id:
            return author_info, confidence
    return None, 0.0

def process_metadata(input_df):
    """Process input dataframe and match metadata."""
    results = []

    for _, row in input_df.iterrows():
        input_author_original = row["author"]
        input_author_normalized = utilities.normalize_author_name(input_author_original)
        print(f'Processing: {input_author_original}')

        # Author Matching
        distilbert_author, confidence = distilbert_author_match(input_author_original)
        if distilbert_author is None:
            print(f"No match found for author: {input_author_original}")
            continue
        # Collect Results
        results.append({
            "author": input_author_original,
            "normalized_author": input_author_normalized,
            "distilbert_author": distilbert_author,
            "confidence": confidence
        })
        print(f"Matched author: {distilbert_author}")

    return pd.DataFrame(results)

## Greek-Latin Identification Inferencing

The next few cells have to do with running the Greek-Latin Identification model on the data downloaded from the HathiTrust Digital Library. The original data file is in `../data/1908698974-1722799169.txt`. The cleaned and deduplicated version is at `../data/hathi2.csv`.

### Load the Data

In [31]:
# Load preprocessed, deduplicated hathi2.csv
input_df = pd.read_csv('../data/hathi2.csv', encoding='utf-8', quotechar='"') 
# Further clean the data by filling missing values with "Unknown"
input_df = utilities.clean_input(input_df)

### Start the emissions tracker for the Greek-Latin Identification model

The `EmissionsTracker` from `codecarbon` will track the energy used by the Greek-Latin Identification model during inferencing.

In [32]:
# Set up codecarbon's EmissionsTracker for the Greek-Latin Identification Model
tracker = EmissionsTracker(
    output_dir="../logs",
    output_file="greek_latin_identification_inference_emissions_log.csv"
)
tracker.start()

[codecarbon INFO @ 14:45:25] [setup] RAM Tracking...
[codecarbon INFO @ 14:45:25] [setup] GPU Tracking...
[codecarbon INFO @ 14:45:25] No GPU found.
[codecarbon INFO @ 14:45:25] [setup] CPU Tracking...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[codecarbon INFO @ 14:45:25] CPU Model on constant consumption mode: Apple M4 Pro
[codecarbon INFO @ 14:45:25] >>> Tracker's metadata:
[codecarbon INFO @ 14:45:25]   Platform system: macOS-15.5-arm64-arm-64bit
[codecarbon INFO @ 14:45:25]   Python version: 3.10.9
[codecarbon INFO @ 14:45:25]   CodeCarbon version: 2.2.2
[codecarbon INFO @ 14:45:25]   Available RAM : 24.000 GB
[codecarbon INFO @ 14:45:25]   CPU count: 12
[codecarbon INFO @ 14:45:25]   CPU model: Apple M4 Pro
[codecarbon INFO @ 14:45:25]   GPU co

### Set up a Timer for the Greek-Latin Identification Inferencing

In [33]:
start_time = time.time()

### Begin Inferencing with the Greek-Latin Identification Model

In [34]:
# Signal the start of the metadata processing
print('Classifying authors as Greek or Latin …')
# Use the classify_and_split_by_language() function to run the Greek-Latin classification model
classified_df, greek_df, latin_df = classify_and_split_by_language(input_df)
# Signal the end of the Greek-Latin classification
print("Done with classification.")

Classifying authors as Greek or Latin …
Language Classification: Du Creux, François, 1596?-1666. -> Latin (Confidence: 1.0000)
Language Classification: Meyer, Ernst H. F. 1791-1858. -> Latin (Confidence: 1.0000)
Language Classification: Laet, Joannes de, 1593-1649. -> Latin (Confidence: 1.0000)
Language Classification: Caesar, Julius -> Latin (Confidence: 0.9999)
Language Classification: Unknown -> Latin (Confidence: 1.0000)
Language Classification: Drexel, Jeremias, 1581-1638, -> Latin (Confidence: 1.0000)
Language Classification: Kircher, Athanasius, 1602-1680 -> Latin (Confidence: 1.0000)
Language Classification: Hincmar, Archbishop of Reims, approximately 806-882 -> Latin (Confidence: 1.0000)
Language Classification: Acosta, José de, 1540-1600, -> Latin (Confidence: 1.0000)
Language Classification: Lessius, Leonardus, 1554-1623 -> Latin (Confidence: 1.0000)
Language Classification: Riccioli, Giovanni Battista, 1598-1671, -> Latin (Confidence: 1.0000)
Language Classification: Guazz

[codecarbon INFO @ 14:45:40] Energy consumed for RAM : 0.000038 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:45:40] Energy consumed for all CPUs : 0.000177 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:45:40] 0.000215 kWh of electricity used since the beginning.


Language Classification: Lachmann, Karl, 1793-1851 -> Latin (Confidence: 1.0000)
Language Classification: Horace -> Latin (Confidence: 1.0000)
Language Classification: Horace -> Latin (Confidence: 1.0000)
Language Classification: Bluhme, Friedrich, 1881- -> Latin (Confidence: 1.0000)
Language Classification: Livy -> Latin (Confidence: 1.0000)
Language Classification: Aristides Quintilianus -> Greek (Confidence: 0.9990)
Language Classification: Virgil -> Latin (Confidence: 1.0000)
Language Classification: Sallust, 86 B.C.-34 B.C. -> Latin (Confidence: 1.0000)
Language Classification: Livy. -> Latin (Confidence: 1.0000)
Language Classification: Apuleius. -> Latin (Confidence: 1.0000)
Language Classification: Curtius Rufus, Quintus -> Latin (Confidence: 1.0000)
Language Classification: Martial -> Latin (Confidence: 1.0000)
Language Classification: Horace -> Latin (Confidence: 1.0000)
Language Classification: Clerval, A. 1859-1918 -> Latin (Confidence: 1.0000)
Language Classification: Park

[codecarbon INFO @ 14:45:55] Energy consumed for RAM : 0.000075 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:45:55] Energy consumed for all CPUs : 0.000354 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:45:55] 0.000429 kWh of electricity used since the beginning.


Language Classification: Tacitus, Cornelius -> Latin (Confidence: 1.0000)
Language Classification: Mela, Pomponius. -> Latin (Confidence: 1.0000)
Language Classification: Tacitus, Cornelius -> Latin (Confidence: 1.0000)
Language Classification: Virgil -> Latin (Confidence: 1.0000)
Language Classification: Virgil -> Latin (Confidence: 1.0000)
Language Classification: Lucretius Carus, Titus -> Latin (Confidence: 1.0000)
Language Classification: Tacitus, Cornelius -> Latin (Confidence: 1.0000)
Language Classification: Tacitus, Cornelius -> Latin (Confidence: 1.0000)
Language Classification: Livy -> Latin (Confidence: 1.0000)
Language Classification: Lucretius Carus, Titus -> Latin (Confidence: 1.0000)
Language Classification: Livy -> Latin (Confidence: 1.0000)
Language Classification: Porfyrius, P. Optianus, active 325. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius -> Latin (Confidence: 1.0000)
Language Classification: Lucian, of Samosata -> Greek (Confiden

[codecarbon INFO @ 14:46:10] Energy consumed for RAM : 0.000113 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:46:10] Energy consumed for all CPUs : 0.000531 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:46:10] 0.000644 kWh of electricity used since the beginning.


Language Classification: Aristotle. -> Greek (Confidence: 0.9989)
Language Classification: Aristotle. -> Greek (Confidence: 0.9989)
Language Classification: Torsellino, Orazio, 1545-1599. -> Latin (Confidence: 1.0000)
Language Classification: Ansegisus, Saint, Abbot of Fontenelle, approximately 770-833. -> Latin (Confidence: 1.0000)
Language Classification: Leunclavius, Johannes, 1533?-1593?. -> Latin (Confidence: 1.0000)
Language Classification: Schneidewein, Johannes, 1519-1568. -> Latin (Confidence: 1.0000)
Language Classification: Baglivi, Giorgio, 1668-1707. -> Latin (Confidence: 1.0000)
Language Classification: Baillou, Guillaume de, 1538-1616. -> Latin (Confidence: 1.0000)
Language Classification: Alpini, Prosper, 1553-1617. -> Latin (Confidence: 1.0000)
Language Classification: Munckerus, Philippus. -> Latin (Confidence: 1.0000)
Language Classification: Marsham, John, Sir, 1602-1685. -> Latin (Confidence: 1.0000)
Language Classification: Campanella, Tommaso, 1568-1639. -> Latin

[codecarbon INFO @ 14:46:25] Energy consumed for RAM : 0.000150 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:46:25] Energy consumed for all CPUs : 0.000708 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:46:25] 0.000858 kWh of electricity used since the beginning.


Language Classification: Unknown -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius. -> Latin (Confidence: 1.0000)
Language Classification: Polignac, Melchior de, 1661-1742? -> Latin (Confidence: 1.0000)
Language Classification: John Chrysostom, Saint, d. 407. -> Greek (Confidence: 0.9986)
Language Classification: Unknown -> Latin (Confidence: 1.0000)
Language Classification: Euler, Leonhard, 1707-1783. -> Latin (Confidence: 1.0000)
Language Classification: Juvenal. -> Latin (Confidence: 1.0000)
Language Classification: Labbe, Philippe, 1607-1667. -> Latin (Confidence: 1.0000)
Language Classification: Ephraem, Syrus, Saint, 303-373. -> Greek (Confidence: 0.9985)
Language Classification: Mumalluhi, Abu al-Harri al- -> Latin (Confidence: 1.0000)
Language Classification: Elias, of Nisibis, 975- -> Greek (Confidence: 0.9985)
Language Classification: Horace. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius. -> Latin (Confidence: 1.0000

[codecarbon INFO @ 14:46:40] Energy consumed for RAM : 0.000188 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:46:40] Energy consumed for all CPUs : 0.000886 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:46:40] 0.001073 kWh of electricity used since the beginning.


Language Classification: Nicolas de Lyre. -> Latin (Confidence: 1.0000)
Language Classification: Pontedera, Giulio -> Latin (Confidence: 1.0000)
Language Classification: Leeuwenhoek, Antonius van -> Latin (Confidence: 1.0000)
Language Classification: Hoffmann, Friedrich, 1660-1742. -> Latin (Confidence: 1.0000)
Language Classification: Morison, Robert. -> Latin (Confidence: 1.0000)
Language Classification: Parenti, Paolo Andrea -> Latin (Confidence: 1.0000)
Language Classification: Celso, Aulo Cornelio. -> Greek (Confidence: 0.8873)
Language Classification: Niccolò de Tudeschi, Arzobispo 1386-1445. -> Latin (Confidence: 1.0000)
Language Classification: Niccolò de Tudeschi, Arzobispo 1386-1445. -> Latin (Confidence: 1.0000)
Language Classification: Niccolò de Tudeschi, Arzobispo 1386-1445. -> Latin (Confidence: 1.0000)
Language Classification: Niccolò de Tudeschi, Arzobispo 1386-1445 -> Latin (Confidence: 1.0000)
Language Classification: Claro, Giulio 1525-1575. -> Latin (Confidence: 1.

[codecarbon INFO @ 14:46:55] Energy consumed for RAM : 0.000225 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:46:55] Energy consumed for all CPUs : 0.001063 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:46:55] 0.001288 kWh of electricity used since the beginning.


Language Classification: Ovidio Nasón, Publio, 43 a.C.-17 d.C. -> Latin (Confidence: 1.0000)
Language Classification: Sigonio, Carlo -> Latin (Confidence: 1.0000)
Language Classification: Apuleyo, Lucio, 125-180 d. C. -> Latin (Confidence: 1.0000)
Language Classification: Robortello, Francesco, 1516-1567 -> Latin (Confidence: 1.0000)
Language Classification: Sigonio, Carlo -> Latin (Confidence: 1.0000)
Language Classification: Foglietta, Uberto, 1518-1581 -> Latin (Confidence: 1.0000)
Language Classification: Curcio Rufo, Quinto. -> Latin (Confidence: 1.0000)
Language Classification: Manuzio, Paolo, 1512-1574 -> Latin (Confidence: 1.0000)
Language Classification: Flaminio, Marco Antonio, 1498-1550 -> Latin (Confidence: 1.0000)
Language Classification: Fara, Giovanni Francesco, 1543-1591 -> Latin (Confidence: 1.0000)
Language Classification: Schott, Andreas, 1552-1629. -> Latin (Confidence: 1.0000)
Language Classification: Lavezzoli Lebeti, Giacomo, m.1585. -> Latin (Confidence: 1.0000)

[codecarbon INFO @ 14:47:10] Energy consumed for RAM : 0.000263 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:47:10] Energy consumed for all CPUs : 0.001240 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:47:10] 0.001502 kWh of electricity used since the beginning.


Language Classification: Ficino, Marsilio, 1433-1499. -> Latin (Confidence: 1.0000)
Language Classification: Galeno. -> Greek (Confidence: 0.9990)
Language Classification: Graaf, Regnier de, 1641-1673. -> Latin (Confidence: 1.0000)
Language Classification: Hofmann, Kaspar. -> Latin (Confidence: 1.0000)
Language Classification: Hipócrates. -> Greek (Confidence: 0.9992)
Language Classification: Linden, Johan-Antonides vander, 1609-1664. -> Latin (Confidence: 1.0000)
Language Classification: Leoniceno, Niccolò. -> Latin (Confidence: 1.0000)
Language Classification: Galeno. -> Greek (Confidence: 0.9990)
Language Classification: Galeno. -> Greek (Confidence: 0.9990)
Language Classification: Brasavola, Antonio Musa, 1500-1555. -> Latin (Confidence: 1.0000)
Language Classification: Unknown -> Latin (Confidence: 1.0000)
Language Classification: Massa, Niccolo, 1489-1569. -> Latin (Confidence: 1.0000)
Language Classification: Platter, Felix -> Latin (Confidence: 1.0000)
Language Classification:

[codecarbon INFO @ 14:47:25] Energy consumed for RAM : 0.000300 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:47:25] Energy consumed for all CPUs : 0.001417 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:47:25] 0.001717 kWh of electricity used since the beginning.


Language Classification: Virgil. -> Latin (Confidence: 1.0000)
Language Classification: Ovidius Naso, Publius, 43 v. Chr.-ca. 18. -> Latin (Confidence: 1.0000)
Language Classification: Phaedrus. -> Latin (Confidence: 1.0000)
Language Classification: Sallust, 86 B.C.-34 B.C. -> Latin (Confidence: 1.0000)
Language Classification: Seneca, Lucius Annaeus, ca. 4 B.C.-65 A.D. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, M. Tullius, 106-43 v. Chr. -> Latin (Confidence: 1.0000)
Language Classification: Gellius, Aulus. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius. -> Latin (Confidence: 1.0000)
Language Classification: Tacitus, Cornelius. -> Latin (Confidence: 1.0000)
Language Classification: Cicero, Marcus Tullius. -> Latin (Confidence: 1.0000)
Language Classification: Catullus, Valerius. 

### Stop the Timer and Emissions Tracker for Greek-Latin Identification

In [35]:
# End the timer
end_time = time.time()
print(f"Execution time: {end_time - start_time:.2f} seconds")
# Stop the emissions tracker after training is complete
emissions = tracker.stop()
print(f"Estimated CO2 emissions for training: {emissions} kg")

[codecarbon INFO @ 14:47:31] Energy consumed for RAM : 0.000315 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:47:31] Energy consumed for all CPUs : 0.001486 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:47:31] 0.001801 kWh of electricity used since the beginning.


Execution time: 125.89 seconds
Estimated CO2 emissions for training: 0.0008525679137120403 kg


### Results of the Greek-Latin Identification Process

In [36]:
print(f"Number of rows in the Greek dataframe: {len(greek_df)}")
print(f"Number of rows in the Latin dataframe: {len(latin_df)}")
print(f"The classification step removed {len(classified_df)-len(latin_df)} records.")

Number of rows in the Greek dataframe: 1477
Number of rows in the Latin dataframe: 13446
The classification step removed 1477 records.


In [37]:
# Save the results to CSV files
classified_df.to_csv("../output/classified_metadata.csv", index=False, encoding='utf-8', quoting=csv.QUOTE_ALL)
greek_df.to_csv("../output/greek_authors.csv", index=False, encoding='utf-8', quoting=csv.QUOTE_ALL)
latin_df.to_csv("../output/latin_authors.csv", index=False, encoding='utf-8', quoting=csv.QUOTE_ALL)

In [38]:
# Print unique authors in the Greek dataframe for verification
for author in greek_df['author'].unique():
    print(author)

Arrian.
Euclid,
Herodotus.
Celsus, Aulus Cornelius.
Anacreon
John Chrysostom, Saint, -407
Clement, of Alexandria, Saint, approximately 150-approximately 215
Herodotus
Nemesius, Bp. of Emesa
Pindar
Theocritus.
Xenophon
Catherine, of Alexandria, Saint.
Aristophanes.
Index librorum prohibitorum.
Homer
Epiphanius, Saint, Bishop of Constantia in Cyprus, approximately 310-403.
Euripides
Philo, of Alexandria
Plato.
Plato
Plotinus
Demosthenes
Plutarch.
Homer.
Theophrastus.
Thucydides
Epictetus
Origen
Celsus, Aulus Cornelius
Epiphanius, Saint, Bishop of Constantia in Cyprus, approximately 310-403
Aristotle
Archimedes.
Dioscorides Pedanius, of Anazarbos
Nicomachus, of Gerasa.
Proclus, approximately 410-485
Müller, K. W.
Xenophon, of Ephesus
Gregory, of Nyssa, Saint, approximately 335-approximately 394.
Theocritus
Sextus, Empiricus
Philostratus, the Athenian, active 2nd century-3rd century
Longus
Stobaeus
Apollonius, of Athens.
Arrian
Philo, of Byzantium
Justin, Martyr, Saint
Hippocrates
Elias, 

Those results look pretty solid.

## Author Reconciliation Inferencing

The remaining cells take the `latin_df` output from the Greek-Latin Identification model and submit it to the Author Reconciliation model for inferencing.

### Set up the emissions tracker for the Author Identification model

This, too, uses the `EmissionsTracker` from `codecarbon`.

In [39]:
# Set up CodeCarbon's EmissionsTracker
tracker = EmissionsTracker(
    output_dir="../logs",
    output_file="author_reconciliation_inference_emissions_log.csv"
)
tracker.start()

[codecarbon INFO @ 14:47:31] [setup] RAM Tracking...
[codecarbon INFO @ 14:47:31] [setup] GPU Tracking...
[codecarbon INFO @ 14:47:31] No GPU found.
[codecarbon INFO @ 14:47:31] [setup] CPU Tracking...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
[codecarbon INFO @ 14:47:31] CPU Model on constant consumption mode: Apple M4 Pro
[codecarbon INFO @ 14:47:31] >>> Tracker's metadata:
[codecarbon INFO @ 14:47:31]   Platform system: macOS-15.5-arm64-arm-64bit
[codecarbon INFO @ 14:47:31]   Python version: 3.10.9
[codecarbon INFO @ 14:47:31]   CodeCarbon version: 2.2.2
[codecarbon INFO @ 14:47:31]   Available RAM : 24.000 GB
[codecarbon INFO @ 14:47:31]   CPU count: 12
[codecarbon INFO @ 14:47:31]   CPU model: Apple M4 Pro
[codecarbon INFO @ 14:47:31]   GPU co

### Start a timer for Author Identification Inferencing

In [40]:
start_time = time.time()

### Begin Inferencing with the Author Identification Model

The following cell uses the `process_metadata()` function defined earlier to send data to the model for inferencing.

In [41]:
output_df = process_metadata(latin_df)
print("Done with processing authors.")

Processing: Du Creux, François, 1596?-1666.
Matched author: {'authorized_name': 'graux, charles henri, 1852-1882', 'author_id': 'A3249'}
Processing: Meyer, Ernst H. F. 1791-1858.
Matched author: {'authorized_name': 'meyer, wilhelm, 1845-1917', 'author_id': 'A6252'}
Processing: Laet, Joannes de, 1593-1649.
Matched author: {'authorized_name': 'lawrence, of novara', 'author_id': 'A5070'}
Processing: Caesar, Julius
Matched author: {'authorized_name': 'caesar, julius', 'author_id': 'A4644'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Processing: Drexel, Jeremias, 1581-1638,
Matched author: {'authorized_name': 'dorpius, martinus, 1485-1525', 'author_id': 'A4045'}
Processing: Kircher, Athanasius, 1602-1680
Matched author: {'authorized_name': 'kircher, athanasius, 1602-1680', 'author_id': 'A4106'}
Processing: Hincmar, Archbishop of Reims, approximately 806-882
Matched author: {'authorized_name': 'hincmar, archbishop of reims', 'author_id

[codecarbon INFO @ 14:47:46] Energy consumed for RAM : 0.000038 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:47:46] Energy consumed for all CPUs : 0.000177 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:47:46] 0.000215 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'quintilian', 'author_id': 'A5545'}
Processing: Rutilius Namatianus, Claudius, active 5th century.
Matched author: {'authorized_name': 'namatianus, claudius rutilius, active 5th century', 'author_id': 'A4453'}
Processing: Seneca, Lucius Annaeus, approximately 4 B.C.-65 A.D.
Matched author: {'authorized_name': 'seneca, lucius annaeus, approximately 4 b.c.-65 a.d.', 'author_id': 'A4655'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Processing: Wyttenbach, Daniel Albert, 1746-1820
Matched author: {'authorized_name': 'wyttenbach, daniel albert, 1746-1820', 'author_id': 'A3223'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Processing: Lucretius Carus, Titus
Matched author: {'authorized_name': 'lucretius carus, titus', 'author_id': 'A5001'}
Processing: Lucretius Carus, Titus
Matched author: {'authorized_name': 'lucretius carus, titus', 'author_id

[codecarbon INFO @ 14:48:01] Energy consumed for RAM : 0.000075 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:48:01] Energy consumed for all CPUs : 0.000354 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:48:01] 0.000429 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'lindinus poeta 5th or 5th/6th century', 'author_id': 'A3055'}
Processing: Fessel, Daniel, 1599-1676.
Matched author: {'authorized_name': 'heinsius, daniel, 1580-1655', 'author_id': 'A3951'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Processing: Buxtorf, Johann, 1564-1629.
Matched author: {'authorized_name': 'trotzendorf, valentin, 1490-1556', 'author_id': 'A3700'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Processing: Vitringa, Campegius, 1659-1722
Matched author: {'authorized_name': 'victorius, marianus, approximately 1485-1572', 'author_id': 'A3407'}
Processing: Drusius, Joannes, 1550-1616
Matched author: {'authorized_name': 'gaius drusus', 'author_id': 'A3045'}
Processing: Praetorius, Abdias, 1524-1573
Matched author: {'authorized_name': 'prammer, ignaz 18.. ?-19.. ?', 'author_id': 'A6166'}
Processing: Meyer, Johannes, 1651 or 1652-

[codecarbon INFO @ 14:48:16] Energy consumed for RAM : 0.000113 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:48:16] Energy consumed for all CPUs : 0.000531 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:48:16] 0.000644 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'livy', 'author_id': 'A4979'}
Processing: Virgil.
Matched author: {'authorized_name': 'virgil', 'author_id': 'A4830'}
Processing: Nepos, Cornelius.
Matched author: {'authorized_name': 'nepos, cornelius', 'author_id': 'A5005'}
Processing: Pliny, the Elder
Matched author: {'authorized_name': 'pliny, the elder', 'author_id': 'A5537'}
Processing: Phaedrus.
Matched author: {'authorized_name': 'phaedrus', 'author_id': 'A4554'}
Processing: Curtius Rufus, Quintus.
Matched author: {'authorized_name': 'curtius rufus, quintus', 'author_id': 'A4602'}
Processing: Curtius Rufus, Quintus.
Matched author: {'authorized_name': 'curtius rufus, quintus', 'author_id': 'A4602'}
Processing: Froebel, Carl Poppo, 1786-1824.
Matched author: {'authorized_name': 'scheffer, johannes reichard, active approximately 1580', 'author_id': 'A3132'}
Processing: Catullus, Gaius Valerius.
Matched author: {'authorized_name': 'catullus, gaius valerius', 'author_id': 'A5237'}
Processing: Pli

[codecarbon INFO @ 14:48:31] Energy consumed for RAM : 0.000150 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:48:31] Energy consumed for all CPUs : 0.000709 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:48:31] 0.000859 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'conti, antonio, 1677-1749', 'author_id': 'A4027'}
Processing: Bandini, Angelo Maria, 1726-1803.
Matched author: {'authorized_name': 'stefonio, bernardino, 1560-1620', 'author_id': 'A3802'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Processing: Willughby, Francis, 1635-1672.
Matched author: {'authorized_name': 'willes, richard, active 1558-1573', 'author_id': 'A3734'}
Processing: Aldrovandi, Ulisse, 1522-1605?
Matched author: {'authorized_name': 'aldrovandi, ulisse, 1522-1605?', 'author_id': 'A4116'}
Processing: Aldrovandi, Ulisse, 1522-1605?
Matched author: {'authorized_name': 'aldrovandi, ulisse, 1522-1605?', 'author_id': 'A4116'}
Processing: Pliny, the Elder.
Matched author: {'authorized_name': 'pliny, the elder', 'author_id': 'A5537'}
Processing: Pliny, the Elder.
Matched author: {'authorized_name': 'pliny, the elder', 'author_id': 'A5537'}
Processing: Pliny, the Elder.
Matched author: {'a

[codecarbon INFO @ 14:48:46] Energy consumed for RAM : 0.000188 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:48:46] Energy consumed for all CPUs : 0.000886 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:48:46] 0.001073 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'krahner, leopold, 1810-1884', 'author_id': 'A6102'}
Processing: Walahfrid Strabo, 807?-849.
Matched author: {'authorized_name': 'walahfrid strabo, 807?-849', 'author_id': 'A4855'}
Processing: Cicero, Marcus Tullius.
Matched author: {'authorized_name': 'cicero, marcus tullius', 'author_id': 'A5129'}
Processing: Newbery, Francis, 1743-1818.
Matched author: {'authorized_name': 'quarles, francis, 1592-1644', 'author_id': 'A3841'}
Processing: Heylbut, Gustav, b. 1852.
Matched author: {'authorized_name': 'roth, karl ludwig, 1790-1868', 'author_id': 'A3462'}
Processing: Bernoulli, Jakob, 1654-1705
Matched author: {'authorized_name': 'bernoulli, jakob, 1654-1705', 'author_id': 'A4089'}
Processing: Commodianus.
Matched author: {'authorized_name': 'commodianus', 'author_id': 'A5248'}
Processing: Augustine, Saint, Bishop of Hippo.
Matched author: {'authorized_name': 'augustine, of hippo, saint, 354-430', 'author_id': 'A5497'}
Processing: Augustine, Saint, Bish

[codecarbon INFO @ 14:49:01] Energy consumed for RAM : 0.000225 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:49:01] Energy consumed for all CPUs : 0.001063 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:49:01] 0.001288 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'fulgentius, saint, bishop of ruspa', 'author_id': 'A5219'}
Processing: Boerhaave, Hermann, 1668-1738
Matched author: {'authorized_name': 'boer, ae.', 'author_id': 'A3275'}
Processing: Dicastillo, Juan de (S.I.), 1585-1643.
Matched author: {'authorized_name': 'fracastoro, girólamo, 1478-1553', 'author_id': 'A5606'}
Processing: Cangiamila, Francesco.
Matched author: {'authorized_name': 'cancianini, gian domenico 1547-1630', 'author_id': 'A3017'}
Processing: Egidio da Presentaçao (O.S.A.)
Matched author: {'authorized_name': 'forcellini, egidio, 1688-1768', 'author_id': 'A3333'}
Processing: Aranda, Felipe (S.I.), 1642-1695.
Matched author: {'authorized_name': 'aratus, solensis', 'author_id': 'A3315'}
Processing: Garcia de los Rios, Eusebio.
Matched author: {'authorized_name': 'garcia, francisco, 1580-1659', 'author_id': 'A5646'}
Processing: Sgambati, Scipione, 1595-1652.
Matched author: {'authorized_name': 'scaevola, quintus mucius, -82 b.c.', 'author_

[codecarbon INFO @ 14:49:16] Energy consumed for RAM : 0.000263 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:49:16] Energy consumed for all CPUs : 0.001240 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:49:16] 0.001502 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'virgil', 'author_id': 'A4830'}
Processing: Schott, Andreas, 1552-1629.
Matched author: {'authorized_name': 'schottus, andreas, 1552-1629', 'author_id': 'A6012'}
Processing: Suetonio Tranquilo, Cayo.
Matched author: {'authorized_name': 'suetonius, approximately 69-approximately 122', 'author_id': 'A4799'}
Processing: Jiménez de Rada, Rodrigo, ca. 1170-1247.
Matched author: {'authorized_name': 'jiménez de rada, rodrigo, approximately 1170-1247', 'author_id': 'A4450'}
Processing: Gerbel, Nicolaus, ca. 1485-1560.
Matched author: {'authorized_name': 'gerberon, gabriel, 1628-1711', 'author_id': 'A5961'}
Processing: Terencio Africano, Publio, ca. 190-159 a. C.
Matched author: {'authorized_name': 'terence', 'author_id': 'A4793'}
Processing: Plinio Segundo, Cayo, 23-79 d.C.
Matched author: {'authorized_name': 'pliny, the elder', 'author_id': 'A5537'}
Processing: Unknown
Matched author: {'authorized_name': 'alan, of tewkesbury', 'author_id': 'A4551'}
Process

[codecarbon INFO @ 14:49:31] Energy consumed for RAM : 0.000300 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:49:31] Energy consumed for all CPUs : 0.001417 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:49:31] 0.001717 kWh of electricity used since the beginning.


Matched author: {'authorized_name': 'langkavel, bernhard august, 1825-1902', 'author_id': 'A3514'}
Processing: Glandorp, Matthias Ludwig
Matched author: {'authorized_name': 'gemoll, wilhelm, 1850-1934', 'author_id': 'A5660'}
Processing: Livio, Tito, ca. 60-17 a. C.
Matched author: {'authorized_name': 'livy', 'author_id': 'A4979'}
Processing: Livio, Tito, ca. 60-17 a. C.
Matched author: {'authorized_name': 'livy', 'author_id': 'A4979'}
Processing: Boerhaave, Hermann, 1668-1738
Matched author: {'authorized_name': 'boer, ae.', 'author_id': 'A3275'}
Processing: Haen, Anton von, 1704-1776.
Matched author: {'authorized_name': 'haraeus, franciscus, 1555?-1631 or 1632', 'author_id': 'A3345'}
Processing: Maldonado, Juan de (S.I.), 1533-1583.
Matched author: {'authorized_name': 'maldonado, juan de 1534-1583', 'author_id': 'A4153'}
Processing: Thou, Jacques Auguste de, 1553-1617.
Matched author: {'authorized_name': 'thou, jacques-auguste de, 1553-1617', 'author_id': 'A4086'}
Processing: Robinson,

### End the Timer and the EmissionsTracker for Author Reconciliation

In [42]:
# End the timer
end_time = time.time()
print(f"Execution time: {end_time - start_time:.2f} seconds")
# Stop the emissions tracker after training is complete
emissions = tracker.stop()
print(f"Estimated CO2 emissions for author reconciliation: {emissions} kg")

[codecarbon INFO @ 14:49:43] Energy consumed for RAM : 0.000330 kWh. RAM Power : 9.000000000000002 W
[codecarbon INFO @ 14:49:43] Energy consumed for all CPUs : 0.001559 kWh. Total CPU Power : 42.5 W
[codecarbon INFO @ 14:49:43] 0.001890 kWh of electricity used since the beginning.


Execution time: 132.09 seconds
Estimated CO2 emissions for author reconciliation: 0.0008945987056582037 kg


### Display the First Ten Rows of the Output

In [43]:
display(output_df.head(10))

Unnamed: 0,author,normalized_author,distilbert_author,confidence
0,"Du Creux, François, 1596?-1666.",du creux francois 15961666,"{'authorized_name': 'graux, charles henri, 185...",0.177464
1,"Meyer, Ernst H. F. 1791-1858.",meyer ernst h f 17911858,"{'authorized_name': 'meyer, wilhelm, 1845-1917...",0.999956
2,"Laet, Joannes de, 1593-1649.",laet joannes de 15931649,"{'authorized_name': 'lawrence, of novara', 'au...",0.156866
3,"Caesar, Julius",caesar julius,"{'authorized_name': 'caesar, julius', 'author_...",0.999986
4,Unknown,unknown,"{'authorized_name': 'alan, of tewkesbury', 'au...",0.11027
5,"Drexel, Jeremias, 1581-1638,",drexel jeremias 15811638,"{'authorized_name': 'dorpius, martinus, 1485-1...",0.521497
6,"Kircher, Athanasius, 1602-1680",kircher athanasius 16021680,"{'authorized_name': 'kircher, athanasius, 1602...",0.999991
7,"Hincmar, Archbishop of Reims, approximately 80...",hincmar archbishop of reims approximately 806882,"{'authorized_name': 'hincmar, archbishop of re...",0.999999
8,"Acosta, José de, 1540-1600,",acosta jose de 15401600,"{'authorized_name': 'acosta, josé de, 1540-16...",0.999997
9,"Lessius, Leonardus, 1554-1623",lessius leonardus 15541623,"{'authorized_name': 'lessing, gotthold ephraim...",0.985865


### Write the results to a CSV

In [44]:
output_df.to_csv('../output/author_inferences.csv',index=False, encoding='utf-8', quoting=csv.QUOTE_ALL)