# Extraction and classification of classical references from the ConDÉ corpus

Script written by Morgane Pica for a submission to the symposium ["Lire les classiques en Normandie"](https://rmblf.be/2022/02/04/appel-a-contribution-lire-les-classiques-en-normandie/) (oct 2022), to be written by herself and Mathieu Goux.

## Imports & declarations

In [62]:
from tqdm.notebook import tqdm #tqdm est bibliothèque qui permet d'avoir une barre de progression
import xml.etree.ElementTree as ET
from datetime import datetime
import csv
import json

ET.register_namespace("", "http://www.tei-c.org/ns/1.0")
ET.register_namespace('xml','http://www.w3.org/XML/1998/namespace')

# Get current time
dt = datetime.now()
tmsp = dt.strftime("%Y%m%d_%H%M")

# Not all witnesses were enriched with reference identification.
# Change paths to fit your own folder organization.
witnesses = ["basnage","berault","merville","pesnelle","terrien"]
binpath = "/home/mpica/Progs/perso/CONDE/editions/base-version/"
einpath = "_base.xml"

# Change output paths here if you like.
listfile = f"output/authors_{tmsp}.csv"
tablefile = f"output/mentions_{tmsp}.csv"
authortable = f"output/authors_{tmsp}.csv"
authorjson = f"output/authors_{tmsp}.json"
checklist = f"output/checklist_{tmsp}.xml"

## FUNCTION: extract text from tei:w element

In [24]:
def get_w_text(word):
    
    """
    Function taking a <tei:w> element and
    returning its compiled textual content.
    
    :param word: ET.Element('{http://www.tei-c.org/ns/1.0}w')
    
    """
    
    # Preparing the return string as an empty string.
    texte = ""
    
    # If there is text directly inside <w> element and
    # before the first child, add it.
    if word.text:
        texte += str(word.text)
                
    # Loop on all current <w> children.
    for item in word:
            
        # If current child is <tei:height> or <tei:supplied>
        if item.tag == '{http://www.tei-c.org/ns/1.0}height' or item.tag == '{http://www.tei-c.org/ns/1.0}supplied':
            # Add text.
            texte += str(item.text)
            # If any, add the text following current child.
            if item.tail:
                texte += str(item.tail)
                
        # If current child is <tei:lb>, add the following text.
        elif item.tag == '{http://www.tei-c.org/ns/1.0}lb':
            if item.tail:
                texte += str(item.tail)
                        
        # If current child is <tei:choice>, add the second child of <choice>
        # (<tei:reg> or <tei:expan>), then add the text following current child if any.
        elif item.tag == '{http://www.tei-c.org/ns/1.0}choice':
            texte += str(item[1].text)
            if item.tail:
                texte += str(item.tail)
        
        # If current child is <tei:c>, add its text, then the following text if any.
        elif item.tag == '{http://www.tei-c.org/ns/1.0}c':
            texte += item.text
            if item.tail:
                texte += str(item.tail)
        
        
        # If current child is <tei:hi>, add its text, then the following text if any.
        elif item.tag == '{http://www.tei-c.org/ns/1.0}hi':
            texte += item.text
            if item.tail:
                texte += item.tail
        
        # If current child is <tei:add>, loop on its children and do the same checks.
        elif item.tag == '{http://www.tei-c.org/ns/1.0}add':
            # On refait tous les tests.
            if item.find('.') == None :
                texte = str(item.text)
                            
            else:
                        
                if item.text:
                    texte += str(item.text)
                        
                for subitem in item:
                    if subitem.tag == '{http://www.tei-c.org/ns/1.0}lb':
                        if subitem.tail:
                            texte += str(subitem.tail)
                    elif subitem.tag == '{http://www.tei-c.org/ns/1.0}choice':
                        texte += str(subitem[1].text)
                        if subitem.tail:
                            texte += str(subitem.tail)
                            
    return texte

## FUNCTION: make title string.

In [25]:
def title_str(div, dtype, dcount):
    
    """
    Function taking a <tei:div> element with lemmatized text
    and returning its title, if any.
    
    :param div: ET.Element('{http://www.tei-c.org/ns/1.0}div')
    :param dtype: div.get('type') as string
    :param dcount: integer
    
    """
    
    # Lists of characters to be treated particularly.
    noLspace = ",.)/]-'"
    noRspace = "(/[]-'"
    # insecable = ";:"
    
    # List of strings to be filled.
    divlist = []
    
    try:
        # If you do find a title as first child of div, make its text.
        if div.find('./*[1]').tag == "{http://www.tei-c.org/ns/1.0}head":
            
            # Loop on each <tei:w> word token.
            for word in div.findall('./{http://www.tei-c.org/ns/1.0}head/{http://www.tei-c.org/ns/1.0}w'):
                
                # Compile the text of current <tei:w> element.
                wtxt = get_w_text(word)
                
                # If the list is empty, add the current word to the list.
                if len(divlist) == 0:
                    divlist.append(wtxt)

                # If the token is a punctuation character which
                # is not separated from the previous word by a space,
                # add it to the last entry in the list.
                elif wtxt in noLspace:
                    divlist[-1] += wtxt
                
                # If the last entry in the list is a character which
                # is not separated from the next word by a space,
                # add the current token to it.
                elif divlist[-1] in noRspace:
                    divlist[-1] += wtxt

                # If the last letter in the last entry in the list is
                # a character which is not separated from the next word
                # by a space, add the current token to it.
                elif divlist[-1][-1] in noRspace:
                    divlist[-1] += wtxt

                #elif wtxt in insecable:
                #    divlist[-1] += "\u00a0"
                #    divlist[-1] += wtxt
                
                # Otherwise, just add the token as a new list entry.
                else:
                    divlist.append(wtxt)
            
            # Once you have treated every token in the title, make the
            # return string by adding a space between each list entry.
            title = " ".join(divlist)
        
        # If there is no title to the div but it has an @subtype,
        # its value makes the return string.
        elif div.get('subtype') != None:
            title = div.get('subtype')
        
        else:
            title = "Aucun titre."
    
    # Just a marker to spot errors within final output.
    except None:
        print("Could not construct string for: "+ ET.tostring(word) + " in " + giv.get('{http://www.w3.org/XML/1998/namespace}id'))
        title = "Pas réussi."
    
    # Show me where you're at as you work.
    print("Processing -> " + dtype + " - " + title)
        
    return title

## FUNCTION: extract authors and store in a dictionary

In [51]:
authors = {}

def extract(witness, path):
    print("Extracting authors on -> "+ witness)
    
    """
    Function taking the name and path to a TEI-XML text file and
    analyzing the references, doing two things:
        - returning an XML element named after the current witness,
            itself containing the copy of each reference declaration,
        - completing the general author dictionnary with new elements,
            whether new authors or authors whose informations were
            incomplete.
            
    :param witness: Name (=id) of the current witness as str (no space)
    :param path: Path to the current witness TEI-XML file.
    
    """
    
    # Create the witness element.
    liste = ET.Element(witness)
    
    # Open and parse TEI-XML file.
    with open(path) as filein:
        tree = ET.parse(filein)
        root = tree.getroot()
        
        # Look for all declared authors.
        for author in tqdm(root.findall('.//{http://www.tei-c.org/ns/1.0}listPerson/{http://www.tei-c.org/ns/1.0}person')):
            
            # Get <birth> element.
            fullbirth = author.find('.//{http://www.tei-c.org/ns/1.0}birth')
            fulldeath = author.find('.//{http://www.tei-c.org/ns/1.0}death')
            try:
                author.get("{http://www.w3.org/XML/1998/namespace}id")
            except:
                print("Weird guy here, not finding their ID.")
                
            try:
                # Get current author identifier.
                ident = author.get("{http://www.w3.org/XML/1998/namespace}id")
                
                if ident not in authors.keys():
                    
                    # Create a dict. entry for the current author.
                    authors[ident] = {}
                                
                    try:
                        # Get current author birth date.
                        
                        if "when" in fullbirth.attrib.keys():
                            authors[ident]["earliest-birth"] = fullbirth.get("when")
                            authors[ident]["latest-birth"] = fullbirth.get("when")
                        else:
                            authors[ident]["earliest-birth"] = fullbirth.get("notBefore")
                            authors[ident]["latest-birth"] = fullbirth.get("notAfter")
                            
                    except:
                        authors[ident]["earliest-birth"] = "none"
                        authors[ident]["latest-birth"] = "none"
                        liste.append(author)
                    
                    try:
                        # Get current author death date.
                        
                        if "when" in fullbirth.attrib.keys():
                            authors[ident]["earliest-death"] = fulldeath.get("when")
                            authors[ident]["latest-death"] = fulldeath.get("when")
                        else:
                            authors[ident]["earliest-death"] = fulldeath.get("notBefore")
                            authors[ident]["latest-death"] = fulldeath.get("notAfter")
                            
                    except:
                        authors[ident]["earliest-death"] = "none"
                        authors[ident]["latest-death"] = "none"
                        liste.append(author)
                        

                    try:
                        # Create a dict. to store all recorded names for current author.
                        lg = {}
                        
                        # Loop on names, store their language.
                        for name in author.findall('.//{http://www.tei-c.org/ns/1.0}persName'):
                            namelang = name.get("{http://www.w3.org/XML/1998/namespace}lang")
                            
                            if name.text:
                                # If name is not split into <forename>/<surname> elements,
                                # there is text directly into <persName> element and we
                                # make this the current language text.
                                lg[namelang] = name.text
                                
                            else:
                                # If name is split, the order is unsure, therefore
                                # we store each kind into its own entry
                                # within names dict. and make a final str out of it.
                                names = {}
                                for nchild in name.findall('*'):
                                    if nchild.tag == "{http://www.tei-c.org/ns/1.0}forename":
                                        names["fn"] = nchild.text
                                    elif nchild.tag == "{http://www.tei-c.org/ns/1.0}surname":
                                        names["sn"] = nchild.text
                                    
                                lg[namelang] = names["fn"] + " " + names["sn"]
                                
                        
                        # Setting an order of preference for final display of name:
                        # preferably French, if not, Latin, and if neither, English.
                        # (These are the only three name languages within the corpus.)
                        if "fr" in lg.keys():
                            authors[ident]["name"] = lg["fr"]
                        elif "la" in lg.keys():
                            authors[ident]["name"] = lg["la"]
                        elif "eng" in lg.keys():
                            authors[ident]["name"] = lg["eng"]
                            
                    except:
                        authors[ident]["name"] = "none"
                        liste.append(author)
                    
                # If the author was recorded in a previous witness but has no name,
                # we try to make a name string again with this witness.
                
                elif authors[ident]["name"] == "none":
                    
                    try:
                        lg = {}
                        for name in author.findall('.//{http://www.tei-c.org/ns/1.0}persName'):
                            namelang = name.get("{http://www.w3.org/XML/1998/namespace}lang")
                            if name.text:
                                lg[namelang] = name.text
                                
                            else:
                                names = {}
                                for nchild in name.findall('*'):
                                    if nchild.tag == "{http://www.tei-c.org/ns/1.0}forename":
                                        names["fn"] = nchild.text
                                    elif nchild.tag == "{http://www.tei-c.org/ns/1.0}surname":
                                        names["sn"] = nchild.text
                                        
                                lg[namelang] = names["sn"] + ", " + names["fn"]
                                                            
                        if "fr" in lg.keys():
                            authors[ident]["name"] = lg["fr"]
                        elif "la" in lg.keys():
                            authors[ident]["name"] = lg["la"]
                        elif "eng" in lg.keys():
                            authors[ident]["name"] = lg["eng"]
                            
                    except:
                        authors[ident]["name"] = "none"
                        liste.append(author)
                
                else:
                    print(f"{ident} is already in the dict, so I'm not making it.")
                
            except:
                print("I'm extracting authors for the first time. There is a problem with one.")
                continue
    
    return liste

## FUNCTION: sort extracted authors: keep those born before year 550

In [57]:
def sort(dico):
    
    """
    Function taking a dictionary of authors shaped like so:
    {'authorID': {'birth':'0000', 'name':'AuthorName'}}
    and returning the same dictionnary where all authors with a
    number greater than 550 as a birthdate were removed.
    
    :param dico: dict
    
    """
    print("Now sorting authors.")
    # Looping on author identifiers (=keys of dict.)
    for author in tqdm(dico.keys()):
        
        birth = authors[author]["latest-birth"]
        
        try:
            # If the author was born before 550, the entry is added
            # to the new dictionary.
            
            if int(birth) <= 560:
                final[author] = authors[author]
        except:
            
            # If there is an error (no birth date), print the author
            # as we want to know if they are interesting now
            # (if so, we can correct the XML itself).
            
            print(f"There is a problem with: {authors[author]['name']}'s ({author}) dates:")
            print(authors[author])
    
    # Show me the final dictionary to assess the data.
    print(f"Here is the final dictionary where the {str(len(authors.keys()))} authors are sorted:")
    print(final)
    
    return final

## FUNCTION: get locations of all references

In [28]:
def get_refs(witness, path, authors):
    print("Getting references of authors in " + witness)
    """
    Function taking the name and path to a witness, as well
    as a list of author identifiers previously extracted, and
    fills the mentions dict. with identification information
    about each mention of each author in the list.
    
    
    :param authors: list of xml identifiers.
    """
    
    partcount = 0
    chptcount = 0
    sctcount = 0
    frontcount = 0
    refcount= 0
    
    # Open and parse TEI-XML file.
    with open(path) as filein:
        tree = ET.parse(filein)
        root = tree.getroot()
        
        # Enter each part, chapter and section, keeping track of
        # their respective numbers. Chapter and section counts do not
        # start anew with each new parent, so as to have a unique number
        # within the document.
        for part in tqdm(root.findall('.//{http://www.tei-c.org/ns/1.0}div[@type="part"]')):
            partcount += 1
            partitle = title_str(part, "part", partcount)

            for chapter in part.findall('.//{http://www.tei-c.org/ns/1.0}div[@type="chapter"]'):
                chptcount += 1
                chaptitle = title_str(chapter, "chapter", chptcount)
                
                for section in chapter.findall('.//{http://www.tei-c.org/ns/1.0}div[@type="section"]'):
                    sctcount += 1
                    sectitle = title_str(section, "section", sctcount)

                    for ref in section.findall('.//{http://www.tei-c.org/ns/1.0}ref'):
                        refcount += 1
                        
                        # Only work if the mention as an @corresp (there are others and
                        # these are of no interest here).
                        
                        if ref.get('corresp'):
                            ident = ref.get('corresp').replace("#","")
                            
                            if ident in authors and ident in mentions:
                                mentions[ident] += [[
                                    witness,
                                    str('{:0>3}'.format(partcount)) + "___'" + partitle + "'",
                                    str('{:0>3}'.format(chptcount)) + "___'" + chaptitle + "'",
                                    str('{:0>3}'.format(sctcount)) + "___'" + sectitle + "'"
                                ]]
                            
                            elif ident in authors:
                                mentions[ident] = [[witness, str(partcount), str(chptcount), str(sctcount)]]
                                
        # Don't forget some authors may be mentionned in front or back matter.
        
        for frontdiv in tqdm(root.findall('.//{http://www.tei-c.org/ns/1.0}front/{http://www.tei-c.org/ns/1.0}div')):
            
            frontcount += 1
            
            frontitle = title_str(frontdiv, "front", frontcount)
    return refcount

## FUNCTION: make a CSV file out of all this

In [29]:
def initial_csv(mentions, info):
    
    """
    Function writing the final CSV compiling author information
    and author mentions.
    
    :param mentions: a dictionary with author identifiers as keys
        (and a list of his mentions within the corpus as value)
        
    :param info: a dictionary with author identifiers as keys
        (and a dictionary of his personal information as value)
    """
    
    # Columns for the new CSV file.
    
    columns = [
        "Author",
        "Birth start",
        "Birth stop",
        "Death start",
        "Death stop",
        "Witness",
        "Part",
        "Chapter",
        "Section"
    ]
    
    
    # Open and prepare CSV file.
    with open(tablefile, 'w') as csvtobe:
        csvwriting = csv.DictWriter(csvtobe, fieldnames = columns)
        csvwriting.writeheader()
    
        # Loop on author keys in information dict.
        # and keep the associated value in "local" var.
        for author in info.keys():
            local = info[author]
            
            # Loop on all mentions of current author,
            # and combine with author information.
            try:
                for mention in mentions[author]:
                
                    csvwriting.writerow({
                        "Author": local["name"],
                        "Birth start": local["earliest-birth"],
                        "Birth stop" : local["latest-birth"],
                        "Death start" : local["earliest-death"],
                        "Death stop" : local["latest-death"],
                        "Witness" : mention[0],
                        "Part" : mention [1],
                        "Chapter" : mention [2],
                        "Section" : mention[3]
                    })
            except:
                print(f"There is a problem with -> {author}\n")
                print("Here is the corresponding dictionary: ", local, "\n")
                continue

## FUNCTION: generating author CSV file.

In [30]:
def author_csv(dico, path):
    
    """
    Function taking a dictionary of author information and an
    export path and writing a CSV file for author timeline dataviz.
    
    :param dico: The dictionary of authors to show.
    :param path: The path to which the CSV file will be written.
    """
    
    # Open and prepare CSV file.
    with open(path, 'w') as csvtobe:
        csvwriting = csv.DictWriter(csvtobe, fieldnames = ["Author", "From", "To", "Param"])
        csvwriting.writeheader()
        
        # Loop on all authors in dictionary and store
        # current author information in "author" var.
        for entity in dico.keys():
            author = final[entity]
            
            # For all authors, make at least one line
            # containing the latest possible birth date
            # and earliest possible death date: these
            # represent the "sure" lifespan if there is one.
            
            csvwriting.writerow({
                "Author": author["name"],
                "From": author["latest-birth"],
                "To" : author["earliest-death"],
                "Param" : "Vie (certaine)",
            })
            
            # Then compare birth date columns and only
            # if dates are different, make another line
            # with this time span.
            
            if author["earliest-birth"] != author["latest-birth"]:
                
                csvwriting.writerow({
                    "Author": author["name"],
                    "From": author["earliest-birth"],
                    "To" : author["latest-birth"],
                    "Param" : "Naissance (imprécis)",
                })
            
            # Then compare death date columns and only
            # if dates are different, make another line
            # with this time span.
            
            if author["earliest-death"] != author["latest-death"]:
                
                csvwriting.writerow({
                    "Author": author["name"],
                    "From": author["earliest-death"],
                    "To" : author["latest-death"],
                    "Param" : "Mort (imprécis)",
                })

## Using the previously declared functions

In [65]:
# Initiate the root element for the XML debugging file.
listroot = ET.Element("people")

# Loop on witnesses: construct the path from initial vars,
# then trigger extract() function on current witness,
# so as to both make the according element for the XML debugging file,
# and fill the general author dictionary.

for witness in witnesses:
    fullpath = binpath + witness + einpath
    
    listroot.append(extract(witness, fullpath))

# Start a new dictionary for the sorted authors.
final = {}

# Fill the dictionary with only desired authors & make a Json file out of it.
final = sort(authors)
with open(authorjson, "w") as jsonf:
    json.dump(final, jsonf)

# Write XML debugging file.
with open(checklist, "w") as failures:
    a_ecrire = ET.tostring(listroot, encoding="unicode", method="xml")
    failures.write(a_ecrire)

Extracting authors on -> basnage


  0%|          | 0/360 [00:00<?, ?it/s]

cirier is already in the dict, so I'm not making it.
alex-4 is already in the dict, so I'm not making it.
aristote is already in the dict, so I'm not making it.
marculphe is already in the dict, so I'm not making it.
bignon is already in the dict, so I'm not making it.
conring is already in the dict, so I'm not making it.
sirmond is already in the dict, so I'm not making it.
vitalis is already in the dict, so I'm not making it.
de-roye is already in the dict, so I'm not making it.
aemyl is already in the dict, so I'm not making it.
loyseau is already in the dict, so I'm not making it.
dudo is already in the dict, so I'm not making it.
pithou is already in the dict, so I'm not making it.
chopin is already in the dict, so I'm not making it.
menage is already in the dict, so I'm not making it.
skenaeus is already in the dict, so I'm not making it.
monte is already in the dict, so I'm not making it.
justinien is already in the dict, so I'm not making it.
philon is already in the dict, so I

  0%|          | 0/145 [00:00<?, ?it/s]

amb-pare is already in the dict, so I'm not making it.
ulpien is already in the dict, so I'm not making it.
cassiod is already in the dict, so I'm not making it.
aulus is already in the dict, so I'm not making it.
appius-claudius is already in the dict, so I'm not making it.
solon is already in the dict, so I'm not making it.
papon is already in the dict, so I'm not making it.
cicero is already in the dict, so I'm not making it.
plutarque is already in the dict, so I'm not making it.
corderius is already in the dict, so I'm not making it.
verro is already in the dict, so I'm not making it.
bartole is already in the dict, so I'm not making it.
tite-live is already in the dict, so I'm not making it.
platon is already in the dict, so I'm not making it.
charondas is already in the dict, so I'm not making it.
bouteiller is already in the dict, so I'm not making it.
guenois is already in the dict, so I'm not making it.
afflito is already in the dict, so I'm not making it.
rebuffe is already 

  0%|          | 0/38 [00:00<?, ?it/s]

le-rouille is already in the dict, so I'm not making it.
davir is already in the dict, so I'm not making it.
terrien is already in the dict, so I'm not making it.
godefroy is already in the dict, so I'm not making it.
berault is already in the dict, so I'm not making it.
littleton is already in the dict, so I'm not making it.
dudo is already in the dict, so I'm not making it.
du-cange is already in the dict, so I'm not making it.
chopin is already in the dict, so I'm not making it.
du-moulin is already in the dict, so I'm not making it.
justinien is already in the dict, so I'm not making it.
cujas is already in the dict, so I'm not making it.
rheginon is already in the dict, so I'm not making it.
aimoin is already in the dict, so I'm not making it.
bardet is already in the dict, so I'm not making it.
coquille is already in the dict, so I'm not making it.
beaumanoir is already in the dict, so I'm not making it.
scaliger is already in the dict, so I'm not making it.
la-roque is already i

  0%|          | 0/207 [00:00<?, ?it/s]

pyrrhus is already in the dict, so I'm not making it.
husson is already in the dict, so I'm not making it.
grimaudet is already in the dict, so I'm not making it.
galland is already in the dict, so I'm not making it.
guyot is already in the dict, so I'm not making it.
pothier is already in the dict, so I'm not making it.
jacquet is already in the dict, so I'm not making it.
boullenois is already in the dict, so I'm not making it.
aguesseau is already in the dict, so I'm not making it.
hoveden is already in the dict, so I'm not making it.
matt-paris is already in the dict, so I'm not making it.
du-chesne is already in the dict, so I'm not making it.
mabillon is already in the dict, so I'm not making it.
skinner is already in the dict, so I'm not making it.
spelman is already in the dict, so I'm not making it.
vossius is already in the dict, so I'm not making it.
menage is already in the dict, so I'm not making it.
ragueau is already in the dict, so I'm not making it.
lauriere is already

  0%|          | 0/92 [00:00<?, ?it/s]

aristote is already in the dict, so I'm not making it.
gratien is already in the dict, so I'm not making it.
s-paul is already in the dict, so I'm not making it.
severe is already in the dict, so I'm not making it.
le-rouille is already in the dict, so I'm not making it.
cicero is already in the dict, so I'm not making it.
augustin is already in the dict, so I'm not making it.
justinien is already in the dict, so I'm not making it.
paulus is already in the dict, so I'm not making it.
ulpien is already in the dict, so I'm not making it.
pline is already in the dict, so I'm not making it.
virgile is already in the dict, so I'm not making it.
plutarque is already in the dict, so I'm not making it.
laurens-valle is already in the dict, so I'm not making it.
quintilien is already in the dict, so I'm not making it.
bald is already in the dict, so I'm not making it.
rebuffe is already in the dict, so I'm not making it.
bud is already in the dict, so I'm not making it.
tiraqueau is already in 

  0%|          | 0/489 [00:00<?, ?it/s]

There is a problem with: Anian's (anian) dates:
{'earliest-birth': 'none', 'latest-birth': 'none', 'earliest-death': 'none', 'latest-death': 'none', 'name': 'Anian'}
There is a problem with: Johann Berthold Herold's (herold) dates:
{'earliest-birth': 'none', 'latest-birth': 'none', 'earliest-death': 'none', 'latest-death': 'none', 'name': 'Johann Berthold Herold'}
There is a problem with: Arq.'s (arq) dates:
{'earliest-birth': 'none', 'latest-birth': 'none', 'earliest-death': 'none', 'latest-death': 'none', 'name': 'Arq.'}
There is a problem with: Goncanus's (goncanus) dates:
{'earliest-birth': 'none', 'latest-birth': 'none', 'earliest-death': 'none', 'latest-death': 'none', 'name': 'Goncanus'}
There is a problem with: Febur.'s (febur) dates:
{'earliest-birth': 'none', 'latest-birth': 'none', 'earliest-death': 'none', 'latest-death': 'none', 'name': 'Febur.'}
There is a problem with: Michel (saint)'s (s-michel) dates:
{'earliest-birth': 'none', 'latest-birth': 'none', 'earliest-death':

In [59]:
# Start a general dictionary to store any mention information.
mentions = {}

# Initiate the total reference count.
countAllRefs = 0

# Once again loop on witnesses and construct path from initial vars,
# then trigger the get_refs() function to collect mention information.
for witness in witnesses:
    fullpath = binpath + witness + einpath
    countAllRefs += get_refs(witness, fullpath, final.keys())

# Show the obtained dict. for checking.
print("Here is the final dictionary:")
print(mentions)

Getting references of authors in basnage


  0%|          | 0/4 [00:00<?, ?it/s]

Processing -> part - coutume
Processing -> chapter - TITRE DE JURISDICTION.
Processing -> section - introduction
Processing -> section - II.
Processing -> section - III.
Processing -> section - IV.
Processing -> section - V. Jurisdiction du Vicomte.
Processing -> section - VI.
Processing -> section - VII.
Processing -> section - VIII.
Processing -> section - IX.
Processing -> section - X.
Processing -> section - XI.
Processing -> section - XII.
Processing -> section - XIII.
Processing -> section - XIV.
Processing -> section - XV. Hauts-Justiciers tenus demander renvoy.
Processing -> section - XVI. Pleds et Assises des Hauts-Justiciers.
Processing -> section - XVII. Quel est le pouvoir des Sergents Royaux dans les Hautes-Justices.
Processing -> section - Extrait des Registres de la Cour de Parlement.
Processing -> section - XVIII.
Processing -> section - XIX. Comparence des Hauts-Justiciers.
Processing -> section - XX. Jurisdiction de Hauts-Justiciers.
Processing -> section - XXI.
Proce

Processing -> section - DCIX.
Processing -> section - DCX.
Processing -> section - DCXI.
Processing -> section - DCXII.
Processing -> section - DCXIII.
Processing -> section - DCXIV.
Processing -> section - DCXV.
Processing -> section - DCXVI.
Processing -> section - DCXVII.
Processing -> section - DCXVIII.
Processing -> section - DCXXI.
Processing -> section - DCXXII.
Processing -> part - usages-locaux
Processing -> chapter - VSAGES LOCAUX DE LA VICOMTE DE ROÜEN.
Processing -> section - I.
Processing -> section - Il.
Processing -> section - III.
Processing -> chapter - USAGES LOCAUX, de la Vicomté du Pont-de-l ’ Arche.
Processing -> section - I.
Processing -> chapter - USAGES LOCAUX, de la Vicomté de Caudebec.
Processing -> section - I.
Processing -> section - Il.
Processing -> section - III.
Processing -> section - IV.
Processing -> section - V.
Processing -> section - VI.
Processing -> section - VII.
Processing -> chapter - USAGES LOCAUX, de la Vicomté d ’ Arques.
Processing -> sect

  0%|          | 0/5 [00:00<?, ?it/s]

Processing -> front - A MONSEIGNEUR PELLOT CHEVALIER, SEIGNEUR DU PORTDAVID, DE DEFFENS ET DE TREVIERES, CONSEILLER ORDINAIRE DU ROY EN SES CONSEILS, ET PREMIER PRESIDENT AU PARLEMENT DE NORMANDIE.
Processing -> front - TABLE DES TITRES OU CHAPITRES DE LA COUTUME DE NORMANDIE, POUR LE PREMIER TOME.
Processing -> front - EXTRAIT DU PRIVILEGE DU ROY.
Processing -> front - TABLE DES TITRES OU CHAPITRES DE LA COUTUME DE NORMANDIE, POUR LE SECOND TOME.
Processing -> front - EXTRAIT DU PRIVILEGE DU ROY.
Getting references of authors in berault


  0%|          | 0/6 [00:00<?, ?it/s]

Processing -> part - COMMENTAIRES SVR LES COVSTVMES DV PAYS DE NORMANDIE, ANCIENS RESSORS ET ENCLAVES D ICELVY.
Processing -> chapter - introduction
Processing -> chapter - TITRE DE JURISDICTION.
Processing -> section - introduction
Processing -> section - ARTICLE I.
Processing -> section - II.
Processing -> section - III.
Processing -> section - IIII.
Processing -> section - V.
Processing -> section - VI.
Processing -> section - VII.
Processing -> section - VIII.
Processing -> section - IX.
Processing -> section - X.
Processing -> section - XI.
Processing -> section - XII.
Processing -> section - XIII.
Processing -> section - XIIII.
Processing -> section - XV.
Processing -> section - XVI.
Processing -> section - XVII.
Processing -> section - XVIII.
Processing -> section - XIX.
Processing -> section - XX.
Processing -> section - XXI.
Processing -> section - XXII.
Processing -> section - XXIII.
Processing -> section - XXIIII.
Processing -> section - XXV.
Processing -> section - XXVI.
Pr

  0%|          | 0/12 [00:00<?, ?it/s]

Processing -> front - Aucun titre.
Processing -> front - AVANT-PROPOS AU LECTEUR
Processing -> front - L ’ IMPRIMEVR AV LECTEVR.
Processing -> front - IN CONSVETVDINES NORMANICAS a IOSIA BERALTO COMMENtariis illustratas.
Processing -> front - IOS. BERALTO, CONSILIARIO REGIO. DOCTISSIMO. NORMANORVM CONSVE. tudinum interpreti.
Processing -> front - AD EUMDEM.
Processing -> front - ANAGRAMMATISMVS. IOSIAs BERALTVS. ILLIVs SORS BEATA
Processing -> front - ALLVSIO VRBIS AQVILAE ET AVTORIS IVXTA EANDEM NATI AD auem Aquilam.
Processing -> front - STANCES.
Processing -> front - Sur les illustrations de la Coustume de monsieur Bérault. STANCES.
Processing -> front - Sur les Coust. de Normandie commentées par le sieur Berault. SONNET.
Processing -> front - TABLE DES TITRES OV CHAPITRES de la Coustume de Normandie,
Getting references of authors in merville


  0%|          | 0/4 [00:00<?, ?it/s]

Processing -> part - DECISIONS SUR CHAQUE ARTICLE DE LA COUTUME DE NORMANDIE.
Processing -> chapter - introduction
Processing -> chapter - TITRE PREMIER. De la Jurisdiction.
Processing -> section - introduction
Processing -> section - ARTICLE PREMIER.
Processing -> section - ARTICLE Il.
Processing -> section - ARTICLE III.
Processing -> section - ARTICLE IV.
Processing -> section - ARTICLE V.
Processing -> section - ARTICLE VI.
Processing -> section - ARTICLE VII.
Processing -> section - ARTICLE VIII.
Processing -> section - ARTICLE IX.
Processing -> section - ARTICLE X.
Processing -> section - ARTICLE XI.
Processing -> section - ARTICLE XII.
Processing -> section - ARTICLE XIII.
Processing -> section - ARTICLE XIV.
Processing -> section - ARTICLE XV.
Processing -> section - ARTICLE XVI.
Processing -> section - ARTICLE XVII.
Processing -> section - ARTICLE XVIII.
Processing -> section - ARTICLE XIX.
Processing -> section - ARTICLE XX.
Processing -> section - ARTICLE XXI.
Processing -> 

Processing -> section - ARTICLE XIX.
Processing -> section - ARTICLE XX.
Processing -> section - ARTICLE XXI.
Processing -> section - ARTICLE XXII.
Processing -> section - ARTICLE XXIII.
Processing -> section - ARTICLE XXIV.
Processing -> section - ARTICLE XXV.
Processing -> section - ARTICLE XXVI.
Processing -> section - ARTICLE XXVII.
Processing -> section - ARTICLE XXVIII.
Processing -> section - ARTICLE XXIX.
Processing -> section - ARTICLE XXX.
Processing -> section - ARTICLE XXXI.
Processing -> section - ARTICLE XXXII.
Processing -> section - ARTICLE XXXIII.
Processing -> section - ARTICLE XXXIV.
Processing -> section - ARTICLE XXXV.
Processing -> section - ARTICLE XXXVI.
Processing -> section - ARTICLE XXXVII.
Processing -> section - ARTICLE XXXVIII.
Processing -> section - ARTICLE XXXIX.
Processing -> section - ARTICLE XL.
Processing -> section - ARTICLE XLI.
Processing -> section - ARTICLE XLII.
Processing -> section - ARTICLE XLIII.
Processing -> section - ARTICLE XLIV.
Proce

  0%|          | 0/4 [00:00<?, ?it/s]

Processing -> front - PREFACE.
Processing -> front - TABLE DES TITRES
Processing -> front - Aucun titre.
Processing -> front - a MESSIRE GEOFFROY MACE CAMUS DE PONTCARRE. CHEVALIER. CONSEILLER du Roy en tous ses Conseils, Premier President du Parlement de Roüen.
Getting references of authors in pesnelle


  0%|          | 0/3 [00:00<?, ?it/s]

Processing -> part - COUTUME DE NORMANDIE.
Processing -> chapter - introduction
Processing -> chapter - CHAPITRE PREMIER. DE JURISDICTION.
Processing -> section - introduction
Processing -> section - ARTICLE PREMIER.
Processing -> section - II.
Processing -> section - III.
Processing -> section - introduction
Processing -> section - IV.
Processing -> section - V. Jurisdiction du Vicomte.
Processing -> section - VI.
Processing -> section - VII.
Processing -> section - VIII.
Processing -> section - IX.
Processing -> section - X.
Processing -> section - XI.
Processing -> section - XII.
Processing -> section - XIII.
Processing -> section - XIV.
Processing -> section - XV.
Processing -> section - XVI.
Processing -> section - XVII.
Processing -> section - XVIII.
Processing -> section - XIX.
Processing -> section - XX.
Processing -> section - XXI.
Processing -> section - XXII.
Processing -> section - XXIII.
Processing -> section - XXIV.
Processing -> section - XXV.
Processing -> section - XXV

  0%|          | 0/5 [00:00<?, ?it/s]

Processing -> front - a MONSEIGNEUR, MONSEIGNEUR DE LAMOIGNON, CHANCELIER DE FRANCE.
Processing -> front - AVERTISSEMENT.
Processing -> front - AVIS AU RELIEUR.
Processing -> front - TABLE DES CHAPITRES De la Coutume de Normandie,
Processing -> front - Aucun titre.
Getting references of authors in terrien


  0%|          | 0/16 [00:00<?, ?it/s]

Processing -> part - LIVRE PREMIER QVI EST, DE LA IVSTICE ET DV droict des Normans.
Processing -> chapter - De Droict et de Iustice. Chap. I.
Processing -> section - La Coustume au premier chapitre.
Processing -> section - La Coustume au chapitre De Justice.
Processing -> section - De ceste vertu est escrit aux proumes des ordonnances du Roy Charles viij. de l ’ an 1493. et de Loys xij. de l ’ an 1498.
Processing -> section - La Coustume aux chapitres de Justice, et de Iusticement, et de Justicier.
Processing -> chapter - Des parties dont nostre droict est composé. Chap. II.
Processing -> section - synthese
Processing -> chapter - De Coustume, et des loix, usages et style. Chap. III.
Processing -> section - La Coustume au chapitre de Coustume.
Processing -> section - La Coustume au chapitre de choses gayues
Processing -> section - Au style de proceder vers la fin.
Processing -> chapter - De l ’ obseruance des ordonnances. Chap. IIII.
Processing -> section - Loys xij. l ’ an 1499.
Proce

Processing -> chapter - De cession es transport de dettes, droicts, es actions. Chap. VI.
Processing -> section - Charles ix. tenant les Estats à Orléans 1560.
Processing -> chapter - Des droicts que gens mariez acquierent ensemble sur les biens beun de l ’ autre. Chap. VII.
Processing -> section - La Coustume, aux chapitres de monneage, et De bref de mariage encombre, et De teneure par bourgage.
Processing -> section - La Cour de Parlement 1557. le xxiij. de Iuillet.
Processing -> section - La Coustume au chapitre De bref de douaire.
Processing -> section - Au Style de proceder.
Processing -> section - La Coustume au chapitre De vefueté dhomme.
Processing -> section - Au Style de proceder.
Processing -> chapter - De menduë de terres, et mesures d ’ icelles. Chap. VIII.
Processing -> section - synthese
Processing -> chapter - De ferme, oi louage d ’ héritage. Chap. IX.
Processing -> section - synthese
Processing -> chapter - De gage. Chap. X.
Processing -> section - De fief engagé est 

  0%|          | 0/3 [00:00<?, ?it/s]

Processing -> front - 
Processing -> front - 
Processing -> front - 
Here is the final dictionary:
{'marculphe': [['basnage', '1', '1', '1'], ['basnage', "001___'coutume'", "001___'TITRE DE JURISDICTION.'", "001___'introduction'"], ['basnage', "001___'coutume'", "001___'TITRE DE JURISDICTION.'", "013___'XIII.'"], ['basnage', "001___'coutume'", "001___'TITRE DE JURISDICTION.'", "013___'XIII.'"], ['basnage', "001___'coutume'", "001___'TITRE DE JURISDICTION.'", "013___'XIII.'"], ['basnage', "001___'coutume'", "009___'DES FIEFS ET DROITS FEODAUX'", "105___'introduction'"], ['basnage', "001___'coutume'", "009___'DES FIEFS ET DROITS FEODAUX'", "109___'CII. Definition de Franc-Aleu.'"], ['basnage', "001___'coutume'", "009___'DES FIEFS ET DROITS FEODAUX'", "109___'CII. Definition de Franc-Aleu.'"], ['basnage', "001___'coutume'", "009___'DES FIEFS ET DROITS FEODAUX'", "114___'CVII. La forme de l ’ hommage.'"], ['basnage', "001___'coutume'", "009___'DES FIEFS ET DROITS FEODAUX'", "147___'CXL.'"]

In [60]:
print(len(authors.keys()))
print(countAllRefs)

489
10799


In [61]:
# Write compilation CSV & reference Json.
initial_csv(mentions, final)
author_csv(final, authortable)

There is a problem with -> appius-claudius

Here is the corresponding dictionary:  {'earliest-birth': '-0399', 'latest-birth': '-0300', 'earliest-death': '-0299', 'latest-death': '-0200', 'name': 'Appius Claudius Caecus'} 

