# Re-numbering tokens after revisions

* __Note__: This script is similar to [this one](../lemmatize-new-witness/NV_1_numerotation_tokens.ipynb), which is rather meant for a file whose `<w>` still have no `att.n` attributes.
* __Note__: due to the fact that on GitHub the `@` sign is used to tag users, it is replaced by `att.` in XPath expressions.

This script takes a valid TEI-XML file, tokenised with `//tei:w[att.n]` elements. Its only function will remove the current `att.n` when they are there, and give a new unique number to each `<w>` element, within an `att.n` attribute, in the reading order.

### FUNCTION: give each `<w>` element a (new?) number

In [1]:
def id_tokens_in_tei(chemin_entree, chemin_sortie):
    
    """
    This function takes a valid TEI-XML file as input.
    It targets all <w> elements and gives them a unique
    @n attribute, numbered from 1, removing the previous
    one if needed. The result is a valid TEI-XML file.
    
    :param chemin_entree: The local path to the tokenized
        TEI-XML file whose <w> elements need to be numbered.
    :param chemin_sortie: The local path for the output file.
    
    """

    import xml.etree.ElementTree as ET
    
    ET.register_namespace('', 'http://www.tei-c.org/ns/1.0')
    
    
    # Create a counter.
    counter = 1
    
    # Declare the TEI namespace, without a prefix since it is the only one.
    ET.register_namespace('', "http://tei-c.org/ns/1.0")
    
    # Import and parse the input XML file.
    tree = ET.parse(chemin_entree)
    root = tree.getroot()

    # Loop on <w> elements in reading order.
    for word in root.findall('.//{http://www.tei-c.org/ns/1.0}w'):
    
        # If the <w> element already has an @n attribute, remove it
        # so we can replace it.
        if word.get('n'):
            del word.attrib['n']
        # Make an @n attribute with the current state of the counter as value.
        word.set('n', str(counter))
        # Add 1 to the counter for the next <w> element.
        counter += 1

    # Write the output file at the path specified as second argument.
    tree.write(chemin_sortie, xml_declaration=True, encoding="unicode")

### Defining input and output files to execute the function

In [4]:
# To execute the function, replace current paths with your own..

id_tokens_in_tei(
    '/local/path/to/input-file.xml',
    '/local/path/to/output-file.xml'
    )

"id_tokens_in_tei(\n    '/home/erminea/Documents/CONDE/editions/base-version/berault_base.xml',\n    '/home/erminea/Documents/CONDE/nov-21_renum/berault_base.xml'\n    )\n\nid_tokens_in_tei(\n    '/home/erminea/Documents/CONDE/editions/base-version/merville_base.xml',\n    '/home/erminea/Documents/CONDE/nov-21_renum/merville_base.xml'\n    )\n\nid_tokens_in_tei(\n    '/home/erminea/Documents/CONDE/editions/base-version/pesnelle_base.xml',\n    '/home/erminea/Documents/CONDE/nov-21_renum/pesnelle_base.xml'\n    )\n\nid_tokens_in_tei(\n    '/home/erminea/Documents/CONDE/editions/base-version/ruines_base.xml',\n    '/home/erminea/Documents/CONDE/nov-21_renum/ruines_base.xml'\n    )\n\nid_tokens_in_tei(\n    '/home/erminea/Documents/CONDE/editions/base-version/tac_base.xml',\n    '/home/erminea/Documents/CONDE/nov-21_renum/tac_base.xml'\n    )\n\nid_tokens_in_tei(\n    '/home/erminea/Documents/CONDE/editions/base-version/terrien_base.xml',\n    '/home/erminea/Documents/CONDE/nov-21_renum/t