# TF from Sumerian

We convert a Sumerian dataset, indicated by Justin Cale Johnson, to text-fabric format.

## Model

We divide the corpus in sections as follows:

* **compositions** Correspond to the individual files;
* **sections** Correspond to consecutive `<l>` elements with the same `corresp` attribute;
* **lines** Correspond to the individual `<l>` elements.

So much for the sectioning.
The text is divided further as follows:

* **words** Correspond to the individual `<w>` elements;
* **glyphs** Correspond to the `-` separated chunks that constitute words.

All these divisions are exactly the node types of the resulting TF dataset.
The slot type is `glyph`.

### NB 1:
Words may contain substrings of the form `&`*xyz*`;`.
This *xyz* is either an HTML entity that stands for an unicode character.
In those cases we replace the entity by the corresponding unicode character.

In other cases we consider the *xyz* also to be a glyph, and we translate it into `{`*xyz*`}`.

### NB 2:
Sometimes lines or sections are empty, i.e. there is no concrete glyph in it.
This does not play nice with TF, so we add a single, empty glyph in those elements.

## Coverage
The `<w>` elements may contain several types of elements. We have only covered
**corr corrEnd damage damageEnd supplied suppliedEnd**, and we ignore (for the moment)
**gloss note term unclear**.

There are also elements between the `<w>` elements, such as `<distinct>`.
These we have ignored (so far).

Most information in the `<teiHeader>` we ignore,
except the `<title>` in `<fileDesc><titleStmt>`.

### Notes on features

#### text-fabric specific

* **otype** for each node type (such as `composition`, `section`, `word`, etc), lists
  the ranges of nodes that are member of that type
* **oslots** for each node (text-object), lists the glyph positions that are part of it
* **otext** configures the sections (`composition`, `section`, `line`) and defines
  text rendering formats.

#### composition

* **compNum** the hierarchical number of the composition, as found in the file name
* **title** the English title of the composition, as found in the TEI header

#### section

* **secNum** the number of the section, as found in the `corresp` attribute on the `<l>`
  elements. We take the part after the `p`, and omit the rest. This is always a number.
  If the `corresp` attribute is missing, we fill in the value 0.
* **translation** the English translation of this section. See *Note on translations* below.
  
#### line

* **lineNum** the number of the line, as found in the `n` attribute on the `<l>` 
  elements. This is not always a number.
  
#### word
All attributes on the `<w>` elements are preserved under the same name:

* **bound det emesal-prefix emesal form-type form label lemma npart pos type**
* **freq_occ** computed feature with the frequency of each word form (using the
  `form` attribute of the `<w>` element)
* **freq_lex** computed feature with the frequency of each word lexeme (using the
  `lemma` attribute of the `<w>` element)
* **rank_occ rank_lex** derived from the corresponding `freq_` features.
  A node with top frequency has rank 1, lesser frequencies get higher ranks.

#### glyph

* **ascii** the textual representation of the glyph, as found in the content of the `<w>`
  elements.
* **trailer** the material to put behind each glyph in order to recreate the original text.
  In most cases, this will be a `-`. But for the last glyph of a word, it is ' '.
  And if the glyph is of the form `{`*xyz*`}`, it is '', or '-', or ' ', depending on 
  where it is encountered.
* **corr** comes from the `<corr>` and `<corrEnd>` elements. All glyphs inside a `<corr>` or
  between a `<corr/>` and `<corrEnd/>` have value 1, the other glyphs have no value.
* **damage** All glyphs between a `<damage/>` and `<damageEnd/>` have value 1, the other   
  glyphs have no value.
* **supplied** All glyphs between a `<supplied/>` and `<suppliedEnd/>` have value 1, the other   glyphs have no value.
* **freq_occ** computed feature with the frequency of each glyph
* **rank_occ** derived from the corresponding `freq_occ` feature.
  A node with top frequency has rank 1, lesser frequencies get higher ranks.

## Note on translations

There are translations per section. We find them in the file `alldb.sql`. We use the table `simpletranslation`.
The translation records there are linked to composition and section, which we use to make the match.

## Prelude
General imports.

In [1]:
import glob, os, collections, re, sqlite3
import xml.etree.ElementTree as ET
from html import unescape

Import the text-fabric package.

In [2]:
from tf.fabric import Fabric
from tf.timestamp import Timestamp

Initialize TF.

In [3]:
tm = Timestamp()
TF = Fabric(locations='~/Dropbox/text-fabric-data', modules='sumerian/etcsl')

This is Text-Fabric 2.3.9
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
29 features found and 0 ignored


Configure the location of the source materials.

In [4]:
BASEDIR = '~/Dropbox/text-fabric-data/sumerian/'.replace(
    '~', os.path.expanduser('~').replace('\\', '/'),
)
SOURCE_TEI = '{}/etcsl-tei-source'.format(BASEDIR)
SOURCE_SQL = '{}/etcsl-sql-source'.format(BASEDIR)

TRANSLATIONS_FILE = '{}/simpletranslation.sql'.format(SOURCE_SQL)

sectionpat = re.compile('^.*\.p([0-9]+)$')

## Grab the translations from the SQL file

We use the table `simpletranslation`, not `translation`.

In [5]:
conn = sqlite3.connect(':memory:')
c = conn.cursor()

In [6]:
with open(TRANSLATIONS_FILE) as sf:
    sql = ''
    for (ln, line) in enumerate(sf):
        line = line.rstrip('\n')
        sql += line
        if line.endswith(';'):
            print('Line {}: {}'.format(ln + 1, sql[0:40]))
            sql = sql.replace("\\'", "''").replace('\\r', '').replace('\\"', '"')
            c.execute(sql)
            sql = ''

Line 10: CREATE TABLE simpletranslation (  lineID
Line 11: INSERT INTO simpletranslation VALUES (1,
Line 12: INSERT INTO simpletranslation VALUES (19
Line 13: INSERT INTO simpletranslation VALUES (44


In [7]:
c = conn.cursor()
rows = []
for row in c.execute("SELECT * FROM simpletranslation"):
    rows.append(row)

In [8]:
len(rows)

6508

In [9]:
translations = dict()
unmatched = collections.defaultdict(list)
for (i, row) in enumerate(rows):
    comp = row[4][1:].split('.', 1)[0]
    translation = row[6]
    sectionFull = row[3]
    match = sectionpat.findall(sectionFull)
    if not match:
        unmatched[translation].append(rows[i])
        continue
    section = match[0]
    translations[(comp, section)] = translation

In [10]:
unmatched

defaultdict(list,
            {'(1 line fragmentary) (1 line missing)': [(1386,
               't.2.2.3',
               'The lament for Sumer and Urim',
               'x',
               'x',
               'x',
               '(1 line fragmentary) (1 line missing)')],
             '(This composition is inscribed on a tablet whose colophon specifies it as a &c;ir-nam&c;ub of Utu)': [(3824,
               't.4.32.e',
               'A &c;ir-nam&c;ub to Utu (Utu E)',
               't432e.n1',
               'x',
               'x',
               '(This composition is inscribed on a tablet whose colophon specifies it as a &c;ir-nam&c;ub of Utu)'),
              (3832,
               't.4.32.f',
               'A &c;ir-nam&c;ub to Utu (Utu F)',
               't432f.n1',
               'x',
               'x',
               '(This composition is inscribed on a tablet whose colophon specifies it as a &c;ir-nam&c;ub of Utu)')],
             '(unknown no. of lines missing)': [(1915,
    

## Grab the TEI data and store it in memory
Set up an object in which all converted data is being collected.

In [11]:
class Data:
    def __init__(self):
        self.slotType = 'glyph'
        self.slotNum = 0
        self.nodeNum = 0
        self.maxSlot = 0
        self.maxNode = 0
        self.paths = {}
        self.slotFeatures = collections.defaultdict(dict)
        self.nodeFeatures = collections.defaultdict(dict)
        self.edgeSlotFeatures = collections.defaultdict(lambda: collections.defaultdict(list))
        self.edgeFeatures = collections.defaultdict(lambda: collections.defaultdict(list))

Define functions to read and convert the TEI XML of a single document (composition).

In [12]:
sections = collections.defaultdict(dict)

wordContentElems = set()

spans = collections.defaultdict(list)

def glyphsFromString(glyphString):
    glyphs = []
    glyphsMain = glyphString.split('-')
    lastGlyphMain = len(glyphsMain) - 1
    for (i, gm) in enumerate(glyphsMain):
        glyphsSub = gm.split('}')
        lastGlyphSub = len(glyphsSub) - 1
        for (j, gs) in enumerate(glyphsSub):
            glyphs.append(
                (
                    (gs + '}') if gs.startswith('{') else gs, 
                    ' ' if i == lastGlyphMain and j == lastGlyphSub else \
                    '-' if i != lastGlyphMain and j == lastGlyphSub else\
                    ''
                )
            )
    return glyphs

def doGlyphs(glyphString, compN, givenSecN, givenLineN, givenWordN):
    glyphs = glyphsFromString(glyphString)
    for (glyph, trailer) in glyphs:
        data.slotNum += 1
        glyphN = data.slotNum
        data.slotFeatures['otype'][glyphN] = 'glyph'
        data.slotFeatures['ascii'][glyphN] = glyph
        data.slotFeatures['trailer'][glyphN] = trailer

        data.edgeSlotFeatures['oslots'][compN].append(glyphN)
        data.edgeSlotFeatures['oslots'][givenSecN].append(glyphN)
        data.edgeSlotFeatures['oslots'][givenLineN].append(glyphN)
        if givenWordN != None:
            data.edgeSlotFeatures['oslots'][givenWordN].append(glyphN)

def walkNode(node, path, compN, givenSecN=None, givenLineN=None, givenWordN=None):
    secN = None
    lineN = None
    wordN = None
    if node.tag == 'title' and path[-1] == 'titleStmt' and path[-2] == 'fileDesc' and path[-3] == 'teiHeader':
        data.nodeFeatures['title'][compN] = ''.join(node.itertext())
    elif node.tag == 'l':
        if 'corresp' in node.attrib:
            match = sectionpat.findall(node.attrib['corresp'])
            secNum = match[0]
        else:
            secNum = '0'
        if secNum not in sections[compN]:
            data.nodeNum += 1
            secN = data.nodeNum
            data.nodeFeatures['otype'][secN] = 'section'
            data.nodeFeatures['secNum'][secN] = secNum
            compNum = data.nodeFeatures['compNum'][compN].replace('.','')
            data.nodeFeatures['translation'][secN] = translations.get((compNum, secNum), 'X')
            sections[compN][secNum] = secN
        else:
            secN = sections[compN][secNum]
        data.nodeNum += 1
        lineN = data.nodeNum
        lineNum = node.attrib['n']
        data.nodeFeatures['otype'][lineN] = 'line'
        data.nodeFeatures['lineNum'][lineN] = lineNum
        if node.find('.//w') == None:
            data.slotNum += 1
            glyphN = data.slotNum
            data.slotFeatures['otype'][glyphN] = 'glyph'
            data.slotFeatures['ascii'][glyphN] = ''
            data.slotFeatures['trailer'][glyphN] = ''
            data.edgeSlotFeatures['oslots'][compN].append(glyphN)
            theSecN = secN if secN != None else givenSecN
            data.edgeSlotFeatures['oslots'][theSecN].append(glyphN)
            data.edgeSlotFeatures['oslots'][lineN].append(glyphN)
    elif node.tag == 'w':
        data.nodeNum += 1
        wordN = data.nodeNum
        data.nodeFeatures['otype'][wordN] = 'word'
        for (att, val) in node.attrib.items():
            data.nodeFeatures[att][wordN] = val
        if node.text != None:
            doGlyphs(node.text, compN, givenSecN, givenLineN, wordN)
    elif node.tag in {'corr', 'damage', 'supplied'}:
        spans[node.tag].append([data.slotNum + 1])
        if node.text != None:
            doGlyphs(node.text, compN, givenSecN, givenLineN, givenWordN)
            spans[node.tag][-1].append(data.slotNum)
    elif node.tag in {'corrEnd', 'damageEnd', 'suppliedEnd'}:
        spans[node.tag.replace('End', '')][-1].append(data.slotNum)
    if givenWordN != None:
        wordContentElems.add(node.tag)
        if node.text != None and node.tag not in {'corr', 'damage', 'supplied'}:
            doGlyphs(node.text, compN, givenSecN, givenLineN, givenWordN)
        if node.tail != None:
            doGlyphs(node.tail, compN, givenSecN, givenLineN, givenWordN)
        
    theSecN = secN if secN != None else givenSecN
    theLineN = lineN if lineN != None else givenLineN
    theWordN = wordN if wordN != None else givenWordN
    for child in node:
        walkNode(
            child, path + (node.tag,), compN,
            givenSecN=theSecN, givenLineN=theLineN, givenWordN=theWordN,
        )

def getNode(root, compNum):
    data.nodeNum += 1
    compN = data.nodeNum
    data.nodeFeatures['otype'][compN] = 'composition'
    data.nodeFeatures['compNum'][compN] = compNum
    walkNode(root, (), compN)

Define functions to reorganize the data that has been collected, so that it is ready to be transformed to TF.

In [13]:
def doSpans():
    for (tag, stretches) in spans.items():
        for span in stretches:
            if len(span) < 2:
                (start, end) = (span[0], span[0])
            else:
                (start, end) = (span[0], span[-1])
            for glyphN in range(start, end + 1):
                data.slotFeatures[tag][glyphN] = '1'

def reorder():
    slotType = data.slotType
    data.maxSlot = data.slotNum
    data.maxNode = data.nodeNum
    otypeValues = set(data.nodeFeatures['otype'].values())
    newIds = sorted(
        range(1, data.maxNode + 1),
        key=lambda n: (data.nodeFeatures['otype'][n], n),
    )
    mapping = dict(((v, i + 1 + data.maxSlot) for (i, v) in enumerate(newIds)))
    
    orderedFeatures = {}
    for (name, dat) in data.nodeFeatures.items():
        orderedFeatures[name] = dict(((mapping[n], v) for (n, v) in dat.items()))
    for (name, dat) in data.slotFeatures.items():
        if name not in orderedFeatures: orderedFeatures[name] = {}
        orderedFeatures[name].update(dat)
    data.nodeFeatures = orderedFeatures

    orderedFeatures = {}
    for (name, dat) in data.edgeFeatures.items():
        orderedFeatures[name] = dict(((mapping[n], [mapping[m] for m in v]) for (n, v) in dat.items()))
    for (name, dat) in data.edgeSlotFeatures.items():
        if name not in orderedFeatures: orderedFeatures[name] = {}
        orderedFeatures[name].update(dict(((mapping[n], v) for (n, v) in dat.items())))
    data.edgeFeatures = orderedFeatures

Put everything together: 

* read the files
* postprocess the data

This will result in having all data in memory, in datastructures that can be readily written to TF.

In [14]:
filenamepat = re.compile('^c\.([0-9a-z.]*)$')
entitypat = re.compile('&([^; \n]+);')

def replaceEntity(match): return '{{{}}}'.format(match.group(1))

data = Data()

tm.indent(level=0, reset=True)
tm.info('Scanning TEI sources of all compositions')
tm.indent(level=1, reset=True)
for (i, xmlfile) in enumerate(glob.glob(SOURCE_TEI+'/*.xml')):
    (dirName, baseName) = os.path.split(xmlfile)
    (fileName, extension) = os.path.splitext(baseName)
    match = filenamepat.findall(fileName)
    if len(match) == 0:
        tm.error('unexpected file: "{}"'.format(baseName))
        continue
    compNum = match[0]
    tm.info('composition {:>3}: {}'.format(i, compNum))
    with open(xmlfile) as xf:
        text = unescape(xf.read())
        text = entitypat.sub(replaceEntity, text)
    root = ET.fromstring(text)
    getNode(root, compNum)
tm.indent(level=0)
tm.info('Slots:       {:>7}'.format(data.slotNum))
tm.info('Other nodes: {:>7}'.format(data.nodeNum))
tm.info('Processing data ...')
doSpans()
reorder()
tm.info('Done')

print('Elements found in word content:\n\t{}'.format('\n\t'.join(sorted(wordContentElems))))

  0.00s Scanning TEI sources of all compositions
   |     0.00s composition   0: 0.1.1
   |     0.01s composition   1: 0.1.2
   |     0.02s composition   2: 0.2.01
   |     0.04s composition   3: 0.2.02
   |     0.06s composition   4: 0.2.03
   |     0.07s composition   5: 0.2.04
   |     0.09s composition   6: 0.2.05
   |     0.10s composition   7: 0.2.06
   |     0.12s composition   8: 0.2.07
   |     0.14s composition   9: 0.2.08
   |     0.15s composition  10: 0.2.11
   |     0.16s composition  11: 0.2.12
   |     0.17s composition  12: 0.2.13
   |     0.18s composition  13: 1.1.1
   |     0.23s composition  14: 1.1.2
   |     0.27s composition  15: 1.1.3
   |     0.34s composition  16: 1.1.4
   |     0.37s composition  17: 1.2.1
   |     0.40s composition  18: 1.2.2
   |     0.45s composition  19: 1.3.1
   |     0.53s composition  20: 1.3.2
   |     0.58s composition  21: 1.3.3
   |     0.64s composition  22: 1.3.4
   |     0.66s composition  23: 1.3.5
   |     0.69s composition  

   |     5.27s composition 206: 3.1.20
   |     5.29s composition 207: 3.1.21
   |     5.31s composition 208: 3.2.01
   |     5.31s composition 209: 3.2.02
   |     5.32s composition 210: 3.2.03
   |     5.33s composition 211: 3.2.04
   |     5.34s composition 212: 3.2.05
   |     5.36s composition 213: 3.3.01
   |     5.37s composition 214: 3.3.02
   |     5.39s composition 215: 3.3.03
   |     5.41s composition 216: 3.3.04
   |     5.41s composition 217: 3.3.05
   |     5.43s composition 218: 3.3.06
   |     5.43s composition 219: 3.3.07
   |     5.44s composition 220: 3.3.08
   |     5.45s composition 221: 3.3.09
   |     5.47s composition 222: 3.3.10
   |     5.48s composition 223: 3.3.11
   |     5.49s composition 224: 3.3.12
   |     5.49s composition 225: 3.3.20
   |     5.50s composition 226: 3.3.21
   |     5.51s composition 227: 3.3.22
   |     5.52s composition 228: 3.3.27
   |     5.53s composition 229: 3.3.39
   |     5.54s composition 230: 4.01.1
   |     5.55s compositio

A few checks.
Finding out the contents of the `<w>` elements was a matter of trial and error.
It seems that there are still a few rough edges (very few, indeed).

In [15]:
faulty = collections.defaultdict(list)
for (tag, stretches) in spans.items():
    for span in stretches:
        if len(span) != 2:
            faulty[tag].append(span)
    print('{:<9}: {:>5} spans, {:>3} faulty'.format(tag, len(stretches), len(faulty[tag])))

damage   : 11306 spans,   1 faulty
supplied : 17865 spans,   4 faulty
corr     :  1102 spans,   0 faulty


In [16]:
faulty

defaultdict(list,
            {'corr': [],
             'damage': [[412032, 412032, 412032]],
             'supplied': [[3176, 3180, 3186],
              [157050],
              [282876, 282877, 282885],
              [396033]]})

## Extra Features: composition number

For technical reasons we need a feature `compNum@en`, i.e. the composition number in English.
It is identical to `compNum`.

In [17]:
data.nodeFeatures['compNum@en'] = data.nodeFeatures['compNum']

## Add metadata
Before we can export to TF, we have to supply essential metadata about the features.

In [18]:
metaData = {
    '': dict(
        createdBy='Justin Cale Johnson and Dirk Roorda',
    ),
    'otext': {
        'sectionFeatures': 'compNum,secNum,lineNum',
        'sectionTypes': 'composition,section,line',
        'fmt:text-orig-full': '{ascii}{trailer}',
        'fmt:form-orig-full': '{form} ',
        'fmt:lex-orig-full': '{lemma} ',
        'fmt:gloss': '{label} ',
        'fmt:translation': '{translation}\\n',
    },
}

numberFeatures = set('''
    secNum
'''.strip().split())

for nf in data.nodeFeatures:
    metaData.setdefault(nf, {})['valueType'] = 'int' if nf in numberFeatures else 'str'
for ef in data.edgeFeatures:
    metaData.setdefault(ef, {})['valueType'] = 'int' if ef in numberFeatures else 'str'

## Extra features: statistical
We add some statistical features:

* rank and frequency for word occurrences and lexemes
* rank and frequency for glyphs

In [19]:
tm.info('Computing statistics')
gstats = {
    'freq': {
        'occ': collections.Counter(),
    },
    'rank': {
        'occ': {},
    },
}
wstats = {
    'freq': {
        'lex': collections.Counter(),
        'occ': collections.Counter(),
    },
    'rank': {
        'lex': {},
        'occ': {},
    },
}

nodeFeatures = data.nodeFeatures

words = [n[0] for n in nodeFeatures['otype'].items() if n[1] == 'word']
glyphs = [n[0] for n in nodeFeatures['otype'].items() if n[1] == 'glyph']

for g in glyphs:
    occ = nodeFeatures['ascii'][g]
    gstats['freq']['occ'][occ] += 1
tp = 'occ'
rank = -1
prev_n = -1
amount = 1
for (x, n) in sorted(gstats['freq'][tp].items(), key=lambda y: (-y[1], y[0])):
    if n == prev_n:
        amount += 1
    else:
        rank += amount
        amount = 1
    prev_n = n
    gstats['rank'][tp][x] = rank
        
for w in words:
    occ = nodeFeatures['form'][w]
    lex = nodeFeatures['lemma'][w]
    wstats['freq']['lex'][lex] += 1
    wstats['freq']['occ'][occ] += 1
for tp in ['lex', 'occ']:
    rank = -1
    prev_n = -1
    amount = 1
    for (x, n) in sorted(wstats['freq'][tp].items(), key=lambda y: (-y[1], y[0])):
        if n == prev_n:
            amount += 1
        else:
            rank += amount
            amount = 1
        prev_n = n
        wstats['rank'][tp][x] = rank
tm.info('Done')

tm.info('Adding statistics as features')
occFeatures = {}
for tp in ['occ', 'lex']:
    for ft in ('freq_{}'.format(tp), 'rank_{}'.format(tp)):
        occFeatures[ft] = {}
        metaData.setdefault(ft, {})['valueType'] = 'int'

for g in glyphs:
    occ = nodeFeatures['ascii'][g]
    tp = 'occ'
    ref = occ
    for kn in ['freq', 'rank']:
        ft = '{}_{}'.format(kn, tp)
        occFeatures[ft][g] = str(gstats[kn][tp][ref])

for w in words:
    occ = nodeFeatures['form'][w]
    lex = nodeFeatures['lemma'][w]
    for tp in ['occ', 'lex']:
        ref = occ if tp == 'occ' else lex
        for kn in ['freq', 'rank']:
            ft = '{}_{}'.format(kn, tp)
            occFeatures[ft][w] = str(wstats[kn][tp][ref])

nodeFeatures.update(occFeatures)

tm.info('Done')

    12s Computing statistics
    13s Done
    13s Adding statistics as features
    15s Done


Here we check whether some text objects have remained without glyphs
(e.g. caused by a line with only a `<gap>` element and no `<w>` elements).

In [20]:
for otype in ['glyph', 'composition', 'section', 'line', 'word']:
    nodes = [n for n in data.nodeFeatures['otype'] if data.nodeFeatures['otype'][n] == otype]
    print('{}: {}-{}'.format(
        otype,
        min(nodes),
        max(nodes),
    ))
    if otype == 'glyph': continue
    for n in nodes:
        if n not in data.edgeFeatures['oslots']:
            print('missing in {}: {}'.format(otype, n))
            break

glyph: 1-412192
composition: 412193-412586
section: 449098-455609
line: 412587-449097
word: 455610-625393


## Build the TF dataset
We can now produce the text-fabric dataset with one command.

In [21]:
TF.save(nodeFeatures=data.nodeFeatures, edgeFeatures=data.edgeFeatures, metaData=metaData)

  0.00s Exporting 27 node and 1 edge and 1 config features to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl:
   |     0.84s T ascii                to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.01s T bound                to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.01s T compNum              to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.01s T compNum@en           to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.02s T corr                 to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.04s T damage               to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.04s T det                  to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.01s T emesal               to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.02s T emesal-prefix        to /Users/dirk/Dropbox/text-fabric-data/sumerian/etcsl
   |     0.37s T form                 to /Users/