# NENA to TF

This notebook will be used to develop code for converting texts from .nena format to Text-Fabric. The parser has principally been written by Hannes Vlaardingerbroek. Many thanks to him for his hard work on it. Updates and refinements have been added by Cody Kingham.

In [1]:
! echo "last updated"; date

last updated
Mon  2 Mar 2020 14:48:24 GMT


In [2]:
import os
import sys
import collections
import re
import csv
import unicodedata
from pathlib import Path
from tf.convert.walker import CV
from tf.fabric import Fabric

# path to parser
parserpath = f'../../nena_corpus/parse_nena/'
sys.path.append(parserpath)
from nena_parser import NenaLexer, NenaParser

# paths
VERSION = '0.01'
OUT_DIR = Path(f'../tf/{VERSION}')
data_dir = Path(f'../../nena_corpus/nena/{VERSION}')
dialect_dirs = list(Path(data_dir).glob('*'))

# open char tables
# trans_lite_table = Path('../char_tables/trans_lite.tsv')
# with open(trans_lite_table, 'r') as infile:
#     trans_data = list(csv.reader(infile, delimiter='\t'))[1:]
#     trans_lite = {unicodedata.normalize('NFC', td[0]):td[1] for td in trans_data}



In [3]:
TF = Fabric(locations=[str(OUT_DIR)], silent=True)
cv = CV(TF)

## Test NENA Parser

The NENA Parser delivers the text as structured morphemes, which can then be processed into a TF graph. We do that below by opening each source text, retrieving its parsed form, and begin each iteration. 

In [4]:
lexer = NenaLexer()
parser = NenaParser()

In [5]:
# dialect2file2parsed = collections.defaultdict(lambda: collections.defaultdict())

# nparsed = 0

# for dialect in sorted(dialect_dirs):    
#     print()
#     print(dialect.name)
#     for file in sorted(dialect.glob('*.nena')):
#         with open(file, 'r') as infile:
#             text = infile.read()
#             print(f'parsing: {file.name}')
#             parse = parser.parse(lexer.tokenize(text))
#             nparsed += 1
#             dialect2file2parsed[dialect.name][file.name] = parse
            

# print('\n', nparsed, 'texts ready for conversion')

In [6]:
# linenum, elements = dialect2file2parsed['Urmi_C']['Village Life.nena'][1][0][
# eg_morph = elements[0]

In [7]:
# dialect2file2parsed['Barwar']['A Hundred Gold Coins.nena'][1][0][0]

## Transcriptions

In [8]:
trans_full = {
    # non-latin vowels
    '\u0131': '1',  # 0x0131 ı dotless i
    '\u0251': '@',  # 0x0251 ɑ alpha
    '\u0259': '3',  # 0x0259 ə schwa
    '\u025B': '$',  # 0x025B ɛ open e
    # vowel accents
    '\u0300': '`',  # 0x0300 à grave
    '\u0301': "'",  # 0x0301 á acute
    '\u0304': '_',  # 0x0304 ā macron
    '\u0306': '%',  # 0x0306 ă breve
    '\u0308': '"',  # 0x0308 ä diaeresis
    '\u0303': '~',  # 0x0303 ã tilde
    '\u02C8': '', # 0x2c8 ˈ small vertical line
    # non-latin consonants
    '\u00F0': '6',  # 0x00F0 ð eth
    '\u025F': '&',  # 0x025F ɟ small dotless j with stroke
    '\u0248': '!',  # 0x0248 Ɉ capital J with stroke
    '\u03B8': '8',  # 0x03B8 θ greek theta
    '\u02B8': '7',  # 0x02B8 ʸ small superscript y
    '\u02BE': '}',  # 0x02BE ʾ right half ring (alaph)
    '\u02BF': '{',  # 0x02BF ʿ left half ring (ayin)
    # consonant diacritics
    '\u207A': '+',  # 0x207A ⁺ superscript plus
    '\u030C': '<',  # 0x030C x̌ caron
    '\u0302': '^',  # 0x0302 x̂ circumflex
    '\u0307': ';',  # 0x0307 ẋ dot above
    '\u0323': '.',  # 0x0323 x̣ dot below
    '\u032D': '>',  # 0x032D x̭ circumflex below
    
    # punctuation
    '\u02C8': '|', # 0x2c8 ˈ small vertical line
}
def trans(s, table, mark_punct=True):
    '''
    Transcribes a text.
    '''
    s = unicodedata.normalize('NFD', s)
    # mark punctuation 
    if mark_punct:
        s = re.sub('([\n,.!?:;/])', r'/\g<1>', s, 1)
    return ''.join([table.get(c, c) for c in s])

In [9]:
class Transcriber:
    """Transcribe a string according to transcription rules.
    
    This transcription class is essentially a filter
    which determines which characters make it into a new,
    transcribed string. The filter is applied on a letter-by-letter 
    basis. A "letter" (token) is defined by the `tokens` argument 
    and can include diacritics/accents. The filter is applied 
    in one of three methods:
        1. replacements on a unicode composed letter (NFC)
        2. or replacements on punctuation if letter is punctuation
        3. or replacements on a unicode decomposed letter (NFD)
    The changes are added to a new string which is then returned.
    
    __init__(tokens, replacements, punctuation, keep):
        string: a string to transcribe
        tokens: regex for splitting letters (tokens)
            to be used with findall
        replacements: dict with find:replace mappings
        keep: regex for characters to keep
            
    Returns:
        str in transcribed form
    """
    def __init__(self, tokens='', replace={}, 
                 punctuation='', keep='', keep_case=False):    
        self.tokenize = re.compile(f'{tokens}|{punctuation}').findall
        self.punct = re.compile(punctuation)
        self.keep = re.compile(keep)
        self.keep_case = keep_case
        
        # ensure normalized characters for pattern searches
        self.repl = {
            unicodedata.normalize('NFC',f):r 
                for f,r in replace.items()
        }
        
    def convert(self, string):
        """Convert string to transcription.
        
        Returns:
            str in transcribed form
        """
        
        string = unicodedata.normalize('NFC',string)
        if not self.keep_case:
            string = string.lower()
        transcription = ''

        for token in self.tokenize(string):

            # filter a composed string
            if token in self.repl:
                transcription += self.repl[token]

            # keep punctuation
            elif self.punct.match(token):
                transcription += token

            # filter at 
            else:
                for char in unicodedata.normalize('NFD', token):

                    # attempt second match on char by char basis
                    if char in self.repl:
                        transcription += self.repl[char]

                    # attempt to keep with keep-set
                    elif self.keep.match(char):
                        transcription += char   
                        
        return transcription

In [10]:
# trans_full = {
#     'tokens': f'[\u207A]?[^\W\d_][\u0300-\u036F]*',
#     'replace': {
#         ''
#     }
#     'punctuation': '[\s-]',
#     'keep': '[A-Za-z]',
# }

trans_lite = {
    'tokens': f'[\u207A]?[^\W\d_][\u0300-\u036F]*', 
    'replace': {
        'ʾ': ')',
        'ʿ': '(',
        'č': '5',
        'č̭': '5',
        'č̣': '%',
        'ḍ': 'D',
        'ð': '6',
        'ð̣': '^',
        'ġ': 'G',
        'ḥ': 'H',
        'ɟ': '4',
        'Ɉ': '4',
        'k̭': '&',
        'ḷ': 'L',
        'ṃ': 'M',    
        'p̣': 'P',
        'ṛ': 'R',
        'ṣ': 'S',
        'š': '$',
        'ṱ': '<+>',
        'ṭ': 'T',
        'θ': '8',
        'ž': '7',
        'ẓ': 'Z',
        'ā̀': 'A',
        'ā́': 'A',
        'ă': '@',
        'ắ': '@',
        'ằ': '@',
        'ē': 'E',
        'ɛ': '3',
        'ī': 'I',
        'ĭ': '9',
        'ə': '9',
        'o': 'o',
        'ō': 'O',
        'ū': 'U',
        'ŭ': '2',
        'ı': 'i',
        'ɑ': 'a',
        'ˈ': '|'
    },
    'punctuation': '[\s.,?!:;–\-\u2014]',
    'keep': '[A-Za-z]',
}

fuzzy_urmi = {
    'tokens': f'[\u207A]?[^\W\d_][\u0300-\u036F]*',
    'replace': {
        'c': 'k',
        'c̭': 'k',
        'č': '5',
        'č̭': '5',
        'č̣': '5',
        'k̭': 'q',
        'ɟ': 'g',
        'Ɉ': 'g',
        'ə': 'i',
    },
    'punctuation': '[\s.,?!:;–\-\u2014]',
    'keep': '[A-Za-z]',
}

fuzzy_barwar = {
    'tokens': f'[\u207A]?[^\W\d_][\u0300-\u036F]*',
    'replace': {
        'č': '5',
        'č̭': '5',
        'č̣': '5',
        'k̭': 'k',
        'θ': 't',
        'ð': 'd',
        'ɛ': 'e',
        'ə': 'i',
    },
    'punctuation': '[\s.,?!:;–\-\u2014]',
    'keep': '[A-Za-z]',
}

In [11]:
test = 'xòš-məndila'

In [12]:
trans_test = Transcriber(**fuzzy_barwar)

In [13]:
trans_test.convert(test)

'xos-mindila'

# Metadata

In [14]:
slotType = 'letter'

otext = {
    'sectionTypes': 'dialect,text,line',
    'sectionFeatures': 'dialect,title,number',
    'fmt:text-orig-full': '{text}{end}',
    'fmt:text-trans-full': '{trans_f}{etrans_f}',
    'fmt:text-trans-lite': '{trans_l}{etrans_l}',
    'fmt:text-trans-fuzzy': '{t_fuzzy}{efuzzy}',
}

description = ''.join("""
The NENA linguistic corpus is derived from decades of 
field work by Prof. Geoffrey Khan and his students.
""".split('\n'))

generic = {
    'origin': 'Cambridge University, Faculty of Asian and Middle Eastern Studies',
    'author': 'Geoffrey Khan et al.',
    'editors': 'Cody Kingham, Paul Noorlander, James Strachan, Hannes Vlaardingerbroek',
    'researchers': 'Dorota Molin, Johan Lundberg',
    'source': description,
    'url': 'https://github.com/CambridgeSemiticsLab/nena_tf',
}

intFeatures = {'number'}

d = 'about'

featureMeta = {
    'dialect': {d: 'name of a dialect in Northeastern Neo-Aramaic'},
    'title': {d: 'title of a text (story)'},
    'version': {d: 'version of the story if there are multiple instances of the same story'},
    'number': {d: 'sequential number of a paragraph or line within a text or paragraph, respectively'},
    'text': {d: 'plain text representation of a letter, morpheme, or word'},
    'end': {d: 'space, punctuation, or other stylistic text at the end of a morpheme or word'},
    'trans_f': {d: 'full, one-to-one transcription of a letter, morpheme, or word'},
    'trans_l': {d: 'lite transcription of a letter, morpheme, or word, without vowel accents'},
    'etrans_f': {d: 'full transcription of punctuation or other stylistic text at the end of a morpheme or word; see also trans_f'},
    'etrans_l': {d: 'lite transcription of punctuation or other stylistic text at the end of a morpheme or word, excluding intonation boundary markers; see also trans_l'},
    'speaker': {d: 'name or initials of person speaking a morpheme or word; see also informant'},
    'footnotes': {d: 'explanatory footnote on a morpheme or text'},
    'lang': {d: 'language of a morpheme foreign to a text'},
    'foreign': {d: 'indicates whether a morpheme is foreign to a text; see also lang'},
    'comment': {d: 'explanatory comment inserted in the text, stored on a morpheme'},
    'continued_from': {d: 'text is a follow-up to the named text'},
    'informant': {d: 'name of person who spoke these words'},
    'place': {d: 'place a text was recorded'},
    'source': {d: 'name of the file from which a text was converted'},
    'text_id': {d: 'id of a text within its original publication; can overlap between publications'},
    't_fuzzy': {d: 'fuzzy transcription that leaves out most diacritics and maps certain characters in certain dialects to common characters'},
    'efuzzy': {d: 'fuzzy transcription of punctuation or other stylistic text at the end of a morpheme or word, excluding intonation boundary markers; see also trans_l'},
}

# Converter

Build a TF Walker class that can walk over the NENA parsed data and fit the text graph.

In [15]:
def make_footnotes(fn_dict):
    """Format footnote dict into string"""
    if fn_dict:
        return '; '.join(
            f'[^{num}]: {txt}' for num, txt in fn_dict.items()
        )
    else:
        return None

def make_wordfeats(mfeat_list, ignore={}):
    """Convert a list of morpheme feature dicts into one for a word.
    
    Features stored on a word must be inherited in special ways
    for words. For example, a word's "end" feature should be 
    the last morpheme, not all of the ends. Those features 
    are specially processed here.
    """
    
    # gather word features here
    word_fs = collections.defaultdict(set)

    # add features
    for mfeats in mfeat_list:
        for feat,val in mfeats.items():
            if feat in ignore:
                continue
            else:
                word_fs[feat].add(val)

    # handle special cases
    word_fs['end'] = mfeat_list[-1]['end']
    word_fs['etrans_f'] = mfeat_list[-1]['etrans_f']
    word_fs['etrans_l'] = mfeat_list[-1]['etrans_l']
    word_fs['efuzzy'] = mfeat_list[-1]['efuzzy']
    word_fs['text'] = ''.join(
        mf['text']+mf['end'] for mf in mfeat_list
    )
    # add transcription with end, leaving off the end from the 
    # last morpheme
    trans_parts = [('trans_f', 'etrans_f'), ('trans_l', 'etrans_l'), ('t_fuzzy', 'efuzzy')]
    for trans, end in trans_parts:
        word_fs[trans] = ''
        for i,mf in enumerate(mfeat_list):
            word_fs[trans] += mf[trans]
            if i+1 != len(mfeat_list):
                word_fs[trans] += mf[end]
    
    # convert to strings and handle duplicates
    for feat,val in word_fs.items():
        if type(val) == set:
            val = {v for v in val if v}
            if val:
                word_fs[feat] = ' '.join(val)
            else:
                word_fs[feat] = None
        
    return word_fs
    
t_lite = Transcriber(**trans_lite) # transcription lite feature

    
def director(CV):
    """Walk the source data and produce a TF graph"""
    
    info = TF.tm.info
    
    # transcriptions particular to dialects
    dialect2fuzzy = {
        'Barwar': Transcriber(**fuzzy_barwar),
        'Urmi_C': Transcriber(**fuzzy_urmi),
    }
    
    for dialect_dir in sorted(dialect_dirs):  
        
        # make dialect node
        dialect = cv.node('dialect')
        dia = dialect_dir.name
        cv.feature(dialect, dialect=dia)
        
        # retrieve fuzzy transcription particular to dialect
        t_fuzzy = dialect2fuzzy[dia]
        
        # process file into TF graph
        for file in sorted(dialect_dir.glob('*.nena')):
            
            info(f'processing: [{file}]')
            
            with open(file, 'r') as infile:
                nena_text = infile.read()
            
            # parse the .nena format
            header, paragraphs = parser.parse(lexer.tokenize(nena_text))
            
            # -- begin TF node creation --
            
            # cv.node initializes a node object
            # all slots added in between its creation and 
            # termination will be considered embedded within
            # this node; same is true of following cv.node calls
            text = cv.node('text')
            cv.feature(text, **header) # adds features to supplied node
            title = header['title']
            
            for i, para in enumerate(paragraphs):
                
                # TODO: Process footnotes here
                if len(para[0]) != 2:
                    continue
                
                # make paragraph node
                paragraph = cv.node('paragraph')
                cv.feature(paragraph, number=i+1)
                
                for line_num, line_elements in para:
                    
                    # make line nodes
                    line = cv.node('line')
                    cv.feature(line, number=line_num)
                    
                    # Make linguistic nodes by parsing morphemes.
                    # This must be done iteratively and composed
                    # based on characters at the end of each morpheme. 
                    # Punctuation signals intonation/subsentence/sentence 
                    # boundaries; spaces and hyphens signal word bounds. 
                    # This is handled in the loop below.
                    word = cv.node('word')
                    inton = cv.node('inton')
                    subsentence = cv.node('subsentence')
                    sentence = cv.node('sentence')
                    word_features = [] # store morphs feats here for processing

                    for i, elem in enumerate(line_elements):
                        
                        is_end = i+1 == len(line_elements)

                        # add morphemes as slots
                        # 'slot' being the most basic element
                        if elem.__class__.__name__ == 'Morpheme':
                            
                            # make morpheme node
                            morph = cv.node('morpheme')
                            
                            # access/prepare morph features
                            fs = elem.__dict__
                            trailer = elem.trailer.replace('/', '\n')
                            
                            # package & edit morph features for cv
                            # NB: None values are ignored by default
                            m_string = ''.join(elem.value)
                            feats = {
                                'text': ''.join(elem.value),
                                'trans_f': trans(m_string, trans_full),
                                'trans_l': t_lite.convert(m_string),
                                't_fuzzy': t_fuzzy.convert(m_string),
                                'end': trailer,
                                'etrans_f': trans(trailer, trans_full),
                                'etrans_l': t_lite.convert(trailer),
                                'efuzzy': t_fuzzy.convert(trailer),
                                'speaker': fs.get('speaker') or header.get('informant'),
                                'footnotes': make_footnotes(fs.get('footnotes', {})),
                                'lang': fs.get('lang'),
                                'foreign': str(fs.get('foreign')) if fs.get('foreign') else None,
                            }
                            
                            # make letter slots
                            # creation of a slot simultaneously 
                            # embeds it within all active nodes
                            for i, let in enumerate(elem.value):
                                # letter features
                                letfs = {
                                    'text': let,
                                    'end': '',
                                    'trans_f': trans(let, trans_full),
                                    'etrans_f': '',
                                    'trans_l': t_lite.convert(let),
                                    'etrans_l': '', 
                                    't_fuzzy': t_fuzzy.convert(let),
                                    'efuzzy': '',                        
                                }
                                if i+1 == len(elem.value):
                                    letfs['end'] = trailer
                                    letfs['etrans_f'] = feats['etrans_f']
                                    letfs['etrans_l'] = feats['etrans_l']
                                    letfs['efuzzy'] = feats['efuzzy']
                                letter = cv.slot()
                                cv.feature(letter, **letfs)
                                cv.terminate(letter)
                            
                            word_features.append(feats)
                            cv.feature(morph, **feats)
                            cv.terminate(morph)
                                
                            # -- trigger linguistic node endings --
                            
                            # word ending
                            if (not re.match('^$|^[-=]$', trailer)) or is_end:
                                cv.feature(word, **make_wordfeats(word_features))
                                word_features = []
                                cv.terminate(word)
                                if not is_end:
                                    word = cv.node('word')

                            # intonation group ending
                            if re.search('\u02c8', trailer):
                                cv.terminate(inton)
                                if not is_end:
                                    inton = cv.node('inton')
                            
                            # subsentence ending
                            if re.search('[,;:\u2014\u2013]', trailer):
                                cv.terminate(subsentence)
                                if not is_end:
                                    subsentence = cv.node('subsentence')
                            
                            # sentence ending
                            elif re.search('[.!?]', trailer):
                                cv.terminate(subsentence)
                                cv.terminate(sentence)
                                if not is_end:
                                    subsentence = cv.node('subsentence')
                                    sentence = cv.node('sentence')
                                
                        # add other elements
                        else:
                            kind, data = elem
                            if kind == 'footnote':
                                cv.feature(text, footnote=make_footnotes(data))
                            else:
                                cv.feature(morph, **{kind:str(data)})
                    
                    # sanity check for un-closed words, itons, subsentences, sentences
                    # due either to lack of proper punctuation in the source text (to be fixed later)
                    # or due to non-morpheme elements intervening in the iteration
                    unclosed = {'inton','sentence', 'subsentence', 'word'} & cv.activeTypes()
                    if unclosed:
                        sys.stderr.write(f'force-closing types {unclosed} in {title} ln {line_num}\n')
                        cv.terminate(word)
                        cv.terminate(inton)
                        cv.terminate(subsentence)
                        cv.terminate(sentence)
                        
                    # -- trigger section node endings --
                    cv.terminate(line)
                cv.terminate(paragraph)
            cv.terminate(text)
        cv.terminate(dialect)

## Test good


In [16]:
good = cv.walk(
    director,
    slotType,
    otext=otext,
    generic=generic,
    intFeatures=intFeatures,
    featureMeta=featureMeta,
    warn=True,
    force=False,
)

  0.00s Importing data from walking through the source ...
   |     0.00s Preparing metadata... 
   |     0.00s No structure nodes will be set up
   |   SECTION   TYPES:    dialect, text, line
   |   SECTION   FEATURES: dialect, title, number
   |   STRUCTURE TYPES:    
   |   STRUCTURE FEATURES: 
   |   TEXT      FEATURES:
   |      |   text-orig-full       end, text
   |      |   text-trans-full      etrans_f, trans_f
   |      |   text-trans-fuzzy     efuzzy, t_fuzzy
   |      |   text-trans-lite      etrans_l, trans_l
   |     0.01s OK
   |     0.00s Following director... 
   |     0.00s processing: [../../nena_corpus/nena/0.01/Barwar/A Hundred Gold Coins.nena]
   |     0.11s processing: [../../nena_corpus/nena/0.01/Barwar/A Man Called Čuxo.nena]
   |     0.32s processing: [../../nena_corpus/nena/0.01/Barwar/A Tale of Two Kings.nena]
   |     0.51s processing: [../../nena_corpus/nena/0.01/Barwar/A Tale of a Prince and a Princess.nena]


force-closing types {'subsentence', 'sentence'} in A Tale of a Prince and a Princess ln 32


   |     1.24s processing: [../../nena_corpus/nena/0.01/Barwar/Baby Leliθa.nena]
   |     1.57s processing: [../../nena_corpus/nena/0.01/Barwar/Dəmdəma.nena]
   |     1.82s processing: [../../nena_corpus/nena/0.01/Barwar/Gozali and Nozali.nena]


force-closing types {'subsentence', 'sentence'} in Gozali and Nozali ln 1


   |     2.97s processing: [../../nena_corpus/nena/0.01/Barwar/I Am Worth the Same as a Blind Wolf.nena]
   |     3.23s processing: [../../nena_corpus/nena/0.01/Barwar/Man Is Treacherous.nena]


force-closing types {'subsentence', 'sentence'} in I Am Worth the Same as a Blind Wolf ln 2


   |     3.30s processing: [../../nena_corpus/nena/0.01/Barwar/Measure for Measure.nena]
   |     3.35s processing: [../../nena_corpus/nena/0.01/Barwar/Nanno and Jəndo.nena]
   |     3.54s processing: [../../nena_corpus/nena/0.01/Barwar/Qaṭina Rescues His Nephew From Leliθa.nena]
   |     3.64s processing: [../../nena_corpus/nena/0.01/Barwar/Sour Grapes.nena]
   |     3.66s processing: [../../nena_corpus/nena/0.01/Barwar/Tales From the 1001 Nights.nena]


force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 1
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 2
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 3
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 4
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 5
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 7
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 8
force-closing types {'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 10
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 11
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Nephew From Leliθa ln 12
force-closing types {'subsentence', 'sentence'} in Qaṭina Rescues His Ne

   |     4.49s processing: [../../nena_corpus/nena/0.01/Barwar/The Battle With Yuwanəs the Armenian.nena]
   |     4.65s processing: [../../nena_corpus/nena/0.01/Barwar/The Bear and the Fox.nena]


force-closing types {'subsentence', 'sentence'} in The Battle With Yuwanəs the Armenian ln 4
force-closing types {'subsentence', 'sentence'} in The Battle With Yuwanəs the Armenian ln 5
force-closing types {'subsentence', 'sentence'} in The Battle With Yuwanəs the Armenian ln 20
force-closing types {'subsentence', 'sentence'} in The Battle With Yuwanəs the Armenian ln 21
force-closing types {'subsentence', 'sentence'} in The Battle With Yuwanəs the Armenian ln 22
force-closing types {'inton', 'sentence'} in The Battle With Yuwanəs the Armenian ln 25
force-closing types {'subsentence', 'sentence'} in The Battle With Yuwanəs the Armenian ln 26


   |     4.86s processing: [../../nena_corpus/nena/0.01/Barwar/The Brother of Giants.nena]
   |     5.01s processing: [../../nena_corpus/nena/0.01/Barwar/The Cat and the Mice.nena]
   |     5.04s processing: [../../nena_corpus/nena/0.01/Barwar/The Cooking Pot.nena]
   |     5.12s processing: [../../nena_corpus/nena/0.01/Barwar/The Crafty Hireling.nena]
   |     5.48s processing: [../../nena_corpus/nena/0.01/Barwar/The Crow and the Cheese.nena]
   |     5.49s processing: [../../nena_corpus/nena/0.01/Barwar/The Daughter of the King.nena]


force-closing types {'subsentence', 'sentence'} in The Crafty Hireling ln 43
force-closing types {'subsentence', 'sentence'} in The Crafty Hireling ln 54
force-closing types {'subsentence', 'sentence'} in The Crow and the Cheese ln 1
force-closing types {'subsentence', 'sentence'} in The Crow and the Cheese ln 2
force-closing types {'subsentence', 'sentence'} in The Crow and the Cheese ln 3
force-closing types {'subsentence', 'sentence'} in The Crow and the Cheese ln 5
force-closing types {'subsentence', 'sentence'} in The Crow and the Cheese ln 6


   |     5.85s processing: [../../nena_corpus/nena/0.01/Barwar/The Fox and the Lion.nena]
   |     5.88s processing: [../../nena_corpus/nena/0.01/Barwar/The Fox and the Miller.nena]
   |     6.10s processing: [../../nena_corpus/nena/0.01/Barwar/The Fox and the Stork.nena]
   |     6.12s processing: [../../nena_corpus/nena/0.01/Barwar/The Giant’s Cave.nena]
   |     6.18s processing: [../../nena_corpus/nena/0.01/Barwar/The Girl and the Seven Brothers.nena]


force-closing types {'sentence'} in The Girl and the Seven Brothers ln 2
force-closing types {'subsentence', 'sentence'} in The Girl and the Seven Brothers ln 3
force-closing types {'sentence'} in The Girl and the Seven Brothers ln 12


   |     6.55s processing: [../../nena_corpus/nena/0.01/Barwar/The King With Forty Sons.nena]
   |     7.28s processing: [../../nena_corpus/nena/0.01/Barwar/The Leliθa From č̭āl.nena]


force-closing types {'sentence'} in The King With Forty Sons ln 40


   |     7.35s processing: [../../nena_corpus/nena/0.01/Barwar/The Lion King.nena]
   |     7.38s processing: [../../nena_corpus/nena/0.01/Barwar/The Lion With a Swollen Leg.nena]
   |     7.48s processing: [../../nena_corpus/nena/0.01/Barwar/The Man Who Cried Wolf.nena]
   |     7.53s processing: [../../nena_corpus/nena/0.01/Barwar/The Man Who Wanted to Work.nena]
   |     7.81s processing: [../../nena_corpus/nena/0.01/Barwar/The Monk Who Wanted to Know When He Would Die.nena]
   |     7.92s processing: [../../nena_corpus/nena/0.01/Barwar/The Monk and the Angel.nena]
   |     8.28s processing: [../../nena_corpus/nena/0.01/Barwar/The Priest and the Mullah.nena]
   |     8.42s processing: [../../nena_corpus/nena/0.01/Barwar/The Sale of an Ox.nena]
   |     8.77s processing: [../../nena_corpus/nena/0.01/Barwar/The Scorpion and the Snake.nena]
   |     8.83s processing: [../../nena_corpus/nena/0.01/Barwar/The Selfish Neighbour.nena]
   |     8.87s processing: [../../nena_corpus/nena/0.01/

force-closing types {'subsentence', 'sentence'} in The Sale of an Ox ln 41
force-closing types {'subsentence', 'sentence'} in The Sisisambər Plant ln 2
force-closing types {'subsentence', 'sentence'} in The Sisisambər Plant ln 8
force-closing types {'subsentence', 'sentence'} in The Sisisambər Plant ln 9
force-closing types {'subsentence', 'sentence'} in The Sisisambər Plant ln 14
force-closing types {'subsentence', 'sentence'} in The Sisisambər Plant ln 15


   |     8.94s processing: [../../nena_corpus/nena/0.01/Barwar/The Story With No End.nena]
   |     8.98s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Farxo and Səttiya.nena]


force-closing types {'inton'} in The Tale of Farxo and Səttiya ln 29


   |     9.60s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Mămo and Zine.nena]


force-closing types {'subsentence', 'sentence'} in The Tale of Mămo and Zine ln 22


   |       11s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Mərza Pămət.nena]
   |       11s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Nasimo.nena]
   |       11s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Parizada, Warda and Nargis.nena]


force-closing types {'subsentence', 'sentence'} in The Tale of Mərza Pămət ln 32
force-closing types {'sentence'} in The Tale of Nasimo ln 3
force-closing types {'subsentence', 'sentence'} in The Tale of Nasimo ln 4
force-closing types {'subsentence', 'sentence'} in The Tale of Nasimo ln 5
force-closing types {'sentence'} in The Tale of Nasimo ln 6
force-closing types {'subsentence', 'sentence'} in The Tale of Nasimo ln 7
force-closing types {'inton'} in The Tale of Parizada, Warda and Nargis ln 29


   |       12s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Rustam (1).nena]


force-closing types {'subsentence', 'sentence'} in The Tale of Parizada, Warda and Nargis ln 55


   |       12s processing: [../../nena_corpus/nena/0.01/Barwar/The Tale of Rustam (2).nena]
   |       13s processing: [../../nena_corpus/nena/0.01/Barwar/The Wise Daughter of the King.nena]
   |       13s processing: [../../nena_corpus/nena/0.01/Barwar/The Wise Snake.nena]


force-closing types {'subsentence', 'sentence'} in The Tale of Rustam (2) ln 51


   |       13s processing: [../../nena_corpus/nena/0.01/Barwar/The Wise Young Man.nena]


force-closing types {'sentence'} in The Wise Snake ln 1


   |       13s processing: [../../nena_corpus/nena/0.01/Barwar/šošət Xere.nena]
   |       13s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Close Shave.nena]
   |       13s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Cure for a Husband’s Madness.nena]


force-closing types {'subsentence', 'sentence'} in The Wise Young Man ln 25
force-closing types {'subsentence', 'sentence'} in šošət Xere ln 6
force-closing types {'sentence'} in šošət Xere ln 7
force-closing types {'subsentence', 'inton', 'sentence'} in šošət Xere ln 8
force-closing types {'sentence'} in šošət Xere ln 10
force-closing types {'subsentence', 'sentence'} in šošət Xere ln 11
force-closing types {'inton'} in A Cure for a Husband’s Madness ln 1
force-closing types {'inton'} in A Cure for a Husband’s Madness ln 4
force-closing types {'inton'} in A Cure for a Husband’s Madness ln 5
force-closing types {'inton'} in A Cure for a Husband’s Madness ln 6


   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Donkey Knows Best.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Dragon in the Well.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Dutiful Son.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Frog Wants a Husband.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Lost Donkey.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Lost Ring.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Painting of the King of Iran.nena]
   |       15s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Pound of Flesh.nena]
   |       16s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Sweater to Pay Off a Debt.nena]
   |       16s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Thousand Dinars.nena]
   |       16s processing: [../../nena_corpus/nena/0.01/Urmi_C/A Visit From Harun Ar-Rashid.nena]
   |       16s

force-closing types {'inton'} in A Thousand Dinars ln 13


   |       17s processing: [../../nena_corpus/nena/0.01/Urmi_C/Am I Dead?.nena]
   |       17s processing: [../../nena_corpus/nena/0.01/Urmi_C/An Orphan Duckling.nena]
   |       17s processing: [../../nena_corpus/nena/0.01/Urmi_C/Axiqar.nena]


force-closing types {'subsentence', 'word', 'inton', 'sentence'} in Axiqar ln 28


   |       18s processing: [../../nena_corpus/nena/0.01/Urmi_C/Events in 1946 on the Urmi Plain.nena]
   |       18s processing: [../../nena_corpus/nena/0.01/Urmi_C/Games.nena]


force-closing types {'subsentence', 'sentence'} in Axiqar ln 89


   |       18s processing: [../../nena_corpus/nena/0.01/Urmi_C/Hunting.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/I Have Died.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Ice for Dinner.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Is There a Man With No Worries?.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Kindness to a Donkey.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Lost Money.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Mistaken Identity.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Much Ado About Nothing.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Nipuxta.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/No Bread Today.nena]
   |       19s processing: [../../nena_corpus/nena/0.01/Urmi_C/Problems Lighting a Fire.nena]
   |       19s processing: [../../nena_corpus/nena/0.

force-closing types {'inton'} in The Adventures of Ashur ln 27


   |       21s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Adventures of a Princess.nena]
   |       22s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Angel of Death.nena]
   |       22s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Assyrians of Armenia.nena]
   |       22s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Assyrians of Urmi.nena]


force-closing types {'subsentence', 'sentence'} in The Assyrians of Armenia ln 10


   |       22s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Bald Child and the Monsters.nena]
   |       23s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Bald Man and the King.nena]
   |       24s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Bird and the Fox.nena]
   |       24s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Cat’s Dinner.nena]
   |       24s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Cow and the Poor Girl.nena]
   |       24s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Dead Rise and Return.nena]
   |       24s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Fisherman and the Princess.nena]
   |       24s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Giant One-Eyed Demon.nena]
   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Little Prince and the Snake.nena]


force-closing types {'inton'} in The Fisherman and the Princess ln 2
force-closing types {'inton'} in The Fisherman and the Princess ln 3


   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Loan of a Cooking Pot.nena]
   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Man Who Wanted to Complain to God.nena]
   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Old Man and the Fish.nena]
   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Purchase of a Donkey.nena]
   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Snake’s Dilemma.nena]
   |       25s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Stupid Carpenter.nena]
   |       26s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Wife Who Learns How to Work (2).nena]


force-closing types {'sentence'} in The Snake’s Dilemma ln 13


   |       26s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Wife Who Learns How to Work.nena]
   |       26s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Wife’s Condition.nena]
   |       26s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Wise Brother.nena]


force-closing types {'inton'} in The Wife Who Learns How to Work ln 1
force-closing types {'inton'} in The Wife Who Learns How to Work ln 7


   |       27s processing: [../../nena_corpus/nena/0.01/Urmi_C/The Wise Young Daughter.nena]
   |       28s processing: [../../nena_corpus/nena/0.01/Urmi_C/Trickster.nena]
   |       28s processing: [../../nena_corpus/nena/0.01/Urmi_C/Two Birds Fall in Love.nena]
   |       28s processing: [../../nena_corpus/nena/0.01/Urmi_C/Two Wicked Daughters-In-Law.nena]
   |       28s processing: [../../nena_corpus/nena/0.01/Urmi_C/Village Life (2).nena]


force-closing types {'subsentence', 'inton', 'sentence'} in Two Wicked Daughters-In-Law ln 9


   |       28s processing: [../../nena_corpus/nena/0.01/Urmi_C/Village Life (3).nena]
   |       29s processing: [../../nena_corpus/nena/0.01/Urmi_C/Village Life (4).nena]
   |       29s processing: [../../nena_corpus/nena/0.01/Urmi_C/Village Life (5).nena]
   |       29s processing: [../../nena_corpus/nena/0.01/Urmi_C/Village Life (6).nena]


force-closing types {'inton'} in Village Life (5) ln 1
force-closing types {'inton'} in Village Life (6) ln 34


   |       30s processing: [../../nena_corpus/nena/0.01/Urmi_C/Village Life.nena]
   |       30s processing: [../../nena_corpus/nena/0.01/Urmi_C/Vineyards.nena]


force-closing types {'inton'} in Village Life ln 1
force-closing types {'inton'} in Village Life ln 5
force-closing types {'inton'} in Village Life ln 18
force-closing types {'inton'} in Village Life ln 20


   |       30s processing: [../../nena_corpus/nena/0.01/Urmi_C/Weddings and Festivals.nena]
   |       32s processing: [../../nena_corpus/nena/0.01/Urmi_C/Weddings.nena]
   |       32s processing: [../../nena_corpus/nena/0.01/Urmi_C/When Shall I Die?.nena]
   |       32s processing: [../../nena_corpus/nena/0.01/Urmi_C/Women Are Stronger Than Men.nena]
   |       32s processing: [../../nena_corpus/nena/0.01/Urmi_C/Women Do Things Best.nena]
   |       32s "edge" actions: 0
   |       32s "feature" actions: 756315
   |       32s "node" actions: 294155
   |       32s "resume" actions: 0
   |       32s "slot" actions: 539381
   |       32s "terminate" actions: 833720
   |          2 x "dialect" node 
   |      35985 x "inton" node 
   |     539381 x "letter" node  = slot type
   |       2544 x "line" node 
   |     120148 x "morpheme" node 
   |        351 x "paragraph" node 
   |      16708 x "sentence" node 
   |      24528 x "subsentence" node 
   |        126 x "text" node 
   |      9