<img align="right" src="tf-small.png"/>

![mql](emdros.png)

# TF from MQL

This notebook can read an
[MQL](https://emdros.org/mql.html)
dump of a version of the [BHSA](https://github.com/ETCBC/bhsa) Hebrew Text Database
and transform it in a Text-Fabric
[Text-Fabric](https://github.com/ETCBC/text-fabric)
resource.

## Discussion

The principled way of going about such a conversion is to import the MQL source into
an [Emdros](https://emdros.org) database, and use it to retrieve objects and features from there.

Because the syntax of an MQL file leaves some freedom, it is error prone to do a text-to-text conversion from
MQL to something else.

Yet this is what we do, the error-prone thing. We then avoid installing and configuring and managing Emdros, MySQL/sqLite3.
Aside the upfront work to get this going, the going after that would also be much slower.

So here you are, a smallish script to do an awful lot of work, mostly correct, if careful used.

# Caveat

This notebook makes use of a new feature of text-fabric, first present in 2.3.12.
Make sure to upgrade first.

```sudo -H pip3 install --upgrade text-fabric
```

In [1]:
import os,sys,re,collections
from shutil import rmtree
from tf.fabric import Fabric
from tf.helpers import setFromSpec
import utils
from blang import bookLangs, bookNames

# Pipeline
See [operation](https://github.com/ETCBC/pipeline/blob/master/README.md#operation) 
for how to run this script in the pipeline.

In [2]:
if 'SCRIPT' not in locals():
    SCRIPT = False
    FORCE = True
    CORE_NAME = 'bhsa'
    VERSION = 'c'

def stop(good=False):
    if SCRIPT: sys.exit(0 if good else 1)

# Setting up the context: source file and target directories

The conversion is executed in an environment of directories, so that sources, temp files and
results are in convenient places and do not have to be shifted around.

In [5]:
repoBase = os.path.expanduser('~/github/etcbc')
thisRepo = '{}/{}'.format(repoBase, CORE_NAME)

thisSource = '{}/source/{}'.format(thisRepo, VERSION)
mqlzFile = '{}/{}.mql.bz2'.format(thisSource, CORE_NAME)

thisTemp = '{}/_temp/{}'.format(thisRepo, VERSION)
thisTempSource = '{}/source'.format(thisTemp)
mqlFile = '{}/{}.mql'.format(thisTempSource, CORE_NAME)
thisTempTf = '{}/tf'.format(thisTemp)

thisTf = '{}/tf/{}'.format(thisRepo, VERSION)

# Test

Check whether this conversion is needed in the first place.
Only when run as a script.

In [7]:
if SCRIPT:
    testFile = '{}/.tf/otype.tfx'.format(thisTf)
    (good, work) = utils.mustRun(mqlzFile, '{}/.tf/otype.tfx'.format(thisTf), force=FORCE)
    if not good: stop(good=False)
    if not work: stop(good=True)

# TF Settings

We add some custom information here.

* the MQL object type that corresponds to the TF slot type, typically `word`;
* a piece of metadata that will go into every feature; the time will be added automatically
* suitable text formats for the `otext` feature of TF.

The oText feature is very sensitive to what is available in the source MQL.
It needs to be configured here.
We save the configs we need per source and version.
And we define a stripped down default version to start with.

In [8]:
slotType = 'word'

featureMetadata = dict(
    dataset='BHSA',
    datasetName='Biblia Hebraica Stuttgartensia Amstelodamensis',
    author='Eep Talstra Centre for Bible and Computer',
    encoders='Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)',
    website='https://shebanq.ancient-data.org',
    email='shebanq@ancient-data.org',
)

oText = {
    '': {
        '': '''
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
@fmt:text-orig-full={g_word_utf8}{g_suffix_utf8}
''',
    },
    '4': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_qere_utf8/g_word_utf8}{qtrailer_utf8/trailer_utf8}
@fmt:text-orig-full-ketiv={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word} 
@fmt:text-trans-full-ketiv={g_word} 
@fmt:text-trans-plain={g_cons} 
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
''',
    '4b': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_qere_utf8/g_word_utf8}{qtrailer_utf8/trailer_utf8}
@fmt:text-orig-full-ketiv={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word} 
@fmt:text-trans-full-ketiv={g_word} 
@fmt:text-trans-plain={g_cons} 
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
''',
    'c': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word}{trailer}
@fmt:text-trans-plain={g_cons}{trailer}
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
''',
    '2016': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word}{trailer}
@fmt:text-trans-plain={g_cons}{trailer}
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
''',
    '2017': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word}{trailer}
@fmt:text-trans-plain={g_cons}{trailer}
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
''',
}

The next function selects the proper otext material, falling back on a default if nothing 
appropriate has been specified in `oText`.

In [9]:
def getOtext():
    thisOtext = oText.get(VERSION, oText[''])
    otextInfo = dict(line[1:].split('=', 1) for line in thisOtext.strip('\n').split('\n'))

    if thisOtext is oText['']:
        utils.caption(0, 'WARNING: no otext feature info provided, using a meager default value') 
    else:
        utils.caption(0, 'INFO: otext feature information found')
    for x in sorted(otextInfo.items()):
        utils.caption(0, '\t{:<20} = "{}"'.format(*x))
    return otextInfo

# Overview

The program has several stages:
   
1. **prepare** the source (utils.bunzip if needed)
1. **parse MQL** and collect information in datastructures
1. **transform to TF** write the datastructures as TF features
1. **differences** (informational)
1. **deliver** the tf data at its destination directory
1. **compile** all tf features to binary format

Stages **parseMQL** and **transform to TF** communicate with the help of several global variables:

* data containers for the MQL kinds of data
  * enumerations
  * object types
  * tables

* data containers for the TF features to be generated,
  * node features
  * edge features.

In [10]:
objectTypes = dict()
tables = dict()

edgeF = dict()
nodeF = dict()

# Stage: Prepare

Check the source, utils.bunzip it if needed, empty the result directory.

In [16]:
def prepare():
    global thisoText

    if not os.path.exists(thisTempSource):
        os.makedirs(thisTempSource)

    utils.caption(0, 'bunzipping {} ...'.format(mqlzFile))
    utils.bunzip(mqlzFile, mqlFile)
    utils.caption(0, 'Done')

    if os.path.exists(thisTempTf): rmtree(thisTempTf)
    os.makedirs(thisTempTf)

    thisoText = getOtext()

Convert a monads specification (a comma separated sequence of numbers and number ranges)
into a set of integers.

# Stage: MQL parsing
Plough through the MQL file and grab all relevant information
and put it into the dedicated data structure.

In [12]:
uniscan = re.compile(r'(?:\\x..)+')

def makeuni(match):
    ''' Make proper unicode of a text that contains byte escape codes such as backslash xb6
    '''
    byts = eval('"' + match.group(0) + '"')
    return byts.encode('latin1').decode('utf-8')

def uni(line): return uniscan.sub(makeuni, line)
    
def parseMql():
    utils.caption(4, 'Parsing mql source ...')
    fh = open(mqlFile)

    curId = None
    curEnum = None
    curObjectType = None
    curTable = None
    curObject = None
    curValue = None
    curFeature = None

    STRING_TYPES = {'ascii', 'string'}

    enums = dict()

    chunkSize = 1000000
    inThisChunk = 0

    good = True

    for (ln, line) in enumerate(fh):
        inThisChunk += 1
        if inThisChunk == chunkSize:
            utils.caption(0, '\tline {:>9}'.format(ln + 1))
            inThisChunk = 0
        if line.startswith('CREATE OBJECTS WITH OBJECT TYPE') or line.startswith('WITH OBJECT TYPE'):
            comps = line.rstrip().rstrip(']').split('[', 1)
            curTable = comps[1]
            utils.caption(0, '\t\tobjects in {}'.format(curTable))
            curObject = None
            if not curTable in tables:
                tables[curTable] = dict()
        elif curEnum != None:
            if line.startswith('}'):
                curEnum = None
                continue
            comps = line.strip().rstrip(',').split('=', 1)
            comp = comps[0].strip()
            words = comp.split()
            if words[0] == 'DEFAULT':
                enums[curEnum]['default'] = uni(words[1])
                value = words[1]
            else:
                value = words[0]
            enums[curEnum]['values'].append(value)
        elif curObjectType != None:
            if line.startswith(']'):
                curObjectType = None
                continue
            if curObjectType == True:
                if line.startswith('['):
                    curObjectType = line.rstrip()[1:]
                    objectTypes[curObjectType] = dict()
                    utils.caption(0, '\t\totype {}'.format(curObjectType))
                    continue
            comps = line.strip().rstrip(';').split(':', 1)
            feature = comps[0].strip()
            fInfo = comps[1].strip()
            fCleanInfo = fInfo.replace('FROM SET', '')
            fInfoComps = fCleanInfo.split(' ', 1)
            fMQLType = fInfoComps[0]
            fDefault = fInfoComps[1].strip().split(' ', 1)[1] if len(fInfoComps) == 2 else None
            if fDefault != None and fMQLType in STRING_TYPES:
                fDefault = uni(fDefault[1:-1])
            default = enums.get(fMQLType, {}).get('default', fDefault)
            ftype = 'str' if fMQLType in enums else\
                    'int' if fMQLType == 'integer' else\
                    'str' if fMQLType in STRING_TYPES else\
                    'int' if fInfo == 'id_d' else\
                    'str'
            isEdge = fMQLType == 'id_d'
            if isEdge:
                edgeF.setdefault(curObjectType, set()).add(feature)
            else:
                nodeF.setdefault(curObjectType, set()).add(feature)

            objectTypes[curObjectType][feature] = (ftype, default)
            utils.caption(0, '\t\t\tfeature {} ({}) =def= {} : {}'.format(feature, ftype, default, 'edge' if isEdge else 'node'))
        elif curTable != None:
            if curObject != None:
                if line.startswith(']'):
                    objectType = objectTypes[curTable]
                    for (feature, (ftype, default)) in objectType.items():
                        if feature not in curObject['feats'] and default != None:
                            curObject['feats'][feature] = default
                    tables[curTable][curId] = curObject
                    curObject = None
                    continue
                elif line.startswith('['):
                    continue
                elif line.startswith('FROM MONADS'):
                    monads = line.split('=', 1)[1].replace('{', '').replace('}', '').replace(' ','').strip()
                    curObject['monads'] = setFromSpec(monads)
                elif line.startswith('WITH ID_D'):
                    comps = line.replace('[', '').rstrip().split('=', 1)
                    curId = int(comps[1])
                elif line.startswith('GO'):
                    continue
                elif line.strip() == '':
                    continue
                else:
                    if curValue != None:
                        toBeContinued = not line.rstrip().endswith('";')
                        if toBeContinued:
                            curValue += line
                        else:
                            curValue += line.rstrip().rstrip(';').rstrip('"')
                            curObject['feats'][curFeature] = uni(curValue)
                            curValue = None
                            curFeature = None
                        continue
                    if ':=' in line:
                        (featurePart, valuePart) = line.split('=', 1)
                        feature = featurePart[0:-1].strip()
                        isText = ':="' in line
                        toBeContinued = isText and not line.rstrip().endswith('";')
                        if toBeContinued:
                            # this happens if a feature value contains a new line
                            # we must continue scanning lines until we meet the ned of the value
                            curFeature = feature
                            curValue = valuePart.lstrip('"')
                        else:
                            value = valuePart.rstrip().rstrip(';').strip('"')
                            curObject['feats'][feature] = uni(value) if isText else value
                    else:
                        utils.caption(0, 'ERROR: line {}: unrecognized line -->{}<--'.format(ln, line))
                        good = False
                        break
            else:
                if line.startswith('CREATE OBJECT'):
                    curObject = dict(feats=dict(), monads=None)
                    curId = None
        else:
            if line.startswith('CREATE ENUMERATION'):
                words = line.split()
                curEnum = words[2]
                enums[curEnum] = dict(default=None, values=[])
                utils.caption(0, '\t\tenum {}'.format(curEnum))
            elif line.startswith('CREATE OBJECT TYPE'):
                curObjectType = True
    utils.caption(0, '{} lines parsed'.format(ln + 1))
    fh.close()
    for table in tables:
        utils.caption(0, '{} objects of type {}'.format(len(tables[table]), table))
    if not good:
        stop(good=False)

# Stage: TF generation
Transform the collected information in feature-like datastructures, and write it all
out to `.tf` files.

In [13]:
def tfFromData():
    utils.caption(4, 'Making TF data ...')
    
    NIL = {'nil', 'NIL', 'Nil'}

    tableOrder = [slotType]+[t for t in sorted(tables) if t != slotType]

    nodeFromIdd = dict()
    iddFromNode = dict()

    nodeFeatures = dict()
    edgeFeatures = dict()
    metaData = dict()

    # metadata that ends up in every feature
    metaData[''] = featureMetadata

    # the config feature otext
    metaData['otext'] = thisoText

    # multilingual book names
    for (langCode, (langEnglish, langName)) in bookLangs.items():
        metaData['book@{}'.format(langCode)] = {
            'valueType': 'str',
            'language': langName,
            'languageCode': langCode,
            'languageEnglish': langEnglish,
        }

    utils.caption(0, 'Monad - idd mapping ...')
    otype = dict()
    for idd in tables.get(slotType, {}):
        monad = list(tables[slotType][idd]['monads'])[0]
        nodeFromIdd[idd] = monad
        iddFromNode[monad] = idd
        otype[monad] = slotType

    maxSlot = max(nodeFromIdd.values()) if len(nodeFromIdd) else 0
    utils.caption(0, 'maxSlot={}'.format(maxSlot))

    utils.caption(0, 'Node mapping and otype ...')
    node = maxSlot
    for t in tableOrder[1:]:
        for idd in sorted(tables[t]):
            node += 1
            nodeFromIdd[idd] = node
            iddFromNode[node] = idd
            otype[node] = t

    nodeFeatures['otype'] = otype
    metaData['otype'] = dict(
        valueType='str',
    )

    utils.caption(0, 'oslots ...')
    oslots = dict()
    for t in tableOrder[1:]:
        for idd in tables.get(t, {}):
            node = nodeFromIdd[idd]
            monads = tables[t][idd]['monads']
            oslots[node] = monads
    edgeFeatures['oslots'] = oslots
    metaData['oslots'] = dict(
        valueType='str',
    )

    utils.caption(0, 'metadata ...')
    for t in nodeF:
        for f in nodeF[t]:
            ftype = objectTypes[t][f][0]
            metaData.setdefault(f, {})['valueType'] = ftype
    for t in edgeF:
        for f in edgeF[t]:
            metaData.setdefault(f, {})['valueType'] = 'str'

    utils.caption(4, 'features ...')
    chunkSize = 100000
    for t in tableOrder:
        utils.caption(0, '\tfeatures from {}s'.format(t))
        inThisChunk = 0
        for (i, idd) in enumerate(tables.get(t, {})):
            inThisChunk += 1
            if inThisChunk == chunkSize:
                utils.caption(0, '\t{:>9} {}s'.format(i + 1, t))
                inThisChunk = 0
            node = nodeFromIdd[idd]
            features = tables[t][idd]['feats']
            for (f, v) in features.items():
                isEdge = f in edgeF.get(t, set())
                if isEdge:
                    if v not in NIL:
                        edgeFeatures.setdefault(f, {}).setdefault(node, set()).add(nodeFromIdd[int(v)])
                else:
                    nodeFeatures.setdefault(f, {})[node] = v
        utils.caption(0, '\t{:>9} {}s'.format(i + 1, t))

    utils.caption(0, 'book names ...')
    nodeFeatures['book@la'] = nodeFeatures.get('book', {})
    bookNodes = sorted(nodeFeatures.get('book', {}))
    for (langCode, langBookNames) in bookNames.items():
        nodeFeatures['book@{}'.format(langCode)] = dict(zip(bookNodes, langBookNames))

    utils.caption(4, 'write data set to TF ...')

    TF = Fabric(locations=thisTempTf, silent=True)
    TF.save(nodeFeatures=nodeFeatures, edgeFeatures=edgeFeatures, metaData=metaData)


# Stage: Diffs

Check differences with previous versions.

The new dataset has been created in a temporary directory,
and has not yet been copied to its destination.

Here is your opportunity to compare the newly created features with the older features.
You expect some differences in some features.

We check the differences between the previous version of the features and what has been generated.
We list features that will be added and deleted and changed.
For each changed feature we show the first line where the new feature differs from the old one.
We ignore changes in the metadata, because the timestamp in the metadata will always change.

# Stage: Deliver 

Copy the new TF dataset from the temporary location where it has been created to its final destination.

# Stage: Compile TF

Just to see whether everything loads and the precomputing of extra information works out.
Moreover, if you want to work with these features, then the precomputing has already been done, and everything is quicker in subsequent runs.

We issue load statement to trigger the precomputing of extra data.
Note that all features specified text formats in the `otext` config feature,
will be loaded, as well as the features for sections.

At that point we have access to the full list of features.
We grab them and are going to load them all! 

In [22]:
def compileTfData():
    utils.caption(4, 'Load and compile standard TF features')
    TF = Fabric(locations=thisTf, modules=[''])
    api = TF.load('')

    utils.caption(4, 'Load and compile all other TF features')
    allFeatures = TF.explore(silent=False, show=True)
    loadableFeatures = allFeatures['nodes'] + allFeatures['edges']
    api = TF.load(loadableFeatures)
    T = api.T
    
    utils.caption(4, 'Basic test')
    utils.caption(4, 'First verse in all formats')
    for fmt in T.formats:
        utils.caption(0, '{}'.format(fmt), continuation=True)
        utils.caption(0, '\t{}'.format(T.text(range(1,12), fmt=fmt)), continuation=True)

# Run it!

In [17]:
prepare()

|      1m 10s bunzipping /Users/dirk/github/etcbc/bhsa/source/c/bhsa.mql.bz2 ...
|      1m 10s 	NOTE: Using existing unzipped file which is newer than bzipped one
|      1m 10s Done
|      1m 10s INFO: otext feature information found
|      1m 10s 	fmt:lex-orig-full    = "{g_lex_utf8} "
|      1m 10s 	fmt:lex-orig-plain   = "{lex_utf8} "
|      1m 10s 	fmt:lex-trans-full   = "{g_lex} "
|      1m 10s 	fmt:lex-trans-plain  = "{lex} "
|      1m 10s 	fmt:text-orig-full   = "{g_word_utf8}{trailer_utf8}"
|      1m 10s 	fmt:text-orig-plain  = "{g_cons_utf8}{trailer_utf8}"
|      1m 10s 	fmt:text-trans-full  = "{g_word}{trailer}"
|      1m 10s 	fmt:text-trans-plain = "{g_cons}{trailer}"
|      1m 10s 	sectionFeatures      = "book,chapter,verse"
|      1m 10s 	sectionTypes         = "book,chapter,verse"


In [18]:
parseMql()

..............................................................................................
.      1m 22s Parsing mql source ...                                                         .
..............................................................................................
|      1m 22s 		enum boolean_t
|      1m 22s 		enum phrase_determination_t
|      1m 22s 		enum language_t
|      1m 22s 		enum book_name_t
|      1m 22s 		enum lexical_set_t
|      1m 22s 		enum verbal_stem_t
|      1m 22s 		enum verbal_tense_t
|      1m 22s 		enum person_t
|      1m 22s 		enum number_t
|      1m 22s 		enum gender_t
|      1m 22s 		enum state_t
|      1m 22s 		enum part_of_speech_t
|      1m 22s 		enum phrase_type_t
|      1m 22s 		enum phrase_atom_relation_t
|      1m 22s 		enum phrase_relation_t
|      1m 22s 		enum phrase_atom_unit_distance_to_mother_t
|      1m 22s 		enum subphrase_relation_t
|      1m 22s 		enum subphrase_mother_object_type_t
|      1m 22s 		enum phrase_function_t
| 

|      3m 47s 		objects in sentence
|      3m 47s 		objects in sentence_atom
|      3m 48s 		objects in sentence_atom
|      3m 49s 	line  24000000
|      3m 49s 		objects in subphrase
|      3m 50s 		objects in subphrase
|      3m 52s 		objects in subphrase
|      3m 52s 		objects in phrase
|      3m 53s 	line  25000000
|      3m 54s 		objects in phrase
|      3m 56s 	line  26000000
|      3m 57s 		objects in phrase
|      3m 59s 		objects in phrase
|      4m 00s 	line  27000000
|      4m 01s 		objects in phrase
|      4m 03s 	line  28000000
|      4m 03s 		objects in phrase
|      4m 04s 		objects in chapter
|      4m 04s 		objects in book
|      4m 04s 		objects in clause
|      4m 07s 		objects in clause
|      4m 07s 	line  29000000
|      4m 09s 		objects in half_verse
|      4m 10s 		objects in verse
|      4m 11s 		objects in phrase_atom
|      4m 11s 	line  30000000
|      4m 13s 		objects in phrase_atom
|      4m 14s 	line  31000000
|      4m 15s 		objects in phrase_atom
|   

In [19]:
tfFromData()

..............................................................................................
.      9m 54s Making TF data ...                                                             .
..............................................................................................
|      9m 54s Monad - idd mapping ...
|      9m 54s maxSlot=426581
|      9m 54s Node mapping and otype ...
|      9m 55s oslots ...
|      9m 55s metadata ...
..............................................................................................
.      9m 55s features ...                                                                   .
..............................................................................................
|      9m 55s 	features from words
|      9m 59s 	   100000 words
|     10m 03s 	   200000 words
|     10m 07s 	   300000 words
|     10m 10s 	   400000 words
|     10m 11s 	   426581 words
|     10m 11s 	features from books
|     10m 11s 	       39 books
|     10m 11s 

   |     0.95s T ps                   to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.81s T qere                 to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.72s T qere_utf8            to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     1.38s T rela                 to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.86s T sp                   to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.77s T st                   to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.16s T tab                  to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.74s T trailer              to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.75s T trailer_utf8         to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.15s T txt                  to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     1.24s T typ                  to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.76s T uvf                  to /Users/dirk/github/etcbc/bhsa/_temp/c/tf
   |     0.75s T

In [20]:
utils.checkDiffs(thisTempTf, thisTf)

..............................................................................................
.     11m 25s Check differences with previous version                                        .
..............................................................................................
|     11m 25s 	2 features to add
|     11m 25s 		g_voc_lex
|     11m 25s 		g_voc_lex_utf8
|     11m 25s 	14 features to delete
|     11m 25s 		freq_lex
|     11m 25s 		freq_occ
|     11m 25s 		gloss
|     11m 25s 		instruction
|     11m 25s 		lex0
|     11m 25s 		nametype
|     11m 25s 		pargr
|     11m 25s 		qere_trailer
|     11m 25s 		qere_trailer_utf8
|     11m 25s 		rank_lex
|     11m 25s 		rank_occ
|     11m 25s 		root
|     11m 25s 		voc_lex
|     11m 25s 		voc_lex_utf8
|     11m 25s 	93 features in common
|     11m 25s book                      ... no changes
|     11m 25s book@am                   ... no changes
|     11m 25s book@ar                   ... no changes
|     11m 25s book@bn          

|     11m 45s rela                      ... no changes
|     11m 45s sp                        ... differences after the metadata
|     11m 46s 	line 426583 OLD -->1436895	prep<--
|     11m 46s 	line 426583 NEW --><empty><--
|     11m 46s 	line 426584 OLD -->subs<--
|     11m 46s 	line 426584 NEW --><empty><--
|     11m 46s 	line 426585 OLD -->verb<--
|     11m 46s 	line 426585 NEW --><empty><--
|     11m 46s 	line 426586 OLD -->subs<--
|     11m 46s 	line 426586 NEW --><empty><--

|     11m 46s st                        ... no changes
|     11m 46s tab                       ... no changes
|     11m 46s trailer                   ... no changes
|     11m 47s trailer_utf8              ... no changes
|     11m 47s txt                       ... no changes
|     11m 47s typ                       ... no changes
|     11m 48s uvf                       ... no changes
|     11m 48s vbe                       ... no changes
|     11m 49s vbs                       ... no changes
|     11m 49s vers

In [21]:
utils.deliverDataset(thisTempTf, thisTf)

..............................................................................................
.     12m 04s Deliver data set to /Users/dirk/github/etcbc/bhsa/tf/c                         .
..............................................................................................


In [23]:
compileTfData()

..............................................................................................
.     13m 20s Load and compile standard TF features                                          .
..............................................................................................
This is Text-Fabric 2.3.15
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data

95 features found and 0 ignored
  0.00s loading features ...
   |     0.79s T otype                from /Users/dirk/github/etcbc/bhsa/tf/c
   |       10s T oslots               from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.08s T book                 from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.05s T chapter              from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.05s T verse                from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.44s T g_cons        

   |     2.60s T rela                 from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.48s T sp                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.38s T st                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.18s T tab                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.29s T txt                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     2.53s T typ                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.57s T uvf                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.45s T vbe                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.69s T vbs                  from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.53s T vs                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     1.54s T vt                   from /Users/dirk/github/etcbc/bhsa/tf/c
   |     0.00s Feature overview: 90 for nodes; 4 for edges; 1 configs; 7 computed
 1m 18s All features loaded/computed - for details use loadL