<img align="right" src="tf-small.png"/>

# TF from MQL

This notebook can read an
[MQL](https://emdros.org/mql.html)
dump of a version of the [BHSA](https://github.com/ETCBC/bhsa) Hebrew Text Database
and transform it in a Text-Fabric
[Text-Fabric](https://github.com/ETCBC/text-fabric)
resource.

## Discussion

The principled way of going about such a conversion is to import the MQL source into
an [Emdros](https://emdros.org) database, and use it to retrieve objects and features from there.

Because the syntax of an MQL file leaves some freedom, it is error prone to do a text-to-text conversion from
MQL to something else.

Yet this is what we do, the error-prone thing. We then avoid installing and configuring and managing Emdros, MySQL/sqLite3.
Aside the upfront work to get this going, the going after that is also much slower.

So here you are, a smallish script to do an awful lot of work, mostly correct, if careful used.

# Caveat

This notebook makes use of a new feature of text-fabric, first present in 2.3.12.
Make sure to upgrade first.

```sudo -H pip3 install text-fabric```

In [1]:
import os,sys,re,collections
from glob import glob
from shutil import rmtree, copytree
from tf.fabric import Fabric
from utils import bunzip, startNow, tprint
from blang import bookLangs, bookNames

## Parameters

We pass the name of the data source, the version, and the name of a target TF module.

In [2]:
SOURCE_NAME = 'x_etcbc'
VERSION= '4b'
TF_MODULE ='core' 

# Setting up the context: source file and target directories

The conversion is executed in an environment of directories, so that sources, temp files and
results are in convenient places and do not have to be shifted around.

In [3]:
REPO_BASE = os.path.expanduser('~/github/bhsa')

SOURCE_BASE = '{}/source'.format(REPO_BASE)
TEMP_BASE = '{}/_temp'.format(REPO_BASE)
TARGET_BASE = '{}/tf'.format(REPO_BASE)

MQLZ_FILE = '{}/{}{}.mql.bz2'.format(SOURCE_BASE, SOURCE_NAME, VERSION)
MQL_FILE = '{}/{}{}.mql'.format(TEMP_BASE, SOURCE_NAME, VERSION)

TF_SAVE = '{}/{}/{}'.format(TEMP_BASE, VERSION, TF_MODULE)

TF_LOCATION = '{}/{}'.format(TARGET_BASE, VERSION)
TF_DELIVER = '{}/{}'.format(TF_LOCATION, TF_MODULE)

# TF Settings

We add some custom information here.

* the MQL object type that corresponds to the TF slot type, typically `word`;
* a piece of metadata that will go into every feature; the time will be added automatically
* suitable text formats for the `otext` feature of TF.

The OTEXT feature is very sensitive to what is available in the source MQL.
It needs to be configured here.
We save the configs we need per source and version.
And we define a stripped down default version to start with.

In [4]:
SLOT_TYPE = 'word'

FEATURE_METADATA = dict(
    dataset='BHSA',
    datasetName='Biblia Hebraica Stuttgartensia Amstelodamensis',
    author='Eep Talstra Centre for Bible and Computer',
    encoders='Constantijn Sikkel (QDF), Ulrik Petersen (MQL) and Dirk Roorda (TF)',
    website='https://shebanq.ancient-data.org',
    email='shebanq@ancient-data.org',
)

OTEXT = {
    '': {
        '': '''
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
@fmt:text-orig-full={g_word_utf8}{g_suffix_utf8}
        ''',
    },
    'x_etcbc': {
        '4': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_qere_utf8/g_word_utf8}{qtrailer_utf8/trailer_utf8}
@fmt:text-orig-full-ketiv={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word} 
@fmt:text-trans-full-ketiv={g_word} 
@fmt:text-trans-plain={g_cons} 
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
        ''',
        '4b': '''
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex} 
@fmt:text-orig-full={g_qere_utf8/g_word_utf8}{qtrailer_utf8/trailer_utf8}
@fmt:text-orig-full-ketiv={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={g_word} 
@fmt:text-trans-full-ketiv={g_word} 
@fmt:text-trans-plain={g_cons} 
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
        ''',
        '4c': '''
@config
@fmt:lex-orig-full={g_lex_utf8} 
@fmt:lex-orig-plain={lex_utf8} 
@fmt:lex-trans-full={g_lex} 
@fmt:lex-trans-plain={lex0} 
@fmt:text-orig-full={qere_utf8/g_word_utf8}{qere_trailer_utf8/trailer_utf8}
@fmt:text-orig-full-ketiv={g_word_utf8}{trailer_utf8}
@fmt:text-orig-plain={g_cons_utf8}{trailer_utf8}
@fmt:text-trans-full={qere/g_word}{qere_trailer/trailer}
@fmt:text-trans-full-ketiv={g_word}{trailer}
@fmt:text-trans-plain={g_cons}{trailer}
@sectionFeatures=book,chapter,verse
@sectionTypes=book,chapter,verse
        ''',
    },
}

The next function selects the proper otext material, falling back on a default if nothing 
appropriate has been specified in `OTEXT`.

In [5]:
def getOtext():
    thisOtext = OTEXT.get(SOURCE_NAME, {}).get(VERSION, OTEXT[''][''])
    otextInfo = dict(line[1:].split('=', 1) for line in thisOtext.strip().split('\n'))

    if thisOtext is OTEXT['']['']:
        print('WARNING: no otext feature info provided, using a meager default value') 
    else:
        print('INFO: otext feature information found')
    for x in sorted(otextInfo.items()):
        print('{:<20} = "{}"'.format(*x))
    return otextInfo

The program has two stages:
   
* parse the MQL and collect information in datastructures
* transform the data structures and write them as TF features

Both phases communicate with the help of several global variables:

* data containers for the MQL kinds of data
  * enumerations
  * object types
  * tables

* data containers for the TF features to be generated,
  * node features
  * edge features.

In [6]:
objectTypes = dict()
tables = dict()

edgeF = dict()
nodeF = dict()

Prepare the ground: check the source, bunzip it if needed, empty the result directory.

In [7]:
def prepare():
    global thisOTEXT

    presentMqlZ = os.path.exists(MQLZ_FILE)
    presentMql = os.path.exists(MQL_FILE)
    if not presentMqlZ and not presentMql:
        print('MQL source file does not exist: {} or {}'.format(MQLZ_FILE, MQL_FILE))
        sys.exit()
    if presentMql: print('using existing bunzipped {}'.format(MQL_FILE))
    else:
        startNow()
        tprint('bunzipping {} ...'.format(MQL_FILE))
        bunzip(MQLZ_FILE, MQL_FILE)
        tprint('Done')

    if os.path.exists(TF_SAVE):
        rmtree(TF_SAVE)
        os.makedirs(TF_SAVE)

    thisOTEXT = getOtext()

    print('Ready to compile TF dataset\n\t{}\nfrom MQL source\n\t{}'.format(TF_SAVE, MQL_FILE))

Deliver the new TF dataset from the temporary location where it has been created to its final destination.

In [8]:
def deliverDataset():
    if os.path.exists(TF_DELIVER):
        rmtree(TF_DELIVER)
        copytree(TF_SAVE, TF_DELIVER)

Convert a monads specification (a comma separated sequence of numbers and number ranges)
into a set of integers.

In [9]:
def setFromSpec(spec):
    covered = set()
    for r_str in spec.split(','):
        bounds = r_str.split('-')
        if len(bounds) == 1:
            covered.add(int(r_str))
        else:
            b = int(bounds[0])
            e = int(bounds[1])
            if (e < b): (b, e) = (e, b)
            for n in range(b, e+1): covered.add(n)
    return covered

# Stage 1: MQL parsing
Plough through the MQL file and grab all relevant information
and put it into the dedicated data structure.

In [10]:
def parseMql(fh):

    startNow()
    tprint('Parsing mql source ...')

    curId = None
    curEnum = None
    curObjectType = None
    curTable = None
    curObject = None
    curValue = None
    curFeature = None

    STRING_TYPES = {'ascii', 'string'}

    enums = dict()

    CHUNK_SIZE = 1000000
    inThisChunk = 0

    good = True

    for (ln, line) in enumerate(fh):
        inThisChunk += 1
        if inThisChunk == CHUNK_SIZE:
            tprint('\tline {:>9}'.format(ln + 1))
            inThisChunk = 0
        if line.startswith('CREATE OBJECTS WITH OBJECT TYPE') or line.startswith('WITH OBJECT TYPE'):
            comps = line.rstrip().rstrip(']').split('[', 1)
            curTable = comps[1]
            print('\t\tobjects in {}'.format(curTable))
            curObject = None
            if not curTable in tables:
                tables[curTable] = dict()
        elif curEnum != None:
            if line.startswith('}'):
                curEnum = None
                continue
            comps = line.strip().rstrip(',').split('=', 1)
            comp = comps[0].strip()
            words = comp.split()
            if words[0] == 'DEFAULT':
                enums[curEnum]['default'] = words[1]
                value = words[1]
            else:
                value = words[0]
            enums[curEnum]['values'].append(value)
        elif curObjectType != None:
            if line.startswith(']'):
                curObjectType = None
                continue
            if curObjectType == True:
                if line.startswith('['):
                    curObjectType = line.rstrip()[1:]
                    objectTypes[curObjectType] = dict()
                    print('\t\totype {}'.format(curObjectType))
                    continue
            comps = line.strip().rstrip(';').split(':', 1)
            feature = comps[0].strip()
            fInfo = comps[1].strip()
            fCleanInfo = fInfo.replace('FROM SET', '')
            fInfoComps = fCleanInfo.split(' ', 1)
            fMQLType = fInfoComps[0]
            fDefault = fInfoComps[1].strip().split(' ', 1)[1] if len(fInfoComps) == 1 else None
            if fDefault != None and fMQLType in STRING_TYPES:
                fDefault = fDefault[1:-1]
            default = enums.get(fMQLType, {}).get('default', fDefault)
            ftype = 'str' if fMQLType in enums else\
                    'int' if fMQLType == 'integer' else\
                    'str' if fMQLType in STRING_TYPES else\
                    'int' if fInfo == 'id_d' else\
                    'str'
            isEdge = fMQLType == 'id_d'
            if isEdge:
                edgeF.setdefault(curObjectType, set()).add(feature)
            else:
                nodeF.setdefault(curObjectType, set()).add(feature)

            objectTypes[curObjectType][feature] = (ftype, default)
            print('\t\t\tfeature {} ({}) = {} : {}'.format(feature, ftype, default, 'edge' if isEdge else 'node'))
        elif curTable != None:
            if curObject != None:
                if line.startswith(']'):
                    objectType = objectTypes[curTable]
                    for (feature, (ftype, default)) in objectType.items():
                        if feature not in curObject['feats'] and default != None:
                            curObject['feats'][feature] = default
                    tables[curTable][curId] = curObject
                    curObject = None
                    continue
                elif line.startswith('['):
                    continue
                elif line.startswith('FROM MONADS'):
                    monads = line.split('=', 1)[1].replace('{', '').replace('}', '').replace(' ','').strip()
                    curObject['monads'] = setFromSpec(monads)
                elif line.startswith('WITH ID_D'):
                    comps = line.replace('[', '').rstrip().split('=', 1)
                    curId = int(comps[1])
                elif line.startswith('GO'):
                    continue
                elif line.strip() == '':
                    continue
                else:
                    if curValue != None:
                        toBeContinued = not line.rstrip().endswith('";')
                        if toBeContinued:
                            curValue += line
                        else:
                            curValue += line.rstrip().rstrip(';').rstrip('"')
                            curObject['feats'][curFeature] = curValue
                            curValue = None
                            curFeature = None
                        continue
                    if ':=' in line:
                        (featurePart, valuePart) = line.split('=', 1)
                        feature = featurePart[0:-1].strip()
                        isText = ':="' in line
                        toBeContinued = isText and not line.rstrip().endswith('";')
                        if toBeContinued:
                            # this happens if a feature value contains a new line
                            # we must continue scanning lines until we meet the ned of the value
                            curFeature = feature
                            curValue = valuePart.lstrip('"')
                        else:
                            value = valuePart.rstrip().rstrip(';').strip('"')
                            curObject['feats'][feature] = value
                    else:
                        tprint('ERROR: line {}: unrecognized line -->{}<--'.format(ln, line))
                        good = False
                        break
            else:
                if line.startswith('CREATE OBJECT'):
                    curObject = dict(feats=dict(), monads=None)
                    curId = None
        else:
            if line.startswith('CREATE ENUMERATION'):
                words = line.split()
                curEnum = words[2]
                enums[curEnum] = dict(default=None, values=[])
                print('\t\tenum {}'.format(curEnum))
            elif line.startswith('CREATE OBJECT TYPE'):
                curObjectType = True
    tprint('{} lines parsed'.format(ln + 1))
    for table in tables:
        print('{} objects of type {}'.format(len(tables[table]), table))
    return good

# Stage 2: TF generation
Transform the collected information in feature-like datastructures, and write it all
out to `.tf` files.

In [11]:
def tfFromData():
    startNow()
    tprint('Making TF data ...')
    
    NIL = {'nil', 'NIL', 'Nil'}

    tableOrder = [SLOT_TYPE]+[t for t in sorted(tables) if t != SLOT_TYPE]

    nodeFromIdd = dict()
    iddFromNode = dict()

    nodeFeatures = dict()
    edgeFeatures = dict()
    metaData = dict()

    # metadata that ends up in every feature
    metaData[''] = FEATURE_METADATA

    # the config feature otext
    metaData['otext'] = thisOTEXT

    # multilingual book names
    for (langCode, (langEnglish, langName)) in bookLangs.items():
        metaData['book@{}'.format(langCode)] = {
            'valueType': 'str',
            'language': langName,
            'languageCode': langCode,
            'languageEnglish': langEnglish,
        }

    tprint('Monad - idd mapping ...')
    otype = dict()
    for idd in tables.get(SLOT_TYPE, {}):
        monad = list(tables[SLOT_TYPE][idd]['monads'])[0]
        nodeFromIdd[idd] = monad
        iddFromNode[monad] = idd
        otype[monad] = SLOT_TYPE

    maxSlot = max(nodeFromIdd.values()) if len(nodeFromIdd) else 0
    tprint('maxSlot={}'.format(maxSlot))

    tprint('Node mapping and otype ...')
    node = maxSlot
    for t in tableOrder[1:]:
        for idd in sorted(tables[t]):
            node += 1
            nodeFromIdd[idd] = node
            iddFromNode[node] = idd
            otype[node] = t

    nodeFeatures['otype'] = otype
    metaData['otype'] = dict(
        valueType='str',
    )

    tprint('oslots ...')
    oslots = dict()
    for t in tableOrder[1:]:
        for idd in tables.get(t, {}):
            node = nodeFromIdd[idd]
            monads = tables[t][idd]['monads']
            oslots[node] = monads
    edgeFeatures['oslots'] = oslots
    metaData['oslots'] = dict(
        valueType='str',
    )

    tprint('metadata ...')
    for t in nodeF:
        for f in nodeF[t]:
            ftype = objectTypes[t][f][0]
            metaData.setdefault(f, {})['valueType'] = ftype
    for t in edgeF:
        for f in edgeF[t]:
            metaData.setdefault(f, {})['valueType'] = 'str'

    tprint('features ...')
    for t in tableOrder:
        tprint('\tfeatures from {}s'.format(t))
        for idd in tables.get(t, {}):
            node = nodeFromIdd[idd]
            features = tables[t][idd]['feats']
            for (f, v) in features.items():
                isEdge = f in edgeF.get(t, set())
                if isEdge:
                    if v not in NIL:
                        edgeFeatures.setdefault(f, {}).setdefault(node, set()).add(nodeFromIdd[int(v)])
                else:
                    nodeFeatures.setdefault(f, {})[node] = v


    tprint('book names ...')
    nodeFeatures['book@la'] = nodeFeatures.get('book', {})
    bookNodes = sorted(nodeFeatures.get('book', {}))
    for (langCode, langBookNames) in bookNames.items():
        nodeFeatures['book@{}'.format(langCode)] = dict(zip(bookNodes, langBookNames))

    tprint('write data set to TF ...')

    TF = Fabric(locations=TF_SAVE)
    TF.save(nodeFeatures=nodeFeatures, edgeFeatures=edgeFeatures, metaData=metaData)

# Run it!

In [12]:
prepare()

using existing bunzipped /Users/dirk/github/bhsa/_temp/x_etcbc4b.mql
INFO: otext feature information found
fmt:lex-orig-full    = "{g_lex_utf8} "
fmt:lex-orig-plain   = "{lex_utf8} "
fmt:lex-trans-full   = "{g_lex} "
fmt:lex-trans-plain  = "{lex} "
fmt:text-orig-full   = "{g_qere_utf8/g_word_utf8}{qtrailer_utf8/trailer_utf8}"
fmt:text-orig-full-ketiv = "{g_word_utf8}{trailer_utf8}"
fmt:text-orig-plain  = "{g_cons_utf8}{trailer_utf8}"
fmt:text-trans-full  = "{g_word} "
fmt:text-trans-full-ketiv = "{g_word} "
fmt:text-trans-plain = "{g_cons} "
sectionFeatures      = "book,chapter,verse"
sectionTypes         = "book,chapter,verse"
Ready to compile TF dataset
	/Users/dirk/github/bhsa/_temp/4b/core
from MQL source
	/Users/dirk/github/bhsa/_temp/x_etcbc4b.mql


In [13]:
with open(MQL_FILE) as fh: good = parseMql(fh)

      0.00s Parsing mql source ...
		enum boolean_t
		enum phrase_determination_t
		enum language_t
		enum book_name_t
		enum lexical_set_t
		enum verbal_stem_t
		enum verbal_tense_t
		enum person_t
		enum number_t
		enum gender_t
		enum state_t
		enum part_of_speech_t
		enum phrase_type_t
		enum phrase_atom_relation_t
		enum phrase_relation_t
		enum phrase_atom_unit_distance_to_mother_t
		enum subphrase_relation_t
		enum subphrase_mother_object_type_t
		enum phrase_function_t
		enum clause_atom_type_t
		enum clause_type_t
		enum clause_kind_t
		enum clause_constituent_relation_t
		enum clause_constituent_mother_object_type_t
		enum clause_constituent_unit_distance_to_mother_t
		otype word
			feature trailer_utf8 (str) = None : node
			feature number (int) = None : node
			feature g_vbe (str) = None : node
			feature g_word (str) = None : node
			feature g_word_utf8 (str) = None : node
			feature g_cons_utf8 (str) = None : node
			feature g_cons (str) = None : node
			feature g_pfm (st

In [14]:
if good: tfFromData()

      0.00s Making TF data ...
      0.00s Monad - idd mapping ...
      0.48s maxSlot=426568
      0.48s Node mapping and otype ...
      1.09s oslots ...
      1.46s metadata ...
      1.46s features ...
      1.46s 	features from words
        18s 	features from books
        18s 	features from chapters
        18s 	features from clauses
        19s 	features from clause_atoms
        20s 	features from half_verses
        20s 	features from phrases
        22s 	features from phrase_atoms
        26s 	features from sentences
        26s 	features from sentence_atoms
        26s 	features from subphrases
        27s 	features from verses
        27s book names ...
        27s write data set to TF ...
This is Text-Fabric 2.3.12
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data

  0.00s Grid feature "otype" not found in

  0.00s Grid feature "oslots" not found in



  0.01s Grid feature "otext" not found. Working without Text-API

  0.00s Exporting 91 node and 4 edge and 1 config features to /Users/dirk/github/bhsa/_temp/4b/core:
   |     0.05s T book                 to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@am              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@ar              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@bn              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@da              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@de              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@el              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@en              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@es              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@fa              to /Users/dirk/github/bhsa/_temp/4b/core
   |     0.00s T book@fr              to /U

# Before continuing

The new dataset has been created in a temporary directory, and not yet copied to its destination.
Here is your opportunity to compare the newly created features with the older features.

We check the differences between the previous version of the features and what has been generated.

In [15]:
existingFiles = glob('{}/*.tf'.format(TF_DELIVER))
newFiles = glob('{}/*.tf'.format(TF_SAVE))
existingFeatures = {os.path.basename(os.path.splitext(f)[0]) for f in existingFiles}
newFeatures = {os.path.basename(os.path.splitext(f)[0]) for f in newFiles}

addedOnes = newFeatures - existingFeatures
deletedOnes = existingFeatures - newFeatures
commonOnes = newFeatures & existingFeatures

if addedOnes:
    print('{} features to add:\n\t{}'.format(len(addedOnes), ' '.join(sorted(addedOnes))))
else:
    print('no features to add')
if deletedOnes:
    print('{} features to delete:\n\t{}'.format(len(deletedOnes), ' '.join(sorted(deletedOnes))))
else:
    print('no features to delete')
    
print('{} features in common'.format(len(commonOnes)))

no features to add
no features to delete
96 features in common


Let's check the common ones

In [16]:
def diffFeature(f):
    sys.stdout.write('{:<25} ... '.format(f))
    existingPath = '{}/{}.tf'.format(TF_DELIVER, f)
    newPath = '{}/{}.tf'.format(TF_SAVE, f)
    with open(existingPath) as h: eLines = (d for d in h.readlines() if not d.startswith('@'))
    with open(newPath) as h: nLines = (d for d in h.readlines() if not d.startswith('@'))
    i = 0
    equal = True
    for (e, n) in zip(eLines, nLines):
        i += 1
        if e != n:
            print('First diff in line {} after the  metadata'.format(i))
            equal = False
            continue
    print('no changes' if equal else '')

In [17]:
for f in sorted(commonOnes):
    diffFeature(f)

book                      ... no changes
book@am                   ... no changes
book@ar                   ... no changes
book@bn                   ... no changes
book@da                   ... no changes
book@de                   ... no changes
book@el                   ... no changes
book@en                   ... no changes
book@es                   ... no changes
book@fa                   ... no changes
book@fr                   ... no changes
book@he                   ... no changes
book@hi                   ... no changes
book@id                   ... no changes
book@ja                   ... no changes
book@ko                   ... no changes
book@la                   ... no changes
book@nl                   ... no changes
book@pa                   ... no changes
book@pt                   ... no changes
book@ru                   ... no changes
book@sw                   ... no changes
book@syc                  ... no changes
book@tr                   ... no changes
book@ur         

If all is well, the next cell will deliver the results.

In [18]:
# good = True

In [19]:
if good: deliverDataset()

# Stage 3: Load all new TF features

Just to see whether everything loads and the precomputing of extra information works out.
Moreover, if you want to work with these features, then the precomputing has already been done, and everything is quicker in subsequent runs.

In [20]:
TF = Fabric(locations=TF_LOCATION, modules=TF_MODULE)

This is Text-Fabric 2.3.12
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
96 features found and 0 ignored


Let's load a single feature to trigger the precomputing of extra data.
Note that all features specified text formats in the `otext` config feature, will also be loaded,
as well as the features for sections.

In [21]:
api = TF.load('sp')

  0.00s loading features ...
   |     2.89s T otype                from /Users/dirk/github/bhsa/tf/4b/core
   |       13s T oslots               from /Users/dirk/github/bhsa/tf/4b/core
   |     0.12s T book                 from /Users/dirk/github/bhsa/tf/4b/core
   |     0.06s T chapter              from /Users/dirk/github/bhsa/tf/4b/core
   |     0.06s T verse                from /Users/dirk/github/bhsa/tf/4b/core
   |     1.79s T g_cons               from /Users/dirk/github/bhsa/tf/4b/core
   |     1.68s T g_cons_utf8          from /Users/dirk/github/bhsa/tf/4b/core
   |     1.55s T g_lex                from /Users/dirk/github/bhsa/tf/4b/core
   |     1.67s T g_lex_utf8           from /Users/dirk/github/bhsa/tf/4b/core
   |     0.69s T g_qere_utf8          from /Users/dirk/github/bhsa/tf/4b/core
   |     1.57s T g_word               from /Users/dirk/github/bhsa/tf/4b/core
   |     1.70s T g_word_utf8          from /Users/dirk/github/bhsa/tf/4b/core
   |     1.47s T lex               

At this point we have access to the full list of features.
We grab them and are going to load them all!

The next cell loads the data of some central features and the metadata of all features. 

In [22]:
allFeatures = TF.explore(silent=False, show=True)

   |     0.00s Feature overview: 91 for nodes; 4 for edges; 1 configs; 7 computed


Now we are going to load the remaining features.

In [23]:
loadableFeatures = allFeatures['nodes'] + allFeatures['edges']
print(' '.join(loadableFeatures))

book book@am book@ar book@bn book@da book@de book@el book@en book@es book@fa book@fr book@he book@hi book@id book@ja book@ko book@la book@nl book@pa book@pt book@ru book@sw book@syc book@tr book@ur book@yo book@zh chapter code det dist dist_unit domain function g_cons g_cons_utf8 g_entry g_entry_heb g_lex g_lex_utf8 g_nme g_nme_utf8 g_pfm g_pfm_utf8 g_prs g_prs_utf8 g_qere_utf8 g_uvf g_uvf_utf8 g_vbe g_vbe_utf8 g_vbs g_vbs_utf8 g_word g_word_utf8 gloss gn is_root kind label language lex lex_utf8 ls mother_object_type nametype nme nu number otype pargr pdp pfm phono phono_sep prs ps qtrailer_utf8 rela sp st tab trailer_utf8 txt typ uvf vbe vbs verse vs vt distributional_parent functional_parent mother oslots


In [24]:
api = TF.load(loadableFeatures)

  0.00s loading features ...
   |     0.18s T code                 from /Users/dirk/github/bhsa/tf/4b/core
   |     1.79s T det                  from /Users/dirk/github/bhsa/tf/4b/core
   |     1.36s T dist                 from /Users/dirk/github/bhsa/tf/4b/core
   |     2.09s T dist_unit            from /Users/dirk/github/bhsa/tf/4b/core
   |     4.13s T distributional_parent from /Users/dirk/github/bhsa/tf/4b/core
   |     0.30s T domain               from /Users/dirk/github/bhsa/tf/4b/core
   |     0.92s T function             from /Users/dirk/github/bhsa/tf/4b/core
   |     7.15s T functional_parent    from /Users/dirk/github/bhsa/tf/4b/core
   |     1.58s T g_entry              from /Users/dirk/github/bhsa/tf/4b/core
   |     1.75s T g_entry_heb          from /Users/dirk/github/bhsa/tf/4b/core
   |     0.99s T g_nme                from /Users/dirk/github/bhsa/tf/4b/core
   |     1.07s T g_nme_utf8           from /Users/dirk/github/bhsa/tf/4b/core
   |     0.78s T g_pfm            