<img align="right" src="tf-small.png"/>

# SBLGNT and Text-Fabric

The source of the SBLGNT data in TF is really a treebank, a hierarchical structure.
We converted an XML representation of it into TF.

As an exercise, we convert the TF back to a hierarchical structure, but not XML, we will produce
[yaml](http://www.yaml.org)

In [17]:
import os,collections,yaml
from tf.fabric import Fabric

In [2]:
TF = Fabric(modules='greek/sblgnt')

This is Text-Fabric 2.3.5
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
60 features found and 0 ignored


In [11]:
api = TF.load('''
    Cat Gender Tense
    Unicode UnicodeLemma Mood
    book book@en chapter verse
    otype function psp
    freq_occ freq_lex
    Head End
    nodeId
    child
              ''')

api.makeAvailableIn(globals())

  0.00s loading features ...
   |     2.32s T nodeId               from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     1.66s T child                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.00s Feature overview: 57 for nodes; 2 for edges; 1 configs; 7 computed
  4.80s All features loaded/computed - for details use loadLog()


In [27]:
outputDir = os.path.expanduser('~/Downloads/yamlFromTf')
if not os.path.exists(outputDir): os.makedirs(outputDir)
outFile = 'sblgnt.yaml'
outPath = '{}/{}'.format(outputDir, outFile)

In [116]:
def addBooks(dest):
    for b in F.otype.s('book'):
        chapters = []
        dest.append({Fs('book@en').v(b): chapters})
        addChapters(b, chapters)

def addChapters(b, dest):
    for c in L.d(b, 'chapter'):
        sentences = []
        dest.append({F.chapter.v(c): sentences})
        addSentences(c, sentences)

curNodeId = None
def addSentences(c, dest):
    global curNodeId
    for s in L.d(c, 'sentence'):
        children = []
        nodeId = F.nodeId.v(s)
        curNodeId = nodeId
        dest.append({nodeId: children})
        addNodes(s, children, 0)

maxK = 0

def addNodes(n, dest, k):
    global maxK
    if k > maxK:
        maxK = k
        print('{} at {}'.format(k, curNodeId))
    descendants = E.child.f(n)
    if descendants == None or len(descendants) == 0:
        dest.append(F.Unicode.v(n))
    else:
        for d in descendants:
            children = []
            if k > 20:
                dest.append({F.Cat.v(d): ' '.join(F.Unicode.v(w) for w in L.d(d, 'word'))})
            else:
                dest.append({F.Cat.v(d): children})
                addNodes(d, children, k+1)


In [117]:
nest = []
addBooks(nest)

1 at 400010010010083
2 at 400010010010083
3 at 400010010010083
4 at 400010010010083
5 at 400010010010083
6 at 400010010010083
7 at 400010010010083
8 at 400010010010083
9 at 400010010010083
10 at 400010120020741
11 at 400010120020741
12 at 400010120020741
13 at 400010120020741
14 at 400010120020741
15 at 400010120020741
16 at 400010120020741
17 at 400010200020501
18 at 400010220020361
19 at 400010220020361
20 at 400010220020361
21 at 400010220020361


In [119]:
indent(reset=True)
info('writing yaml')
with open(outPath, 'w', encoding='utf8') as of:
    yaml.dump(nest, of, allow_unicode=True, default_style=None)
    of.close()
info('Done')

  0.00s writing yaml
    54s Done


There is a tree with nesting depth 161 in the corpus. The YAML serializer complained about it.
I have cut of levels at depth 20.