<img align="right" src="tf-small.png"/>

# SBLGNT and Text-Fabric

The source of the SBLGNT data in TF is really a treebank, a hierarchical structure.
We converted an XML representation of it into TF.

As an exercise, we convert the TF back to a hierarchical structure, but not XML, we will produce
[yaml](http://www.yaml.org)

In [1]:
import os,collections,yaml
from tf.fabric import Fabric

In [2]:
TF = Fabric(modules='greek/sblgnt')

This is Text-Fabric 2.3.6
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
60 features found and 0 ignored


In [3]:
api = TF.load('''
    Cat Gender Tense
    Unicode UnicodeLemma Mood
    book book@en chapter verse
    otype function psp
    freq_occ freq_lex
    Head End
    nodeId
    child
              ''')

api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.02s B otype                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.01s B book                 from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.01s B chapter              from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.03s B verse                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.09s B Unicode              from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.08s B UnicodeLemma         from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.13s B Cat                  from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.04s B Gender               from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.04s B Tense                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.02s B Mood                 from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.01s B book@en              from /Users/d

In [4]:
F.Unicode.freqList()

(('καὶ', 8559),
 ('ὁ', 2800),
 ('ἐν', 2682),
 ('δὲ', 2609),
 ('τοῦ', 2500),
 ('εἰς', 1747),
 ('τὸ', 1656),
 ('τὸν', 1562),
 ('τὴν', 1514),
 ('τῆς', 1299),
 ('ὅτι', 1282),
 ('τῷ', 1224),
 ('τῶν', 1208),
 ('οἱ', 1079),
 ('ἡ', 953),
 ('γὰρ', 922),
 ('μὴ', 904),
 ('αὐτοῦ', 883),
 ('τῇ', 861),
 ('τὰ', 819),
 ('οὐκ', 765),
 ('τοὺς', 727),
 ('ἐκ', 669),
 ('πρὸς', 667),
 ('ἵνα', 661),
 ('ἐπὶ', 655),
 ('αὐτὸν', 651),
 ('οὐ', 628),
 ('τοῖς', 614),
 ('αὐτῷ', 512),
 ('διὰ', 482),
 ('ὡς', 480),
 ('οὖν', 470),
 ('ἀπὸ', 467),
 ('θεοῦ', 423),
 ('ἀλλὰ', 415),
 ('ἐστιν', 400),
 ('εἶπεν', 397),
 ('ὑμῶν', 378),
 ('Καὶ', 375),
 ('αὐτῶν', 374),
 ('ὑμῖν', 371),
 ('εἰ', 365),
 ('μου', 351),
 ('τὰς', 337),
 ('ἢ', 333),
 ('Ἰησοῦς', 324),
 ('κατὰ', 315),
 ('περὶ', 312),
 ('ὑμᾶς', 297),
 ('αὐτοῖς', 280),
 ('σου', 279),
 ('ἡμῶν', 275),
 ('ἦν', 275),
 ('λέγει', 269),
 ('μετὰ', 269),
 ('τοῦτο', 259),
 ('ἐὰν', 259),
 ('ἐγὼ', 239),
 ('θεὸς', 236),
 ('Ἰησοῦ', 230),
 ('τί', 228),
 ('ἐξ', 227),
 ('πάντα', 224),
 ('τις', 

In [27]:
outputDir = os.path.expanduser('~/Downloads/yamlFromTf')
if not os.path.exists(outputDir): os.makedirs(outputDir)
outFile = 'sblgnt.yaml'
outPath = '{}/{}'.format(outputDir, outFile)

In [116]:
def addBooks(dest):
    for b in F.otype.s('book'):
        chapters = []
        dest.append({Fs('book@en').v(b): chapters})
        addChapters(b, chapters)

def addChapters(b, dest):
    for c in L.d(b, 'chapter'):
        sentences = []
        dest.append({F.chapter.v(c): sentences})
        addSentences(c, sentences)

curNodeId = None
def addSentences(c, dest):
    global curNodeId
    for s in L.d(c, 'sentence'):
        children = []
        nodeId = F.nodeId.v(s)
        curNodeId = nodeId
        dest.append({nodeId: children})
        addNodes(s, children, 0)

maxK = 0

def addNodes(n, dest, k):
    global maxK
    if k > maxK:
        maxK = k
        print('{} at {}'.format(k, curNodeId))
    descendants = E.child.f(n)
    if descendants == None or len(descendants) == 0:
        dest.append(F.Unicode.v(n))
    else:
        for d in descendants:
            children = []
            if k > 20:
                dest.append({F.Cat.v(d): ' '.join(F.Unicode.v(w) for w in L.d(d, 'word'))})
            else:
                dest.append({F.Cat.v(d): children})
                addNodes(d, children, k+1)


In [117]:
nest = []
addBooks(nest)

1 at 400010010010083
2 at 400010010010083
3 at 400010010010083
4 at 400010010010083
5 at 400010010010083
6 at 400010010010083
7 at 400010010010083
8 at 400010010010083
9 at 400010010010083
10 at 400010120020741
11 at 400010120020741
12 at 400010120020741
13 at 400010120020741
14 at 400010120020741
15 at 400010120020741
16 at 400010120020741
17 at 400010200020501
18 at 400010220020361
19 at 400010220020361
20 at 400010220020361
21 at 400010220020361


In [119]:
indent(reset=True)
info('writing yaml')
with open(outPath, 'w', encoding='utf8') as of:
    yaml.dump(nest, of, allow_unicode=True, default_style=None)
    of.close()
info('Done')

  0.00s writing yaml
    54s Done


```
- Matthew:
  - 1:
    - '400010010010083':
      - CL:
        - P:
          - np:
            - np:
              - noun: [Βίβλος]
            - np:
              - np:
                - noun: [γενέσεως]
              - np:
                - np:
                  - np:
                    - np:
                      - noun: [Ἰησοῦ]
                    - np:
                      - noun: [χριστοῦ]
                  - np:
                    - np:
                      - noun: [υἱοῦ]
                    - np:
                      - noun: [Δαυὶδ]
                - np:
                  - np:
                    - noun: [υἱοῦ]
                  - np:
                    - noun: [Ἀβραάμ.]
```

There is a tree with nesting depth 161 in the corpus. The YAML serializer complained about it.
I have cut of levels at depth 20.

```
    - '420030230011651':
      - CL:
        - conj: [Καὶ]
        - CL:
          - S:
            - np:
              - pron: [αὐτὸς]
              - np:
                - noun: [Ἰησοῦς]
          - V:
            - vp:
              - verb: [ἦν]
              - vp:
                - verb: [ἀρχόμενος]
          - ADV:
            - np:
              - advp:
                - adv: [ὡσεὶ]
              - np:
                - np:
                  - noun: [ἐτῶν]
                - nump:
                  - num: ['τριάκοντα,']
          - ADV:
            - CL:
              - CL:
                - conj: [ὡς]
                - CL:
                  - V:
                    - vp:
                      - verb: ['ἐνομίζετο,']
              - CL:
                - VC:
                  - vp:
                    - verb: [ὢν]
                - P:
                  - np:
                    - np:
                      - noun: ['υἱός,']
                    - np:
                      - np:
                        - noun: [Ἰωσὴφ]
                      - np:
                        - np:
                          - det: [τοῦ]
                        - np:
                          - np:
                            - noun: [Ἠλὶ]
                          - np:
                            - np:
                              - det: [τοῦ]
                            - np:
                              - np:
                                - noun: [Μαθθὰτ]
                              - np:
                                - np:
                                  - det: [τοῦ]
                                - np:
                                  - np:
                                    - noun: [Λευὶ]
                                  - np:
                                    - np:
                                      - det: [τοῦ]
                                    - np:
                                      - np:
                                        - noun: [Μελχὶ]
                                      - np:
                                        - np:
                                          - det: [τοῦ]
                                        - np:
                                          - np:
                                            - noun: [Ἰανναὶ]
                                          - np:
                                            - np:
                                              - det: [τοῦ]
                                            - np:
                                              - np:
                                                - {noun: ''}
                                              - np:
                                                - {np: τοῦ}
                                                - {np: Ματταθίου τοῦ Ἀμὼς τοῦ Ναοὺμ
                                                    τοῦ Ἑσλὶ τοῦ Ναγγαὶ τοῦ Μάαθ τοῦ
                                                    Ματταθίου τοῦ Σεμεῒν τοῦ Ἰωσὴχ
                                                    τοῦ Ἰωδὰ τοῦ Ἰωανὰν τοῦ Ῥησὰ τοῦ
                                                    Ζοροβαβὲλ τοῦ Σαλαθιὴλ τοῦ Νηρὶ
                                                    τοῦ Μελχὶ τοῦ Ἀδδὶ τοῦ Κωσὰμ τοῦ
                                                    Ἐλμαδὰμ τοῦ Ἢρ τοῦ Ἰησοῦ τοῦ Ἐλιέζερ
                                                    τοῦ Ἰωρὶμ τοῦ Μαθθὰτ τοῦ Λευὶ
                                                    τοῦ Συμεὼν τοῦ Ἰούδα τοῦ Ἰωσὴφ
                                                    τοῦ Ἰωνὰμ τοῦ Ἐλιακὶμ τοῦ Μελεὰ
                                                    τοῦ Μεννὰ τοῦ Ματταθὰ τοῦ Ναθὰμ
                                                    τοῦ Δαυὶδ τοῦ Ἰεσσαὶ τοῦ Ἰωβὴλ
                                                    τοῦ Βόος τοῦ Σαλὰ τοῦ Ναασσὼν
                                                    τοῦ Ἀμιναδὰβ τοῦ Ἀδμὶν τοῦ Ἀρνὶ
                                                    τοῦ Ἑσρὼμ τοῦ Φαρὲς τοῦ Ἰούδα
                                                    τοῦ Ἰακὼβ τοῦ Ἰσαὰκ τοῦ Ἀβραὰμ
                                                    τοῦ Θάρα τοῦ Ναχὼρ τοῦ Σεροὺχ
                                                    τοῦ Ῥαγαὺ τοῦ Φάλεκ τοῦ Ἔβερ τοῦ
                                                    Σαλὰ τοῦ Καϊνὰμ τοῦ Ἀρφαξὰδ τοῦ
                                                    Σὴμ τοῦ Νῶε τοῦ Λάμεχ τοῦ Μαθουσαλὰ
                                                    τοῦ Ἑνὼχ τοῦ Ἰάρετ τοῦ Μαλελεὴλ
                                                    τοῦ Καϊνὰμ τοῦ Ἐνὼς τοῦ Σὴθ τοῦ
                                                    Ἀδὰμ τοῦ θεοῦ.}
```