<img align="right" src="images/tf.png" width="128"/>
<img align="right" src="images/logo.png"/>

# Tutorial

This notebook gets you started with using
[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the Tischendorf New Testament.

Familiarity with the underlying
[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)
is recommended.

## Installing Text-Fabric

### Python

You need to have Python on your system. Most systems have it out of the box,
but alas, that is python2 and we need at least python **3.6**.

Install it from [python.org](https://www.python.org) or from
[Anaconda](https://www.anaconda.com/download).

### TF itself

```
pip3 install text-fabric
```

### Jupyter notebook

You need [Jupyter](http://jupyter.org).

If it is not already installed:

```
pip3 install jupyter
```

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import collections

In [3]:
from tf.app import use

## Tischendorf data

Text-Fabric will fetch a standard set of features for you from the newest github release binaries.

The data will be stored in the `text-fabric-data` in your home directory.

# Load Features
The data of the corpus is organized in features.
They are *columns* of data.
Think of the text as a gigantic spreadsheet, where row 1 corresponds to the
first word, row 2 to the second word, and so on, for all 100,000+ words.

The letters of each word is a column `form` in that spreadsheet.

The corpus contains ca. 30 columns, not only for the words, but also for
textual objects, such as *books*, *chapters*, and *verses*.

Instead of putting that information in one big table, the data is organized in separate columns.
We call those columns **features**.

For the very last version, use `hot`.

For the latest release, use `latest`.

If you have cloned the repos (TF app and data), use `clone`.

If you do not want/need to upgrade, leave out the checkout specifiers.

In [8]:
A = use("annotation/tischendorf_tf:hot", hoist=globals())

rate limit is 5000 requests per hour, with 4821 left for this hour
	connecting to online GitHub repo annotation/tischendorf_tf ... connected
	app/config.yaml...downloaded
	app/static...directory
		app/static/logo.png...downloaded
	OK


The requested data is not available offline
	~/text-fabric-data/annotation/tischendorf_tf/tf not found
rate limit is 5000 requests per hour, with 4812 left for this hour
	connecting to online GitHub repo annotation/tischendorf_tf ... connected
	no releases
	no releases
	tf/2.8/anlex_lem.tf...downloaded
	tf/2.8/book.tf...downloaded
	tf/2.8/book_code.tf...downloaded
	tf/2.8/chapter.tf...downloaded
	tf/2.8/freq_lex.tf...downloaded
	tf/2.8/gloss.tf...downloaded
	tf/2.8/ketiv.tf...downloaded
	tf/2.8/morph.tf...downloaded
	tf/2.8/oslots.tf...downloaded
	tf/2.8/otext.tf...downloaded
	tf/2.8/otype.tf...downloaded
	tf/2.8/para.tf...downloaded
	tf/2.8/qere.tf...downloaded
	tf/2.8/str_lem.tf...downloaded
	tf/2.8/strongs.tf...downloaded
	tf/2.8/verse.tf...downloaded
	tf/2.8/vrsnum.tf...downloaded
	OK


This is Text-Fabric 9.2.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

17 features found and 0 ignored
   |     0.08s T otype                from ~/text-fabric-data/annotation/tischendorf_tf/tf/2.8
   |     0.61s T oslots               from ~/text-fabric-data/annotation/tischendorf_tf/tf/2.8
   |     0.00s T book                 from ~/text-fabric-data/annotation/tischendorf_tf/tf/2.8
   |     0.00s T chapter              from ~/text-fabric-data/annotation/tischendorf_tf/tf/2.8
   |     0.04s T verse                from ~/text-fabric-data/annotation/tischendorf_tf/tf/2.8
   |     0.80s T qere                 from ~/text-fabric-data/annotation/tischendorf_tf/tf/2.8
   |      |     0.01s C __levels__           from otype, oslots, otext
   |      |     0.99s C __order__            from otype, oslots, __levels__
   |      |     0.05s C __rank__             from otype, __order__
   |      |     3.10s C __levUp__            from otype, oslots, __rank__
   |   

## API

At this point it is helpful to throw a quick glance at the text-fabric API documentation
(see the links under **API Members** above).

The most essential thing for now is that we can use `F` to access the data in the features
we've loaded.
But there is more, such as `N`, which helps us to walk over the text, as we see in a minute.

# Counting

In order to get acquainted with the data, we start with the simple task of counting.

## Count all nodes
We use the
[`N.walk()` generator](https://annotation.github.io/text-fabric/tf/core/nodes.html#tf.core.nodes.Nodes.walk)
to walk through the nodes.

We compared corpus to a gigantic spreadsheet, where the rows correspond to the words.
In Text-Fabric, we call the rows `slots`, because they are the textual positions that can be filled with words.

We also mentioned that there are also more textual objects.
They are the verses, chapters and books.
They also correspond to rows in the big spreadsheet.

In Text-Fabric we call all these rows *nodes*, and the `N.walk()` generator
carries us through those nodes in the textual order.

Just one extra thing: the `info` statements generate timed messages.
If you use them instead of `print` you'll get a sense of the amount of time that
the various processing steps typically need.

In [9]:
A.indent(reset=True)
A.info("Counting nodes ...")

i = 0
for n in N.walk():
    i += 1

A.info("{} nodes".format(i))

  0.00s Counting nodes ...
  0.02s 152077 nodes


## What are those nodes?
Every node has a type, like word, or phrase, sentence.
We know that we have approximately 150,000 words and a few other nodes.
But what exactly are they?

Text-Fabric has two special features, `otype` and `oslots`, that must occur in every Text-Fabric data set.
`otype` tells you for each node its type, and you can ask for the number of `slot`s in the text.

Here we go!

In [10]:
F.otype.slotType

'word'

In [11]:
F.otype.maxSlot

137711

In [12]:
F.otype.maxNode

152077

In [13]:
F.otype.all

('book', 'chapter', 'paragraph', 'lex', 'verse', 'word')

In [14]:
C.levels.data

(('book', 5100.407407407408, 137712, 137738),
 ('chapter', 525.6145038167939, 137739, 138000),
 ('paragraph', 188.12978142076503, 138001, 138732),
 ('lex', 25.53040415276233, 146684, 152077),
 ('verse', 17.319959753490128, 138733, 146683),
 ('word', 1, 1, 137711))

This is interesting: above you see all the textual objects, with the average size of their objects,
the node where they start, and the node where they end.

## Count individual object types
This is an intuitive way to count the number of nodes in each type.
Note in passing, how we use the `indent` in conjunction with `info` to produce neat timed
and indented progress messages.

In [15]:
A.indent(reset=True)
A.info("counting objects ...")

for otype in F.otype.all:
    i = 0
    A.indent(level=1, reset=True)

    for n in F.otype.s(otype):
        i += 1

    A.info("{:>7} {}s".format(i, otype))

A.indent(level=0)
A.info("Done")

  0.00s counting objects ...
   |     0.00s      27 books
   |     0.00s     262 chapters
   |     0.00s     732 paragraphs
   |     0.00s    5394 lexs
   |     0.00s    7951 verses
   |     0.01s  137711 words
  0.02s Done


# Viewing textual objects

We use the A API (the extra power) to peek into the corpus.

Let's inspect some words.

In [16]:
wordShow = (1000, 10000, 100000)
for word in wordShow:
    A.pretty(word, withNodes=True)

# Feature statistics

`F`
gives access to all features.
Every feature has a method
`freqList()`
to generate a frequency list of its values, higher frequencies first.
Here are the morphological tags (the top 20, at least):

In [17]:
F.morph.freqList()[0:20]

(('CONJ', 16302),
 ('PREP', 10531),
 ('ADV', 3763),
 ('N-NSM', 3471),
 ('N-GSM', 2932),
 ('T-NSM', 2902),
 ('N-ASF', 2876),
 ('PRT-N', 2670),
 ('N-ASM', 2456),
 ('V-PAI-3S', 2271),
 ('N-GSF', 2187),
 ('T-GSM', 1904),
 ('N-NSF', 1601),
 ('T-ASM', 1574),
 ('T-ASF', 1526),
 ('N-DSF', 1460),
 ('P-GSM', 1358),
 ('T-GSF', 1292),
 ('V-2AAI-3S', 1254),
 ('N-DSM', 1235))

The number of verbs, assuming that any word with a morph tag starting with `V-` is a verb.

In [18]:
nVerbs = 0

for (tag, n) in F.morph.freqList():
    if tag.startswith("V-"):
        nVerbs += n

nVerbs

28372

The feature `anlex_lem` contains lexeme information.

In [19]:
F.anlex_lem.freqList()[0:20]

(('ὁ', 19788),
 ('καί', 8978),
 ('αὐτός', 5566),
 ('σύ', 2903),
 ('δέ', 2782),
 ('ἐν', 2744),
 ('ἐγώ', 2583),
 ('εἰμί', 2462),
 ('λέγω', 2259),
 ('εἰς', 1770),
 ('οὐ', 1627),
 ('ὅς', 1405),
 ('οὗτος', 1381),
 ('θεός', 1313),
 ('ὅτι', 1300),
 ('πᾶς', 1227),
 ('γάρ', 1037),
 ('μή', 1034),
 ('ἐκ', 916),
 ('Ἰησοῦς', 909))

# Lexeme matters

## Top 10 frequent verbs

If we count the frequency of words, we usually mean the frequency of their
corresponding lexemes.

There are several methods for working with lexemes.

### Method 1: counting words

In [20]:
verbs = collections.Counter()
A.indent(reset=True)
A.info("Collecting data")

verbStart = "V-"

for w in F.otype.s("word"):
    if not F.morph.v(w).startswith(verbStart):
        continue
    verbs[F.anlex_lem.v(w)] += 1

A.info("Done")
print(
    "".join(
        "{}: {}\n".format(verb, cnt)
        for (verb, cnt) in sorted(verbs.items(), key=lambda x: (-x[1], x[0]))[0:10]
    )
)

  0.00s Collecting data
  0.08s Done
εἰμί: 2461
λέγω: 2258
ἔχω: 707
ὁράω: 683
γίνομαι: 667
ἔρχομαι: 627
ποιέω: 569
ἀκούω: 427
δίδωμι: 414
οἶδα: 317



## Lexeme distribution

Let's do a bit more fancy lexeme stuff.

### Hapaxes

A hapax can be found by inspecting lexemes and see to how many word nodes they are linked.
If that is number is one, we have a hapax.

We print 10 hapaxes with their gloss.

In [21]:
A.indent(reset=True)

hapax = []
lexIndex = collections.defaultdict(list)

for n in F.otype.s("word"):
    lexIndex[F.anlex_lem.v(n)].append(n)

hapax = dict((lex, occs) for (lex, occs) in lexIndex.items() if len(occs) == 1)

A.info("{} hapaxes found".format(len(hapax)))

for h in sorted(hapax)[0:10]:
    print(f"\t{h}")

  0.07s 1920 hapaxes found
	Αἰνών
	Αὐγοῦστος
	Βάαλ
	Βαλάκ
	Βαράκ
	Βαραχίας
	Βαριησοῦς
	Βαριωνᾶ
	Βαρτιμαῖος
	Βελιάρ


If we want more info on the hapaxes, we get that by means of its *node*.
The lexIndex dictionary stores the occurrences of a lexeme as a list of nodes.

Let's get the part of speech and the syriac form of those 10 hapaxes.

In [22]:
for h in sorted(hapax)[0:10]:
    node = hapax[h][0]
    print(f"\t{F.anlex_lem.v(node):<20} {F.morph.v(node):<12} {F.gloss.v(node)}")

	Αἰνών                N-PRI        Aenon
	Αὐγοῦστος            N-GSM        Augustus
	Βάαλ                 N-PRI        Baal
	Βαλάκ                N-PRI        Balak
	Βαράκ                N-PRI        Barak
	Βαραχίας             N-GSM        Berekiah
	Βαριησοῦς            N-GSM        Bar-Jesus
	Βαριωνᾶ              N-PRI        son of Jonah
	Βαρτιμαῖος           N-NSM        Bartimaeus
	Βελιάρ               N-PRI        needle


### Small occurrence base

The occurrence base of a lexeme are the verses, chapters and books in which occurs.
Let's look for lexemes that occur in a single chapter.

Oh yes, we have already found the hapaxes, we will skip them here.

In [23]:
A.indent(reset=True)
A.info("Finding single chapter lexemes")

lexChapterIndex = {}

for (lex, occs) in lexIndex.items():
    lexChapterIndex[lex] = set(L.u(n, otype="chapter")[0] for n in occs)

singleCh = [
    (lex, occs)
    for (lex, occs) in lexIndex.items()
    if len(lexChapterIndex.get(lex, [])) == 1
]

A.info("{} single chapter lexemes found".format(len(singleCh)))

for (lex, occs) in sorted(singleCh[0:10]):
    print(lex, occs)
    print(
        "{:<20} {:<6} ({}x)".format(
            "{} {}:{}".format(*T.sectionFromNode(occs[0])),
            lex,
            len(occs),
        )
    )

  0.00s Finding single chapter lexemes
  0.44s 2124 single chapter lexemes found
Ζάρα [34]
Matthew 1:3          Ζάρα   (1x)
Θαμάρ [37]
Matthew 1:3          Θαμάρ  (1x)
Οὐρίας [99]
Matthew 1:6          Οὐρίας (1x)
Σαλμών [62, 63]
Matthew 1:4          Σαλμών (2x)
Ἀράμ [47, 48]
Matthew 1:3          Ἀράμ   (2x)
Ἀσάφ [114, 115]
Matthew 1:7          Ἀσάφ   (2x)
Ἰωσαφάτ [119, 120]
Matthew 1:8          Ἰωσαφάτ (2x)
Ῥαχάβ [70]
Matthew 1:5          Ῥαχάβ  (1x)
Ῥοβοάμ [104, 105]
Matthew 1:7          Ῥοβοάμ (2x)
Ῥούθ [78]
Matthew 1:5          Ῥούθ   (1x)


### Confined to books

As a final exercise with lexemes, lets make a list of all books, and show their total number of lexemes and
the number of lexemes that occur exclusively in that book.

In [24]:
A.indent(reset=True)
A.info("Making book-lexeme index")

allBook = collections.defaultdict(set)
allLex = set()

for b in F.otype.s("book"):
    for w in L.d(b, "word"):
        ln = F.anlex_lem.v(w)
        allBook[b].add(ln)
        allLex.add(ln)

A.info("Found {} lexemes".format(len(allLex)))

  0.00s Making book-lexeme index
  0.10s Found 5394 lexemes


In [25]:
A.indent(reset=True)
A.info("Finding single book lexemes")

lexBookIndex = {}

for (lex, occs) in lexIndex.items():
    lexBookIndex[lex] = set(L.u(n, otype="book")[0] for n in occs)

singleBookLex = collections.defaultdict(set)
for (lex, books) in lexBookIndex.items():
    if len(books) == 1:
        singleBookLex[list(books)[0]].add(lex)

singleBook = {book: len(lexs) for (book, lexs) in singleBookLex.items()}

A.info("found {} single book lexemes".format(sum(singleBook.values())))

  0.00s Finding single book lexemes
  0.46s found 2404 single book lexemes


In [26]:
print(
    "{:<20}{:>5}{:>5}{:>5}\n{}".format(
        "book",
        "#all",
        "#own",
        "%own",
        "-" * 35,
    )
)
booklist = []

for b in F.otype.s("book"):
    book = T.bookName(b)
    a = len(allBook[b])
    o = singleBook.get(b, 0)
    p = 100 * o / a
    booklist.append((book, a, o, p))

for x in sorted(booklist, key=lambda e: (-e[3], -e[1], e[0])):
    print("{:<20} {:>4} {:>4} {:>4.1f}%".format(*x))

book                 #all #own %own
-----------------------------------
Acts                 2017  555 27.5%
Luke                 2039  339 16.6%
Hebrews              1026  155 15.1%
2_Peter               398   57 14.3%
Revelation            911  126 13.8%
1_Timothy             537   74 13.8%
2_Timothy             453   62 13.7%
Romans               1058  132 12.5%
2_Corinthians         786   94 12.0%
James                 554   63 11.4%
1_Peter               542   60 11.1%
1_Corinthians         954  103 10.8%
Titus                 298   31 10.4%
John                 1018  104 10.2%
Philippians           443   42  9.5%
Matthew              1664  149  9.0%
Colossians            430   38  8.8%
Ephesians             527   41  7.8%
Jude                  226   15  6.6%
Galatians             519   34  6.6%
Mark                 1330   84  6.3%
1_Thessalonians       362   19  5.2%
Philemon              141    7  5.0%
2_Thessalonians       251   10  4.0%
3_John                108    4  3.7%
2_J

# Layer API
We travel upwards and downwards, forwards and backwards through the nodes.
The Layer-API (`L`) provides functions: `u()` for going up, and `d()` for going down,
`n()` for going to next nodes and `p()` for going to previous nodes.

These directions are indirect notions: nodes are just numbers, but by means of the
`oslots` feature they are linked to slots. One node *contains* an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.
And one if next or previous to an other, if its slots follow of precede the slots of the other one.

`L.u(node)` **Up** is going to nodes that embed `node`.

`L.d(node)` **Down** is the opposite direction, to those that are contained in `node`.

`L.n(node)` **Next** are the next *adjacent* nodes, i.e. nodes whose first slot comes immediately after the last slot of `node`.

`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.

All these functions yield nodes of all possible otypes.
By passing an optional parameter, you can restrict the results to nodes of that type.

The result are ordered according to the order of things in the text.

The functions return always a tuple, even if there is just one node in the result.

## Going up
We go from the first word to the book it contains.
Note the `[0]` at the end. You expect one book, yet `L` returns a tuple.
To get the only element of that tuple, you need to do that `[0]`.

If you are like me, you keep forgetting it, and that will lead to weird error messages later on.

In [27]:
firstBook = L.u(1, otype="book")[0]
print(firstBook)

137712


And let's see all the containing objects of word 3:

In [28]:
w = 3
for otype in F.otype.all:
    if otype == F.otype.slotType:
        continue
    up = L.u(w, otype=otype)
    upNode = "x" if len(up) == 0 else up[0]
    print("word {} is contained in {} {}".format(w, otype, upNode))

word 3 is contained in book 137712
word 3 is contained in chapter 137739
word 3 is contained in paragraph 138001
word 3 is contained in lex 146686
word 3 is contained in verse 138733


## Going next
Let's go to the next nodes of the first book.

In [29]:
afterFirstBook = L.n(firstBook)
for n in afterFirstBook:
    print(
        "{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
            n,
            F.otype.v(n),
            E.oslots.s(n)[0],
            E.oslots.s(n)[-1],
        )
    )
secondBook = L.n(firstBook, otype="book")[0]

  18259: word          first slot=18259 , last slot=18259 
 139800: verse         first slot=18259 , last slot=18263 
 138090: paragraph     first slot=18259 , last slot=18439 
 137767: chapter       first slot=18259 , last slot=18954 
 137713: book          first slot=18259 , last slot=29495 


## Going previous

And let's see what is right before the second book.

In [30]:
for n in L.p(secondBook):
    print(
        "{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
            n,
            F.otype.v(n),
            E.oslots.s(n)[0],
            E.oslots.s(n)[-1],
        )
    )

 137712: book          first slot=1     , last slot=18258 
 137766: chapter       first slot=17933 , last slot=18258 
 138089: paragraph     first slot=18180 , last slot=18258 
 139799: verse         first slot=18238 , last slot=18258 
  18258: word          first slot=18258 , last slot=18258 


## Going down

We go to the chapters of the second book, and just count them.

In [31]:
chapters = L.d(secondBook, otype="chapter")
print(len(chapters))

16


## The first verse
We pick the first verse and the first word, and explore what is above and below them.

In [32]:
for n in [1, L.u(1, otype="verse")[0]]:
    A.indent(level=0)
    A.info("Node {}".format(n), tm=False)
    A.indent(level=1)
    A.info("UP", tm=False)
    A.indent(level=2)
    A.info("\n".join(["{:<15} {}".format(u, F.otype.v(u)) for u in L.u(n)]), tm=False)
    A.indent(level=1)
    A.info("DOWN", tm=False)
    A.indent(level=2)
    A.info("\n".join(["{:<15} {}".format(u, F.otype.v(u)) for u in L.d(n)]), tm=False)
A.indent(level=0)
A.info("Done", tm=False)

Node 1
   |   UP
   |      |   146684          lex
   |      |   138733          verse
   |      |   138001          paragraph
   |      |   137739          chapter
   |      |   137712          book
   |   DOWN
   |      |   
Node 138733
   |   UP
   |      |   138001          paragraph
   |      |   137739          chapter
   |      |   137712          book
   |   DOWN
   |      |   1               word
   |      |   2               word
   |      |   3               word
   |      |   4               word
   |      |   5               word
   |      |   6               word
   |      |   7               word
   |      |   8               word
Done


# Text API

So far, we have mainly seen nodes and their numbers, and the names of node types.
You would almost forget that we are dealing with text.
So let's try to see some text.

In the same way as `F` gives access to feature data,
`T` gives access to the text.
That is also feature data, but you can tell Text-Fabric which features are specifically
carrying the text, and in return Text-Fabric offers you
a Text API: `T`.

## Formats
Syriac text can be represented in a number of ways:

* in transliteration, or in Syriac characters,
* showing the actual text or only the lexemes,

If you wonder where the information about text formats is stored:
not in the program text-fabric, but in the data set.
It has a feature `otext`, which specifies the formats and which features
must be used to produce them. `otext` is the third special feature in a TF data set,
next to `otype` and `oslots`.
It is an optional feature.
If it is absent, there will be no `T` API.

Here is a list of all available formats in this data set.

In [33]:
sorted(T.formats)

['text-orig-full']

## Using the formats

We can pretty display in other formats:

In [34]:
for word in wordShow:
    A.pretty(word, fmt="text-orig-ketiv")

Undefined format "text-orig-ketiv"


Undefined format "text-orig-ketiv"


Undefined format "text-orig-ketiv"


Now let's use those formats to print out the first verse of the Hebrew Bible.

In [35]:
for fmt in sorted(T.formats):
    print("{}:\n\t{}".format(fmt, T.text(range(1, 12), fmt=fmt)))

text-orig-full:
	Βίβλος γενέσεως Ἰησοῦ Χριστοῦ υἱοῦ Δαυεὶδ υἱοῦ Ἀβραάμ. Ἀβραὰμ ἐγέννησεν τὸν 


If we do not specify a format, the **default** format is used (`text-orig-full`).

In [36]:
print(T.text(range(1, 12)))

Βίβλος γενέσεως Ἰησοῦ Χριστοῦ υἱοῦ Δαυεὶδ υἱοῦ Ἀβραάμ. Ἀβραὰμ ἐγέννησεν τὸν 


## Whole text in all formats in less than a second
Part of the pleasure of working with computers is that they can crunch massive amounts of data.

In [37]:
A.indent(reset=True)
A.info("writing plain text of whole New Testament in all formats")

text = collections.defaultdict(list)

for v in F.otype.s("verse"):
    words = L.d(v, "word")
    for fmt in sorted(T.formats):
        text[fmt].append(T.text(words, fmt=fmt))

A.info("done {} formats".format(len(text)))

for fmt in sorted(text):
    print("{}\n{}\n".format(fmt, "\n".join(text[fmt][0:5])))

  0.00s writing plain text of whole New Testament in all formats
  0.27s done 1 formats
text-orig-full
Βίβλος γενέσεως Ἰησοῦ Χριστοῦ υἱοῦ Δαυεὶδ υἱοῦ Ἀβραάμ. 
Ἀβραὰμ ἐγέννησεν τὸν Ἰσαάκ, Ἰσαὰκ δὲ ἐγέννησεν τὸν Ἰακώβ, Ἰακὼβ δὲ ἐγέννησεν τὸν Ἰούδαν καὶ τοὺς ἀδελφοὺς αὐτοῦ, 
Ἰούδας δὲ ἐγέννησεν τὸν Φάρες καὶ τὸν Ζάρα ἐκ τῆς Θαμάρ, Φάρες δὲ ἐγέννησεν τὸν Ἑσρώμ, Ἑσρὼμ δὲ ἐγέννησεν τὸν Ἀράμ, 
Ἀρὰμ δὲ ἐγέννησεν τὸν Ἀμιναδάβ, Ἀμιναδὰβ δὲ ἐγέννησεν τὸν Ναασσών, Ναασσὼν δὲ ἐγέννησεν τὸν Σαλμών, 
Σαλμὼν δὲ ἐγέννησεν τὸν Βόες ἐκ τῆς Ῥαχάβ, Βόες δὲ ἐγέννησεν τὸν Ἰωβὴδ ἐκ τῆς Ῥούθ, Ἰωβὴδ δὲ ἐγέννησεν τὸν Ἰεσσαί, 



### The full plain text
We write a few formats to file, in your `Downloads` folder.

In [38]:
orig = "text-orig-full"
trans = "text-trans-full"
for fmt in (orig, trans):
    with open(os.path.expanduser(f"~/Downloads/Tischendorf-{fmt}.txt"), "w") as f:
        f.write("\n".join(text[fmt]))

In [39]:
!head -n 20 ~/Downloads/Tischendorf-{orig}.txt

Βίβλος γενέσεως Ἰησοῦ Χριστοῦ υἱοῦ Δαυεὶδ υἱοῦ Ἀβραάμ. 
Ἀβραὰμ ἐγέννησεν τὸν Ἰσαάκ, Ἰσαὰκ δὲ ἐγέννησεν τὸν Ἰακώβ, Ἰακὼβ δὲ ἐγέννησεν τὸν Ἰούδαν καὶ τοὺς ἀδελφοὺς αὐτοῦ, 
Ἰούδας δὲ ἐγέννησεν τὸν Φάρες καὶ τὸν Ζάρα ἐκ τῆς Θαμάρ, Φάρες δὲ ἐγέννησεν τὸν Ἑσρώμ, Ἑσρὼμ δὲ ἐγέννησεν τὸν Ἀράμ, 
Ἀρὰμ δὲ ἐγέννησεν τὸν Ἀμιναδάβ, Ἀμιναδὰβ δὲ ἐγέννησεν τὸν Ναασσών, Ναασσὼν δὲ ἐγέννησεν τὸν Σαλμών, 
Σαλμὼν δὲ ἐγέννησεν τὸν Βόες ἐκ τῆς Ῥαχάβ, Βόες δὲ ἐγέννησεν τὸν Ἰωβὴδ ἐκ τῆς Ῥούθ, Ἰωβὴδ δὲ ἐγέννησεν τὸν Ἰεσσαί, 
Ἰεσσαὶ δὲ ἐγέννησεν τὸν Δαυεὶδ τὸν βασιλέα. Δαυεὶδ δὲ ἐγέννησεν τὸν Σολομῶνα ἐκ τῆς τοῦ Οὐρίου, 
Σολομὼν δὲ ἐγέννησεν τὸν Ῥοβοάμ, Ῥοβοὰμ δὲ ἐγέννησεν τὸν Ἀβιά, Ἀβιὰ δὲ ἐγέννησεν τὸν Ἀσάφ, 
Ἀσὰφ δὲ ἐγέννησεν τὸν Ἰωσαφάτ, Ἰωσαφὰτ δὲ ἐγέννησεν τὸν Ἰωράμ, Ἰωρὰμ δὲ ἐγέννησεν τὸν Ὀζείαν, 
Ὀζείας δὲ ἐγέννησεν τὸν Ἰωαθάμ, Ἰωαθὰμ δὲ ἐγέννησεν τὸν Ἀχάζ, Ἀχὰζ δὲ ἐγέννησεν τὸν Ἑζεκίαν, 
Ἑζεκίας δὲ ἐγέννησεν τὸν Μανασσῆ, Μανασσῆς δὲ ἐγέννησεν τὸν Ἀμώς, Ἀμὼς δὲ ἐγέννησεν τὸν Ἰωσείαν, 
Ἰωσείας δὲ ἐγέννησεν

## Book names

For Bible book names, we can use several languages.
Well, in this case we have just English.

### Languages
Here are the languages that we can use for book names.
These languages come from the features `book@ll`, where `ll` is a two letter
ISO language code. Have a look in your data directory, you can't miss them.

In [40]:
T.languages

{'': {'language': 'default', 'languageEnglish': 'default'}}

## Sections

A section is a book, a chapter or a verse.
Knowledge of sections is not baked into Text-Fabric.
The config feature `otext.tf` may specify three section levels, and tell
what the corresponding node types and features are.

From that knowledge it can construct mappings from nodes to sections, e.g. from verse
nodes to tuples of the form:

    (bookName, chapterNumber, verseNumber)

Here are examples of getting the section that corresponds to a node and vice versa.

**NB:** `sectionFromNode` always delivers a verse specification, either from the
first slot belonging to that node, or, if `lastSlot`, from the last slot
belonging to that node.

In [41]:
for x in (
    ("section of first word", T.sectionFromNode(1)),
    ("node of Matthew 1:1", T.nodeFromSection(("Matthew", 1, 1))),
    ("node of book Matthew", T.nodeFromSection(("Matthew",))),
    ("node of chapter Matthew 1", T.nodeFromSection(("Matthew", 1))),
    ("section of book node", T.sectionFromNode(109641)),
    ("idem, now last word", T.sectionFromNode(109641, lastSlot=True)),
    ("section of chapter node", T.sectionFromNode(109668)),
    ("idem, now last word", T.sectionFromNode(109668, lastSlot=True)),
):
    print("{:<30} {}".format(*x))

section of first word          ('Matthew', 1, 1)
node of Matthew 1:1            138733
node of book Matthew           137712
node of chapter Matthew 1      137739
section of book node           ('1_Thessalonians', 2, 11)
idem, now last word            ('1_Thessalonians', 2, 11)
section of chapter node        ('1_Thessalonians', 2, 13)
idem, now last word            ('1_Thessalonians', 2, 13)


# Next steps

By now you have an impression how to compute around in the text.
While this is still the beginning, I hope you already sense the power of unlimited programmatic access
to all the bits and bytes in the data set.

Here are a few directions for unleashing that power.

## Search
Text-Fabric contains a flexible search engine, that does not only work for this data,
but also for data that you add to it.
There is a tutorial dedicated to [search](search.ipynb).

## Add your own data
If you study the additional data, you can observe how that data is created and also
how it is turned into a text-fabric data module.
The last step is incredibly easy. You can write out every Python dictionary where the keys are numbers
and the values string or numbers as a Text-Fabric feature.
When you are creating data, you have already constructed those dictionaries, so writing
them out is just one method call.

You can then easily share your new features on GitHub, so that your colleagues everywhere
can try it out for themselves.

## Export to Emdros MQL

[EMDROS](http://emdros.org), written by Ulrik Petersen,
is a text database system with the powerful *topographic* query language MQL.
The ideas are based on a model devised by Christ-Jan Doedens in
[Text Databases: One Database Model and Several Retrieval Languages](https://books.google.nl/books?id=9ggOBRz1dO4C).

Text-Fabric's model of slots, nodes and edges is a fairly straightforward translation of the models of Christ-Jan Doedens and Ulrik Petersen.

[SHEBANQ](https://shebanq.ancient-data.org) uses EMDROS to offer users to execute and save MQL queries against the Hebrew Text Database of the ETCBC.

So it is kind of logical and convenient to be able to work with a Text-Fabric resource through MQL.

If you have obtained an MQL dataset somehow, you can turn it into a text-fabric data set by `importMQL()`,
which we will not show here.

And if you want to export a Text-Fabric data set to MQL, that is also possible.

After the `Fabric(modules=...)` call, you can call `exportMQL()` in order to save all features of the
indicated modules into a big MQL dump, which can be imported by an EMDROS database.

In [42]:
TF.exportMQL("mytisch", "~/Downloads")

  0.00s Checking features of dataset mytisch
  0.00s 14 features to export to MQL ...
  0.00s Loading 14 features
  0.01s Writing enumerations
	book           :   27 values, 11 not a name, e.g. «1_Corinthians»
	book_code      :   27 values, 11 not a name, e.g. «1CO»
	vrsnum         :   58 values, 58 not a name, e.g. «1»
  0.09s Mapping 14 features onto 6 object types
  0.22s Writing 14 features as data in 6 object types
   |     0.00s word data ...
   |      |     0.43s batch of size               10.1MB with   50000 of   50000 words
   |      |     0.87s batch of size               10.1MB with   50000 of  100000 words
   |      |     1.19s batch of size                7.7MB with   37711 of  137711 words
   |     1.19s word data: 137711 objects
   |     0.00s verse data ...
   |      |     0.04s batch of size              598.5KB with    7951 of    7951 verses
   |     0.04s verse data: 7951 objects
   |     0.00s lex data ...
   |      |     0.11s batch of size                1.4MB wi

Now you have a file `~/Downloads/mysyrnt.mql` of 52 MB.
You can import it into an Emdros database by saying:

    cd ~/Downloads
    rm mysyrnt
    mql -b 3 < mysyrnt.mql

The result is an SQLite3 database `mysyrnt` in the same directory (17 MB).
You can run a query against it by creating a text file test.mql with this contents:

    select all objects where
    [verse
        [word FOCUS lexeme_ascii = 'WME']
    ]

And then say

    mql -b 3 -d mysyrnt test.mql

You will see raw query results: all word occurrences that belong to lexemes with `make` in their gloss.

It is not very pretty, and probably you should use a more visual Emdros tool to run those queries.
You see a lot of node numbers, but the good thing is, you can look those node numbers up in Text-Fabric.

---

CC-BY Dirk Roorda