# Examples

This is a collection of examples and illustrations of Text-Fabric in general

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

# Type levels

When Text-Fabric precomputes corpus data, it ranks the node types according to the average size of nodes of that type.

The *size* of a node is the size of the set of slots attached to that node.
Slots are attached to nodes by means of the `oslots` feature, which is a standard part of each TF data set.

From within Text-Fabric, we can ask for this ranking, by means of

* `C.levels.data`: inspecting the precomputed data
* `F.otype.all`: the resulting node types
* `N.otypeRank`: the resulting ranking

We load the BHSA and Uruk
([here](https://annotation.github.io/text-fabric/tf/about/corpora.html) is more info on these corpora)
and have a look at their node type ranking.

In [3]:
As = {}

In [4]:
for corpus in ("bhsa", "uruk"):
    print(f"Loading {corpus} ...")
    As[corpus] = use(f"{corpus}:clone", silent="deep")
    As[corpus].info("done")

Loading bhsa ...


  2.26s done
Loading uruk ...


  0.41s done


We have loaded both datasets.

We want to be able to put them into the foreground, i.e. make it so that the global variables `A N F E L T S C TF` become bound to the
forground data set. We write a function for that.

In [5]:
def foreground(corpus, hoist):
    thisA = As[corpus]
    hoist["A"] = thisA
    thisTf = thisA.TF
    thisTf.makeAvailableIn(hoist)
    thisA.showContext("corpus")

In [6]:
foreground("bhsa", globals())

<details open><summary><b>bhsa</b> <i>app context</i></summary>

<details open><summary>15. corpus</summary>

`BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis`

</details>
</details>


In [7]:
foreground("uruk", globals())

<details open><summary><b>uruk</b> <i>app context</i></summary>

<details open><summary>15. corpus</summary>

`Uruk IV/III: Proto-cuneiform tablets `

</details>
</details>


That works.

Back to the BHSA!

In [8]:
foreground("bhsa", globals())

<details open><summary><b>bhsa</b> <i>app context</i></summary>

<details open><summary>15. corpus</summary>

`BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis`

</details>
</details>


In [9]:
C.levels.data

(('book', 10938.051282051281, 426585, 426623),
 ('chapter', 459.18622174381056, 426624, 427552),
 ('lex', 46.2021011588866, 1437567, 1446799),
 ('verse', 18.37694395381898, 1414354, 1437566),
 ('half_verse', 9.441876936697653, 606362, 651541),
 ('sentence', 6.693928789994822, 1172290, 1236016),
 ('sentence_atom', 6.611142967841921, 1236017, 1300541),
 ('clause', 4.8408892318516585, 427553, 515673),
 ('clause_atom', 4.703863796753705, 515674, 606361),
 ('phrase', 1.684724355961723, 651542, 904748),
 ('phrase_atom', 1.5944621572020736, 904749, 1172289),
 ('subphrase', 1.4231715460584122, 1300542, 1414353),
 ('word', 1, 1, 426584))

The second column is the average length of nodes in slots.

Here you see just the types, in the same order:

In [10]:
F.otype.all

('book',
 'chapter',
 'lex',
 'verse',
 'half_verse',
 'sentence',
 'sentence_atom',
 'clause',
 'clause_atom',
 'phrase',
 'phrase_atom',
 'subphrase',
 'word')

And here are the ranks:

In [11]:
N.otypeRank

{'word': 0,
 'subphrase': 1,
 'phrase_atom': 2,
 'phrase': 3,
 'clause_atom': 4,
 'clause': 5,
 'sentence_atom': 6,
 'sentence': 7,
 'half_verse': 8,
 'verse': 9,
 'lex': 10,
 'chapter': 11,
 'book': 12}

Now the same for oldbabylonian:

In [12]:
foreground("uruk", globals())

<details open><summary><b>uruk</b> <i>app context</i></summary>

<details open><summary>15. corpus</summary>

`Uruk IV/III: Proto-cuneiform tablets `

</details>
</details>


In [13]:
C.levels.data

(('tablet', 22.013513513513512, 143889, 150252),
 ('face', 14.098032994923859, 150253, 159708),
 ('column', 9.338016116380233, 180450, 194472),
 ('line', 3.6104012052898833, 227226, 263067),
 ('case', 3.4648222982074395, 159709, 169359),
 ('cluster', 1.031416969437914, 194473, 227225),
 ('quad', 2.050606220347918, 140095, 143888),
 ('comment', 1.0, 169360, 180449),
 ('sign', 1, 1, 140094))

In [14]:
F.otype.all

('tablet',
 'face',
 'column',
 'line',
 'case',
 'cluster',
 'quad',
 'comment',
 'sign')

And here are the ranks:

In [15]:
N.otypeRank

{'sign': 0,
 'comment': 1,
 'quad': 2,
 'cluster': 3,
 'case': 4,
 'line': 5,
 'column': 6,
 'face': 7,
 'tablet': 8}

Note that `cluster` is ranked higher than `quad` although `cluster` is smaller than `quad` on average.

This is what we want, and we have achieved it by specifying this order under the `@levels` key in the
[otext](https://github.com/Nino-cunei/uruk/blob/51f495fbaa94e4faa9f7dc06482548dfdf10bd87/tf/uruk/1.0/otext.tf#L10)
feature of this dataset.

# Relevance for display

When we display, we want to display smaller nodes inside bigger nodes.

In the BHSA, sentences are bigger than clauses.

But what if a sentence happens to be exactly as big as its only clause?

We want to guarantee that even then the clause is displayed in the sentence, and not the other way around.

In [16]:
foreground("bhsa", globals())

<details open><summary><b>bhsa</b> <i>app context</i></summary>

<details open><summary>15. corpus</summary>

`BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis`

</details>
</details>


In [17]:
query = """
sentence
    := clause
    =:
"""
results = A.search(query)

  0.41s 49699 results


In [18]:
s = results[0][0]
F.otype.v(s)

'sentence'

In [19]:
A.pretty(s)