In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
import collections

In [3]:
from tf.app import use

In [5]:
A = use("nena:clone", checkout="clone", version="alpha", hoist=globals())

In [6]:
A.indent(reset=True)
A.info("Counting nodes ...")

i = 0
for n in N.walk():
    i += 1

A.info("{} nodes".format(i))

  0.00s Counting nodes ...
  0.13s 833584 nodes


Here you see it: a bit less than 1M nodes.

## What are those nodes?
Every node has a type, like sign, or line, face.
But what exactly are they?

Text-Fabric has two special features, `otype` and `oslots`, that must occur in every Text-Fabric data set.
`otype` tells you for each node its type, and you can ask for the number of `slot`s in the text.

Here we go!

In [7]:
F.otype.slotType

'letter'

In [8]:
F.otype.maxSlot

539378

In [9]:
F.otype.maxNode

833584

In [10]:
F.otype.all

('dialect',
 'text',
 'paragraph',
 'line',
 'sentence',
 'subsentence',
 'inton',
 'stress',
 'word',
 'letter')

In [11]:
C.levels.data

(('dialect', 269689.0, 539379, 539380),
 ('text', 4280.777777777777, 713308, 713433),
 ('paragraph', 1541.08, 578369, 578718),
 ('line', 212.01965408805032, 575825, 578368),
 ('sentence', 33.037976234227614, 578719, 595044),
 ('subsentence', 22.018124668326735, 688811, 713307),
 ('inton', 14.800186587641313, 539381, 575824),
 ('stress', 5.7523835932000935, 595045, 688810),
 ('word', 4.4891677971885375, 713434, 833584),
 ('letter', 1, 1, 539378))

This is interesting: above you see all the textual objects, with the average size of their objects,
the node where they start, and the node where they end.

## Count individual object types
This is an intuitive way to count the number of nodes in each type.
Note in passing, how we use the `indent` in conjunction with `info` to produce neat timed
and indented progress messages.

In [12]:
A.indent(reset=True)
A.info("counting objects ...")

for otype in F.otype.all:
    i = 0

    A.indent(level=1, reset=True)

    for n in F.otype.s(otype):
        i += 1

    A.info("{:>7} {}s".format(i, otype))

A.indent(level=0)
A.info("Done")

  0.00s counting objects ...
   |     0.00s       2 dialects
   |     0.00s     126 texts
   |     0.00s     350 paragraphs
   |     0.00s    2544 lines
   |     0.00s   16326 sentences
   |     0.00s   24497 subsentences
   |     0.00s   36444 intons
   |     0.01s   93766 stresss
   |     0.01s  120151 words
   |     0.06s  539378 letters
  0.10s Done


# Viewing textual objects

You can use the A API (the extra power) to display cuneiform text.

See the [display](display.ipynb) tutorial.

# Feature statistics

`F`
gives access to all features.
Every feature has a method
`freqList()`
to generate a frequency list of its values, higher frequencies first.

In [13]:
F.speaker.freqList()

(('Dawið ʾAdam', 21184),
 ('Yulia Davudi', 18191),
 ('Yuwarəš Xošăba Kena', 10124),
 ('Manya Givoyev', 6746),
 ('Yuwəl Yuḥanna', 5953),
 ('Nanəs Bənyamən', 5424),
 ('Yosəp bet Yosəp', 5256),
 ('Yonan Petrus', 4345),
 ('Natan Khoshaba', 4176),
 ('Arsen Mikhaylov', 3338),
 ('Xošebo ʾOdišo', 3281),
 ('Nancy George', 3131),
 ('Awiko Sulaqa', 3096),
 ('Maryam Gwirgis', 2954),
 ('Alice Bet-Yosəp', 2618),
 ('Bənyamən Bənyamən', 2598),
 ('MB', 2317),
 ('Mišayel Barčəm', 1818),
 ('Nadia Aloverdova', 1754),
 ('Frederic Ayyubkhan', 1615),
 ('Victor Orshan', 1426),
 ('Merab Badalov', 1162),
 ('Sophia Danielova', 1109),
 ('Blandina Barwari', 1030),
 ('YD', 998),
 ('Dawið Gwərgəs', 865),
 ('Gwərgəs Dawið', 658),
 ('AB', 603),
 ('Jacob Petrus', 534),
 ('Dawid Adam', 492),
 ('NK', 326),
 ('YP', 320),
 ('JP', 261),
 ('Kena Kena', 174),
 ('Nawiya ʾOdišo', 102),
 ('GK', 101),
 ('Leya ʾOraha', 71))

In [15]:
F.lang.freqList("word")

(('NENA', 117093),
 ('K.', 1767),
 ('A.', 775),
 ('K./A.', 263),
 ('A.|A.|K.', 65),
 ('A.|K.', 35),
 ('K./T.', 32),
 ('K.|K.', 26),
 ('K.|K.|K.', 18),
 ('A.|A.', 16),
 ('Urm.', 16),
 ('E.', 12),
 ('K./A./E.', 9),
 ('P.', 5),
 ('A./K.', 4),
 ('K./A.|K./A.', 4),
 ('T.', 4),
 ('Ṭiy.', 3),
 ('A./E.', 2),
 ('K./E.', 1),
 ('K./T.|K./T.', 1))

In [16]:
F.lang.freqList("morpheme")

()

In [17]:
for (w, amount) in F.text.freqList("word")[0:20]:
    print(f"{amount:>5} {w}")

 2105 l
 2016 b
 1609 ʾna
 1605 
 1596 ʾu
 1509 x
 1406 xa
 1306 mə́re
 1177 mra
 1102 d
 1101 ʾ
 1100 m
  934 k̭a
  929 dye
  890 mən
  872 ɟu
  861 la
  805 t
  748 gu
  691 rba


In [15]:
for (w, amount) in F.lemma.freqList()[0:20]:
    print(f"{amount:>5} {w}")

 1857 w, ʾu-
 1322 xa, xaʾa
  960 ṱ
  859 b-
  749 gu-
  742 la
  621 t
  590 l-
  529 ʾana
  498 mən, m-
  429 diya
  349 naša
  342 ʾawwa
  339 malka
  338 ʾaw, ʾo
  330 kul, ku, kulla
  326 mo, mu, modi, maw, mawdi
  319 qəm-
  318 ʾaw
  300 ṭla-, ta-


## Word distribution

Let's do a bit more fancy word stuff.

### Hapaxes

A hapax can be found by picking the words with frequency 1.

We print 20 hapaxes.

In [16]:
hapaxes1 = sorted(lx for (lx, amount) in F.lemma.freqList() if amount == 1)
len(hapaxes1)

340

In [17]:
for lx in hapaxes1[0:20]:
    print(lx)

bahər, bɛhər, bɛhɛri
banaya, bannaya
banda
baxəra
baʿd
baṣora
baṭila
baṭman
bdila
boš
briza
bulbul
burra
busama
bustana
buxtən
băṭaniya
bšila
bərra
bəṣliṣa


### Small occurrence base

The occurrence base of a word are the scrolls in which occurs.

We compute the occurrence base of each word, based on lexemes according to the `glex` feature.

In [18]:
occurrenceBase1 = collections.defaultdict(set)

A.indent(reset=True)
A.info("compiling occurrence base ...")
for w in F.otype.s("word"):
    text = T.sectionFromNode(w)[1]
    occurrenceBase1[F.lemma.v(w)].add(text)
A.info(f"{len(occurrenceBase1)} entries")

  0.00s compiling occurrence base ...
  2.71s 1174 entries


Wow, that took long!

We looked up the text for each word.

But there is another way:

Start with texts, and iterate through their words.

In [19]:
occurrenceBase2 = collections.defaultdict(set)

A.indent(reset=True)
A.info("compiling occurrence base ...")
for s in F.otype.s("text"):
    text = F.title.v(s)
    for w in L.d(s, otype="word"):
        occurrenceBase2[F.lemma.v(w)].add(text)
A.info("done")
A.info(f"{len(occurrenceBase2)} entries")

  0.00s compiling occurrence base ...
  0.19s done
  0.19s 1174 entries


Much better. Are the results equal?

In [20]:
occurrenceBase1 == occurrenceBase2

True

Yes.

In [21]:
occurrenceBase = occurrenceBase2

An overview of how many words have how big occurrence bases:

In [22]:
occurrenceSize = collections.Counter()

for (w, scrolls) in occurrenceBase.items():
    occurrenceSize[len(scrolls)] += 1

occurrenceSize = sorted(
    occurrenceSize.items(),
    key=lambda x: (-x[1], x[0]),
)

for (size, amount) in occurrenceSize[0:10]:
    print(f"base size {size:>4} : {amount:>5} words")
print("...")
for (size, amount) in occurrenceSize[-10:]:
    print(f"base size {size:>4} : {amount:>5} words")

base size    1 :   531 words
base size    2 :   199 words
base size    3 :    85 words
base size    4 :    65 words
base size    5 :    35 words
base size    6 :    25 words
base size    8 :    19 words
base size   10 :    19 words
base size    7 :    17 words
base size   12 :    15 words
...
base size   38 :     2 words
base size   51 :     2 words
base size   26 :     1 words
base size   31 :     1 words
base size   46 :     1 words
base size   48 :     1 words
base size   49 :     1 words
base size   50 :     1 words
base size   52 :     1 words
base size  126 :     1 words


Let's give the predicate *private* to those words whose occurrence base is a single text.

In [23]:
privates = {w for (w, base) in occurrenceBase.items() if len(base) == 1}
len(privates)

531

### Peculiarity of texts

As a final exercise with texts, lets make a list of all texts, and show their

* total number of words
* number of private words
* the percentage of private words: a measure of the peculiarity of the text

In [24]:
textList = []

empty = set()
ordinary = set()

for d in F.otype.s("text"):
    text = F.title.v(d)
    words = {F.lemma.v(w) for w in L.d(d, otype="word")}
    a = len(words)
    if not a:
        empty.add(text)
        continue
    o = len({w for w in words if w in privates})
    if not o:
        ordinary.add(text)
        continue
    p = 100 * o / a
    textList.append((text, a, o, p))

textList = sorted(textList, key=lambda e: (-e[3], -e[1], e[0]))

print(f"Found {len(empty):>4} empty texts")
print(f"Found {len(ordinary):>4} ordinary texts (i.e. without private words)")

Found    0 empty texts
Found   74 ordinary texts (i.e. without private words)


In [25]:
print(
    "{:<50}{:>5}{:>5}{:>5}\n{}".format(
        "text",
        "#all",
        "#own",
        "%own",
        "-" * 35,
    )
)

for x in textList[0:20]:
    print("{:<50} {:>4} {:>4} {:>4.1f}%".format(*x))
print("...")
for x in textList[-20:]:
    print("{:<50} {:>4} {:>4} {:>4.1f}%".format(*x))

text                                               #all #own %own
-----------------------------------
Gozali and Nozali                                   331   64 19.3%
The Crow and the Cheese                              29    5 17.2%
The Story With No End                                54    7 13.0%
The Fox and the Miller                              151   18 11.9%
The Tale of Mămo and Zine                           271   30 11.1%
The King With Forty Sons                            253   28 11.1%
Tales From the 1001 Nights                          293   31 10.6%
The Fox and the Stork                                38    4 10.5%
The Tale of Farxo and Səttiya                       270   27 10.0%
The Priest and the Mullah                            82    8  9.8%
šošət Xere                                           98    9  9.2%
Baby Leliθa                                         166   15  9.0%
The Crafty Hireling                                 189   17  9.0%
The Man Who Wanted to Work 

# Locality API
We travel upwards and downwards, forwards and backwards through the nodes.
The Locality-API (`L`) provides functions: `u()` for going up, and `d()` for going down,
`n()` for going to next nodes and `p()` for going to previous nodes.

These directions are indirect notions: nodes are just numbers, but by means of the
`oslots` feature they are linked to slots. One node *contains* an other node, if the one is linked to a set of slots that contains the set of slots that the other is linked to.
And one if next or previous to an other, if its slots follow or precede the slots of the other one.

`L.u(node)` **Up** is going to nodes that embed `node`.

`L.d(node)` **Down** is the opposite direction, to those that are contained in `node`.

`L.n(node)` **Next** are the next *adjacent* nodes, i.e. nodes whose first slot comes immediately after the last slot of `node`.

`L.p(node)` **Previous** are the previous *adjacent* nodes, i.e. nodes whose last slot comes immediately before the first slot of `node`.

All these functions yield nodes of all possible otypes.
By passing an optional parameter, you can restrict the results to nodes of that type.

The result are ordered according to the order of things in the text.

The functions return always a tuple, even if there is just one node in the result.

## Going up
We go from the first word to the scroll it contains.
Note the `[0]` at the end. You expect one scroll, yet `L` returns a tuple.
To get the only element of that tuple, you need to do that `[0]`.

If you are like me, you keep forgetting it, and that will lead to weird error messages later on.

In [26]:
firstText = L.u(1, otype="text")[0]
print(firstText)

713259


And let's see all the containing objects of letter 3:

In [27]:
s = 3
for otype in F.otype.all:
    if otype == F.otype.slotType:
        continue
    up = L.u(s, otype=otype)
    upNode = "x" if len(up) == 0 else up[0]
    print("letter {} is contained in {} {}".format(s, otype, upNode))

letter 3 is contained in dialect 539382
letter 3 is contained in text 713259
letter 3 is contained in paragraph 577912
letter 3 is contained in line 575368
letter 3 is contained in sentence 578263
letter 3 is contained in subsentence 688732
letter 3 is contained in inton 539384
letter 3 is contained in stress 594970
letter 3 is contained in word 713386


## Going next
Let's go to the next nodes of the first text.

In [28]:
afterFirstText = L.n(firstText)
for n in afterFirstText:
    print(
        "{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
            n,
            F.otype.v(n),
            E.oslots.s(n)[0],
            E.oslots.s(n)[-1],
        )
    )
secondText = L.n(firstText, otype="text")[0]

   2004: letter        first slot=2004  , last slot=2004  
 713868: word          first slot=2004  , last slot=2004  
 595313: stress        first slot=2004  , last slot=2012  
 539513: inton         first slot=2004  , last slot=2029  
 688836: subsentence   first slot=2004  , last slot=2029  
 578331: sentence      first slot=2004  , last slot=2029  
 575380: line          first slot=2004  , last slot=2183  
 577913: paragraph     first slot=2004  , last slot=6497  
 713260: text          first slot=2004  , last slot=6497  


## Going previous

And let's see what is right before the second text.

In [29]:
for n in L.p(secondText):
    print(
        "{:>7}: {:<13} first slot={:<6}, last slot={:<6}".format(
            n,
            F.otype.v(n),
            E.oslots.s(n)[0],
            E.oslots.s(n)[-1],
        )
    )

 713259: text          first slot=1     , last slot=2003  
 577912: paragraph     first slot=1     , last slot=2003  
 575379: line          first slot=1855  , last slot=2003  
 578330: sentence      first slot=1985  , last slot=2003  
 688835: subsentence   first slot=1985  , last slot=2003  
 539512: inton         first slot=1985  , last slot=2003  
 595312: stress        first slot=1992  , last slot=2003  
 713867: word          first slot=1996  , last slot=2003  
   2003: letter        first slot=2003  , last slot=2003  


## Going down

We go to the lines of the first text, and just count them.

In [30]:
lines = L.d(firstText, otype="line")
print(len(lines))

12


# Text API

So far, we have mainly seen nodes and their numbers, and the names of node types.
You would almost forget that we are dealing with text.
So let's try to see some text.

In the same way as `F` gives access to feature data,
`T` gives access to the text.
That is also feature data, but you can tell Text-Fabric which features are specifically
carrying the text, and in return Text-Fabric offers you
a Text API: `T`.

## Formats
DSS text can be represented in a number of ways:

* `orig`: unicode
* `trans`: ETCBC transcription
* `source`: as in Abegg's data files

All three can be represented in two flavours:

* `full`: all glyphs, but no bracketings and flags
* `extra`: everything

If you wonder where the information about text formats is stored:
not in the program text-fabric, but in the data set.
It has a feature `otext`, which specifies the formats and which features
must be used to produce them. `otext` is the third special feature in a TF data set,
next to `otype` and `oslots`.
It is an optional feature.
If it is absent, there will be no `T` API.

Here is a list of all available formats in this data set.

In [31]:
T.formats

{'text-orig-full': 'word',
 'text-orig-lite': 'word',
 'text-trans-full': 'word',
 'text-trans-fuzzy': 'word',
 'text-trans-lite': 'word',
 'layout-orig-full': 'word',
 'layout-orig-lite': 'word',
 'layout-trans-full': 'word',
 'layout-trans-fuzzy': 'word',
 'layout-trans-lite': 'word'}

## Using the formats

The ` T.text()` function is central to get text representations of nodes. Its most basic usage is

```python
T.text(nodes, fmt=fmt)
```
where `nodes` is a list or iterable of nodes, usually word nodes, and `fmt` is the name of a format.
If you leave out `fmt`, the default `text-orig-full` is chosen.

The result is the text in that format for all nodes specified:

You see for each format in the list above its intended level of operation: `letter` or `word`.

If TF formats a node according to a defined text-format, it will descend to constituent nodes and represent those
constituent nodes.

In this case, no formats specify the `word` level as the descend type.

If we do not specify a format, the **default** format is used (`text-orig-full`).

We examine a portion of text material:

In [32]:
lineNode = T.nodeFromSection(("Barwar", "Gozali and Nozali", 8))
lineNode

575541

In [33]:
letters = L.d(lineNode, otype="letter")
morphemes = L.d(lineNode, otype="morpheme")
words = L.d(lineNode, otype="word")
print(
    f"""
line {T.sectionFromNode(lineNode)} with
  {len(letters):>3} letters
  {len(morphemes):>3} morphemes
  {len(words):>3} words
"""
)


line ('Barwar', 'Gozali and Nozali', 8) with
  166 letters
    0 morphemes
   40 words



In [34]:
T.text(letters[0:50])

''

In [35]:
T.text(morphemes[0:20])

''

In [36]:
T.text(words[0:10])

'hadíya mù wídle?ˈ mə́re ṭlá polìseˈ mə́re só l-bɛ́θət '

## Whole text in all formats in a few seconds
Part of the pleasure of working with computers is that they can crunch massive amounts of data.

We print the text in all formats.

In [37]:
A.indent(reset=True)
A.info("writing plain text of all texts in all text formats")

formats = T.formats

text = collections.defaultdict(list)

for ln in F.otype.s("line"):
    for fmt in formats:
        if fmt.startswith("layout"):
            continue
        text[fmt].append(T.text(ln, fmt=fmt, descend=True))

A.info("done {} formats".format(len(text)))

for fmt in sorted(text):
    print("{}\n{}\n".format(fmt, "\n".join(text[fmt][0:5])))

  0.00s writing plain text of all texts in all text formats
  1.59s done 5 formats
text-orig-full
xá-ga xèta,ˈ mállah naṣràdin,ˈ xázəx mòdi wíða.ˈ gu-bɛ̀θa wéwa,ˈ har-zála-w θàya.ˈ zála-w θàya,ˈ mára ya-ʾàlaha,ˈ yawə̀tliˈ ʾə́mma dàwe.ˈ ʾən-hàwaˈ ʾə́č̣č̣i-u ʾə́č̣č̣a maqəlbə̀nna.ˈ ʾu-ʾən-hàwaˈ ʾə́mma-w-xà-ži,ˈ la-băyə̀nna.ˈ de-šùqla.ˈ ʾə̀mma gắrəg háwa drə́st.ˈ 
b-álaha hóle zála-w θàya,ˈ ʾíθwale xá-šwawa huðàya,ˈ maṣóθe ʾə́lle dìye.ˈ mə́re xázəx ʾáwwa dū̀s-ile.ˈ qɛ́mən mjarbə̀nne.ˈ síqa l-gàre,ˈ də́ryɛle ʾə́č̣č̣i-u ʾə́č̣č̣a dáwe gu-ða-kìsta,ˈ də́rya b-kàwele.ˈ ʾá báxta hàyyo!ˈ hóle ʾaláha qəm-mšadə̀rrən.ˈ 
muθɛ́θɛla màjma.ˈ msúrqəlla píšela mnáyəlla l-xà-xa.ˈ plíṭla ʾə́č̣č̣i-u ʾə̀č̣č̣a.ˈ trè,ˈ trè,ˈ ʾə́č̣č̣i-u ʾə̀č̣č̣a.ˈ ʾə̀ṣra,ˈ ʾə̀ṣra,ˈ hàr-ʾəč̣č̣i-u ʾə́č̣č̣a.ˈ klèla,ˈ ʾámər báxta dū̀s-ile.ˈ ʾaláha là-xaləṭ.ˈ ʾə́č̣č̣i-u ʾə̀č̣č̣a,ˈ ʾáxči ʾána max-xšàwti,ˈ ʾáyya kìstaˈ hóle mxožə́bnəlla max-xà.ˈ ha-šqùl,ˈ máttula tămàha.ˈ 
huðáya l-gàreˈ šwirɛ́le l-pàlga,ˈ yába ʾànən mšúdrəlla!ˈ ʾáy kál

### The full plain text
We write all formats to file, in your `Downloads` folder.

In [38]:
for fmt in T.formats:
    with open(
        os.path.expanduser(f"~/Downloads/{fmt}.txt"),
        "w",
        # encoding='utf8',
    ) as f:
        f.write("\n".join(text[fmt]))

(if this errors, uncomment the line with `encoding`)

## Sections

A section in the NENA corpus is a dialect, a text or a line.
Knowledge of sections is not baked into Text-Fabric.
The config feature `otext.tf` may specify three section levels, and tell
what the corresponding node types and features are.

From that knowledge it can construct mappings from nodes to sections, e.g. from line
nodes to tuples of the form:

    (dialect name, text title, line number)

You can get the section of a node as a tuple of relevant dialect, text, and line nodes.
Or you can get it as a passage label, a string.

You can ask for the passage corresponding to the first slot of a node, or the one corresponding to the last slot.

Here are examples of getting the section that corresponds to a node and vice versa.

**NB:** `sectionFromNode` always delivers a line specification, either from the
first slot belonging to that node, or, if `lastSlot`, from the last slot
belonging to that node.

In [39]:
someNodes = (
    F.otype.s("letter")[100000],
    F.otype.s("word")[5000],
    F.otype.s("inton")[4000],
    F.otype.s("stress")[3000],
    F.otype.s("line")[1000],
    F.otype.s("text")[100],
    F.otype.s("dialect")[1],
)

In [40]:
for n in someNodes:
    nType = F.otype.v(n)
    d = f"{n:>7} {nType}"
    first = A.sectionStrFromNode(n)
    last = A.sectionStrFromNode(n, lastSlot=True, fillup=True)
    tup = (
        T.sectionTuple(n),
        T.sectionTuple(n, lastSlot=True, fillup=True),
    )
    print(f"{d:<16} - {tup}")
    print(f"    first: {first}")
    print(f"    last:  {last}")

 100001 letter   - ((539382, 713280, 575961), (539382, 713280, 575961))
    first: Barwar, The Daughter of the King, Ln. 14
    last:  Barwar, The Daughter of the King, Ln. 14
 718385 word     - ((539382, 713263, 575489), (539382, 713263, 575489))
    first: Barwar, Baby Leliθa, Ln. 9
    last:  Barwar, Baby Leliθa, Ln. 9
 543384 inton    - ((539382, 713272, 575727), (539382, 713272, 575727))
    first: Barwar, Tales From the 1001 Nights, Ln. 18
    last:  Barwar, Tales From the 1001 Nights, Ln. 18
 597970 stress   - ((539382, 713262, 575466), (539382, 713262, 575466))
    first: Barwar, A Tale of a Prince and a Princess, Ln. 46
    last:  Barwar, A Tale of a Prince and a Princess, Ln. 46
 576368 line     - ((539382, 713300, 576368), (539382, 713300, 576368))
    first: Barwar, The Tale of Farxo and Səttiya, Ln. 28
    last:  Barwar, The Tale of Farxo and Səttiya, Ln. 28
 713359 text     - ((539383, 713359), (539383, 713359, 577524))
    first: Urmi_C, The Loan of a Cooking Pot
    las

# Clean caches

Text-Fabric pre-computes data for you, so that it can be loaded faster.
If the original data is updated, Text-Fabric detects it, and will recompute that data.

But there are cases, when the algorithms of Text-Fabric have changed, without any changes in the data, that you might
want to clear the cache of precomputed results.

There are two ways to do that:

* Locate the `.tf` directory of your dataset, and remove all `.tfx` files in it.
  This might be a bit awkward to do, because the `.tf` directory is hidden on Unix-like systems.
* Call `TF.clearCache()`, which does exactly the same.

It is not handy to execute the following cell all the time, that's why I have commented it out.
So if you really want to clear the cache, remove the comment sign below.

In [41]:
# TF.clearCache()

# Next steps

By now you have an impression how to compute around in the corpus.
While this is still the beginning, I hope you already sense the power of unlimited programmatic access
to all the bits and bytes in the data set.

Here are a few directions for unleashing that power.

* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[similarLines](similarLines.ipynb)** spot the similarities between lines

---

See the [cookbook](cookbook) for recipes for small, concrete tasks.

CC-BY Dirk Roorda