<img src="images/pthu.png"/>
<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/dans-small.png"/>

# Compute

This notebook shows you how to compute with your corpus in Text-Fabric.
See [start](start.ipynb) for preliminaries.

## About

Ernst Boogert, while at the [PThU](https://www.pthu.nl/en/) has mass-converted Greek Literature from
high quality libraries such as Perseus to the Text-Fabric format.

He has delivered the outcome to the [greek_literature](https://github.com/pthu/greek_literature) repository on GitHub.

It consists of nearly 1800 works by over 250 authors.

In [1]:
%load_ext autoreload
%autoreload 2

In [6]:
from tf.app import use
from catalog import makeCatalog, ORG, REPO, VERSION

# Selection

We pick the same work as in [load](load.ipynb).

In [7]:
AUTHOR = "Aeschylus"
TITLE = "Eumenides"

works = makeCatalog()
dataSource = works[AUTHOR][TITLE][0]

dataSource

'First1KGreek/tlg0085/tlg007/opp-grc3/1/tf'

# Loading

We load the Eumenides:

In [8]:
A = use(f"{ORG}/{REPO}/{dataSource}:clone", version=VERSION, hoist=globals())

# Exploring

Every TF corpus has an atomic type, the type of the textual `slots`.

We can read it off from the dataset by means of the TF API (see
[cheatsheet](https://annotation.github.io/text-fabric/tf/cheatsheet.html) ).

We can also find out the number of slots.

In [9]:
F.otype.slotType

'word'

In [10]:
F.otype.maxSlot

5629

This is a small text, only 5629 words.

Now let's see what features we have:

In [12]:
Fall()

['_book',
 '_phrase',
 '_sentence',
 'anapests',
 'antistrophe',
 'card',
 'choral',
 'edition-grc',
 'ephymn',
 'episode',
 'foreign',
 'head',
 'hi',
 'hypothesis',
 'l',
 'lyric',
 'main',
 'note',
 'orig',
 'otype',
 'p',
 'para-P',
 'pb',
 'plain',
 'post',
 'pre',
 'sp',
 'speaker',
 'strophe']

I suspect the actual text is in the feature `orig`.

We can get a bit more information by looking at the metadata:

In [13]:
F.orig.meta

{'author': 'Aeschylus',
 'availability': 'Available under a Creative Commons Attribution-ShareAlike 4.0 International License.',
 'convertor_author': 'Ernst Boogert',
 'convertor_date': 'July, 2020',
 'convertor_execution': 'Ernst Boogert',
 'convertor_institution': 'Protestant Theological University (PThU), Amsterdam/Groningen, The Netherlands',
 'convertor_version': '1.0.0',
 'description': 'the original format of the word without interpunction',
 'editor': 'Arthur Sidgwick',
 'filename': 'tlg0085.tlg007.opp-grc3',
 'funder': 'Tufts University',
 'principal': 'Gregory Crane',
 'publicationStmt': 'Trustees of Tufts University, Medford, MA, Perseus Project.',
 'respStmt': 'Prepared under the supervision of, Gregory Crane',
 'sourceDesc': 'Aeschylus, Aeschyli Tragoediae : cum fabularum deperditarum fragmentis, poetae vita et operum catalogo, Arthur Sidgwick, Oxford, Clarendon Press, 1902, HathiTrust.',
 'sponsor': 'Perseus Project, Tufts University',
 'title': 'Eumenides',
 'valueType':

Indeed. And the [betacode](https://en.wikipedia.org/wiki/Beta_Code)
representation could be in feature `plain`.

In [14]:
F.plain.meta["description"]

'plain format in lowercase'

We call up the `orig` and `plain` features of the second word.

In [16]:
F.orig.v(1)

'Ὀρέστης '

In [17]:
F.plain.v(1)

'ορεστης'

OK, `plain` is not betacode, but accentless greek.

There is additional configuration in the dataset, coming from feature `otext.tf` (which only has metadata, no data).

That information is about text formats and sections.

In [19]:
T.formats

{'text-orig-full': 'word', 'text-orig-main': 'word', 'text-orig-plain': 'word'}

In [21]:
T.sectionTypes

['card', 'strophe', 'ephymn']

We see that we have three ways to represent text.

And the corpus is divided into *cards*, and then in *strophes*, and then in *ephymns*.

Let's pick the first strophe:

In [23]:
s = F.otype.s("strophe")[0]
s

9502

We get a number. All things in the corpus, words, strophes, cards, etc. are represented by nodes, and nodes are just numbers.
Treat them as bar codes: you have them, but you do not read them, and you do not remember them. Text-Fabric is your bar code reader.

Now we display the text of this strophe:

In [24]:
T.text(s)

'Ὀρέστης  ἐν  Δελφοῖς  περιεχόμενος  ὑπὸ  τῶν  Ἐρινύων  βουλῇ  Ἀπόλλωνος  παρεγένετο  εἰς  Ἀθήνας  εἰς  τὸ  ἱερὸν  τῆς  Ἀθηνᾶς: : ἧς  βουλῇ  νικήσας  κατῆλθεν  εἰς  Ἄργος. . τὰς  δὲ  Ἐρινύας  πραΰνας  προσηγόρευσεν  Εὐμενίδας. . παρ᾽ ᾽ οὐδετέρῳ  κεῖται  ῃ  μυθοποιία..Τὰ  τοῦ  δράματος  πρόσωπα::Πυθιὰς  προφῆτιςἈπόλλωνὈρέστηςΚλυταιμήστρας  εἴδωλονχορὸς  ΕὐμενίδωνΑθηνᾶπροπομποίαργυμεντυμ] 3 ] 3   ηερμανν  ʽʽαυξτορε  ηαρποκρατιονε  ς. . ϝοξε  ϝιδ. . αδ  δραματις  περσοναε] ]   ποστεα  ινσερτο  μʼʼμ̓'

And now in all three formats:

In [27]:
for fmt in T.formats:
    print(f"{F.otype.v(s)} {F.strophe.v(s)} in format {fmt}")
    print(T.text(s, fmt=fmt))
    print("")

strophe 0 in format text-orig-full
Ὀρέστης  ἐν  Δελφοῖς  περιεχόμενος  ὑπὸ  τῶν  Ἐρινύων  βουλῇ  Ἀπόλλωνος  παρεγένετο  εἰς  Ἀθήνας  εἰς  τὸ  ἱερὸν  τῆς  Ἀθηνᾶς: : ἧς  βουλῇ  νικήσας  κατῆλθεν  εἰς  Ἄργος. . τὰς  δὲ  Ἐρινύας  πραΰνας  προσηγόρευσεν  Εὐμενίδας. . παρ᾽ ᾽ οὐδετέρῳ  κεῖται  ῃ  μυθοποιία..Τὰ  τοῦ  δράματος  πρόσωπα::Πυθιὰς  προφῆτιςἈπόλλωνὈρέστηςΚλυταιμήστρας  εἴδωλονχορὸς  ΕὐμενίδωνΑθηνᾶπροπομποίαργυμεντυμ] 3 ] 3   ηερμανν  ʽʽαυξτορε  ηαρποκρατιονε  ς. . ϝοξε  ϝιδ. . αδ  δραματις  περσοναε] ]   ποστεα  ινσερτο  μʼʼμ̓

strophe 0 in format text-orig-main
ὀρέστης ἐν δελφοῖς περιεχόμενος ὑπὸ τῶν ἐρινύων βουλῇ ἀπόλλωνος παρεγένετο εἰς ἀθήνας εἰς τὸ ἱερὸν τῆς ἀθηνᾶς ἧς βουλῇ νικήσας κατῆλθε εἰς ἄργος τὰς δὲ ἐρινύας πραΰνας προσηγόρευσεν εὐμενίδας παρὰ οὐδετέρῳ κεῖται ῃ μυθοποιία τὰ τοῦ δράματος πρόσωπα πυθιὰς προφῆτις ἀπόλλων ὀρέστης κλυταιμήστρας εἴδωλον χορὸς εὐμενίδων αθηνᾶ προπομποί αργυμεντυμ ηερμανν αυξτορε ηαρποκρατιονε ς ϝοξε ϝιδ ἅδε δραματις περσοναε ποστεα ινσερτο με,ἐ

We can ask for a frequency distribution of the values of an arbitrary feature.
Now let's ask for the distribution of the `plain` feature. That will give us a nice frequency list of the
words in this corpus.

We print the top 20:

In [30]:
F.plain.freqList()[0:20]

(('δε', 197),
 ('και', 151),
 ('χορος', 96),
 ('τε', 71),
 ('γαρ', 67),
 ('ου', 61),
 ('εν', 54),
 ('το', 49),
 ('μη', 42),
 ('αθηνα', 36),
 ('μεν', 30),
 ('εκ', 29),
 ('προς', 29),
 ('η', 28),
 ('αλλα', 27),
 ('αν', 27),
 ('απολλων', 27),
 ('ως', 27),
 ('συ', 25),
 ('εγω', 23))

How many strophes are there?

In [31]:
len(F.otype.s("strophe"))

70

Here is strophe 36:

In [32]:
s = F.otype.s("strophe")[35]
T.text(s)

'Χορόςἔσθ᾽ ᾽ ὅπου  τὸ  δεινὸν  εὖ,,καὶ  φρενῶν  ἐπίσκοπονδεῖ  μένειν  καθήμενον::ξυμφέρεισωφρονεῖν  ὑπὸ  στένει..τίς  δὲ  μηδὲν  ἐν  δέεικαρδίαν  &&lt;;ἂν&&gt; ; ἀνατρέφωνἢ  πόλις  βροτός  θ᾽ ᾽ ὁμοίως  ἔτ᾽ ᾽ ἂν  σέβοι  δίκαν;;'

Here is word 5001

In [33]:
w = F.otype.s("word")[5000]
T.text(w)

'ταῦτα  '

Where is it?

In [34]:
T.sectionFromNode(w)

(881, 0, 0)

It is in card 881, in a strophe with number 0, in an ephymn with number 0.

We can get the nodes of these sections:

In [35]:
T.sectionTuple(w)

(7213, 9555, 7291)

Lets print the strophe (the middle element)

In [37]:
s = T.sectionTuple(w)[1]
T.text(s)

'Ἀθηνᾶοὔτοι  καμοῦμαί  σοι  λέγουσα  τἀγαθά, ,ὡς  μήποτ᾽ ᾽ εἴπῃς  πρὸς  νεωτέρας  ἐμοῦθεὸς  παλαιὰ  καὶ  πολισσούχων  βροτῶνἄτιμος  ἔρρειν  τοῦδ᾽ ᾽ ἀπόξενος  πέδου..ἀλλ᾽ ᾽ εἰ  μὲν  ἁγνόν  ἐστί  σοι  Πειθοῦς  σέβας,,γλώσσης  ἐμῆς  μείλιγμα  καὶ  θελκτήριον,,σὺ  δ᾽ ᾽ οὖν  μένοις  ἄν: : εἰ  δὲ  μὴ  θέλεις  μένειν,,οὔ  τἂν  δικαίως  τῇδ᾽ ᾽ ἐπιρρέποις  πόλειμῆνίν  τιν᾽ ᾽ ἢ  κότον  τιν᾽ ᾽ ἢ  βλάβην  στρατῷ..ἔξεστι  γάρ  σοι  τῆσδε  γαμόρῳ  χθονὸςεἶναι  δικαίως  ἐς  τὸ  πᾶν  τιμωμένῃ..Χορόςἄνασσ᾽ ᾽ Ἀθάνα, , τίνα  με  φῂς  ἔχειν  ἕδραν;;Ἀθηνᾶπάσης  ἀπήμον᾽ ᾽ οἰζύος: : δέχου  δὲ  σύ..Χορόςκαὶ  δὴ  δέδεγμαι: : τίς  δέ  μοι  τιμὴ  μένει;;Ἀθηνᾶὡς  μή  τιν᾽ ᾽ οἶκον  εὐθενεῖν  ἄνευ  σέθεν..Χορόςσὺ  τοῦτο  πράξεις, , ὥστε  με  σθένειν  τόσον;;Ἀθηνᾶτῷ  γὰρ  σέβοντι  συμφορὰς  ὀρθώσομεν..Χορόςκαί  μοι  πρόπαντος  ἐγγύην  θήσει  χρόνου;;Ἀθηνᾶἔξεστι  γάρ  μοι  μὴ  λέγειν  ἃ  μὴ  τελῶ..Χορόςθέλξειν  μ᾽ ᾽ ἔοικας  καὶ  μεθίσταμαι  κότου..Ἀθηνᾶτοιγὰρ  κατὰ  χθόν᾽ ᾽ οὖσ᾽ ᾽ ἐπικτήσει  φίλους..Χορόςτί  οὖν  μ᾽ 

The first slot and the last slot in this strophe are:

In [38]:
words = L.d(s, otype="word")
print(words[0])
print(words[-1])

4838
5067


We can display this strophe in a prettier way:

In [40]:
A.pretty(s, full=True)

# All steps

* **[start](start.ipynb)** catalogue of Greek Literature
* **[load](load.ipynb)** load a Greek work
* (later) ~[display](display.ipynb)~ create pretty displays of your text structures
* **compute** compute with text and features
* (later) ~[search](search.ipynb)~ turbo charge your hand-coding with search templates
* (later) ~[exportExcel](exportExcel.ipynb)~ make tailor-made spreadsheets out of your results
* (later) ~[share](share.ipynb)~ draw in other people's data and let them use yours

CC-BY Ernst Boogert, Dirk Roorda