<img align="right" src="images/tf.png" width="128"/>
<img align="right" src="images/dans.png"/>
<img align="right" src="images/huygenslogo.png"/>
<img align="right" src="images/logo.png"/>

# Tutorial

This notebook gets you started with using
[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in 
[Missieven Corpus](https://github.com/Dans-labs/clariah-gm).

Familiarity with the underlying
[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)
is recommended.

## Installing Text-Fabric

### Python

You need to have Python on your system. Most systems have it out of the box,
but alas, that is python2 and we need at least python **3.6**.

Install it from [python.org](https://www.python.org) or from
[Anaconda](https://www.anaconda.com/download).

### TF itself

```
pip3 install text-fabric
```

### Jupyter notebook

You need [Jupyter](http://jupyter.org).

If it is not already installed:

```
pip3 install jupyter
```

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os, collections

In [3]:
from tf.app import use

## Corpus data

Text-Fabric will fetch the Missieven corpus for you.

It will fetch the newest version by default, but you can get other versions as well.

The data will be stored under `text-fabric-data` in your home directory.

# Incantation

The simplest way to get going is by this *incantation*:

For the very last version, use `hot`.

For the latest release, use `latest`.

If you have cloned the repos (TF app and data), use `clone`.

If you do not want/need to upgrade, leave out the checkout specifiers.

**After downloading new data it will take 1-2 minutes to optimize the data**.

The optimized data will be stored in your system, and all subsequent use of this
corpus will find that optimized data.

In [4]:
A = use('missieven:latest', checkout="latest", hoist=globals())
# A = use('missieven', hoist=globals())
# A = use('missieven:clone', checkout="clone", hoist=globals())
# A = use('missieven:hot', checkout="hot", hoist=globals())

rate limit is 5000 requests per hour, with 4682 left for this hour
	connecting to online GitHub repo annotation/app-missieven ... connected
	code/__init__.py...downloaded
	code/app.py...downloaded
	code/config.yaml...downloaded
	code/static...directory
		code/static/display.css...downloaded
		code/static/logo.png...downloaded
	OK


rate limit is 5000 requests per hour, with 4667 left for this hour
	connecting to online GitHub repo Dans-labs/clariah-gm ... connected
	downloading https://github.com/Dans-labs/clariah-gm/releases/download/v0.5/tf-0.5.zip ... 
	unzipping ... 
	saving data


   |     1.64s T otype                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |       14s T oslots               from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |       11s T trans                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     5.82s T punco                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     3.94s T transr               from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     7.34s T transo               from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.65s T n                    from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     3.18s T puncr                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.00s T title                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     8.95s T punc                 from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |      |     0.37s C __levels__           from otype, oslots, otext
   |      |       45s C __ord



   |      |     1.28s C __sections__         from otype, oslots, otext, __levUp__, __levels__, n, n, n
   |      |     0.24s C __structure__        from otype, oslots, otext, __rank__, __levUp__, n, title, n
   |     0.00s T author               from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.00s T authorFull           from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.03s T col                  from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.00s T day                  from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.01s T emph                 from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.02s T facs                 from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.08s T fnote                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.01s T folio                from ~/text-fabric-data/Dans-labs/clariah-gm/tf/0.5
   |     0.00s T month                from ~/text-fabric-data/Dans-labs/

There is a lot of information in the report above, we'll look at that in a later chapter:
[compute](compute.ipynb)

# Getting around

## Where am I?

All information in a Text-Fabric dataset is tied to nodes and edges.
Nodes are integers, from 1 upwards, and the basic textual objects (*slots*) come first, in the order of the text.
In this corpus, slots are words, and we have more than 5 millions of them.

Here is how you can visualize a slot and see where you are, if you found the millionth word:

In [5]:
n = 1_000_000
A.plain(n)

This word is in volume 4, page 717, line 28.
You can click the passage specifier, and it will take you to the image of this page on the
Missieven site maintained by the Huygens institute.

![fragment](images/GM4-717.png)

## How to get to ...?

Suppose we want to move to volume 3, page 717.
How do we find the node that corresponds to that page?

In [6]:
p = A.nodeFromSectionStr("3 717")
p

5502010

This looks like a meaningless number, but like a barcode on a product, this is the key to all information
about a thing. What kind of thing?

In [7]:
F.otype.v(p)

'page'

We just asked for the value of the feature `otype` (object type) of node `p`, and it turned out to be a page.
In the same way we can get the page number:

In [8]:
F.n.v(p)

717

<img align="right" src="images/incantation.png" width="500"/>

Which features are defined, and what they mean is dependent on the dataset.
The dataset designer has provided metadata and documentation about features that are 
accessible wherever you work with Text-Fabric.
Just after the incantation you can expand the list of features amd click on any feature to jump to its documentation.

We can also navigate to a specific line:

In [9]:
ln = A.nodeFromSectionStr("3 717:28")
print(f"node {ln} is {F.otype.v(ln)} {F.n.v(ln)}")

node 5147608 is line 28


We can also do this in a more structured way:

In [10]:
p = T.nodeFromSection((3, 717))
p

5502010

In [11]:
ln = T.nodeFromSection((3, 717, 28))
ln

5147608

At this point, have a look at the 
[cheatsheet](https://annotation.github.io/text-fabric/tf/cheatsheet.html)
and find the documentation of these methods.

## Explore the neighbourhood

We show how to find the nodes of the lines in the page, how to print the text of those lines, and how to find the individual words.

Text-Fabric has an API called `Locality` (or simply `L`) to explore spatially related nodes.

From a node we can go `up`, `down`, `previous` and `next`. Here we go down.

In [12]:
lines = L.d(p, otype="line")
lines

(5147581,
 5147582,
 5147583,
 5147584,
 5147585,
 5147586,
 5147587,
 5147588,
 5147589,
 5147590,
 5147591,
 5147592,
 5147593,
 5147594,
 5147595,
 5147596,
 5147597,
 5147598,
 5147599,
 5147600,
 5147601,
 5147602,
 5147603,
 5147604,
 5147605,
 5147606,
 5147607,
 5147608,
 5147609,
 5147610,
 5147611,
 5147612,
 5147613,
 5147614,
 5147615,
 5147616,
 5147617,
 5147618,
 5147619,
 5147620)

# Display

Text-Fabric has a high-level display API to show textual material in various ways.

Here is a plain view.

In [13]:
for line in lines:
    A.plain(line)

We can show the text in another text format.
Formats have been defined by the dataset designer, they are not built in into Text-Fabric.
Let's see what the designer has provided in this regard:

In [14]:
T.formats

{'text-orig-full': 'word',
 'text-orig-remark': 'word',
 'text-orig-source': 'word',
 'layout-full': 'word',
 'layout-full-notes': 'word',
 'layout-remark': 'word',
 'layout-remark-notes': 'word',
 'layout-orig': 'word',
 'layout-orig-notes': 'word'}

Some formats show all text, others editorial texts only, and some show original letter content only.

The formats that start with `text-` yield plain Unicode text.

The formats that start with `layout-` deliver formatted HTML.
We have designed the layout in such a way that the text types (editorial, original) are distinguished.

The default format is `text-orig-full`.

Let's switch to `layout-full-notes`, which will also show the footnotes in place.

In [15]:
for line in lines:
    A.plain(line, fmt="layout-full-notes")

If we want to skip the remarks we can choose `layout-remark-notes`:

In [16]:
for line in lines:
    A.plain(line, fmt="layout-remark-notes")

Or, without the footnotes:

In [17]:
for line in lines:
    A.plain(line, fmt="layout-remark")

Just the original text:

In [18]:
for line in lines:
    A.plain(line, fmt="layout-orig")

# Drilling down

Lets navigate to individual words, we pick a few lines from this page we have seen in various ways.

In [19]:
ln = A.nodeFromSectionStr("3 717:33")
A.plain(ln)

words = L.d(ln, otype="word")
words

(1000027,
 1000028,
 1000029,
 1000030,
 1000031,
 1000032,
 1000033,
 1000034,
 1000035,
 1000036)

Let's make a table of the words of lines 31 - 33 and the values of some features that they carry, namely:
`trans`, `transo`, `transr`, `punc`, `remark`, `fnote`

In [20]:
features = "trans transo transr punc remark fnote".split()

table = []

for lno in range(31, 34):
    ln = T.nodeFromSection((3, 717, lno))
    for w in L.d(ln, otype="word"):
        row = tuple(Fs(feature).v(w) for feature in features)
        table.append(row)

table

[('Met', None, 'Met', ' ', 1, None),
 ('geduld', None, 'geduld', ' ', 1, None),
 ('bereikt', None, 'bereikt', ' ', 1, None),
 ('men', None, 'men', ' ', 1, None),
 ('in', None, 'in', ' ', 1, None),
 ('Perzië', None, 'Perzië', ' ', 1, None),
 ('het', None, 'het', ' ', 1, None),
 ('meest', None, 'meest', '', 1, None),
 ('', None, None, '', None, None),
 ('Seecker', 'Seecker', None, ' ', None, None),
 ('constapelsmaet',
  'constapelsmaet',
  None,
  ', ',
  None,
  '2. Uitgevallen: met.'),
 ('namen', 'namen', None, ' ', None, None),
 ('Jacob', 'Jacob', None, ' ', None, None),
 ('Arriva', 'Arriva', None, ' ', None, None),
 ('van', 'van', None, ' ', None, None),
 ('St', 'St', None, '. ', None, None),
 ('Cruis', 'Cruis', None, ', ', None, None),
 ('voor', 'voor', None, ' ', None, None),
 ('dato', 'dato', None, '', None, None)]

We can show that more prettily in a markdown table, but it is a bit of a hassle to compose
the markdown string.
Once we have that, we can pass it to a method in the Text-Fabric API that displays it as markdown.

In [21]:
NL = "\n"

mdHead = f"""
{" | ".join(features)}
{" | ".join("---" for _ in features)}
"""

mdData = "\n".join(
    f"""{" | ".join(str(c or "").replace(NL, " ") for c in row)}"""
    for row in table
)

A.dm(f"""{mdHead}{mdData}""")


trans | transo | transr | punc | remark | fnote
--- | --- | --- | --- | --- | ---
Met |  | Met |   | 1 | 
geduld |  | geduld |   | 1 | 
bereikt |  | bereikt |   | 1 | 
men |  | men |   | 1 | 
in |  | in |   | 1 | 
Perzië |  | Perzië |   | 1 | 
het |  | het |   | 1 | 
meest |  | meest |  | 1 | 
 |  |  |  |  | 
Seecker | Seecker |  |   |  | 
constapelsmaet | constapelsmaet |  | ,  |  | 2. Uitgevallen: met.
namen | namen |  |   |  | 
Jacob | Jacob |  |   |  | 
Arriva | Arriva |  |   |  | 
van | van |  |   |  | 
St | St |  | .  |  | 
Cruis | Cruis |  | ,  |  | 
voor | voor |  |   |  | 
dato | dato |  |  |  | 

Note that the dataset designer has the text strings of all words into the feature `trans`;
editorial words also go into `transr`, bit not into `transo`;
original words go into `transo`, but not into `transr`.

The existence of these features is mainly to make it possible to define the selective text formats
we have seen above.

If constructing a low level dataset is too low-level for your taste,
we can just collect a bunch of nodes and feed it to a higher-level display function of Text-Fabric:

In [22]:
table = []

for lno in range(31, 34):
    ln = T.nodeFromSection((3, 717, lno))
    for w in L.d(ln, otype="word"):
        table.append((w,))

table

[(1000018,),
 (1000019,),
 (1000020,),
 (1000021,),
 (1000022,),
 (1000023,),
 (1000024,),
 (1000025,),
 (1000026,),
 (1000027,),
 (1000028,),
 (1000029,),
 (1000030,),
 (1000031,),
 (1000032,),
 (1000033,),
 (1000034,),
 (1000035,),
 (1000036,)]

Before we ask Text-Fabric to display this, we tell it the features we're interested in.

In [23]:
A.displaySetup(extraFeatures=features)
A.show(table, condensed=True)

Where this machinery really shines is when it comes to displaying the results of queries.
See [search](search.ipynb).

---

# Next steps

By now you have an impression how to orient yourself in the Missieven dataset.
The next steps will show you how to get powerful: searching and computing.

After that it is time for collecting results, use them in new annotations and share them.

* **start** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours

CC-BY Dirk Roorda