<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/huc.png"/>
<img align="left" src="images/logo.png"/>

# Tutorial

This notebook gets you started with using
[Text-Fabric](https://annotation.github.io/text-fabric/) for coding in the correspondence of Suriano.

Familiarity with the underlying
[data model](https://annotation.github.io/text-fabric/tf/about/datamodel.html)
is recommended.

In [1]:
%load_ext autoreload
%autoreload 2

## Installing Text-Fabric

See [here](https://annotation.github.io/text-fabric/tf/about/install.html)

## Tip
If you start computing with this tutorial, first copy its parent directory to somewhere else,
outside your repository.
If you pull changes from the repository later, your work will not be overwritten.
Where you put your tutorial directory is up to you.
It will work from any directory.

## Suriano data

To get the Suriano data, make sure you are on a computer with a command prompt that can do git operations.

Make a directory `github.com/HuygensING/suriano` insode your home directory, then, in the command prompt,
navigate to that directory, and from there, give nthe command:

```
git clone https://github.com/HygensING/suriano
```

You are well prepared now.

# Incantation

The simplest way to get going is by this *incantation*:

In [2]:
from tf.app import use

In [4]:
A = use("HuygensING/suriano:clone", checkout="clone", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
folder,11,158047.18,100
file,725,2397.96,100
body,725,2206.0,92
text,725,2206.0,92
div,4148,737.03,176
table,243,217.58,3
teiHeader,725,191.96,8
page,8764,157.79,80
correspDesc,725,125.82,5
sourceDesc,725,51.06,2


# Named entities

There are named entities in the corpus.

## Entities and their occurrences

Every person refererred to by an entity is associated with a node of type `entity`.
Let's see how many we have.

In [5]:
results = A.search("""
entity
""")

  0.00s 786 results


Let's show the first 5.

We'll see for each entity that we show its occurrences. They can be far apart, that's why you see the dotted lines.

In [6]:
A.table(results, end=5)

n,p,entity
1,02@001:7,clarissimo Padavino clarissimo Padavino clarissimo Pada- vino clarissimo Padavinoclarissimo Pada- vino clarissimo Padavino clarissimo Padavino clarissimo signor residente Padavino clarissimo Padavinoclarissimo Padavino
2,02@001:7,Luca Tron clarissimo Tron clarissimo Tron
3,02@001:7,Lamingher
4,02@001:7,serenissimo Leopoldo serenissimo Leopoldo serenissimo Leopoldo serenissimo LeopoldoLeopold Leopold arciduca Leopoldo Leopoldo Leopoldo Leopoldo serenissimo Leopoldo LeopoldoLeopold arci- duca Leopoldo arciduca Leopoldo Leopoldo Leopoldoarciduca LeopoldoLeopoldo arciduca Leopoldo arciduca Leopoldo arciduca Leopoldo arciduca Leopoldo arciduca Leopoldo Leopolto LeopoldoLeopoldoserenissimo LeopoldoLeopoldo LeopoldoLeopoldo LeopoldoLeopoldarciduca Leopoldo Leopoldoarciduca Leopoldo Leopoldo arciduca Leopoldoserenissimo Leopoldo arciduca LeopoldoLeopoldoarciduca LeopoldoLeopoldoarciduca Leopoldo Leopoldo arciduca LeopoldoLeopoldo Leo- poldo Leopoldo Leopoldo arciduca LeopoldoLeopoldoLeopoldo Leopold arciduca LeopoldoLeopoldo Leopoldoarciduca Leopoldoarciduca Leopoldo Leopoldo LeopoldoLeo- poldo
5,02@001:7,signor governator di Milanosignor governator di Milano don Pietro di Toledo don Pietro di Toledo don Pietro di Toledo don Pietro di Toledo Don Pietro di Toledo


For each entity occurrence we have a node of type `ent`. Let's see how many we have:

In [7]:
results = A.search("""
ent
""")

  0.01s 11962 results


Again, let's have a look at the first 5:

In [8]:
A.table(results, end=5)

n,p,ent
1,02@001:7,clarissimo Padavino
2,02@001:7,Luca Tron
3,02@001:7,Lamingher
4,02@001:7,clarissimo Tron
5,02@001:7,serenissimo Leopoldo


## Most frequent entities

We make a list of the top 20 of most frequent entities

In [9]:
import collections
from tf.core.helpers import mdEsc

In [10]:
freqList = collections.Counter()

In [11]:
for entity in F.otype.s("entity"):
    occs = E.eoccs.f(entity)
    freqList[entity] = len(occs)

In [12]:
md = """
entity | frequency
:--- | ---:
"""
for (entity, freq) in sorted(
    freqList.items(),
    key=lambda x: (-x[1], x[0])
)[0:20]:
    
    md += f"{mdEsc(F.ename.v(entity))} | {freq}\n"

A.dm(md)


entity | frequency
:--- | ---:
Peter Ernst, Count of Mansfeld | 1047
Maurice of Nassau | 459
Ambrogio Spinola | 400
Giovanni Battista Pasini | 355
Johan van Oldenbarnevelt | 354
James I, king of England and Scotland | 336
Dudley Carleton, Viscount of Dorchester | 303
Frederick V of the Palatinate | 234
Christian the Younger Duke of Brunswick-Lüneburg | 213
Philip III | 211
Charles Emmanuel I, Duke of Savoy | 210
Albert of Austria and Isabella Clara Eugenia of Spain | 181
Louis XIII | 178
Bernardino Rota | 163
Johann Tserclaes, Count of Tilly | 154
Johan Seghers van Yeghem, Lord of Wassenhoven | 153
De Bausse (Beausse) | 148
Frederick-Henry of Nassau | 145
Benjamin Aubery du Maurier | 141
Christian IV of Denmark | 123


# Work in progress!

This is the bare beginnings of the tutorial, more is to come.