<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Sharing data features

This tutorial is a companion to the Text-Fabric
[documentation on data sharing](https://annotation.github.io/text-fabric/tf/about/datasharing.html).

## Explore additional data
The ETCBC has a few other repositories with data that work in conjunction with the BHSA data.
One of them you have already seen:
[phono](https://github.com/ETCBC/phono),
for phonetic transcriptions.
There is also
[parallels](https://github.com/ETCBC/parallels)
for detecting parallel passages,
and
[valence](https://github.com/ETCBC/valence)
for studying patterns around verbs that determine their meanings.

## Make your own data
If you study the additional data, you can observe how that data is created and also
how it is turned into a text-fabric data module.
The last step is incredibly easy. You can write out every Python dictionary where the keys are numbers
and the values string or numbers as a Text-Fabric feature.
When you are creating data, you have already constructed those dictionaries, so writing
them out is just one method call.
See for example how the
[flowchart](https://nbviewer.jupyter.org/github/etcbc/valence/blob/master/programs/flowchart.ipynb#Add-sense-feature-to-valence-module)
notebook in valence writes out verb sense data.

## Share your new data
You can then easily share your new features on GitHub, so that your colleagues everywhere
can try it out for themselves.

Here is how you draw in other data, for example

* [etcbc/valence/tf](https://github.com/etcbc/valence) :
  the results of the *verbal valence* work of Janet Dyk in the SYNVAR project;
* [etcbc/lingo/heads/tf](https://github.com/etcbc/lingo/tree/master/heads) :
  head words for phrases, work done by Cody Kingham;
* [ch-jensen/participants/actor/tf](https://github.com/ch-jensen/participants) :
  participant analysis in progress by Christian Høygaard-Jensen;
* [cmerwich/bh-reference-system/tf](https://github.com/cmerwich/bh-reference-system):
  participant analysis in progress by Christiaan Erwich;
* or whatever you have in the making!

You can add such data on the fly, by passing a `mod={org}/{repo}/{path}` parameter,
or a bunch of them separated by commas, or packed in a list or tuple.

If the data is there, it will be auto-downloaded and stored on your machine.

Let's do it.

In [1]:
%load_ext autoreload
%autoreload 2

# Incantation

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are
explained in the [start tutorial](start.ipynb).

In [2]:
from tf.app import use

First we are going to include the work of Cody Kingham on
[heads of phrases](https://nbviewer.org/github/ETCBC/lingo/blob/master/heads/Heads2TF.ipynb) and some earlier work
by Janet Dyk and Dirk Roorda on [verbal valence](https://github.com/etcbc/valence).

In [4]:
A = use('ETCBC/bhsa', mod="etcbc/lingo/heads/tf,etcbc/valence/tf", hoist=globals())

You see that the features from the **etcbc/valence/tf** and **etcbc/lingo/heads/tf** modules have been added to the mix.

## ETCBC Valence

Click the triangle before **etcbc/valence/tf** to see what features have been contributed.

Note that edge features are in **_bold italic_**.

Let's find out more about *sense*.

You can start with clicking the triangle afte "sense str" above.
It tells you where the feature comes from, and it shows you the context where it has been constructed.
You might go there to see additional documentation.

But we can also dive directly into its data:

In [5]:
F.sense.freqList()

(('--', 17941),
 ('d-', 9975),
 ('-p', 6537),
 ('-i', 3604),
 ('-c', 3231),
 ('dp', 1899),
 ('dc', 1002),
 ('di', 918),
 ('l.', 876),
 ('i.', 630),
 ('n.', 532),
 ('-b', 64),
 ('db', 61),
 ('c.', 57),
 ('k.', 54))

Which nodes have a sense feature?

In [6]:
{F.otype.v(n) for n in N.walk() if F.sense.v(n)}

{'word'}

In [7]:
results = A.search(
    """
word sense
"""
)

  0.14s 47381 results


Let's show some of the rarer sense values:

In [8]:
results = A.search(
    """
word sense=k.
"""
)

  0.15s 54 results


In [9]:
A.table(results, end=5)

n,p,word
1,Genesis 4:17,יִּקְרָא֙
2,Genesis 13:16,שַׂמְתִּ֥י
3,Genesis 32:13,שַׂמְתִּ֤י
4,Genesis 34:31,יַעֲשֶׂ֖ה
5,Genesis 48:20,יְשִֽׂמְךָ֣


If we do a pretty display, the `sense` feature shows up.

In [10]:
A.show(results, start=1, end=1, withNodes=True)

## Lingo heads
If you click the triangle before **etcbc/lingo/heads/tf** you see what features it contributes.
Unfortunately, the authors have not provided a description of this feature, but if you click
on the triangle after *heads* none, you see where the feature comes from and who has made it.

Moreover, the fact that *heads* is in italics makes clear that it is an edge feature.

Let's use it in a query:
Now, `heads` is an edge feature, we cannot directly make it visible in pretty displays, but we can use it in queries.

We also want to make the feature `sense` visible, so we mention the feature in the query, without restricting the results.

In [11]:
results = A.search(
    """
book book=Genesis
  chapter chapter=1
    clause
      phrase
      -heads> word sense*
"""
)

  0.40s 402 results


In [12]:
A.show(results, start=1, end=2)

Note how the words that are **_heads_** of their phrases are highlighted within their phrases.

# Participants

Now we are going to add another promising module, provided by Christian Canu Højgaard, from this repo:
[participants](https://github.com/ch-jensen/participants).

Let's do it in the straightforward way:

In [13]:
A = use(
    'ETCBC/bhsa',
    mod=(
        "ETCBC/lingo/heads/tf",
        "ETCBC/valence/tf",
        "ch-jensen/participants/actor/tf"
    ),
    hoist=globals(),
)

The requested data is not available offline
	~/text-fabric-data/github/ch-jensen/participants/actor/tf/2021 not found
No directory actor/tf/2021 in #9671910a329c069cfd3d366526ea816de57666dcWill try something else
	Failed

No directory actor/tf/2021 in #9671910a329c069cfd3d366526ea816de57666dc	Failed

There was an error loading TF-app etcbc/bhsa from ~/text-fabric-data/github/etcbc/bhsa/app
AttributeError("'TfApp' object has no attribute 'TF'")
Traceback (most recent call last):
  File "/Users/me/github/annotation/text-fabric/tf/advanced/app.py", line 542, in findApp
    app = appClass(
  File "/Users/me/text-fabric-data/github/etcbc/bhsa/app/app.py", line 6, in __init__
    super().__init__(*args, **kwargs)
  File "/Users/me/github/annotation/text-fabric/tf/advanced/app.py", line 178, in __init__
    volumesApi(self)
  File "/Users/me/github/annotation/text-fabric/tf/advanced/volumes.py", line 39, in volumesApi
    TF = app.TF
AttributeError: 'TfApp' object has no attribute 'TF'
Text-Fabric is not loaded


The features are not there!

If we have a look on Github in this repo we see under
[actor/tf](https://github.com/ch-jensen/participants/tree/master/actor/tf)
the directory `c` only. Christian has produced his features against version `c` of the BHSA.

Ok, then we go back, and run our command for version `c`.

In [14]:
A = use(
    'ETCBC/bhsa',
    version="c",
    mod=(
        "ETCBC/lingo/heads/tf",
        "ETCBC/valence/tf",
        "ch-jensen/participants/actor/tf"
    ),
    hoist=globals(),
)

While this succeeded, there are scenoarios where you have more trouble.
For example, you decide that you really, really need the bhsa data as in release 1.7.1.

Then you discover that this does note work:

```
A = use(
    'etcbc/bhsa',
    version="c",
    checkout="v1.7.1",
    mod=("etcbc/lingo/heads/tf" ,"etcbc/valence/tf", "ch-jensen/participants/actor/tf"), 
    hoist=globals(),
)
```

because the BHSA invokes two standard modules, `etcbc/phono/tf` and `etcbc/parallels/tf` and if you go to their
GitHub repos, you see that they do not have a release `v1.7.1`.
You have to walk through their releases and find one with the right data version.
Having found them, you can then get it all like this:

```
A = use(
    'etcbc/bhsa',
    version="c",
    checkout="v1.7.1",
    mod=(
        "etcbc/phono/tf:1.2",
        "etcbc/parallels/tf:v1.2",
        "etcbc/lingo/heads/tf",
        "etcbc/valence/tf",
        "ch-jensen/participants/actor/tf",
    ),
    hoist=globals(),
)
```

## Semantic actors

Let's find out about *actor*.

Again, we can click on the triangles and see information about the features.
Christian has provided descriptions in the metadata of the features.

And we can look into the data itself.

In [15]:
fl = F.actor.freqList()
len(fl)

415

In [16]:
fl[0:10]

(('JHWH', 358),
 ('BN JFR>L', 205),
 ('>JC', 101),
 ('2sm"YOUSgmas"', 67),
 ('MCH', 60),
 ('>RY', 58),
 ('>TM', 45),
 ('>X "YOUSgmas"', 36),
 ('JFR>L', 35),
 ('KHN', 33))

Which nodes have an actor feature?

In [17]:
{F.otype.v(n) for n in N.walk() if F.actor.v(n)}

{'phrase_atom', 'subphrase'}

In [18]:
results = A.search(
    """
phrase_atom actor
"""
)

  0.08s 2062 results


Let's show some of the rarer actor values:

In [19]:
results = A.search(
    """
phrase_atom actor=KHN
"""
)

  0.10s 30 results


In [20]:
A.table(results)

n,p,phrase_atom
1,Leviticus 17:5,אֶל־הַכֹּהֵ֑ן
2,Leviticus 17:6,זָרַ֨ק
3,Leviticus 17:6,הַכֹּהֵ֤ן
4,Leviticus 17:6,הִקְטִ֣יר
5,Leviticus 19:22,כִפֶּר֩
6,Leviticus 19:22,הַכֹּהֵ֜ן
7,Leviticus 21:1,אֶל־הַכֹּהֲנִ֖ים
8,Leviticus 21:1,בְּנֵ֣י אַהֲרֹ֑ן
9,Leviticus 21:5,יִקְרְח֤וּ
10,Leviticus 21:5,יְגַלֵּ֑חוּ


In [21]:
A.show(results, start=1, end=1)

We see no highlights!
That is because phrase atoms are hidden by default. So let's unhide:

In [22]:
A.displaySetup(hiddenTypes="subphrase clause_atom sentence_atom half_verse")

The next calls to `show()` will work as if `hiddenTypes="subphrase clause_atom sentence_atom half_verse"` is passed to them. 

In [23]:
A.show(results, start=1, end=1)

We make the feature `sense` from the valence module visible:

In [24]:
A.show(results, start=1, end=3, withNodes=True, extraFeatures="sense")

# All together!

Here is a query that shows results with all features.

In [25]:
results = A.search(
    """
book book=Leviticus
  phrase sense*
    phrase_atom actor=KHN
  -heads> word
"""
)

  0.39s 30 results


In [26]:
A.displaySetup(
    condensed=True,
    condenseType="verse",
    hiddenTypes="subphrase clause_atom sentence_atom half_verse",
)
A.show(results, start=8, end=8)
A.displaySetup()

## Exercise

See whether you can find the quote in the Easter egg that is in
`etcbc/lingo/easter/tf` !

# All steps

* **[start](start.ipynb)** your first step in mastering the bible computationally
* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **share** draw in other people's data and let them use yours
* **[export](export.ipynb)** export your dataset as an Emdros database
* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features
* **[map](map.ipynb)** map somebody else's annotations to a new version of the corpus
* **[volumes](volumes.ipynb)** work with selected books only
* **[trees](trees.ipynb)** work with the BHSA data as syntax trees

CC-BY Dirk Roorda