<img align="right" src="images/tf.png" width="200"/>
<img align="right" src="images/huc.png" width="200"/>
<img align="right" src="images/logo.png" width="200"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Volume support

Text-Fabric 9.0.0 introduces volume support.
Read
[here](https://annotation.github.io/text-fabric/tf/about/volumes.html)
what that is and why you might want it.

In this tutorial we show the practical side:
how to *extract volumes* from works and *collect* several *volumes* into *collections*.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
from tf.app import use
from tf.fabric import Fabric
# from tf.volumes import extract, collect
# from tf.core.helpers import unexpanduser as ux

In [3]:
GH = os.path.expanduser("~/github")
BH = f"{GH}/clariah/wp6-missieven"
VERSION = "1.0"
SOURCE = f"{BH}/tf/{VERSION}"
TARGET = f"{BH}/tf/{VERSION}/_local"

# Work volumes, and collections

We use the General Missives as *work*.

The *volumes* of a work are its top-level sections.

We can bundle volumes into *collections*.

What this chunking of a work means in Text-Fabric is explained in the
[docs](https://annotation.github.io/text-fabric/tf/about/volumes.html).

## Load the work

We load the corpus in the usual way:

In [23]:
Aw = use("clariah/wp6-missieven")

Note: we are going to load several volumes and collections too, so instead storing the
handle to the API in a variable with the name `A`, we choose one with the name `Aw`.
And for the same reason, we do not use the `hoist=globals()` argument, so that we do not
pollute our namespace.

## Extract volumes

We can now extract volumes:

In [32]:
Aw.extract(overwrite=True, show=True)

1                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/1
2                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/2
3                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/3
4                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/4
5                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/5
6                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/6
7                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/7
8                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/8
9                    (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/9
10                   (new) @ ~/text-fabric-data/github/clariah/wp6-missieven/tf/1.0/_local/10
11                   (new) @ ~/text-fabric-data/github/clariah/wp6-mi

## Load a single volume

In [33]:
A3 = use("clariah/wp6-missieven", volume=3)

We see it reported that a single volume has been loaded instead of the whole work.

The volume info can be obtained separately by reading the attribute `volumeInfo`:

In [34]:
print(A3.volumeInfo)

3


## Generated features

When volumes are created, some extra features are generated, which have to do with the relation
between the original work and the volume, and what happens at the boundaries of volumes.

In [35]:
for (feat, info) in A3.isLoaded("owork ointerfrom ointerto", pretty=False).items():
    print(f"\t{feat}: {info['meta']['description']}")

	owork: mapping from nodes in the volume to nodes in the work
	ointerfrom: all outgoing inter-volume edges
	ointerto: all incoming inter-volume edges


### owork

Note that each volume has an extra feature: `owork`. Its value for each node in a volume dataset
is the corresponding node in the *original work* from which the volume is taken.

If you use the volume to compute annotations,
and you want to publish these annotations against the original work,
the feature `owork` provides the necessary information to do so.

Suppose `annotvx` is a dict, mapping some nodes in the volume `x` to interesting values,
then you apply them to the original work as follows

``` python

{F.owork.v(n): value for (n, value) in annotvx.items}
```

There is another important function of `owork`: when collecting volumes, we may encounter nodes in the volumes
that come from a single node in the work. We want to *merge* these nodes in the collected work.
The information in `owork` provides the necessary information for that.

### ointerto, ointerfrom

Note that we do have features `ointerto` and `ointerfrom`.

They are used to store information that spans different volumes:
edges from nodes in one volume  to nodes in another volume.

## Make collections of volumes

We can collect volumes into new works by means of the `collect()` method on `Aw`.

We define three collections out of the volumes of the General Missives:

In [36]:
COLLECTIONS = dict(
    middle=(8,),
    beginning=(1, 2),
    end=(13, 14),
)

In [37]:
for (name, volumes) in COLLECTIONS.items():
    Aw.collect(
        volumes,
        name,
        overwrite=True,
)

## Load collection

We can load the collection in the same way as a volume, but now using `collection=`:

In [38]:
Ab = use("clariah/wp6-missieven", collection="beginning")

Which volumes have we got?

In [40]:
for b in Ab.api.F.otype.s("volume"):
    print(Ab.api.T.sectionFromNode(b)[0])

1
2


There are more ways to work with volumes and collections, and there is more complexity
that is dealt with behind the scenes.
To see that at work, consult the
[volume tutorial of the Hebrew Bible](https://nbviewer.org/github/ETCBC/bhsa/blob/master/tutorial/volumes.ipynb)

# All steps

* **start** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[entities](entities.ipynb)** use results of third-party NER (named entity recognition)
* **volumes** work with selected books only

CC-BY Dirk Roorda