<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Volume support

Text-Fabric 9.0.0 introduces volume support.
Read
[here](https://annotation.github.io/text-fabric/tf/about/volumes.html)
what that is and why you might want it.

In this tutorial we show the practical side:
how to *extract volumes* from works and *collect* several *volumes* into *collections*.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
from tf.app import use
from tf.fabric import Fabric
from tf.volumes import extract, collect
from tf.core.helpers import unexpanduser as ux

In [3]:
GH = os.path.expanduser("~/github")
BH = f"{GH}/clariah/wp6-missieven"
VERSION = "1.0"
SOURCE = f"{BH}/tf/{VERSION}"
TARGET = f"{BH}/tf/{VERSION}/_local"

# Work and volumes

We use the Hebrew Bible as *work*.
The *volumes* of a work are lists of its top-level sections.
Volumes may have a name.

We define three volumes out of the smallest books of the bible:

In [4]:
VOLUMES = dict(
    middle=(8,),
    beginning=(1, 2),
    end=(13, 14),
)
COLLECTION = "selected"

# Volume support

We can work with works through several TF apis:

* the usual, high-level API using `A = use(work)`.
* the basic, low-level API using `TF = Fabric(locations, modules)`
* as plain functions `extract()` and `collect()`.

We show all ways of doing it.

## Load the work

We load the corpus in the advanced way:

In [5]:
Aw = use("clariah/wp6-missieven:clone", checkout="clone")

This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

44 features found and 0 ignored


We can now extract volumes by using the `extract()` method on the `app` object
which is held in the variable `Aw`.

Note: we are going to load several volumes and collections too, so instead storing the
handle to the API in a variable with the name `A`, we choose one with the name `Aw`.

## Extract volumes

In [6]:
volumes = Aw.extract(VOLUMES, overwrite=True)

  0.00s Check volumes ...
   |   Volume middle exists and will be recreated
   |   Volume beginning exists and will be recreated
   |   Volume end exists and will be recreated
   |   Work consists of 14 volumes:
   |   volume 1                   : with   352727 slots
   |   volume 2                   : with   400901 slots
   |   volume 3                   : with   439609 slots
   |   volume 4                   : with   366207 slots
   |   volume 5                   : with   412436 slots
   |   volume 6                   : with   434227 slots
   |   volume 7                   : with   393559 slots
   |   volume 8                   : with   125206 slots
   |   volume 9                   : with   464578 slots
   |   volume 10                  : with   637655 slots
   |   volume 11                  : with   499612 slots
   |   volume 12                  : with   342699 slots
   |   volume 13                  : with   371337 slots
   |   volume 14                  : with   736607 slots
  0.

## Inspect the volumes

The `extract()` method returns basic information about the volumes:
their location on disk.

In [7]:
if volumes:
    for (name, info) in volumes.items():
        loc = info["location"]
        new = "(new)     " if info["new"] else "(existing)"
        print(f"volume {name:<7}: {new} at {ux(loc)}")
else:
    print(volumes)

volume middle : (new)      at ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
volume beginning: (new)      at ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
volume end    : (new)      at ~/github/clariah/wp6-missieven/tf/1.0/_local/end


## Load single volumes

We load the volumes separately.
For each volume we get a handle, which we store in a dictionary `As`, keyed by its name.

In [8]:
As = {}

for name in volumes:
    As[name] = use("clariah/wp6-missieven:clone", checkout="clone", volume=name)

This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

46 features found and 0 ignored
   |     0.04s T otype                from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.27s T oslots               from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.15s T transo               from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.03s T n                    from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.38s T trans                from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.02s T transn               from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.32s T punc                 from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.01s T puncn                from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0.00s T title                from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle
   |     0

This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

46 features found and 0 ignored
   |     0.18s T otype                from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     1.69s T oslots               from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     1.81s T transo               from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     0.17s T n                    from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     2.37s T trans                from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     0.22s T transn               from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     1.93s T punc                 from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     0.19s T puncn                from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning
   |     0.00s T title                from ~/github/clariah/wp6-missieven/tf/1.0/

This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

46 features found and 0 ignored
   |     0.26s T otype                from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     2.54s T oslots               from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     0.76s T transo               from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     0.27s T n                    from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     3.22s T trans                from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     0.01s T transn               from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     2.80s T punc                 from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     0.01s T puncn                from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     0.00s T title                from ~/github/clariah/wp6-missieven/tf/1.0/_local/end
   |     2.55s T transr              

We see it reported that single volumes have been loaded instead of the whole work.

The volume info can be obtained separately by reading the attribute `volumeInfo`,
either on the `A` or on the `TF` object:

In [9]:
for name in volumes:
    print(As[name].volumeInfo)

middle:8
beginning:1-2
end:13-14


## Generated features

When volumes are created, some extra features are generated, which have to do with the relation
between the original work and the volume, and what happens at the boundaries of volumes.

In [10]:
for name in volumes:
    print(name)
    for (feat, info) in As[name].isLoaded("owork ointerfrom ointerto", pretty=False).items():
        print(f"\t{feat}: {info['meta']['description']}")

middle
	owork: mapping from nodes in the volume to nodes in the work
	ointerfrom: all outgoing inter-volume edges
	ointerto: all incoming inter-volume edges
beginning
	owork: mapping from nodes in the volume to nodes in the work
	ointerfrom: all outgoing inter-volume edges
	ointerto: all incoming inter-volume edges
end
	owork: mapping from nodes in the volume to nodes in the work
	ointerfrom: all outgoing inter-volume edges
	ointerto: all incoming inter-volume edges


### owork

Note that each volume has an extra feature: `owork`. Its value for each node in a volume dataset
is the corresponding node in the *original work* from which the volume is taken.

If you use the volume to compute annotations,
and you want to publish these annotations against the original work,
the feature `owork` provides the necessary information to do so.

Suppose `annotvx` is a dict, mapping some nodes in the volume `x` to interesting values,
then you apply them to the original work as follows

``` python

{F.owork.v(n): value for (n, value) in annotvx.items}
```

There is another important function of `owork`: when collecting volumes, we may encounter nodes in the volumes
that come from a single node in the work. We want to *merge* these nodes in the collected work.
The information in `owork` provides the necessary information for that.

### ointerto, ointerfrom

Note that we do have features `ointerto` and `ointerfrom`.

They are used to store information that spans different volumes:
edges from nodes in one volume  to nodes in another volume.

## Make collections of volumes

We can collect volumes into new works by means of the `collect()` method on `Aw`.
Let's collect all volumes just created.

In [11]:
Aw.collect(
    tuple(volumes),
    COLLECTION,
    overwrite=True,
)

Collection selected exists and will be recreated
  0.00s Loading volume middle                                                       from ~/github/clariah/wp6-missieven/tf/1.0/_local/middle ...
This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

46 features found and 0 ignored
  0.09s All features loaded/computed - for details use TF.isLoaded()
  0.09s Feature overview: 43 for nodes; 2 for edges; 1 configs; 9 computed
  0.02s All additional features loaded - for details use TF.isLoaded()
  0.12s Loading volume beginning                                                    from ~/github/clariah/wp6-missieven/tf/1.0/_local/beginning ...
This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

46 features found and 0 ignored
  0.42s All features loaded/computed - for details use TF.isLoaded()
  0.43s Feature overview: 43 for nodes; 2 for edges; 1 configs; 9 computed
  0.13s All additional featur

True

## Load collection

We can load the collection in the same way as a volume, but now using `collection=`:

In [12]:
Ac = use("clariah/wp6-missieven:clone", checkout="clone", collection=COLLECTION)

This is Text-Fabric 9.4.2
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

47 features found and 0 ignored
   |     0.47s T otype                from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     4.65s T oslots               from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     2.73s T transo               from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     0.47s T n                    from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     5.85s T trans                from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     0.24s T transn               from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     5.06s T punc                 from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     0.20s T puncn                from ~/github/clariah/wp6-missieven/tf/1.0/_local/selected
   |     0.00s T title                from ~/github/clariah/wp6-missieven/tf/1.0/_local/s

Which volumes have we got?

In [13]:
Fc = Ac.api.F
Tc = Ac.api.T

for b in Fc.otype.s("volume"):
    print(Tc.sectionFromNode(b)[0])

8
1
2
13
14


There are more ways to work with volumes and collections, and there is more complexity
that is dealt with behind the scenes.
To see that at work, consult the
[volume tutorial of the Hebrew Bible](https://nbviewer.org/github/ETCBC/bhsa/blob/master/tutorial/volumes.ipynb)

# All steps

* **start** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **volumes** work with selected books only

CC-BY Dirk Roorda