<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Volume support

Text-Fabric 9.0.0 introduces volume support.
Read
[here](https://annotation.github.io/text-fabric/tf/about/volumes.html)
what that is and why you might want it.

In this tutorial we show the practical side:
how to *extract volumes* from works and *collect* several *volumes* into *collections*.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
from tf.app import use
from tf.fabric import Fabric
from tf.volumes import extract, collect
from tf.core.helpers import unexpanduser as ux

In [3]:
GH = os.path.expanduser("~/github")
BH = f"{GH}/etcbc/bhsa"
VERSION = "2021"
SOURCE = f"{BH}/tf/{VERSION}"
TARGET = f"{BH}/tf/{VERSION}/_local"

# Work and volumes

We use the Hebrew Bible as *work*.
The *volumes* of a work are lists of its top-level sections.
Volumes may have a name.

We define three volumes out of the smallest books of the bible:

In [4]:
VOLUMES = dict(
    tiny=("Obadiah", "Nahum", "Haggai", "Habakkuk", "Jonah", "Micah"),
    small=("Malachi", "Joel"),
    medium=("Ezra",),
)
COLLECTION = "prophets"

# Volume support

We can work with works through several TF apis:

* the usual, high-level API using `A = use(work)`.
* the basic, low-level API using `TF = Fabric(locations, modules)`
* as plain functions `extract()` and `collect()`.

We show all ways of doing it.

# High-level API `A = use()`

If we load the BHSA with the advanced API, like `A = use("bhsa", ...)`, we also get some standard modules,
such as `phono` for phonological transcription and `parallels` for cross-references betweem similar passages.

We see that when we split the BHSA into volumes we also get these features into the volumes.

## Load the work

We load the BHSA in the advanced way:

In [5]:
Aw = use("bhsa:clone", checkout="clone")

This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

121 features found and 0 ignored


We check that the features of interest are loaded:

In [6]:
Aw.isLoaded(features="lex phono crossref")

crossref             edge (int) ~/github/etcbc/parallels/tf/2021
lex                  node (str) ~/github/etcbc/bhsa/tf/2021
phono                node (str) ~/github/etcbc/phono/tf/2021


We can now extract volumes by using the `extract()` method on the `app` object
which is held in the variable `Aw`.

Note: we are going to load several volumes and collections too, so instead storing the
handle to the API in a variable with the name `A`, we choose one with the name `Aw`.

## Extract volumes

In [7]:
volumes = Aw.extract(VOLUMES, overwrite=True)

  0.00s Check volumes ...
   |   Volume tiny exists and will be recreated
   |   Volume small exists and will be recreated
   |   Volume medium exists and will be recreated
   |   Work consists of 39 books:
   |   book Genesis             : with    28764 slots
   |   book Exodus              : with    23748 slots
   |   book Leviticus           : with    17099 slots
   |   book Numbers             : with    23188 slots
   |   book Deuteronomy         : with    20128 slots
   |   book Joshua              : with    14526 slots
   |   book Judges              : with    14086 slots
   |   book 1_Samuel            : with    18929 slots
   |   book 2_Samuel            : with    15612 slots
   |   book 1_Kings             : with    18685 slots
   |   book 2_Kings             : with    17307 slots
   |   book Isaiah              : with    22931 slots
   |   book Jeremiah            : with    29736 slots
   |   book Ezekiel             : with    26182 slots
   |   book Hosea               : wit

## Inspect the volumes

The `extract()` method returns basic information about the volumes:
their location on disk.

In [8]:
if volumes:
    for (name, info) in volumes.items():
        loc = info["location"]
        new = "(new)     " if info["new"] else "(existing)"
        print(f"volume {name:<7}: {new} at {ux(loc)}")
else:
    print(volumes)

volume tiny   : (new)      at ~/github/etcbc/bhsa/tf/2021/_local/tiny
volume small  : (new)      at ~/github/etcbc/bhsa/tf/2021/_local/small
volume medium : (new)      at ~/github/etcbc/bhsa/tf/2021/_local/medium


## Load single volumes

We load the volumes separately.
For each volume we get a handle, which we store in a dictionary `As`, keyed by its name.

In [11]:
As = {}

for name in volumes:
    As[name] = use("bhsa:clone", checkout="clone", version="2021", volume=name)

This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored


This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored


This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored


We see it reported that single volumes have been loaded instead of the whole work.

The volume info can be obtained separately by reading the attribute `volumeInfo`,
either on the `A` or on the `TF` object:

In [12]:
for name in volumes:
    print(As[name].volumeInfo)

tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah
small:Malachi-Joel
medium:Ezra


## Generated features

When volumes are created, some extra features are generated, which have to do with the relation
between the original work and the volume, and what happens at the boundaries of volumes.

In [13]:
for name in volumes:
    print(name)
    for (feat, info) in As[name].isLoaded("owork ointerfrom ointerto", pretty=False).items():
        print(f"\t{feat}: {info}")

tiny
	owork: {'kind': 'node', 'type': 'int', 'source': '~/github/etcbc/bhsa/tf/2021/_local/tiny', 'edgeValues': None}
	ointerfrom: {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/tiny', 'edgeValues': None}
	ointerto: {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/tiny', 'edgeValues': None}
small
	owork: {'kind': 'node', 'type': 'int', 'source': '~/github/etcbc/bhsa/tf/2021/_local/small', 'edgeValues': None}
	ointerfrom: {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/small', 'edgeValues': None}
	ointerto: {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/small', 'edgeValues': None}
medium
	owork: {'kind': 'node', 'type': 'int', 'source': '~/github/etcbc/bhsa/tf/2021/_local/medium', 'edgeValues': None}
	ointerfrom: {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/medium', 'edgeValues': None}
	ointerto: {'kind': 'node', 'type': 'str', 'source

### owork

Note that each volume has an extra feature: `owork`. Its value for each node in a volume dataset
is the corresponding node in the *original work* from which the volume is taken.

If you use the volume to compute annotations,
and you want to publish these annotations against the original work,
the feature `owork` provides the necessary information to do so.

Suppose `annotvx` is a dict, mapping some nodes in the volume `x` to interesting values,
then you apply them to the original work as follows

``` python

{F.owork.v(n): value for (n, value) in annotvx.items}
```

There is another important function of `owork`: when collecting volumes, we may encounter nodes in the volumes
that come from a single node in the work. We want to *merge* these nodes in the collected work.
The information in `owork` provides the necessary information for that.

### ointerto, ointerfrom

Note that we do have features `ointerto` and `ointerfrom`.

We'll come back to them later.

## Make collections of volumes

We can collect volumes into new works by means of the `collect()` method on `Aw`.
Let's collect all volumes just created.

In [14]:
Aw.collect(
    tuple(volumes),
    COLLECTION,
    overwrite=True,
)

Collection prophets exists and will be recreated
  0.00s Loading volume tiny                                                         from ~/github/etcbc/bhsa/tf/2021/_local/tiny ...
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  0.09s All features loaded/computed - for details use TF.isLoaded()
   |     0.00s Feature overview: 85 for nodes; 3 for edges; 2 configs; 8 computed
  0.00s loading features ...
  0.05s All additional features loaded - for details use TF.isLoaded()
  0.16s Loading volume small                                                        from ~/github/etcbc/bhsa/tf/2021/_local/small ...
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored
  0.00s loading features ...
   |     0

True

## Load collection

We can load the collection in the same way as a volume, but now using `collection=`:

In [17]:
Ac = use("bhsa:clone", checkout="clone", version="2021", collection=COLLECTION)

This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

91 features found and 0 ignored


Which books have we got?

In [18]:
Fc = Ac.api.F
Tc = Ac.api.T

for b in Fc.otype.s("book"):
    print(Tc.sectionFromNode(b)[0])

Obadiah
Jonah
Micah
Nahum
Habakkuk
Haggai
Joel
Malachi
Ezra


In [19]:
b = Tc.nodeFromSection(("Obadiah",))
Tc.text(b, fmt="text-trans-plain")

'XZWN <BDJH KH&>MR >DNJ JHWH L>DWM CMW<H CM<NW M>T JHWH WYJR BGWJM CLX QWMW WNQWMH <LJH LMLXMH00 HNH QVN NTTJK BGWJM BZWJ >TH M>D00 ZDWN LBK HCJ>K CKNJ BXGWJ&SL< MRWM CBTW >MR BLBW MJ JWRDNJ >RY00 >M&TGBJH KNCR W>M&BJN KWKBJM FJM QNK MCM >WRJDK N>M&JHWH00 >M&GNBJM B>W&LK >M&CWDDJ LJLH >JK NDMJTH HLW> JGNBW DJM >M&BYRJM B>W LK HLW> JC>JRW <LLWT00 >JK NXPFW <FW NB<W MYPNJW00 <D&HGBWL CLXWK KL >NCJ BRJTK HCJ>WK JKLW LK >NCJ CLMK LXMK JFJMW MZWR TXTJK >JN TBWNH BW00 HLW> BJWM HHW> N>M JHWH WH>BDTJ XKMJM M>DWM WTBWNH MHR <FW00 WXTW GBWRJK TJMN LM<N JKRT&>JC MHR <FW MQVL00 MXMS >XJK J<QB TKSK BWCH WNKRT L<WLM00 BJWM <MDK MNGD BJWM CBWT ZRJM XJLW WNKRJM B>W C<RW W<L&JRWCLM JDW GWRL GM&>TH K>XD MHM00 W>L&TR> BJWM&>XJK BJWM NKRW W>L&TFMX LBNJ&JHWDH BJWM >BDM W>L&TGDL PJK BJWM YRH00 >L&TBW> BC<R&<MJ BJWM >JDM >L&TR> GM&>TH BR<TW BJWM >JDW W>L&TCLXNH BXJLW BJWM >JDW00 W>L&T<MD <L&HPRQ LHKRJT >T&PLJVJW W>L&TSGR FRJDJW BJWM YRH00 KJ&QRWB JWM&JHWH <L&KL&HGWJM K>CR <FJT J<FH LK GMLK JCWB BR>CK00 KJ K

## Check: Lexeme nodes

First we count the lexeme nodes in the original work,
as far as they occur in the books contained in the volumes.

In [20]:
books = set()
for parts in VOLUMES.values():
    books |= set(parts)
books

{'Ezra',
 'Habakkuk',
 'Haggai',
 'Joel',
 'Jonah',
 'Malachi',
 'Micah',
 'Nahum',
 'Obadiah'}

In [21]:
lexNodesWork = set()
Fw = Aw.api.F
Tw = Aw.api.T
Lw = Aw.api.L

for b in Fw.otype.s("book"):
    if Tw.sectionFromNode(b)[0] not in books:
        continue
    for w in Lw.d(b, otype="word"):
        lx = Lw.u(w, otype="lex")[0]
        lexNodesWork.add(lx)

len(lexNodesWork)

2075

Let's count the lexeme nodes in the individual volumes and add up the numbers.
Each volume has its own lexemes, so lexemes that occur in multiple volumes correspond to multiple nodes.
We expect more lexeme nodes.

In [22]:
total = 0

for (name, Av) in As.items():
    Fv = Av.api.F

    nLex = len(Fv.otype.s("lex"))
    total += nLex
    print(f"{name:<10} has {nLex:>5} lexeme nodes")
print(f"{'Total':<10}     {total:>5} lexeme nodes")

tiny       has  1173 lexeme nodes
small      has   587 lexeme nodes
medium     has   991 lexeme nodes
Total           2751 lexeme nodes


Now let's count the lexemes in the new collection.

In [23]:
lexNodesCollection = Fc.otype.s("lex")
len(lexNodesCollection)

2075

Exactly the same amount as in the original work.

Let's make absolutely sure that we have the same lexeme set:

In [24]:
lexNodesWork == lexNodesCollection

False

Of course, because the node numbers in the original work are almost guaranteed to be different from the node numbers in the collection.

But the information attached to the nodes in the collection should be identical to the information attached to the
corresponding nodes in the work.

In [25]:
lexemesWork = {Fw.lex.v(n) for n in lexNodesWork}
lexemesCollection = {Fc.lex.v(n) for n in lexNodesCollection}

In [26]:
lexemesWork == lexemesCollection

True

Another way of verifying this is to map the lexeme nodes of the collection back to those of the work
and see whether they are equal sets.

In [27]:
lexNodesCollectionToWork = {Fc.owork.v(n) for n in lexNodesCollection}

In [28]:
lexNodesWork == lexNodesCollectionToWork

True

## Check: crossrefs

The edge feature `crossref` has inter-volume edges.

We explore the situation in the original work, inside the volumes, and in the new collection.

We count the incoming and outgoing edges w.r.t. the nodes in the relevant material.

`crossref` edges are between verses, so we first collect all relevant verses in the original work.

We want the verses in all the books of all the volumes, and we want those verses per volume.

In [29]:
books = dict(all=set())
for (name, parts) in VOLUMES.items():
    partsSet = set(parts)
    books[name] = partsSet
    books["all"] |= partsSet
books

{'all': {'Ezra',
  'Habakkuk',
  'Haggai',
  'Joel',
  'Jonah',
  'Malachi',
  'Micah',
  'Nahum',
  'Obadiah'},
 'tiny': {'Habakkuk', 'Haggai', 'Jonah', 'Micah', 'Nahum', 'Obadiah'},
 'small': {'Joel', 'Malachi'},
 'medium': {'Ezra'}}

In [30]:
verseNodesWork = {}

for (name, heads) in books.items():
    for b in Fw.otype.s("book"):
        if Tw.sectionFromNode(b)[0] not in heads:
            continue
        for vs in Lw.d(b, otype="verse"):
            verseNodesWork.setdefault(name, set()).add(vs)

for (name, verses) in verseNodesWork.items():
    print(f"{name:<10} {len(verses):>3} verses")

all        723 verses
tiny       315 verses
small      128 verses
medium     280 verses


### Compute edges from the work data

Now we determine the number of incoming and outgoing edges w.r.t. these portions,
and we split them into *inter*-portion and *intra*-portion edges.

In [31]:
Ew = Aw.api.E

incomingWorkTotal = {}
incomingWorkIntra = {}
incomingWorkInter = {}
outgoingWorkTotal = {}
outgoingWorkIntra = {}
outgoingWorkInter = {}

for (name, verses) in verseNodesWork.items():
    inct = set()
    inca = set()
    incr = set()
    ougt = set()
    ouga = set()
    ougr = set()

    for vs in verses:
        wvs = Ew.crossref.t(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                inct.add((ws, vs))
                if ws in verses:
                    inca.add((ws, vs))
                else:
                    incr.add((ws, vs))

        wvs = Ew.crossref.f(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                ougt.add((vs, ws))
                if ws in verses:
                    ouga.add((vs, ws))
                else:
                    ougr.add((vs, ws))
    incomingWorkTotal[name] = inct
    incomingWorkIntra[name] = inca
    incomingWorkInter[name] = incr
    outgoingWorkTotal[name] = ougt
    outgoingWorkIntra[name] = ouga
    outgoingWorkInter[name] = ougr

for name in verseNodesWork:
    print(f"{name:<10}: total: incoming: {len(incomingWorkTotal[name]):>3} outgoing: {len(outgoingWorkTotal[name]):>3}")
    print(f"{name:<10}: intra: incoming: {len(incomingWorkIntra[name]):>3} outgoing: {len(outgoingWorkIntra[name]):>3}")
    print(f"{name:<10}: inter: incoming: {len(incomingWorkInter[name]):>3} outgoing: {len(outgoingWorkInter[name]):>3}")

all       : total: incoming: 400 outgoing: 400
all       : intra: incoming:  64 outgoing:  64
all       : inter: incoming: 336 outgoing: 336
tiny      : total: incoming: 245 outgoing: 245
tiny      : intra: incoming:   8 outgoing:   8
tiny      : inter: incoming: 237 outgoing: 237
small     : total: incoming:   3 outgoing:   3
small     : intra: incoming:   0 outgoing:   0
small     : inter: incoming:   3 outgoing:   3
medium    : total: incoming: 152 outgoing: 152
medium    : intra: incoming:  56 outgoing:  56
medium    : inter: incoming:  96 outgoing:  96


Ah, the `crossref` edges are symmetric, so there are as many incoming as outgoing edges.

### Compute edges from the volume data

We only see the intra edges, they should coincide with the `incomingWorkIntra[volume]` edges.

First the number of edges:

In [32]:
incomingVolumeTotal = {}
outgoingVolumeTotal = {}

for name in volumes:
    Av = As[name]
    Fv = Av.api.F
    Ev = Av.api.E

    verses = Fv.otype.s("verse")
    inct = set()
    ougt = set()

    for vs in verses:
        wvs = Ev.crossref.t(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                inct.add((ws, vs))

        wvs = Ev.crossref.f(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                ougt.add((vs, ws))
    incomingVolumeTotal[name] = inct
    outgoingVolumeTotal[name] = ougt

We have gathered the data.

Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping.

In [33]:
for name in volumes:
    Av = As[name]
    Fv = Av.api.F

    inVolTotal = incomingVolumeTotal[name]
    outVolTotal = outgoingVolumeTotal[name]
    inWorkIntra = incomingWorkIntra[name]
    outWorkIntra = outgoingWorkIntra[name]

    print(f"{name:<10}: total: incoming: {len(inVolTotal):>3} outgoing: {len(outVolTotal):>3}")
    eqamountIncoming = len(inWorkIntra) == len(inVolTotal)
    eqamountOutgoing = len(outWorkIntra) == len(outVolTotal)
    print(f"equal amount of incoming inter-edges as in work? {eqamountIncoming}")
    print(f"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}")
    inVolToWork = {(Fv.owork.v(f), Fv.owork.v(t)) for (f, t) in inVolTotal}
    outVolToWork = {(Fv.owork.v(f), Fv.owork.v(t)) for (f, t) in outVolTotal}
    sameIncoming = inWorkIntra == inVolToWork
    sameOutgoing = outWorkIntra == outVolToWork
    print(f"same incoming inter-edges as in work? {sameIncoming}")
    print(f"same outgoing inter-edges as in work? {sameOutgoing}")

tiny      : total: incoming:   8 outgoing:   8
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True
small     : total: incoming:   0 outgoing:   0
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True
medium    : total: incoming:  56 outgoing:  56
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True


### Compute edges from collection data

The final test is whether the collection has the right edges.
When the collection was created, inter-volume edges have been added on the basis of the `ointerto` and `ointerfrom` features
in the individual volumes.

Now we check whether that went well.

In [34]:
Ec = Ac.api.E

verses = Fc.otype.s("verse")
inct = set()
ougt = set()

for vs in verses:
    wvs = Ec.crossref.t(vs)
    if wvs:
        for wv in wvs:
            ws = wv[0]
            inct.add((ws, vs))

    wvs = Ec.crossref.f(vs)
    if wvs:
        for wv in wvs:
            ws = wv[0]
            ougt.add((vs, ws))

incomingCollectionTotal = inct
outgoingCollectionTotal = ougt

We have gathered the data.

Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping.

In [35]:
inColTotal = incomingCollectionTotal
outColTotal = outgoingCollectionTotal
inWorkIntra = incomingWorkIntra["all"]
outWorkIntra = outgoingWorkIntra["all"]

print(f"collection: total: incoming: {len(inColTotal):>3} outgoing: {len(outColTotal):>3}")
eqamountIncoming = len(inWorkIntra) == len(inColTotal)
eqamountOutgoing = len(outWorkIntra) == len(outColTotal)
print(f"equal amount of incoming inter-edges as in work? {eqamountIncoming}")
print(f"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}")
inColToWork = {(Fc.owork.v(f), Fc.owork.v(t)) for (f, t) in inColTotal}
outColToWork = {(Fc.owork.v(f), Fc.owork.v(t)) for (f, t) in outColTotal}
sameIncoming = inWorkIntra == inColToWork
sameOutgoing = outWorkIntra == outColToWork
print(f"same incoming inter-edges as in work? {sameIncoming}")
print(f"same outgoing inter-edges as in work? {sameOutgoing}")

collection: total: incoming:  64 outgoing:  64
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True


# Success!

We have seen that when we take a collection of volumes
the identification of lexeme nodes of the same lexeme across volumes
works out perfectly.

The collection of inter-volume edges works!

# Low-level API `TF=Fabric()`

We now load the data through `Fabric()`.

You do not have to load the work before extracting volumes, but you may do so.
The advantage of pre-loading is that after the extraction of volumes you still have
a handle to the work.

In [36]:
TFw = Fabric(locations=SOURCE)
apiw = TFw.loadAll()
apiw.makeAvailableIn(globals())

This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

115 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  3.51s All features loaded/computed - for details use TF.isLoaded()
   |     0.00s Feature overview: 109 for nodes; 5 for edges; 1 configs; 8 computed
  0.00s loading features ...
  7.90s All additional features loaded - for details use TF.isLoaded()


[('Computed',
  'computed-data',
  ('C Computed', 'Call AllComputeds', 'Cs ComputedString')),
 ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')),
 ('Fabric', 'loading', ('TF',)),
 ('Locality', 'locality', ('L Locality',)),
 ('Nodes', 'navigating-nodes', ('N Nodes',)),
 ('Features',
  'node-features',
  ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')),
 ('Search', 'search', ('S Search',)),
 ('Text', 'text', ('T Text',))]

## Extract

We use the same specification as before.

In [37]:
volumes = TFw.extract(VOLUMES, overwrite=True)

  0.00s Check volumes ...
   |   Volume tiny exists and will be recreated
   |   Volume small exists and will be recreated
   |   Volume medium exists and will be recreated
   |   Work consists of 39 books:
   |   book Genesis             : with    28764 slots
   |   book Exodus              : with    23748 slots
   |   book Leviticus           : with    17099 slots
   |   book Numbers             : with    23188 slots
   |   book Deuteronomy         : with    20128 slots
   |   book Joshua              : with    14526 slots
   |   book Judges              : with    14086 slots
   |   book 1_Samuel            : with    18929 slots
   |   book 2_Samuel            : with    15612 slots
   |   book 1_Kings             : with    18685 slots
   |   book 2_Kings             : with    17307 slots
   |   book Isaiah              : with    22931 slots
   |   book Jeremiah            : with    29736 slots
   |   book Ezekiel             : with    26182 slots
   |   book Hosea               : wit

## Inspect

In [38]:
TFs = {}

for name in volumes:
    TFs[name] = Fabric(locations=SOURCE, volume=name)
    TFs[name].loadAll(silent="deep")

This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored


In [39]:
for name in volumes:
    TFv = TFs[name]
    Fsv = TFv.api.Fs
    print(TFv.volumeInfo)
    for (feat, info) in TFv.isLoaded("owork ointerfrom ointerto", pretty=False).items():
        n = 0
        for x in Fsv(feat).items():
            n += 1
        print(f"  {feat:<10}: {n:>7} values\n    {info}")

tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah
  owork     :   21779 values
    {'kind': 'node', 'type': 'int', 'source': '~/github/etcbc/bhsa/tf/2021/_local/tiny', 'edgeValues': None}
  ointerfrom:       0 values
    {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/tiny', 'edgeValues': None}
  ointerto  :       0 values
    {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/tiny', 'edgeValues': None}
small:Malachi-Joel
  owork     :    9495 values
    {'kind': 'node', 'type': 'int', 'source': '~/github/etcbc/bhsa/tf/2021/_local/small', 'edgeValues': None}
  ointerfrom:       0 values
    {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/small', 'edgeValues': None}
  ointerto  :       0 values
    {'kind': 'node', 'type': 'str', 'source': '~/github/etcbc/bhsa/tf/2021/_local/small', 'edgeValues': None}
medium:Ezra
  owork     :   17286 values
    {'kind': 'node', 'type': 'int', 'source': '~/github/etcbc/

### ointerto, ointerfrom

Note that in our volumes the features `ointerfrom`, `ointerto` are empty.

These are features that collect edge data for edges between a node inside the volume and an edge outside the volume.

In our work, we do not have such edges, because we did not load the parallels module explicitly,
and the `Fabric(locations, modules)` function only looks in directories specified in its `locations` and `modules` parameters.

## Collect

We used the same collection specification as before.

In [40]:
TFw.collect(
    tuple(volumes),
    COLLECTION,
    overwrite=True,
)

Collection prophets exists and will be recreated
  0.00s Loading volume tiny                                                         from ~/github/etcbc/bhsa/tf/2021/_local/tiny ...
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  0.09s All features loaded/computed - for details use TF.isLoaded()
   |     0.00s Feature overview: 112 for nodes; 4 for edges; 1 configs; 8 computed
  0.00s loading features ...
  0.10s All additional features loaded - for details use TF.isLoaded()
  0.21s Loading volume small                                                        from ~/github/etcbc/bhsa/tf/2021/_local/small ...
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |   

True

### Load collection

In [41]:
TFc = Fabric(locations=SOURCE, collection=COLLECTION)
TFc.loadAll(silent="deep")

This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

118 features found and 0 ignored


<tf.core.api.Api at 0x7fe4b58020a0>

## Lowest level: plain functions

We can pass the location of a work, or the api to the loaded features of a work.
We do the latter here.

In [42]:
VOLUMES_WRONG = dict(
    tiny=("Obadiah", "Nahum", "Haggai", "Habakkuk", "Jonah", "Micah"),
    small=("Obadiah", "Malachi", "Joel"),
    medium=("Ezra",),
)

This will turn out to be wrong because there is a book that occurs in several volumes.

In [43]:
volumes = extract(SOURCE, TARGET, VOLUMES_WRONG, api=apiw, overwrite=True)

  0.00s Check volumes ...


   |       27s Section Obadiah of volume tiny reoccurs in volume small


It is not allowed to extract volumes that have material in common!

In [44]:
volumes = extract(SOURCE, TARGET, VOLUMES, api=apiw, overwrite=True)

  0.00s Check volumes ...
   |   Volume tiny exists and will be recreated
   |   Volume small exists and will be recreated
   |   Volume medium exists and will be recreated
   |   Work consists of 39 books:
   |   book Genesis             : with    28764 slots
   |   book Exodus              : with    23748 slots
   |   book Leviticus           : with    17099 slots
   |   book Numbers             : with    23188 slots
   |   book Deuteronomy         : with    20128 slots
   |   book Joshua              : with    14526 slots
   |   book Judges              : with    14086 slots
   |   book 1_Samuel            : with    18929 slots
   |   book 2_Samuel            : with    15612 slots
   |   book 1_Kings             : with    18685 slots
   |   book 2_Kings             : with    17307 slots
   |   book Isaiah              : with    22931 slots
   |   book Jeremiah            : with    29736 slots
   |   book Ezekiel             : with    26182 slots
   |   book Hosea               : wit

Now we make the same collection as before, but first we make a few deliberate mistakes.

In [45]:
collect(
    (("tiny", f"{TARGET}/tiny"), ("tiny", f"{TARGET}/small")),
    f"{TARGET}/bible",
    overwrite=True,
)

    22s Volume tiny is already part of the collection


False

In [46]:
collect(
    (("tiny", f"{TARGET}/tiny"), ("small", f"{TARGET}/tiny")),
    f"{TARGET}/bible",
    overwrite=True,
)

    24s Volume tiny at location ~/github/etcbc/bhsa/tf/2021/_local/tiny reoccurs as volume small


False

In [48]:
collect(
    {name: info["location"] for (name, info) in volumes.items()},
    f"{TARGET}/bible",
    overwrite=True,
)

Collection bible exists and will be recreated
  0.00s Loading volume tiny                                                         from ~/github/etcbc/bhsa/tf/2021/_local/tiny ...
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  0.09s All features loaded/computed - for details use TF.isLoaded()
   |     0.00s Feature overview: 112 for nodes; 4 for edges; 1 configs; 8 computed
  0.00s loading features ...
  0.09s All additional features loaded - for details use TF.isLoaded()
  0.20s Loading volume small                                                        from ~/github/etcbc/bhsa/tf/2021/_local/small ...
This is Text-Fabric 9.0.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0

True

# All steps

* **start** start computing with this corpus
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[compute](compute.ipynb)** sink down a level and compute it yourself
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[annotate](annotate.ipynb)** export text, annotate with BRAT, import annotations
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **volumes** work with selected books only

CC-BY Dirk Roorda