<img align="right" src="images/tf-small.png" width="128"/>
<img align="right" src="images/etcbc.png"/>
<img align="right" src="images/dans-small.png"/>

You might want to consider the [start](search.ipynb) of this tutorial.

Short introductions to other TF datasets:

* [Dead Sea Scrolls](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/dss.ipynb),
* [Old Babylonian Letters](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/oldbabylonian.ipynb),
or the
* [Q'uran](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/lorentz2020/quran.ipynb)


# Volume support

We show how to extract volumes from works and collect several volumes into one.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import os
from tf.app import use
from tf.fabric import Fabric
from tf.volumes import extract, collect

In [3]:
GH = os.path.expanduser("~/github")
BH = f"{GH}/etcbc/bhsa"
VERSION = "2017"
SOURCE = f"{BH}/tf/{VERSION}"
TARGET = f"{BH}/tf/{VERSION}/_local"

# Work and volumes

We use the Hebrew Bible as *work*.
The *volumes* of a work are lists of its top-level sections.
Volumes may have a name.

We define three volumes out of the smallest books of the bible:

In [4]:
VOLUMES_WRONG = dict(
    tiny=("Obadiah", "Nahum", "Haggai", "Habakkuk", "Jonah", "Micah"),
    small=("Obadiah", "Malachi", "Joel"),
    medium=("Ezra",),
)

In [5]:
VOLUMES = dict(
    tiny=("Obadiah", "Nahum", "Haggai", "Habakkuk", "Jonah", "Micah"),
    small=("Malachi", "Joel"),
    medium=("Ezra",),
)

# Volume support

We can work with datasets through two TF apis: the normal, high-level one using `TF = Fabric(locations, modules)`
and the usual, high-level one using `A = use(work)`.

Volume support is implemented in both apis (and even outside them).
We show all ways of doing it.

# Low-level API `TF=Fabric()`

We start with loading the data through `Fabric()`.

You do not have to load the work before extracting volumes, but you may do so.
The advantage of pre-loading is that after the extraction of volumes you still have
a handle to the work.

In [6]:
TFw = Fabric(locations=SOURCE)
apiw = TFw.loadAll()
apiw.makeAvailableIn(globals())

This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

115 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  3.79s All features loaded/computed - for details use TF.loadLog()
   |     0.00s Feature overview: 109 for nodes; 5 for edges; 1 configs; 8 computed
  0.00s loading features ...
  7.41s All additional features loaded - for details use TF.loadLog()


[('Computed',
  'computed-data',
  ('C Computed', 'Call AllComputeds', 'Cs ComputedString')),
 ('Features', 'edge-features', ('E Edge', 'Eall AllEdges', 'Es EdgeString')),
 ('Fabric', 'loading', ('TF',)),
 ('Locality', 'locality', ('L Locality',)),
 ('Nodes', 'navigating-nodes', ('N Nodes',)),
 ('Features',
  'node-features',
  ('F Feature', 'Fall AllFeatures', 'Fs FeatureString')),
 ('Search', 'search', ('S Search',)),
 ('Text', 'text', ('T Text',))]

## Extract volumes

We can now extract the volumes in two ways:

* by calling the `extract()` function from the outside, and passing it the `api` of the work.

In [7]:
volumes = extract(SOURCE, TARGET, VOLUMES_WRONG, api=apiw, overwrite=True)

  0.00s Check volumes ...


   |    -0.00s Section Obadiah of volume tiny reoccurs in volume small


It is not allowed to extract volumes that have material in common!

In [8]:
volumes = extract(SOURCE, TARGET, VOLUMES, api=apiw, overwrite=True)

  0.00s Check volumes ...
   |   Volume tiny exists and will be recreated
   |   Volume small exists and will be recreated
   |   Volume medium exists and will be recreated
   |   Work consists of 39 books:
   |   book Genesis             : with    28763 slots
   |   book Exodus              : with    23748 slots
   |   book Leviticus           : with    17099 slots
   |   book Numbers             : with    23188 slots
   |   book Deuteronomy         : with    20127 slots
   |   book Joshua              : with    14526 slots
   |   book Judges              : with    14085 slots
   |   book 1_Samuel            : with    18929 slots
   |   book 2_Samuel            : with    15612 slots
   |   book 1_Kings             : with    18685 slots
   |   book 2_Kings             : with    17307 slots
   |   book Isaiah              : with    22931 slots
   |   book Jeremiah            : with    29736 slots
   |   book Ezekiel             : with    26182 slots
   |   book Hosea               : wit

*   by calling the `extract()` method on the TF object.
    This way, there are fewer parameters to pass.

In [9]:
volumes = TFw.extract(VOLUMES, overwrite=True)

  0.00s Check volumes ...
   |   Volume tiny exists and will be recreated
   |   Volume small exists and will be recreated
   |   Volume medium exists and will be recreated
   |   Work consists of 39 books:
   |   book Genesis             : with    28763 slots
   |   book Exodus              : with    23748 slots
   |   book Leviticus           : with    17099 slots
   |   book Numbers             : with    23188 slots
   |   book Deuteronomy         : with    20127 slots
   |   book Joshua              : with    14526 slots
   |   book Judges              : with    14085 slots
   |   book 1_Samuel            : with    18929 slots
   |   book 2_Samuel            : with    15612 slots
   |   book 1_Kings             : with    18685 slots
   |   book 2_Kings             : with    17307 slots
   |   book Isaiah              : with    22931 slots
   |   book Jeremiah            : with    29736 slots
   |   book Ezekiel             : with    26182 slots
   |   book Hosea               : wit

## Inspect the volumes

The `extract()` function returns basic information about the volumes:

* long name (all books in the volume)
* short name (used to name its directory on disk)
* location of the volume dataset on the filesystem

In [10]:
if volumes:
    for (name, info) in volumes.items():
        loc = info["location"]
        new = "(new)     " if info["new"] else "(existing)"
        print(f"volume {name:<24}: {new} at {loc}")
else:
    print(volumes)

volume tiny                    : (new)      at /Users/dirk/github/etcbc/bhsa/tf/2017/_local/tiny
volume small                   : (new)      at /Users/dirk/github/etcbc/bhsa/tf/2017/_local/small
volume medium                  : (new)      at /Users/dirk/github/etcbc/bhsa/tf/2017/_local/medium


### owork

Note that each volume has an extra feature: `owork`. Its value for each node in a volume dataset
is the corresponding node in the *original work* from which the volume is taken.

If you use the volume to compute annotations,
and you want to publish these annotations against the original work dataset,
the feature `owork` provides the necessary information to do so.

Suppose `annotvx` is a dict mapping the some nodes in the dataset of volume `x` to interesting values,
then you apply them to the original work as follows

``` python

{F.owork.v(n): value for (n, value) in annotvx.items}
```

There is another important function of `owork`: when collecting volumes, we may encounter nodes in the volumes
that come from a single node in the work. We want to *merge* these nodes in the collected work.
The information in `owork` provides the necessary information for that.

### ointerto, ointerfrom

Note that in our volumes we do not have features `ointerfrom`, `ointerto`.

These are features that collect edge data for edges between a node inside the volume and an edge outside the volume.

In our volume, we do not have such edges.

Later on we will load the edge `crossref`, which does have inter-volume edges.

## Load all volumes

We use the result of the `extract()` function to find and load all volumes.

We now get one TF-api handle per volume.

Later we see how we can *collect* these volumes into one new work, with one TF-api handle.

In [11]:
TFs = {}
apis = {}
print("Loading all volumes")
for name in volumes:
    TFs[name] = Fabric(locations=SOURCE, volume=name, silent=True)
    apis[name] = TFs[name].loadAll(silent="deep")
    print(f"info = {TFs[name].volumeInfo}")
print("Done")

Loading all volumes
info = tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah
info = small:Malachi-Joel
info = medium:Ezra
Done


## Make collections of volumes

We can collect volumes into new works by means of the `collect()` function
Let's collect all volumes just created.

Again, we can do it in two ways:

* by calling the `collect()` function from the outside.

First we make a few calls that are deliberately problematic:

In [12]:
collect(
    (("tiny", f"{TARGET}/tiny"), ("tiny", f"{TARGET}/small")),
    f"{TARGET}/bible",
    overwrite=True,
)

Collection bible exists and will be recreated


    37s Volume tiny is already part of the collection


False

In [13]:
collect(
    (("tiny", f"{TARGET}/tiny"), ("small", f"{TARGET}/tiny")),
    f"{TARGET}/bible",
    overwrite=True,
)

    38s Volume tiny at location ~/github/etcbc/bhsa/tf/2017/_local/tiny reoccurs as volume small


False

In [14]:
collect(
    {name: info["location"] for (name, info) in volumes.items()},
    f"{TARGET}/bible",
    overwrite=True,
)

  0.00s Loading volume tiny                                                         from ~/github/etcbc/bhsa/tf/2017/_local/tiny ...
This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  0.08s All features loaded/computed - for details use TF.loadLog()
   |     0.00s Feature overview: 112 for nodes; 4 for edges; 1 configs; 8 computed
  0.00s loading features ...
  0.10s All additional features loaded - for details use TF.loadLog()
  0.20s Loading volume small                                                        from ~/github/etcbc/bhsa/tf/2017/_local/small ...
This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext

True

*   by calling the `collect()` method on the TF object.
    The volumes are passed as a tuple of their names.
    The locations of volumes and collections
    are all inside the `_local` directory of the work.

In [15]:
TFw.collect(
    tuple(volumes),
    "bible",
    overwrite=True,
)

Collection bible exists and will be recreated
  0.00s Loading volume tiny                                                         from ~/github/etcbc/bhsa/tf/2017/_local/tiny ...
This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  0.08s All features loaded/computed - for details use TF.loadLog()
   |     0.00s Feature overview: 112 for nodes; 4 for edges; 1 configs; 8 computed
  0.00s loading features ...
  0.09s All additional features loaded - for details use TF.loadLog()
  0.19s Loading volume small                                                        from ~/github/etcbc/bhsa/tf/2017/_local/small ...
This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

117 features found and 0 ignored
  0.00s loading features ...
   |     0.0

True

Let's see what we have got:

In [16]:
TFc = Fabric(locations=SOURCE, collection="bible")
apic = TFc.loadAll(silent="deep")
apic.makeAvailableIn(globals())
TFc.collectionInfo

This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

118 features found and 0 ignored


'bible:tiny,small,medium'

Which books have we got?

In [17]:
for b in F.otype.s("book"):
    print(T.sectionFromNode(b)[0])

Obadiah
Jonah
Micah
Nahum
Habakkuk
Haggai
Joel
Malachi
Ezra


In [18]:
b = T.nodeFromSection(("Obadiah",))
T.text(b, fmt="text-trans-plain")

'XZWN <BDJH KH&>MR >DNJ JHWH L>DWM CMW<H CM<NW M>T JHWH WYJR BGWJM CLX QWMW WNQWMH <LJH LMLXMH00 HNH QVN NTTJK BGWJM BZWJ >TH M>D00 ZDWN LBK HCJ>K CKNJ BXGWJ&SL< MRWM CBTW >MR BLBW MJ JWRDNJ >RY00 >M&TGBJH KNCR W>M&BJN KWKBJM FJM QNK MCM >WRJDK N>M&JHWH00 >M&GNBJM B>W&LK >M&CWDDJ LJLH >JK NDMJTH HLW> JGNBW DJM >M&BYRJM B>W LK HLW> JC>JRW <LLWT00 >JK NXPFW <FW NB<W MYPNJW00 <D&HGBWL CLXWK KL >NCJ BRJTK HCJ>WK JKLW LK >NCJ CLMK LXMK JFJMW MZWR TXTJK >JN TBWNH BW00 HLW> BJWM HHW> N>M JHWH WH>BDTJ XKMJM M>DWM WTBWNH MHR <FW00 WXTW GBWRJK TJMN LM<N JKRT&>JC MHR <FW MQVL00 MXMS >XJK J<QB TKSK BWCH WNKRT L<WLM00 BJWM <MDK MNGD BJWM CBWT ZRJM XJLW WNKRJM B>W C<RW W<L&JRWCLM JDW GWRL GM&>TH K>XD MHM00 W>L&TR> BJWM&>XJK BJWM NKRW W>L&TFMX LBNJ&JHWDH BJWM >BDM W>L&TGDL PJK BJWM YRH00 >L&TBW> BC<R&<MJ BJWM >JDM >L&TR> GM&>TH BR<TW BJWM >JDW W>L&TCLXNH BXJLW BJWM >JDW00 W>L&T<MD <L&HPRQ LHKRJT >T&PLJVJW W>L&TSGR FRJDJW BJWM YRH00 KJ&QRWB JWM&JHWH <L&KL&HGWJM K>CR <FJT J<FH LK GMLK JCWB BR>CK00 KJ K

## Check: Lexeme nodes

First we count the lexeme nodes in the original work,
as far as they occur in the books contained in the volumes.

In [19]:
books = set()
for parts in VOLUMES.values():
    books |= set(parts)
books

{'Ezra',
 'Habakkuk',
 'Haggai',
 'Joel',
 'Jonah',
 'Malachi',
 'Micah',
 'Nahum',
 'Obadiah'}

In [20]:
lexNodesWork = set()
F = apiw.F
T = apiw.T
L = apiw.L

for b in F.otype.s("book"):
    if T.sectionFromNode(b)[0] not in books:
        continue
    for w in L.d(b, otype="word"):
        lx = L.u(w, otype="lex")[0]
        lexNodesWork.add(lx)

len(lexNodesWork)

2075

Let's count the lexeme nodes in the individual volumes and add . Each volume has its own lexemes, so lexemes that occur in multiple volumes correspond to multiple nodes. We expect more lexeme nodes.

In [21]:
total = 0

for (name, TFv) in TFs.items():
    nLex = len(TFv.api.F.otype.s("lex"))
    total += nLex
    print(f"{name:<10} has {nLex:>5} lexeme nodes")
print(f"{'Total':<10}     {total:>5} lexeme nodes")

tiny       has  1173 lexeme nodes
small      has   587 lexeme nodes
medium     has   991 lexeme nodes
Total           2751 lexeme nodes


Now let's count the lexemes in the new collection.

In [22]:
lexNodesCollection = apic.F.otype.s("lex")
len(lexNodesCollection)

2075

Exactly the same amount as in the original work.

Let's make absolutely sure that we have the same lexeme set:

In [23]:
lexNodesWork == lexNodesCollection

False

Of course, because the node numbers in the original work are almost guaranteed to be different from the node numbers in the collection.

But the information attached to the nodes in the collection should be identical to the information attached to the
corresponding nodes in the work.

In [24]:
lexemesWork = {apiw.F.lex.v(n) for n in lexNodesWork}
lexemesCollection = {apic.F.lex.v(n) for n in lexNodesCollection}

In [25]:
lexemesWork == lexemesCollection

True

Another way of verifying this is to map the lexeme nodes of the collection back to those of the work
and see whether they are equal sets.

In [26]:
lexNodesCollectionToWork = {apic.F.owork.v(n) for n in lexNodesCollection}

In [27]:
lexNodesWork == lexNodesCollectionToWork

True

So we have seen that when we take a collection of volumes
the identification of lexeme nodes of the same lexeme across volumes
works out perfectly.

# High-level API `A = use()`

So far we have worked with datasets that are essentially one directory with feature files.

But if we load the BHSA with the advanced API, like `A = use("bhsa", ...)`, we also get some standard modules.

Now, if we want to split the BHSA into volumes, we also want to include these features in the volumes.

That is entirely possible, and can be done in a convenient way.

Let's first point at some interesting features and see whether they are loaded right now.

In [28]:
TFw.isLoaded(features="lex phono crossref")

{'lex': {'kind': 'node', 'type': 'str', 'edgeValues': None},
 'phono': None,
 'crossref': None}

So, `lex` is loaded, `phono` and `crossref` are not.

## Load the work

Now we load the BHSA in the advanced way:

In [29]:
Aw = use("bhsa:clone", checkout="clone", version="2017")

This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

121 features found and 0 ignored


We check that the features of interest are loaded:

In [30]:
Aw.isLoaded(features="lex phono crossref")

{'lex': {'kind': 'node', 'type': 'str', 'edgeValues': None},
 'phono': {'kind': 'node', 'type': 'str', 'edgeValues': None},
 'crossref': {'kind': 'edge', 'type': 'int', 'edgeValues': True}}

We can use the advanced API (now in `Aw`) to split the loaded work in the same volumes as before.

## Extract volumes

In [31]:
Aw.extract(VOLUMES, overwrite=True)

  0.00s Check volumes ...
   |   Volume tiny exists and will be recreated
   |   Volume small exists and will be recreated
   |   Volume medium exists and will be recreated
   |   Work consists of 39 books:
   |   book Genesis             : with    28763 slots
   |   book Exodus              : with    23748 slots
   |   book Leviticus           : with    17099 slots
   |   book Numbers             : with    23188 slots
   |   book Deuteronomy         : with    20127 slots
   |   book Joshua              : with    14526 slots
   |   book Judges              : with    14085 slots
   |   book 1_Samuel            : with    18929 slots
   |   book 2_Samuel            : with    15612 slots
   |   book 1_Kings             : with    18685 slots
   |   book 2_Kings             : with    17307 slots
   |   book Isaiah              : with    22931 slots
   |   book Jeremiah            : with    29736 slots
   |   book Ezekiel             : with    26182 slots
   |   book Hosea               : wit

{'tiny': {'location': '/Users/dirk/github/etcbc/bhsa/tf/2017/_local/tiny',
  'new': True},
 'small': {'location': '/Users/dirk/github/etcbc/bhsa/tf/2017/_local/small',
  'new': True},
 'medium': {'location': '/Users/dirk/github/etcbc/bhsa/tf/2017/_local/medium',
  'new': True}}

## Load single volumes

We load the volumes separately.

In [33]:
As = {}

for name in volumes:
    As[name] = use("bhsa:clone", checkout="clone", version="2017", volume=name)

This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored


This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored


This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored


We see it reported that a single volumes have been loaded instead of the whole work.

That volume info can be obtained separately by reading the attribute `volumeInfo`,
either on the `A` or on the `TF` object:

In [34]:
for name in volumes:
    print(As[name].volumeInfo)
    print(As[name].TF.volumeInfo)

tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah
tiny:Obadiah-Nahum-Haggai-Habakkuk-Jonah-Micah
small:Malachi-Joel
small:Malachi-Joel
medium:Ezra
medium:Ezra


## ointerto, ointerfrom

Note that we do have features `ointerto` and `ointerfrom` now:

In [35]:
for name in volumes:
    print(name)
    for feat in ("ointerfrom", "ointerto"):
        print(f"\t{feat:<10} {As[name].isLoaded(feat)}")

tiny
	ointerfrom {'ointerfrom': {'kind': 'node', 'type': 'str', 'edgeValues': None}}
	ointerto   {'ointerto': {'kind': 'node', 'type': 'str', 'edgeValues': None}}
small
	ointerfrom {'ointerfrom': {'kind': 'node', 'type': 'str', 'edgeValues': None}}
	ointerto   {'ointerto': {'kind': 'node', 'type': 'str', 'edgeValues': None}}
medium
	ointerfrom {'ointerfrom': {'kind': 'node', 'type': 'str', 'edgeValues': None}}
	ointerto   {'ointerto': {'kind': 'node', 'type': 'str', 'edgeValues': None}}


## Make collections of volumes

In [36]:
Aw.collect(
    tuple(volumes),
    "bible",
    overwrite=True,
)

Collection bible exists and will be recreated
  0.00s Loading volume tiny                                                         from ~/github/etcbc/bhsa/tf/2017/_local/tiny ...
This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
  0.09s All features loaded/computed - for details use TF.loadLog()
   |     0.00s Feature overview: 85 for nodes; 3 for edges; 2 configs; 8 computed
  0.00s loading features ...
  0.05s All additional features loaded - for details use TF.loadLog()
  0.16s Loading volume small                                                        from ~/github/etcbc/bhsa/tf/2017/_local/small ...
This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

90 features found and 0 ignored
  0.00s loading features ...
   |     0.00s 

True

## Load collection

We can load the collection analogously to a volume:

In [38]:
Ac = use("bhsa:clone", checkout="clone", version="2017", collection="bible")

This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

91 features found and 0 ignored


## Check: crossrefs

The edge feature `crossref` has inter-volume edges.

We explore the situation in the complete work, inside the volumes, and in the new collection.

We count the incoming and outgoing edges w.r.t. the nodes in the relevant material.

`crossref` edges are between verses, so we first collect all relevant verses in the original work.

We want the verses in all the books of all the volumes, and we want those verses per volume.

In [39]:
books = dict(all=set())
for (name, parts) in VOLUMES.items():
    partsSet = set(parts)
    books[name] = partsSet
    books["all"] |= partsSet
books

{'all': {'Ezra',
  'Habakkuk',
  'Haggai',
  'Joel',
  'Jonah',
  'Malachi',
  'Micah',
  'Nahum',
  'Obadiah'},
 'tiny': {'Habakkuk', 'Haggai', 'Jonah', 'Micah', 'Nahum', 'Obadiah'},
 'small': {'Joel', 'Malachi'},
 'medium': {'Ezra'}}

In [40]:
verseNodesWork = {}
F = Aw.api.F
T = Aw.api.T
L = Aw.api.L

for (name, heads) in books.items():
    for b in F.otype.s("book"):
        if T.sectionFromNode(b)[0] not in heads:
            continue
        for vs in L.d(b, otype="verse"):
            verseNodesWork.setdefault(name, set()).add(vs)

for (name, verses) in verseNodesWork.items():
    print(f"{name:<10} {len(verses):>3} verses")

all        723 verses
tiny       315 verses
small      128 verses
medium     280 verses


### Compute edges from the work data

Now we determine the number of incoming and outgoing edges w.r.t. these portions,
and we split them into *inter*-portion and *intra*-portion edges.

In [41]:
E = Aw.api.E

incomingWorkTotal = {}
incomingWorkIntra = {}
incomingWorkInter = {}
outgoingWorkTotal = {}
outgoingWorkIntra = {}
outgoingWorkInter = {}

for (name, verses) in verseNodesWork.items():
    inct = set()
    inca = set()
    incr = set()
    ougt = set()
    ouga = set()
    ougr = set()

    for vs in verses:
        wvs = E.crossref.t(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                inct.add((ws, vs))
                if ws in verses:
                    inca.add((ws, vs))
                else:
                    incr.add((ws, vs))

        wvs = E.crossref.f(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                ougt.add((vs, ws))
                if ws in verses:
                    ouga.add((vs, ws))
                else:
                    ougr.add((vs, ws))
    incomingWorkTotal[name] = inct
    incomingWorkIntra[name] = inca
    incomingWorkInter[name] = incr
    outgoingWorkTotal[name] = ougt
    outgoingWorkIntra[name] = ouga
    outgoingWorkInter[name] = ougr

for name in verseNodesWork:
    print(f"{name:<10}: total: incoming: {len(incomingWorkTotal[name]):>3} outgoing: {len(outgoingWorkTotal[name]):>3}")
    print(f"{name:<10}: intra: incoming: {len(incomingWorkIntra[name]):>3} outgoing: {len(outgoingWorkIntra[name]):>3}")
    print(f"{name:<10}: inter: incoming: {len(incomingWorkInter[name]):>3} outgoing: {len(outgoingWorkInter[name]):>3}")

all       : total: incoming: 400 outgoing: 400
all       : intra: incoming:  64 outgoing:  64
all       : inter: incoming: 336 outgoing: 336
tiny      : total: incoming: 245 outgoing: 245
tiny      : intra: incoming:   8 outgoing:   8
tiny      : inter: incoming: 237 outgoing: 237
small     : total: incoming:   3 outgoing:   3
small     : intra: incoming:   0 outgoing:   0
small     : inter: incoming:   3 outgoing:   3
medium    : total: incoming: 152 outgoing: 152
medium    : intra: incoming:  56 outgoing:  56
medium    : inter: incoming:  96 outgoing:  96


Ah, the `crossref` edges are symmetric, so there are as many incoming as outgoing edges.

### Compute edges from the volume data

We only see the intra edges, they should coincide with the `incomingWorkIntra[volume]` edges.

First the number of edges:

In [42]:
incomingVolumeTotal = {}
outgoingVolumeTotal = {}

for name in volumes:
    A = As[name]
    F = A.api.F
    E = A.api.E

    verses = F.otype.s("verse")
    inct = set()
    ougt = set()

    for vs in verses:
        wvs = E.crossref.t(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                inct.add((ws, vs))

        wvs = E.crossref.f(vs)
        if wvs:
            for wv in wvs:
                ws = wv[0]
                ougt.add((vs, ws))
    incomingVolumeTotal[name] = inct
    outgoingVolumeTotal[name] = ougt

We have gathered the data.

Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping.

In [43]:
for name in volumes:
    A = As[name]
    F = A.api.F

    inVolTotal = incomingVolumeTotal[name]
    outVolTotal = outgoingVolumeTotal[name]
    inWorkIntra = incomingWorkIntra[name]
    outWorkIntra = outgoingWorkIntra[name]

    print(f"{name:<10}: total: incoming: {len(inVolTotal):>3} outgoing: {len(outVolTotal):>3}")
    eqamountIncoming = len(inWorkIntra) == len(inVolTotal)
    eqamountOutgoing = len(outWorkIntra) == len(outVolTotal)
    print(f"equal amount of incoming inter-edges as in work? {eqamountIncoming}")
    print(f"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}")
    inVolToWork = {(F.owork.v(f), F.owork.v(t)) for (f, t) in inVolTotal}
    outVolToWork = {(F.owork.v(f), F.owork.v(t)) for (f, t) in outVolTotal}
    sameIncoming = inWorkIntra == inVolToWork
    sameOutgoing = outWorkIntra == outVolToWork
    print(f"same incoming inter-edges as in work? {sameIncoming}")
    print(f"same outgoing inter-edges as in work? {sameOutgoing}")

tiny      : total: incoming:   8 outgoing:   8
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True
small     : total: incoming:   0 outgoing:   0
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True
medium    : total: incoming:  56 outgoing:  56
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True


### Compute edges from collection data

The final test is whether the collection has the right edges.
When the collection was created, inter-volume edges have been added on the basis of the `ointerto` and `ointerfrom` features
in the individual volumes.

Now we check whether that went well.

In [44]:
F = Ac.api.F
E = Ac.api.E

verses = F.otype.s("verse")
inct = set()
ougt = set()

for vs in verses:
    wvs = E.crossref.t(vs)
    if wvs:
        for wv in wvs:
            ws = wv[0]
            inct.add((ws, vs))

    wvs = E.crossref.f(vs)
    if wvs:
        for wv in wvs:
            ws = wv[0]
            ougt.add((vs, ws))

incomingCollectionTotal = inct
outgoingCollectionTotal = ougt

We have gathered the data.

Now we make the comparisons, first comparing number of edges, and then identity of edges, modulo mapping.

In [45]:
inColTotal = incomingCollectionTotal
outColTotal = outgoingCollectionTotal
inWorkIntra = incomingWorkIntra["all"]
outWorkIntra = outgoingWorkIntra["all"]

print(f"collection: total: incoming: {len(inColTotal):>3} outgoing: {len(outColTotal):>3}")
eqamountIncoming = len(inWorkIntra) == len(inColTotal)
eqamountOutgoing = len(outWorkIntra) == len(outColTotal)
print(f"equal amount of incoming inter-edges as in work? {eqamountIncoming}")
print(f"equal amount of outgoing inter-edges as in work? {eqamountOutgoing}")
inColToWork = {(F.owork.v(f), F.owork.v(t)) for (f, t) in inColTotal}
outColToWork = {(F.owork.v(f), F.owork.v(t)) for (f, t) in outColTotal}
sameIncoming = inWorkIntra == inColToWork
sameOutgoing = outWorkIntra == outColToWork
print(f"same incoming inter-edges as in work? {sameIncoming}")
print(f"same outgoing inter-edges as in work? {sameOutgoing}")

collection: total: incoming:  64 outgoing:  64
equal amount of incoming inter-edges as in work? True
equal amount of outgoing inter-edges as in work? True
same incoming inter-edges as in work? True
same outgoing inter-edges as in work? True


# Success!

The collection of inter-volume edges works!

# All steps

* **[start](start.ipynb)** your first step in mastering the bible computationally
* **[display](display.ipynb)** become an expert in creating pretty displays of your text structures
* **[search](search.ipynb)** turbo charge your hand-coding with search templates
* **[exportExcel](exportExcel.ipynb)** make tailor-made spreadsheets out of your results
* **[share](share.ipynb)** draw in other people's data and let them use yours
* **[export](export.ipynb)** export your dataset as an Emdros database
* **[annotate](annotate.ipynb)** annotate plain text by means of other tools and import the annotations as TF features
* **volumes** work with selected books only
* **[trees](trees.ipynb)** work with the BHSA data as syntax trees

CC-BY Dirk Roorda