<img align="right" src="images/dans-small.png"/>
<img align="right" src="images/tf-small.png"/>
<img align="right" src="images/etcbc.png"/>


# BHSA version mappings

In this notebook we map the nodes between the all the extant versions of the BHSA dataset.

The resulting mappings can be used for writing version independent programs that process
the BHSA data.
Those programs can only be version independent to a certain extent, because
in general, node mappings between versions cannot be made perfect.

If one imagines what may change between versions, it seems intractable to make a device that overcomes
differences in the encoding of the texts and its syntax.
However, we are dealing with versions of a very stable text, that is linguistically annotated by means
of a consistent method, so there is reason to be optimistic.
This notebook shows that this optimism is well-founded.

In another notebook,
[versionPhrases](versionPhrases.ipynb)
we show how one can use the mappings to analyze phrase encodings across versions of the data.

# Overview
We create the mappings in two distinct stages, each being based on a particular insight, and dealing with
a set of difficult cases.

* [Slot nodes](#Slot-nodes): we restrict ourselves to the *slot* nodes,
  the nodes that correspond to the individual words;
* [Nodes in general](#Nodes-in-general): we extend the slot mapping in a generic way to
  a mapping between all nodes.
  Those other nodes are the ones that correspond to higher level textual objects, such as phrases, clauses,
  sentences.

This is a big notebook, here are links to some locations in the computation.

* [start of the computation](#Computing)
* [start of making slot mappings](#Making-slot-mappings)
* [start of expanding them to node mappings](#Extending-to-node-mappings)

## Nodes, edges, mappings

In the
[text-fabric data model](https://github.com/Dans-labs/text-fabric/wiki/Data-model),
nodes correspond to the objects in the text and its syntax, and edges correspond to relationships between
those objects.
Normally, these edges are **intra**-dataset, they are between nodes in the same dataset.

Now, each version of the BHSA in text-fabric is its own dataset.
The mappings between nodes of one version and corresponding nodes in another version are
**inter**-dataset edges.

Nodes in text-fabric are abstract, they are just numbers,
starting with 1 for the first slot (word),
increasing by one for each slot up to the last slot,
and then just continuing beyond that for the non-slot nodes.

So an edge is just a mapping between numbers, and it is perfectly possible to have just any mapping
between numbers in a dataset.

We store mappings as ordinary TF edge features, so you can use the mapping in both ways, by

```
nodesInVersion4 = Es('omap@3-4').f(nodeInVersion3)
nodesInVersion3 = Es('omap@3-4').t(nodeInVersion4)
```

respectively.
When one version supersedes another, we store the mapping between the older and newer version
as an edge in the new version, leaving the older version untouched.

We store the node mapping with a bit more information than the mere correspondence between nodes.
We also add an integer to each correspondence which indicates how problematic that correspondence is.

If the correspondence is perfect, we do not add any value.
If it is a simple discrepancy, confined to an equal amount of slots in both versions, we add value `0`.
If the discrepancy is more complicated, we add a higher number.
The details of this will follow.

# Slot nodes

The basic idea in creating a slot mapping is to walk through the slots of both versions in parallel,
and upon encountering a difference, to take one of a few prescribed actions, that may lead to catching up
slots in one of the two versions.

The standard behaviour is to stop at each difference encountered, unless the difference conforms
to a "predefined" case. When there is no match, the user may add a case to the list of cases.
It might be needed to add a different systematic kind of case, and for that programming is needed.

This notebook shows the patterns and the very small lists of cases that were needed to do the job for 4
version transitions, each corresponding to 1 year or more of encoding activity.

## Differences

When we compare versions, our aim is not to record all differences in general, but to record
the correspondence between the slots of the versions, and exactly where and how this
correspondence is disturbed.

We use the lexeme concept as an anchor point for the correspondence.
If we compare the two versions, slot by slot, and as long as we encounter the same lexemes,
we have an undisturbed correspondence.
In fact, we relax this a little bit, because the very concept of lexeme might change between versions.
So we reduce the information in lexemes considerably, before we compare them, so that we
do not get disturbed by petty changes.

While being undisturbed, we just create an edge between the slot in the one version that we are at,
to the node in the other version that we are at,
and we assign no value to such an edge.

But eventually, we encounter real disturbances.
They manifest themselves in just a few situations:

1. ![1](diffs/diffs.001.png)
2. ![2](diffs/diffs.002.png)
3. ![3](diffs/diffs.003.png)

In full generality, we can say:
$i$ slots in the source $V$ version correspond
to $j$ slots in the target version $W$,
where $i$ and $j$ may be 0, but not at the same time:

1. ![4](diffs/diffs.004.png)

If $i$ slots in version $V$, starting at $n$
get replaced by $j$ slots in the version $W$, starting at $m$,
we create edges between all $n, ..., n+i-1$ on the one hand
and all $m, ..., m+j-1$ on the other hand,
and associate them all with the same number $j-i$.

But so far, it turns out that the only things we have to deal with,
are specific instances of 1, 2, and 3 above.

We have a closer look at those cases.

### Lexeme change
When a lexeme changes at a particular spot $n, m$,
we have $i=j=1$, leading to exactly one edge $(n, m)$ with value $0$.

### Slot splitting
When slot $n\in V$ splits into $m, ..., m+j \in W$, we create edges from $n$ to each of the $m, ..., m+j$,
each carrying the number $j$. The larger $j$ is,
the greater the dissimilarity between node $n\in V$
and each of the $m, ..., m+j \in W$.

### Slot collapse
When slots $n, ..., n+i \in V$ collapse into $m\in W$, we create edges from each of the $n, ..., n+i$ to $m$,
each carrying the number $j$. The larger $j$ is,
the greater the dissimilarity between the nodes $n, ..., n+i\in V$
and $m \in W$.

### Slot deletion
When slot $n$ is deleted from $V$, we have $i=1, j=0$, leading to zero edges from $n$.
But so far, we have not encountered this case.

### Slot addition
When slot $m$ is added to $W$, we have $i=0, j=1$, again leading to zero edges to $m$.
But so far, we have not encountered this case.

# Nodes in general
The basic idea we use for the general case is that that nodes are linked to slots.
In text-fabric, the standard `oslots` edge feature lists for each non-slot node the slots it is linked to.

Combining the just created slot mappings between versions and the `oslots` feature,
we can extend the slot mapping into a general node mapping.

In order to map a node $n$ in version $V$, we look at its slots $s$,
use the already established *slot mapping* to map these to slots $t$ in version $W$,
and collect the nodes $m$ in version $W$ that are linked to those $t$.
They are good candidates for the mapping.

![5](diffs/diffs.005.png)

# Refinements

When we try to match nodes across versions, based on slot containment, we also respect
their `otype`s. So we will not try to match a `clause` to a `phrase`.
We make implicit use of the fact that for most `otype`s, the members contain disjoint slots.

# Multiple candidates
Of course, things are not always as neat as in the diagram. Textual objects may have split, or shifted,
or collapsed.
In general we find 0 or more candidates.
Even if we find exactly one candidate, it does not have to be a perfect match.
A typical situation is this:

![6](diffs/diffs.006.png)

We do not find a node $m\in W$ that occupies the mapped slots exactly.
Instead, we find that the target area is split between two candidates who
also reach outside the target area.

In such cases, we make edges to all such candidates, but we add a dissimilarity measure.
If $m$ is the collection of slots, mapped from $n$, and $m_1$ is a candidate for $n$, meaning $m_1$ has
overlap with $m$, then the *dissimilarity* of $m_1$ is defined as:

$$|m_1\cup m| - |m_1\cap m|$$

In words: the number of slots in the union of $m_1$ and $m$ minus the number of slots in their intersection.

In other words: $m_1$ gets a penalty for

* each slot $s\in m_1$ that is not in the mapped slots $m$;
* each mapped slot $t\in m$ that is not in $m_1$.

If a candidate occupies exactly the mapped slots, the dissimilarity is 0.
If there is only one such candidate of the right type, the case is completely clear, and we
do not add a dissimilarity value to the edge.

If there are more candidates, all of them will get an edge, and those edges will contain the dissimilarity
value, even if that value is $0$.


# Subphrases
The most difficult type to handle in our dataset is the `subphrase`,
because they nest and overlap.
But it turns out that the similarity measure almost always helps out: when looking for candidates
for a mapped subphrase, usually one of them has a dissimilarity of 0.
That's the real counterpart.

# Reporting
We report the success in establishing the match between non-slot nodes.
We do so per node type, and for each node type we list a few statistics,
both in absolute numbers and in percentage of the total amount of nodes of that
type in the source version.

We count the nodes that fall in each of the following cases.
The list of cases is ordered by decreasing success of the mapping.

1. **unique, perfect**: there is only one match for the mapping and it is a perfect one in terms
   of slots linked to it;
2. **multiple, one perfect**: there are multiple matches, but at least one is perfect; this occurs
   typically if nodes of a type are linked to nested and overlapping sequences of slots, such as `subphrase`s;
3. **unique, imperfect**: there is only one match, but it is not perfect; this indicates that some
   boundary reorganization has happened between the two versions, and that some slots of the source node
   have been cut off in the target node; yet the fact that the source node and the
   target node correspond is clear;
4. **multiple, cleanly composed**: in this case the source node corresponds to a bunch of matches, that
   together cleanly cover the mapped slots of the source node; in other words: the original node
   has been split in several parts;
5. **multiple, non-perfect**: all remaining cases where there are matches; these situations can be the
   result of more intrusive changes; if it turns out to be a small set they do require closer inspection;
6. **not mapped**: these are nodes for which no match could be found.

# Computing

In [2]:
import os
import collections
from functools import reduce
from utils import caption
from tf.fabric import Fabric

We specify our versions and the subtle differences between them as far as they are relevant.

In [23]:
REPO = os.path.expanduser("~/github/etcbc/bhsa")
baseDir = "{}/tf".format(REPO)
tempDir = "{}/_temp".format(REPO)

versions = """
    3
    4
    4b
    2016
    2017
    c
""".strip().split()

# work only with selected versions
# remove this if you want to work with all versions
versions = """
    c
    2021
""".strip().split()

versionInfo = {
    "": dict(
        OCC="g_word",
        LEX="lex",
    ),
    "3": dict(
        OCC="text_plain",
        LEX="lexeme",
    ),
}

Load all versions in one go!

In [24]:
TF = {}
api = {}
for v in versions:
    for (param, value) in versionInfo.get(v, versionInfo[""]).items():
        globals()[param] = value
    caption(4, "Version -> {} <- loading ...".format(v))
    TF[v] = Fabric(locations="{}/{}".format(baseDir, v), modules=[""])
    api[v] = TF[v].load("{} {}".format(OCC, LEX))  # noqa F821
caption(4, "All versions loaded")

..............................................................................................
.      3m 01s Version -> c <- loading ...                                                    .
..............................................................................................
This is Text-Fabric 9.1.6
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

114 features found and 0 ignored
  0.00s loading features ...
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
    17s All features loaded/computed - for details use TF.isLoaded()
..............................................................................................
.      3m 18s Version -> 2021 <- loading ...                                                 .
..............................................................................................
This is Text-Fabric 9.1.6
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

We want to switch easily between the APIs for the versions.

In [25]:
def activate(v):
    for (param, value) in versionInfo.get(v, versionInfo[""]).items():
        globals()[param] = value
    api[v].makeAvailableIn(globals())
    caption(4, "Active version is now -> {} <-".format(v))

Inspect the amount of slots in all versions.

In [26]:
nSlots = {}
for v in versions:
    activate(v)
    nSlots[v] = F.otype.maxSlot
    caption(0, "\t {} slots".format(nSlots[v]))

..............................................................................................
.      3m 43s Active version is now -> c <-                                                  .
..............................................................................................
|      3m 43s 	 426584 slots
..............................................................................................
.      3m 43s Active version is now -> 2021 <-                                               .
..............................................................................................
|      3m 43s 	 426590 slots


# Method

When we compare two versions, we inspect the lexemes found at corresponding positions in the versions.
We start at the beginning, and when the lexemes do not match, we have a closer look.

However, in order not to be disturbed by minor discrepancies in the lexemes, we mask the lexemes: we
apply a few transformations to it, such as removing alephs and wavs, and finally even turning them into
ordered sets of letters, thereby loosing the order and multiplicity of letter.
We also strip the disambiguation marks.

We maintain a current mapping between the slots of the two versions, and we update it if we encounter
disturbances.
Initially, this map is the identity map.

What we encounter as remaining differences boils down to the following:

* a lexeme is split into two lexemes with the same total material, typically involving `H`, `MN`, or `B`
* the lexeme is part of a special case, listed in the `cases` table (which has been found by repeatedly
  chasing for the first remaining difference.
* the both lexemes differ, but that's it, no map updates have to be done.

The first two types of cases can be solved by splitting a lexeme into `k` parts or combining `k` lexemes into one.
After that the mapping has to be shifted to the right or to the left from a certain point onwards.

The loop then is as follows:

* find the first slot with a lexeme in the first version that is different from the lexeme at the mapped slot
  in the second version
* analyze what is the case:
  * if the disturbance is recognized on the basis of existing patterns and cases, update the map and
    consider this case solved
  * if the disturbance is not recognized, the case is unsolved, and we break out of the loop.
    More analysis is needed, and the outcome of that has to be coded as an extra pattern or case.
* if the status is solved, go back to the first step

We end up with a mapping from the slots of the first version to those of the other version that links
slots with approximately equal lexemes together.

# Making slot mappings
## Lexeme masking
We start by defining our masking function, and compile lists of all lexemes and masked lexemes for all versions.

In [27]:
masks = [
    (lambda lex: lex.rstrip("[/="), "strip disambiguation"),
    (lambda lex: lex[0:-2] if lex.endswith("JM") else lex, "remove JM"),
    (lambda lex: lex[0:-2] if lex.endswith("WT") else lex, "remove WT"),
    (lambda lex: lex.replace("J", ""), "remove J"),
    (lambda lex: lex.replace(">", ""), "remove Alef"),
    (lambda lex: lex.replace("W", ""), "remove W"),
    (lambda lex: lex.replace("Z", "N"), "identify Z and N"),
    (lambda lex: lex.rstrip("HT"), "strip HT"),
    (
        lambda lex: ("".join(sorted(set(set(lex))))) + "_" * lex.count("_"),
        "ignore order and multiplicity",
    ),
]


def mask(lex, trans=None):
    """Apply a single masking operation or apply them all.
    
    Parameters
    ----------
    lex: string
        The text of the lexem
    trans: integer, optional `None`
        If given, it is an index in the `masks` list, and the corresponding
        mask transformation will be applied to `lex`.
        If `None`, all transformations in the `masks` list will be applied in that order.
        
    Returns
    -------
    string
        The result of transforming `lex`
    """
    if trans is not None:
        return masks[trans][0](lex)
    for (fun, desc) in masks:
        lex = fun(lex)
    return lex

Carry out the lexeme masking for all versions.

In [28]:
lexemes = {}

caption(4, "Masking lexemes")
for v in versions:
    activate(v)
    lexemes[v] = collections.OrderedDict()
    for n in F.otype.s("word"):
        lex = Fs(LEX).v(n)  # noqa F821
        lexemes[v][n] = (lex, mask(lex, trans=0), mask(lex))
caption(0, "Done")

..............................................................................................
.      3m 48s Masking lexemes                                                                .
..............................................................................................
..............................................................................................
.      3m 48s Active version is now -> c <-                                                  .
..............................................................................................
..............................................................................................
.      3m 50s Active version is now -> 2021 <-                                               .
..............................................................................................
|      3m 51s Done


Now for each version `v`, `lexemes[v]` is a mapping from word nodes `n` 
to lexeme information of the word at node `n`.
The lexeme information is a tuple with members

*   **fullLex** the full disambiguated lexeme
*   **lex** the lexeme without the disambiguation marks
*   **maskedLex** the fully transformed lexeme

# Cases and mappings
In `cases` we store special cases that we stumbled upon.
Every time we encountered a disturbance which did not follow a recognized pattern,
we turned it into a case.
The number is the slot number in the first version where the case will be applied.
Cases will only be applied at these exact slot numbers and nowhere else.

In `mappings` we build a mapping between corresponding nodes across a pair of versions.
At some of those correspondences there are disturbances, there we add a measure of the
dissimilarity to the mapped pair.

Later, we extend those slot mappings to *node* mappings, which are maps between versions where
*all* nodes get mapped, not just slot nodes.
We deliver those node mappings as formal edges in TF.
Then these edges will be added in the second version, so that each newer version knows
how to link to the previous version.
We build the node maps in `edges`.

We store the dissimilarities in a separate dictionary, `dissimilarity`.

All these dictionaries are keyed by a 2-tuple of versions.

In [29]:
cases = {}
mappings = {}
dissimilarity = {}
edges = {}

# Algorithm

Here is the code that directly implements the method.
Every pair of distinct versions can be mapped.
We store the mappings in a dictionary, keyed by tuples like `(4, 4b)`,
for the mapping from version `4` to `4b`, for instance.

The loop is in `doDiffs` below.

In [30]:
def inspect(v, w, start, end):
    """Helper function for inspecting the situation in a given range of slots.
    
    Parameters
    ----------
    v: string
        First version
    w: string
        Second version
    start: integer
        Slot number (in first version) where we start the inspection.
    end: integer
        Slot number (in first version) where we end the inspection.
        
    Returns
    -------
    None
        The situation will be printed as a table with a row for each slot
        and columns:
        slot number in version 1,
        lexeme of that slot in version 1,
        lexeme of the corresponding slot in version 2
    """
    mapKey = (v, w)
    mapping = mappings[mapKey]
    version1Info = versionInfo.get(v, versionInfo[""])
    version2Info = versionInfo.get(w, versionInfo[""])
    
    for slot in range(start, end):
        print(
            "{:>6}: {:<8} {:<8}".format(
                slot,
                api[v].Fs(version1Info["LEX"]).v(slot),
                api[w].Fs(version2Info["LEX"]).v(mapping[slot]),
            )
        )


def inspect2(v, w, slot, k):
    """Helper function for inspecting the edges in a given range of slots.
    
    Not used, currently.
    
    Parameters
    ----------
    v: string
        First version
    w: string
        Second version
    slot: integer
        Slot number (in first version) in the center of the inspection
    k: integer
        Amount of slots left and right from the center where we inspect.
        
    Returns
    -------
    None
        The situation will be printed as a table with a row for each slot
        and columns:
        slot number in version 1,
        the edge at that slot number, or X if there is no edge
    """
    mapKey = (v, w)
    edge = edges[mapKey]
    for i in range(slot - k, slot + k + 1):
        print(f"EDGE {i} =>", edge.get(i, "X"))


def firstDiff(v, w, start):
    """Find the first discrepancy after a given position.
    
    First we walk quickly through the slots of th first version,
    until we reach the starting position.
    
    Then we continue walking until the current slot is either
    
    *   a special case
    *   a discrepancy
    
    Parameters
    ----------
    v: string
        First version
    w: string
        Second version
    start: integer
        start position
    
    Returns
    -------
    int or None
        If there is no discrepancy, None is returned,
        otherwise the position of the first discrepancy.
    """
    mapKey = (v, w)
    mapping = mappings[mapKey]
    theseCases = cases[mapKey]

    fDiff = None
    for (slot, (lex1, bareLex1, maskedLex1)) in lexemes[v].items():
        if slot < start:
            continue
        maskedLex2 = lexemes[w][mapping[slot]][2]
        if slot in theseCases or maskedLex1 != maskedLex2:
            fDiff = slot
            break
    return fDiff


def printDiff(v, w, slot, k):
    """Prints the situation around a discrepancy.
    
    We also show phrase atom boundaries.
    WE show the bare lexemes in the display, not the masked lexemes.
    
    Parameters
    ----------
    v: string
        First version
    w: string
        Second version
    slot: integer
        position of the discrepancy
    k: integer
        amount of slots around the discrepancy to include in the display
        
    Returns
    -------
    A plain text display of the situation around the discrepancy.
    """
    
    mapKey = (v, w)
    mapping = mappings[mapKey]
    comps = {}
    prevChunkV = None
    prevChunkW = None
    
    # gather the comparison material in comps
    # which has as keys the versions and as value a list of display items
    
    for i in range(slot - k, slot + k + 1):
        # determine if we are at a phrase atom boundary in version 1
        chunkV = None if i not in mapping else api[v].L.u(i, otype="phrase_atom")
        boundaryV = prevChunkV is not None and prevChunkV != chunkV
        prevChunkV = chunkV
        # determine if we are at the actual discrepancy in version 1
        currentV = i == slot

        # determine if we are at a phrase atom boundary in version 2
        j = mapping.get(i, None)
        chunkW = None if j is None else api[w].L.u(j, otype="phrase_atom")
        boundaryW = prevChunkW is not None and prevChunkW != chunkW
        prevChunkW = chunkW
        # determine if we are at the actual discrepancy in version 2
        currentW = j == mapping[slot]

        lvTuple = lexemes[v].get(i, None)
        lwTuple = None if j is None else lexemes[w].get(j, None)
        lv = "□" if lvTuple is None else lvTuple[1] # bare lexeme
        lw = "□" if lwTuple is None else lwTuple[1] # bare lexeme

        comps.setdefault(v, []).append((lv, currentV, boundaryV))
        comps.setdefault(w, []).append((lw, currentW, boundaryW))
        
    # turn the display items into strings and store them in rep
    # which is also keyed by the versions
    
    rep = {}
    for version in comps:
        rep[version] = printVersion(version, comps[version])

    # compose the display out of the strings per version
    # and make a header of sectional information and slot positions
    
    print(
        """{} {}:{} ==> slot {} ==> {}
    {}
    {}
""".format(
            *api[v].T.sectionFromNode(slot),
            slot,
            mapping[slot],
            rep[v],
            rep[w],
        )
    )


def printVersion(v, comps):
    """Generate a string displaying a stretch of lexemes around a position.
    
    Parameters
    ----------
    comps: list of tuple
        For each slot there is a comp tuple consisting of
        
        *   the bare lexeme
        *   whether the slot is in the discrepancy position
        *   whether the slot is at a phrase atom boundary
        
    Returns
    -------
    string
        A sequence of lexemes with boundary characters in between.
    """
    
    rep = ""
    for (lex, isCurrent, boundary) in comps:
        rep += "┫┣" if boundary else "╋"
        rep += f"▶{lex}◀" if isCurrent else lex
    rep += "╋"
    return rep

# doDiffs

This function contains the loop to walk through all differences.

We walk from discrepancy to discrepancy, and stop if there are no more discrepancies or when we
have reached an artificial upper boundary of discrepancies.

We try to solve the discrepancies.
If we hit a discrepancy that we cannot solve, we break out the loop too.

## MAX_ITER

The articial limit is MAX_ITER.
You determine it experimentally.
Keep it low at first, when you are meeting the initial discrepancies.
When you have dealt with them and discover that you can dealt with that amount of discrepancies,
increase the limit.

## Cases

We will encounter discrepancies, and we will learn how to solve them.
There are some generic ways of solving them, and these we collect in a dictionary of cases.

The keys of the cases are either slot positions or lexemes.

When the algorithms walks through the corpus, it will consider slots
whose number or whose lexeme is in the cases as solved.

The value of a case is a tuple consisting of

*   the name of an *action*
*   a parameter

Here are the actions

key | action | parameters | description
--- | --- | --- | ---
slot | `ok` | `None` | the discrepancy is ok, nothing to worry about; we set the dissimilarity to 0, which is worse than `None`
slot | `split` | `n` integer | split the lexeme in version 1 into `n` lexemes in version 2; set the dissimilarity to `n`
slot | `collapse` | `n` integer | collapse `n` lexemes in version 1 into one lexeme in version 2; dissimilarity `-n`
lex | `ok` | `alt` string | the discrepancy is ok if version 2 has *alt* instead of *lex*; dissimilarity set to 0
lex | `split` | `n` integer | split *lex* in version 1 into `n` extra slots in version 2; set the dissimilarity to `n`

If a discrepancy falls through all these special cases, we have a few generic rules that will also be applied:

* if a lexeme in version 1 contains `_`, we split on it and treat it as separate lexemes.
  In fact, we perform the action `split` with parameter the number of parts separated by `_`.
* if the lex in version 1 equals the lex in version 2 plus the next lex in version 2, and if the lex in version 2 is `H`,
  we split the lex in version 1 into that `H` and the rest.
* if the set of letters in the masked lexeme in version 1 is the union of the sets of the corresponding masked lexeme
  in version 2 plus that of the next lexeme in version 2, and if the corresponding lexeme in version 2 is either `B` or `MN`,
  we split the lex in version 1 into that `B` or `MN` and the rest.
  
Note that these rules are very corpus dependent, and have been distilled from experience with the BHSA versions involved.
If you aree in the process of applying this algorithm to other corpora, you can leave out these rules, and add your
own depending on what you encounter.

In [31]:
MAX_ITER = 250


def doDiffs(v, w):
    mapKey = (v, w)
    
    thisDissimilarity = {}
    dissimilarity[mapKey] = thisDissimilarity
    
    thisMapping = dict(((n, n) for n in api[v].F.otype.s("word")))
    mappings[mapKey] = thisMapping
    
    theseCases = cases.get(mapKey, {})

    iteration = 0
    start = 1

    solved = True

    while True:
        # try to find the next difference from where you are now
        n = firstDiff(v, w, start)

        if n is None:
            print(f"No more differences.\nFound {iteration} points of disturbance")
            break

        if iteration > MAX_ITER:
            print("There might be more disturbances: increase MAX_ITER")
            break

        iteration += 1
        
        # there is a difference: we have to do work
        # we print it as a kind of logging
        
        printDiff(v, w, n, 5)

        # we try to solve the discrepancy
        # first we gather the information of about the lexemes at this position in both versions
    
        (lex1, bareLex1, maskedLex1) = lexemes[v][n]
        (lex2, bareLex2, maskedLex2) = lexemes[w][thisMapping[n]]
        
        # and at the next position
        
        (lex1next, bareLex1next, maskedLex1next) = lexemes[v][n + 1]
        (lex2next, bareLex2next, maskedLex2next) = lexemes[w][thisMapping[n + 1]]

        # the discrepancy is not solved unless we find it in a case or in a rule
        solved = None
        skip = 0
        
        # first check the explicit cases
        
        if n in theseCases:
            (action, param) = theseCases[n]
            if action == "collapse":
                plural = "" if param == 1 else "s"
                solved = f"{action} {param} fewer slot{plural}"
                thisDissimilarity[n] = -param
                skip = param
                for m in range(api[v].F.otype.maxSlot, n + param, -1):
                    thisMapping[m] = thisMapping[m - param]
                for m in range(n + 1, n + param + 1):
                    thisMapping[m] = thisMapping[n]
            elif action == "split":
                plural = "" if param == 1 else "s"
                solved = f"{action} into {param} extra slot{plural}"
                thisDissimilarity[n] = param
                for m in range(n + 1, api[v].F.otype.maxSlot + 1):
                    thisMapping[m] = thisMapping[m] + param
            elif action == "ok":
                solved = "incidental variation in lexeme"
                thisDissimilarity[n] = 0
        elif lex1 in theseCases:
            (action, param) = theseCases[lex1]
            if action == "ok":
                if lex2 == param:
                    solved = "systematic variation in lexeme"
                    thisDissimilarity[n] = 0
            elif action == "split":
                plural = "" if param == 1 else "s"
                solved = f"systematic {action} into {param} extra slot{plural}"
                thisDissimilarity[n] = param
                for m in range(n + 1, api[v].F.otype.maxSlot + 1):
                    thisMapping[m] = thisMapping[m] + param
                    
        # then try some more general rules
        
        elif "_" in lex1:
            action = "split"
            param = lex1.count("_")
            plural = "" if param == 1 else "s"
            solved = f"{action} on _ into {param} extra slot{plural}"
            thisDissimilarity[n] = param
            for m in range(n + 1, api[v].F.otype.maxSlot + 1):
                thisMapping[m] = thisMapping[m] + param
        elif lex1 == lex2 + lex2next:
            if lex2 == "H":
                solved = "split article off"
                thisDissimilarity[n] = 1
                for m in range(n + 1, api[v].F.otype.maxSlot + 1):
                    thisMapping[m] = thisMapping[m] + 1
        elif set(maskedLex1) == set(maskedLex2) | set(maskedLex2next):
            if lex2 == "B" or lex2 == "MN":
                solved = "split preposition off"
                thisDissimilarity[n] = 1
                for m in range(n + 1, api[v].F.otype.maxSlot + 1):
                    thisMapping[m] = thisMapping[m] + 1
        print(f"Action: {solved if solved else 'BLOCKED'}\n")

        # stop the loop if the discrepancy is not solved
        # The discrepancy has already been printed to the output,
        # so you can see immediately what is happening there
        
        if not solved:
            break

        # if the discrepancy was solved, 
        # advance to the first position after the discrepancy
        # and try to find a new discrepancy in the next iteration
        start = n + 1 + skip

    if not solved:
        print(f"Blocking difference in {iteration} iterations")

The mappings itself are needed elsewhere in Text-Fabric, let us write them to file.
We write them into the dataset corresponding to the target version.
So the map `3-4` ends up in version `4`.

In [32]:
def edgesFromMaps():
    edges.clear()
    for ((v, w), mp) in sorted(mappings.items()):
        caption(4, "Make edge from slot mapping {} => {}".format(v, w))

        edge = {}
        dm = dissimilarity[(v, w)]

        for n in range(1, api[v].F.otype.maxSlot + 1):
            m = mp[n]
            k = dm.get(n, None)
            if k is None:
                if n in edge:
                    if m not in edge[n]:
                        edge[n][m] = None
                else:
                    edge.setdefault(n, {})[m] = None
            else:
                if k > 0:
                    for j in range(m, m + k + 1):
                        edge.setdefault(n, {})[j] = k
                elif k < 0:
                    for i in range(n, n - k + 1):
                        edge.setdefault(i, {})[m] = k
                else:
                    edge.setdefault(n, {})[m] = 0
        edges[(v, w)] = edge

# Running

Here we run the mapping between `3` and `4`.

## 3 => 4

Here are the special cases for this conversion.

In [12]:
cases.update(
    {
        ("3", "4"): {
            "CXH[": ("ok", "XWH["),
            "MQYT/": ("split", 1),
            28730: ("ok", None),
            121812: ("ok", None),
            174515: ("ok", None),
            201089: ("ok", None),
            218383: ("split", 2),
            221436: ("ok", None),
            247730: ("ok", None),
            272883: ("collapse", 1),
            353611: ("ok", None),
        },
    }
)

In [13]:
doDiffs("3", "4")

Genesis 18:2 ==> slot 7840 ==> 7840
    ╋MN╋PTX╋H╋>HL┫┣W┫┣▶CXH◀┫┣>RY┫┣W┫┣>MR┫┣>DNJ┫┣>M╋
    ╋MN╋PTX╋H╋>HL┫┣W┫┣▶XWH◀┫┣>RY┫┣W┫┣>MR┫┣>DNJ┫┣>M╋

Action: systematic variation in lexeme

Genesis 19:1 ==> slot 8447 ==> 8447
    ╋W┫┣QWM┫┣L╋QR>┫┣W┫┣▶CXH◀┫┣>P┫┣>RY┫┣W┫┣>MR┫┣HNH╋
    ╋W┫┣QWM┫┣L╋QR>┫┣W┫┣▶XWH◀┫┣>P┫┣>RY┫┣W┫┣>MR┫┣HNH╋

Action: systematic variation in lexeme

Genesis 21:14 ==> slot 9856 ==> 9856
    ╋HLK┫┣W┫┣T<H┫┣B╋MDBR╋▶B>R_CB<◀┫┣W┫┣KLH┫┣H╋MJM┫┣MN╋
    ╋HLK┫┣W┫┣T<H┫┣B╋MDBR╋▶B>R◀╋CB<┫┣W┫┣KLH┫┣H╋MJM╋

Action: split on _ into 1 extra slot

Genesis 21:31 ==> slot 10174 ==> 10175
    ╋L╋H╋MQWM╋H╋HW>┫┣▶B>R_CB<◀┫┣KJ┫┣CM┫┣CB<┫┣CNJM┫┣W╋
    ╋L╋H╋MQWM╋H╋HW>┫┣▶B>R◀╋CB<┫┣KJ┫┣CM┫┣CB<┫┣CNJM╋

Action: split on _ into 1 extra slot

Genesis 21:32 ==> slot 10183 ==> 10185
    ╋CNJM┫┣W┫┣KRT┫┣BRJT┫┣B╋▶B>R_CB<◀┫┣W┫┣QWM┫┣>BJMLK╋W╋PJKL╋
    ╋CNJM┫┣W┫┣KRT┫┣BRJT┫┣B╋▶B>R◀╋CB<┫┣W┫┣QWM┫┣>BJMLK╋W╋

Action: split on _ into 1 extra slot

Genesis 21:33 ==> slot 10200 ==> 10203
    ╋PLCTJ┫┣W┫┣NV<┫┣>CL┫┣B╋▶B>R_CB<◀┫┣W

# Running

Here we run the mapping between `4` and `4b`.
The points of disturbance will be written into the output cell.

## 4 => 4b

Here are the special cases for this conversion.

In [14]:
cases.update(
    {
        ("4", "4b"): {
            214730: ("collapse", 3),
            260028: ("split", 1),
            289948: ("ok", None),
            307578: ("split", 1),
            323067: ("ok", None),
            389774: ("ok", None),
            407543: ("split", 1),
            408429: ("split", 1),
        },
    }
)

In [15]:
doDiffs("4", "4b")

Genesis 24:65 ==> slot 12369 ==> 12369
    ╋H╋<BD┫┣MJ┫┣H╋>JC╋▶HLZH◀┫┣H┫┣HLK┫┣B╋H╋FDH╋
    ╋H╋<BD┫┣MJ┫┣H╋>JC╋▶H◀╋LZH┫┣H┫┣HLK┫┣B╋H╋

Action: split article off

Genesis 37:19 ==> slot 20514 ==> 20515
    ╋>X┫┣HNH┫┣B<L╋H╋XLWM╋▶HLZH◀┫┣BW>┫┣W┫┣<TH┫┣HLK┫┣W╋
    ╋>X┫┣HNH┫┣B<L╋H╋XLWM╋▶H◀╋LZH┫┣BW>┫┣W┫┣<TH┫┣HLK╋

Action: split article off

Judges 6:20 ==> slot 130846 ==> 130848
    ╋W┫┣NWX┫┣>L╋H╋SL<╋▶HLZ◀┫┣W┫┣>T╋H╋MRQ┫┣CPK╋
    ╋W┫┣NWX┫┣>L╋H╋SL<╋▶H◀╋LZ┫┣W┫┣>T╋H╋MRQ╋

Action: split article off

1_Samuel 14:1 ==> slot 148319 ==> 148322
    ╋MYB╋PLCTJ┫┣>CR┫┣MN╋<BR╋▶HLZ◀┫┣W┫┣L╋>B┫┣L>┫┣NGD╋
    ╋MYB╋PLCTJ┫┣>CR┫┣MN╋<BR╋▶H◀╋LZ┫┣W┫┣L╋>B┫┣L>╋

Action: split article off

1_Samuel 17:26 ==> slot 151331 ==> 151335
    ╋>CR┫┣NKH┫┣>T╋H╋PLCTJ╋▶HLZ◀┫┣W┫┣SWR┫┣XRPH┫┣MN╋<L╋
    ╋>CR┫┣NKH┫┣>T╋H╋PLCTJ╋▶H◀╋LZ┫┣W┫┣SWR┫┣XRPH┫┣MN╋

Action: split article off

1_Samuel 20:19 ==> slot 153816 ==> 153821
    ╋W┫┣JCB┫┣>YL╋H╋>BN┫┣▶H>ZL◀┫┣W┫┣>NJ┫┣CLC╋H╋XY╋
    ╋W┫┣JCB┫┣>YL╋H╋>BN╋▶H◀╋>ZL┫┣W┫┣>NJ┫┣CLC╋H╋

Action: split article off

## 4b => 2016

We need other cases.

In [16]:
cases.update(
    {
        ("4b", "2016"): {
            28423: ("split", 2),
            28455: ("split", 2),
            91193: ("split", 1),
            91197: ("split", 1),
            122218: ("split", 1),
            122247: ("split", 1),
            123160: ("split", 1),
            184086: ("split", 1),
            394186: ("collapse", 1),
            395150: ("ok", None),
            395190: ("ok", None),
            401036: ("split", 2),
            404503: ("ok", None),
            419138: ("split", 2),
        },
    }
)

In [17]:
doDiffs("4b", "2016")

Genesis 50:10 ==> slot 28423 ==> 28423
    ╋KBD╋M>D┫┣W┫┣BW>┫┣<D╋▶GRN_>VD◀┫┣>CR┫┣B╋<BR╋H╋JRDN╋
    ╋KBD╋M>D┫┣W┫┣BW>┫┣<D╋▶GRN◀╋H╋>VD┫┣>CR┫┣B╋<BR╋

Action: split into 2 extra slots

Genesis 50:11 ==> slot 28455 ==> 28457
    ╋KN<NJ┫┣>T╋H╋>BL┫┣B╋▶GRN_>VD◀┫┣W┫┣>MR┫┣>BL╋KBD┫┣ZH╋
    ╋KN<NJ┫┣>T╋H╋>BL┫┣B╋▶GRN◀╋H╋>VD┫┣W┫┣>MR┫┣>BL╋

Action: split into 2 extra slots

Numbers 33:45 ==> slot 91193 ==> 91197
    ╋MN╋<JJM┫┣W┫┣XNH┫┣B╋▶DJBWN_GD◀┫┣W┫┣NS<┫┣MN╋DJBWN_GD┫┣W╋
    ╋MN╋<JJM┫┣W┫┣XNH┫┣B╋▶DJBN◀╋GD┫┣W┫┣NS<┫┣MN╋DJBN╋

Action: split into 1 extra slot

Numbers 33:46 ==> slot 91197 ==> 91202
    ╋B╋DJBWN_GD┫┣W┫┣NS<┫┣MN╋▶DJBWN_GD◀┫┣W┫┣XNH┫┣B╋<LMWN_DBLTJMH┫┣W╋
    ╋B╋DJBN┫┣W┫┣NS<┫┣MN╋▶DJBN◀╋GD┫┣W┫┣XNH┫┣B╋<LMWN_DBLTJMH╋

Action: split into 1 extra slot

Joshua 16:3 ==> slot 122218 ==> 122224
    ╋GBWL╋H╋JPLVJ┫┣<D╋GBWL╋▶BJT_XRWN_TXTWN◀╋W╋<D╋GZR┫┣W┫┣HJH╋
    ╋GBWL╋H╋JPLVJ┫┣<D╋GBWL╋▶BJT_XWRWN◀╋TXTWN╋W╋<D╋GZR┫┣W╋

Action: split into 1 extra slot

Joshua 16:5 ==> slot 122247 ==> 122254
    ╋GBWL╋NXLH┫┣MZRX┫┣<

## 2016 => 2017

We need other cases.

In [12]:
cases.update(
    {
        ("2016", "2017"): {
            16562: ("split", 1),
            392485: ("split", 2),
        },
    }
)

In [13]:
doDiffs("2016", "2017")

Genesis 31:11 ==> slot 16562 ==> 16562
    ╋>MR┫┣>L┫┣ML>K╋H╋>LHJM┫┣▶B◀╋XLWM┫┣J<QB┫┣W┫┣>MR┫┣HNH╋
    ╋>MR┫┣>L┫┣ML>K╋H╋>LHJM┫┣▶B◀╋H╋XLWM┫┣J<QB┫┣W┫┣>MR╋

Action: split into 1 extra slot

1_Chronicles 2:52 ==> slot 392485 ==> 392486
    ╋L╋CWBL┫┣>B╋QRJT_J<RJM┫┣HR>H┫┣▶XYJ_HMNXWT◀┫┣W┫┣MCPXH╋QRJT_J<RJM┫┣H╋JTRJ╋
    ╋L╋CWBL┫┣>B╋QRJT_J<RJM┫┣HR>H┫┣▶XYJ◀╋H╋MNWXH┫┣W┫┣MCPXH╋QRJT_J<RJM╋

Action: split into 2 extra slots

No more differences.
Found 2 points of disturbance


# 2017 => 2021

No changes expected right now.

In [13]:
cases.update(
    {
        ("2017", "2021"): {},
    }
)

In [14]:
doDiffs("2017", "2021")

Genesis 24:10 ==> slot 11325 ==> 11325
    ╋W┫┣QWM┫┣W┫┣HLK┫┣>L╋▶>RM_NHRJM◀┫┣>L╋<JR╋NXWR┫┣W┫┣BRK╋
    ╋W┫┣QWM┫┣W┫┣HLK┫┣>L╋▶>RM◀╋NHR┫┣>L╋<JR╋NXWR┫┣W╋

Action: split on _ into 1 extra slot

Deuteronomy 23:5 ==> slot 105981 ==> 105982
    ╋BL<M┫┣BN╋B<WR┫┣MN╋PTWR╋▶>RM_NHRJM◀┫┣L╋QLL┫┣W┫┣L>┫┣>BH╋
    ╋BL<M┫┣BN╋B<WR┫┣MN╋PTWR╋▶>RM◀╋NHR┫┣L╋QLL┫┣W┫┣L>╋

Action: split on _ into 1 extra slot

Judges 3:8 ==> slot 128871 ==> 128873
    ╋MKR┫┣B╋JD╋KWCN_RC<TJM┫┣MLK╋▶>RM_NHRJM◀┫┣W┫┣<BD┫┣BN╋JFR>L┫┣>T╋
    ╋MKR┫┣B╋JD╋KWCN_RC<TJM┫┣MLK╋▶>RM◀╋NHR┫┣W┫┣<BD┫┣BN╋JFR>L╋

Action: split on _ into 1 extra slot

Psalms 60:2 ==> slot 320252 ==> 320255
    ╋L╋LMD┫┣B╋NYH┫┣>T╋▶>RM_NHRJM◀╋W╋>T╋>RM╋YWB>┫┣W╋
    ╋L╋LMD┫┣B╋NYH┫┣>T╋▶>RM◀╋NHR╋W╋>T╋>RM╋YWB>╋

Action: split on _ into 1 extra slot

1_Chronicles 19:6 ==> slot 401289 ==> 401293
    ╋KSP┫┣L╋FKR┫┣L┫┣MN╋▶>RM_NHRJM◀╋W╋MN╋>RM_M<KH╋W╋MN╋
    ╋KSP┫┣L╋FKR┫┣L┫┣MN╋▶>RM◀╋NHR╋W╋MN╋>RM╋M<KH╋

Action: split on _ into 1 extra slot

1_Chronicles 19:6 ==> slot 401292 ==> 401297
   

# c => 2021

No changes expected right now.

In [33]:
cases.update(
    {
        ("c", "2021"): {},
    }
)

In [34]:
doDiffs("c", "2021")

Genesis 24:10 ==> slot 11325 ==> 11325
    ╋W┫┣QWM┫┣W┫┣HLK┫┣>L╋▶>RM_NHRJM◀┫┣>L╋<JR╋NXWR┫┣W┫┣BRK╋
    ╋W┫┣QWM┫┣W┫┣HLK┫┣>L╋▶>RM◀╋NHR┫┣>L╋<JR╋NXWR┫┣W╋

Action: split on _ into 1 extra slot

Deuteronomy 23:5 ==> slot 105981 ==> 105982
    ╋BL<M┫┣BN╋B<WR┫┣MN╋PTWR╋▶>RM_NHRJM◀┫┣L╋QLL┫┣W┫┣L>┫┣>BH╋
    ╋BL<M┫┣BN╋B<WR┫┣MN╋PTWR╋▶>RM◀╋NHR┫┣L╋QLL┫┣W┫┣L>╋

Action: split on _ into 1 extra slot

Judges 3:8 ==> slot 128871 ==> 128873
    ╋MKR┫┣B╋JD╋KWCN_RC<TJM┫┣MLK╋▶>RM_NHRJM◀┫┣W┫┣<BD┫┣BN╋JFR>L┫┣>T╋
    ╋MKR┫┣B╋JD╋KWCN_RC<TJM┫┣MLK╋▶>RM◀╋NHR┫┣W┫┣<BD┫┣BN╋JFR>L╋

Action: split on _ into 1 extra slot

Psalms 60:2 ==> slot 320252 ==> 320255
    ╋L╋LMD┫┣B╋NYH┫┣>T╋▶>RM_NHRJM◀╋W╋>T╋>RM╋YWB>┫┣W╋
    ╋L╋LMD┫┣B╋NYH┫┣>T╋▶>RM◀╋NHR╋W╋>T╋>RM╋YWB>╋

Action: split on _ into 1 extra slot

1_Chronicles 19:6 ==> slot 401289 ==> 401293
    ╋KSP┫┣L╋FKR┫┣L┫┣MN╋▶>RM_NHRJM◀╋W╋MN╋>RM_M<KH╋W╋MN╋
    ╋KSP┫┣L╋FKR┫┣L┫┣MN╋▶>RM◀╋NHR╋W╋MN╋>RM╋M<KH╋

Action: split on _ into 1 extra slot

1_Chronicles 19:6 ==> slot 401292 ==> 401297
   

Clearly, the only difference between versions `c` and `2021` is
that some composite words in `c` have been split in version `2021`.

In [35]:
edgesFromMaps()

..............................................................................................
.      4m 10s Make edge from slot mapping c => 2021                                          .
..............................................................................................


# Extending to node mappings

In [36]:
nodeMapping = {}
diagnosis = {}

In [37]:
statLabels = collections.OrderedDict(
    b="unique, perfect",
    d="multiple, one perfect",
    c="unique, imperfect",
    f="multiple, cleanly composed",
    e="multiple, non-perfect",
    a="not mapped",
)

In [38]:
def makeNodeMapping(nodeType, v, w, force=False):
    caption(2, "Mapping {} nodes {} ==> {}".format(nodeType, v, w))
    mapKey = (v, w)
    edge = edges[mapKey]

    if not force and mapKey in nodeMapping and nodeType in nodeMapping[mapKey]:
        mapping = nodeMapping[mapKey][nodeType]
        diag = diagnosis[mapKey][nodeType]

    else:
        mapping = {}
        diag = {}
        caption(
            0, "Extending slot mapping {} ==> {} for {} nodes".format(*mapKey, nodeType)
        )
        for n in api[v].F.otype.s(nodeType):
            slots = api[v].E.oslots.s(n)
            mappedSlotsTuple = reduce(
                lambda x, y: x + y,
                [tuple(edge.get(s, ())) for s in slots],
                (),
            )
            mappedSlots = set(mappedSlotsTuple)
            mappedNodes = reduce(
                set.union,
                [set(api[w].L.u(s, nodeType)) for s in mappedSlots],
                set(),
            )
            result = {}
            nMs = len(mappedNodes)
            if nMs == 0:
                diag[n] = "a"

            elif nMs >= 1:
                theseMSlots = {}
                for m in mappedNodes:
                    mSlots = set(api[w].E.oslots.s(m))
                    dis = len(mappedSlots | mSlots) - len(mappedSlots & mSlots)
                    result[m] = dis
                    theseMSlots[m] = mSlots
                mapping[n] = result

                # we wait further case analysis before we put these counterparts of n into the edge

                if nMs == 1:
                    m = list(mappedNodes)[0]
                    dis = result[m]
                    if dis == 0:
                        diag[n] = "b"
                        edge[n] = {
                            m: None
                        }  # this is the most freqent case, hence an optimization: no dis value.
                        # all other cases require the dis value to be passed on, even if 0
                    else:
                        diag[n] = "c"
                        edge[n] = {m: dis}
                else:
                    edge[n] = result
                    dis = min(result.values())
                    if dis == 0:
                        diag[n] = "d"
                    else:
                        allMSlots = reduce(
                            set.union,
                            [set(theseMSlots[m]) for m in mappedNodes],
                            set(),
                        )
                        composed = allMSlots == mappedSlots and sum(
                            result.values()
                        ) == len(mappedSlots) * (len(mappedNodes) - 1)

                        if composed:
                            diag[n] = "f"
                        else:
                            diag[n] = "e"

        diagnosis.setdefault(mapKey, {})[nodeType] = diag
        nodeMapping.setdefault(mapKey, {})[nodeType] = mapping
        caption(0, "\tDone")

In [39]:
def exploreNodeMapping(nodeType, v, w, force=False):
    caption(4, "Statistics for {} ==> {} ({})".format(v, w, nodeType))
    mapKey = (v, w)
    diag = diagnosis[mapKey][nodeType]
    total = len(diag)
    if total == 0:
        return

    reasons = collections.Counter()

    for (n, dia) in diag.items():
        reasons[dia] += 1

    caption(0, "\t{:<30} : {:6.2f}% {:>7}x".format("TOTAL", 100, total))
    for stat in statLabels:
        statLabel = statLabels[stat]
        amount = reasons[stat]
        if amount == 0:
            continue
        perc = 100 * amount / total
        caption(0, "\t{:<30} : {:6.2f}% {:>7}x".format(statLabel, perc, amount))

In [40]:
# ntypes = api["3"].F.otype.all
ntypes = api["2021"].F.otype.all
for (i, v) in enumerate(versions):
    if i == 0:
        continue
    prev = versions[i - 1]
    ntypes = api[v].F.otype.all
    for ntype in ntypes[0:-1]:
        makeNodeMapping(ntype, prev, v, force=False)
        exploreNodeMapping(ntype, prev, v)


**********************************************************************************************
*                                                                                            *
*      4m 14s Mapping book nodes c ==> 2021                                                  *
*                                                                                            *
**********************************************************************************************

|      4m 14s Extending slot mapping c ==> 2021 for book nodes
|      4m 25s 	Done
..............................................................................................
.      4m 25s Statistics for c ==> 2021 (book)                                               .
..............................................................................................
|      4m 25s 	TOTAL                          : 100.00%      39x
|      4m 25s 	unique, perfect                : 100.00%      39x

************************

# Writing mappings as TF edges

In [41]:
def writeMaps():
    for ((v1, v2), edge) in sorted(edges.items()):
        fName = "omap@{}-{}".format(v1, v2)
        caption(4, "Write edge as TF feature {}".format(fName))

        edgeFeatures = {fName: edge}
        metaData = {
            fName: {
                "about": "Mapping from the slots of BHSA version {} to version {}".format(
                    v1, v2
                ),
                "encoder": "Dirk Roorda by a semi-automatic method",
                "see": "https://github.com/ETCBC/bhsa/blob/master/programs/versionMappings.ipynb",
                "valueType": "int",
                "edgeValues": True,
            }
        }
        activate(v2)
        TF.save(
            nodeFeatures={},
            edgeFeatures=edgeFeatures,
            metaData=metaData,
        )

In [42]:
caption(4, "Write mappings as TF edges")
for (v1, v2) in sorted(mappings.keys()):
    caption(0, "\t {:>4} ==> {:<4}".format(v1, v2))

writeMaps()

..............................................................................................
.      5m 14s Write mappings as TF edges                                                     .
..............................................................................................
|      5m 14s 	    c ==> 2021
..............................................................................................
.      5m 14s Write edge as TF feature omap@c-2021                                           .
..............................................................................................
..............................................................................................
.      5m 14s Active version is now -> 2021 <-                                               .
..............................................................................................
  0.00s Exporting 0 node and 1 edge and 0 config features to ~/github/etcbc/bhsa/tf/2021:
   |     3.55s T omap@c-2