# Compare the Afifi and Lakhnawi editions of the Fusus

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from tf.app import use

In [3]:
BASE = "~/github/among/fusus"
VERSION = "0.7"

# Load both editions

Normally, when we load a single data source in a notebook, we store the handle in a variable called
`A`, and we hoist additional variables `F`, `L`, `T`, etc to the global namespace.

But now we work with two datasources, so we store the handles in a dictionary `A`, with
a key `L` for the Lakhnawi edition and a key `A` for the Afifi edition.

We also make dictionaries for `F`, `L`, `T`, etc, keyed with the same keys.

In that way we can systematically select our handles for the desired editions.

In [4]:
LK = "LK"
AF = "AF"

EDITIONS = {
    LK: "Lakhnawi",
    AF: "Afifi",
}

A = {}
F = {}
E = {}
L = {}
T = {}
N = {}

In [5]:
for (acro, name) in EDITIONS.items():
    A[acro] = use(f"among/fusus/tf/{name}:clone", writing="ara", version=VERSION)
    F[acro] = A[acro].api.F
    E[acro] = A[acro].api.E
    L[acro] = A[acro].api.L
    T[acro] = A[acro].api.T
    N[acro] = A[acro].api.N

This is Text-Fabric 9.1.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

27 features found and 0 ignored


This is Text-Fabric 9.1.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

17 features found and 0 ignored


Let's find out the max slot of both editions.

In [6]:
maxSlot = {acro: F[acro].otype.maxSlot for acro in EDITIONS}
maxSlot

{'LK': 40379, 'AF': 40271}

We set up our comparison.

We work with the latin transcriptions, in order to avoid complications with right-to-left writing in 
the displays of situations where discrepancies occur.

By trial and error we build a dictionary of special cases,
where we force a decision.

In [7]:
casesLK = {}
casesAF = {}

The result of the comparison will be a mapping from Afifi slots to Lakhnawi slots.

In [8]:
mapping = {}

The mapping itself is needed elsewhere in Text-Fabric, let us write it to file.
We write it as an edge feature into the AF edition.

In [74]:
edge = {}

In [75]:
def edgeFromMap():
    edge.clear()
    print("Make edge from slot mapping")

    for iLK in range(1, maxSlot[LK]+ 1):
        iAF = mapping[iLK]
        k = dissimilarity.get(iLK, None)
        if k is None:
            if iLK in edge:
                if iAF not in edge[iLK]:
                    edge[iLK][iAF] = None
            else:
                edge.setdefault(iLK, {})[iAF] = None
        else:
            if k > 0:
                for j in range(iAF, iAF + k + 1):
                    edge.setdefault(iLK, {})[j] = k
            elif k < 0:
                for i in range(iLK, iLK - k + 1):
                    edge.setdefault(i, {})[iA] = k
            else:
                edge.setdefault(iLK, {})[iAF] = 0

We define auxiliary functions for finding discrepancies and inspecting them.

In [9]:
def inspect(start, end):
    """Helper function for inspecting the situation in a given range of slots.
    
    Parameters
    ----------
    start: integer
        Slot number where we start the inspection.
    end: integer
        Slot number where we end the inspection.
        
    Returns
    -------
    None
        The situation will be printed as a table with a row for each slot
        and columns:
        slot number in LK,
        letters of that slot in LK
        letters of the corresponding slot in AF
    """
    for iLK in range(start, end):
        iAF = mapping[iLK]
        print(
            "{:>6}: {:<8} {:<8}".format(
                iLK,
                F[LK].lettersn.v(iLK),
                F[AF].lettersn.v(iAF),
            )
        )


def firstDiff(start):
    """Find the first discrepancy after a given position.
    
    First we walk quickly through the slots of LK, until we reach the starting position.
    
    Then we continue walking until the current slot is either
    
    *   a special case
    *   a discrepancy
    
    Parameters
    ----------
    start: integer
        start position
    
    Returns
    -------
    int or None
        If there is no discrepancy, None is returned,
        otherwise the position of the first discrepancy.
    """

    fDiff = None
    for iLK in range(1, maxSlot[LK] + 1):
        if iLK < start:
            continue
        iAF = mapping[iLK]
        lettersLK = F[LK].lettersn.v(iLK)
        lettersAF = F[AF].lettersn.v(iAF)
        if iLK in casesLK or iAF in casesAF or lettersLK != lettersAF:
            fDiff = iLK
            break
    return fDiff


def printDiff(slotLK, k):
    """Prints the situation around a discrepancy.
    
    Parameters
    ----------
    slotLK: integer
        position of the discrepancy
    k: integer
        amount of slots around the discrepancy to include in the display
        
    Returns
    -------
    A plain text display of the situation around the discrepancy.
    """
    
    comps = {}
    
    # gather the comparison material in comps
    # which will be a list of display items
    
    slotAF = mapping[slotLK]
    
    for iLK in range(max((1, slotLK - k)), min((maxSlot[LK], slotLK + k)) + 1):
        iAF = mapping.get(iLK, None)
        currentLK = iLK == slotLK
        currentAF = iAF == slotAF

        lettersLK = F[LK].lettersn.v(iLK)
        lettersAF = F[AF].lettersn.v(iAF)

        comps.setdefault(LK, []).append((lettersLK, currentLK))
        comps.setdefault(AF, []).append((lettersAF, currentAF))
        
    # turn the display items into strings and store them in rep
    # which is also keyed by the versions
    
    rep = {}
    for acro in comps:
        rep[acro] = printEdition(acro, comps[acro])

    # compose the display out of the strings per edition
    # and make a header of sectional information and slot positions
    
    print(
        """{} {}:{} ==> slotLK {} ==> {}
    {}
    {}
""".format(
            *T[acro].sectionFromNode(slotLK),
            slotLK,
            slotAF,
            rep[LK],
            rep[AF],
        )
    )


def printEdition(acro, comps):
    """Generate a string displaying a stretch of slots around a position.
    
    Parameters
    ----------
    comps: list of tuple
        For each slot there is a comp tuple consisting of
        
        *   the letters of the slot
        *   whether the slot is in the discrepancy position
        
    Returns
    -------
    string
        A sequence of words with boundary characters in between.
    """
    
    rep = ""
    for (letters, isCurrent) in comps:
        if letters is None:
            letters = "?"
        rep += f"▶{letters}◀" if isCurrent else f"╋{letters}"
    rep += "╋"
    return rep

Now the proper algorithm.

We limit the amount of solved discrepancies we tolerate by setting `MAX_ITER`.
When we exceed the limit we stop.

We also stop when we cannot solve a discrepancy.

When solving discrepancies, we adjust the mapping and we record the severity of the
discrepancy in a separate dict `dissimilarity`.

In [73]:
MAX_ITER = 100

dissimilarity = {}

def doDiffs():
    global mapping
    
    mapping = {iLK: iLK for iLK in range(1, maxSlot[LK] + 1)}
    
    dissimilarity.clear()

    iteration = 0
    start = 1

    solved = True
    
    lastApplied = (None, None, None)

    while True:
        # try to find the next difference from where you are now
        iLK = firstDiff(start)

        if iLK is None:
            print(f"No more differences.\nFound {iteration} discrepancies")
            break

        if iteration > MAX_ITER:
            print("There might be more discrepancies: increase MAX_ITER")
            break

        iteration += 1
        
        # there is a discrepancy: we have to do work
        # we print it as a kind of logging
        
        printDiff(iLK, 8)

        # we try to solve the discrepancy
        # first we gather the information of about the slots at this position in both versions
    
        iAF = mapping[iLK]
        
        lettersLK = F[LK].lettersn.v(iLK)
        lettersAF = F[AF].lettersn.v(iAF)
        
        # and at the next position
        
        lettersNextLK = F[LK].lettersn.v(iLK + 1)
        lettersNextAF = F[AF].lettersn.v(mapping[iLK + 1])
        
        # the discrepancy is not solved unless we find it in a case or in a rule
        solved = None
        side = None
        skip = 0
        
        # first check the explicit cases
        
        if iLK in casesLK:
            (action, param) = casesLK[iLK]
            (lastILK, lastAction, lastSide) = lastApplied
            
            if action == "skipother" and not (iLK == lastILK and action == lastAction and side == LK):
                plural = "" if param == 1 else "s"
                solved = f"{action} {param} slot{plural}"
                side = LK
                for m in range(iLK, maxSlot[LK] + 1):
                    mapping[m] += param
            elif action == "skipme":
                plural = "" if param == 1 else "s"
                solved = f"{action} {param} slot{plural}"
                side = LK
                skip = param - 1
                for m in range(maxSlot[LK], iLK + param - 1, -1):
                    mapping[m] = mapping[m - param]
                for m in range(iLK, iLK + param):
                    mapping[m] = None
            elif action == "collapse":
                plural = "" if param == 1 else "s"
                solved = f"{action} {param} fewer slot{plural}"
                side = LK
                dissimilarity[iLK] = -param
                skip = param
                for m in range(maxSlot[LK], iLK + param, -1):
                    mapping[m] = mapping[m - param]
                for m in range(iLK + 1, iLK + param + 1):
                    mapping[m] = mapping[iLK]
            elif action == "split":
                plural = "" if param == 1 else "s"
                solved = f"{action} into {param} extra slot{plural}"
                side = LK
                dissimilarity[iLK] = param
                for m in range(iLK + 1, maxSlot[LK] + 1):
                    mapping[m] += mapping[m]
            elif action == "ok":
                solved = "incidental variation in word"
                side = LK
                dissimilarity[iLK] = 0
        elif lettersLK in casesLK:
            (action, param) = casesLK[lettersLK]
            if action == "ok":
                if lettersAF == param:
                    solved = "systematic variation in lexeme"
                    side = LK
                    dissimilarity[iLK] = 0
            elif action == "split":
                plural = "" if param == 1 else "s"
                solved = f"systematic {action} into {param} extra slot{plural}"
                side = LK
                dissimilarity[iLK] = param
                for m in range(iLK + 1, maxSlot[LK] + 1):
                    mapping[m] += mapping[m]
        elif lettersAF in casesAF:
            (action, param) = casesAF[lettersAF]
            if action == "skipme" and not (iLK == lastILK and action == lastAction and side == AF):
                plural = "" if param == 1 else "s"
                solved = f"{action} {param} slot{plural}"
                side = AF
                for m in range(iLK, maxSlot[LK] + 1):
                    mapping[m] += param
        elif lettersLK.replace("y", "w") == lettersAF.replace("y", "w"):
            solved = f"y/w equivalent"
            side = None
            dissimilarity[iLK] = 0
        else:
            setLK = set(lettersLK)
            setAF = set(lettersAF)
            if (len(setLK) > 1 or len(setAF) > 1) and len(setLK - setAF) < 2 and len(setAF - setLK) < 2:
                solved = f"single letter variation"
                side = None
                dissimilarity[iLK] = 1
                    
        # then try some more general rules
        
        if solved:
            lastApplied = (iLK, action, side)
            
        print(f"Action: {solved if solved else 'BLOCKED'}\n")

        # stop the loop if the discrepancy is not solved
        # The discrepancy has already been printed to the output,
        # so you can see immediately what is happening there
        
        if not solved:
            break

        # if the discrepancy was solved, 
        # advance to the first position after the discrepancy
        # and try to find a new discrepancy in the next iteration
        start = iLK + 1 + skip

    if not solved:
        print(f"Blocking difference in {iteration} iterations")

# Run the comparison

Here we go!

In [71]:
casesLK = {
    1: ("skipother", 2),
    115: ("skipother", 1),
    179: ("skipme", 1),
    "ālmḥrm": ("ok", "mḥrm"),
    "ālḥkm": ("ok", "ālḥk"),
    "llh": ("ok", "lh"),
    "mmd": ("ok", "md"),
}

casesAF = {
    "ā": ("skipme", 1),
}

In [72]:
doDiffs()

47 :1 ==> slotLK 1 ==> 1
    ▶ālḥmd◀╋llh╋mnzl╋ālḥkm╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋
    ▶bnzlylālʿ◀╋ylrʿā╋ālḥmd╋lh╋mnzl╋ālḥk╋ʿlá╋ḳlwb╋ālklm╋

Action: skipother 2 slots

47 :1 ==> slotLK 2 ==> 4
    ╋ālḥmd▶llh◀╋mnzl╋ālḥkm╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋ālāmm╋
    ╋ālḥmd▶lh◀╋mnzl╋ālḥk╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋ālāmm╋

Action: systematic variation in lexeme

47 :2 ==> slotLK 4 ==> 6
    ╋ālḥmd╋llh╋mnzl▶ālḥkm◀╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋ālāmm╋mn╋ālmḳām╋
    ╋ālḥmd╋lh╋mnzl▶ālḥk◀╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋ālāmm╋mn╋ālmḳām╋

Action: systematic variation in lexeme

47 :2 ==> slotLK 13 ==> 15
    ╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋ālāmm╋mn╋ālmḳām▶ālāḳdm◀╋wān╋āḫtlft╋ālnḥl╋wālmll╋lāḫtlāf╋ālāmm╋wṣlá╋āllh╋
    ╋ʿlá╋ḳlwb╋ālklm╋bāḥdyŧ╋ālṭryḳ╋ālāmm╋mn╋ālmḳām▶ā◀╋ālāḳdm╋wān╋āḫtlft╋ālnḥl╋wālmll╋lāḫtlāf╋ālāmm╋wṣlá╋

Action: skipme 1 slot

47 :3 ==> slotLK 23 ==> 26
    ╋āḫtlft╋ālnḥl╋wālmll╋lāḫtlāf╋ālāmm╋wṣlá╋āllh╋ʿlá▶mmd◀╋ālhmm╋mn╋ḫzāʾn╋ālǧwd╋wālkrm╋bālḳyl╋ālāḳwm╋mḥmd╋
    ╋āḫtlft╋ālnḥl╋wālmll╋lāḫtlāf╋ālāmm