In [1]:
%load_ext autoreload
%autoreload 2

In [17]:
from fusus.align import readEditions, doDiffs, printLines, checkAlignment

# Load both editions

We call a function that reads both editions and gives us a handle to all relevant
information in the alignment process.

In [18]:
VERSION = "0.7"

In [19]:
CASES = {
    3072: (5, 1),
    4597: (1, 1),
    4598: (1, 1),
    4600: (0, 4),
    8273: (4, 0),
    13539: (2, 1),
    14878: (1, 1),
    14879: (1, 0),
    14880: (12, 0),
    16198: (1, 1),
    16199: (1, 1),
    16200: (1, 1),
    16201: (1, 1),
    16212: (1, 1),
    18029: (6, 1),
    22762: (1, 0),
    22763: (1, 1),
    22764: (1, 1),
}

In [24]:
state = readEditions(VERSION, CASES)

This is Text-Fabric 9.1.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

27 features found and 0 ignored


This is Text-Fabric 9.1.3
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

17 features found and 0 ignored


Let's find out the max slot of both editions.

In [25]:
print(f"{state['maxLK']=} {state['maxAF']=}")

state['maxLK']=40379 state['maxAF']=40271


In [30]:
LLK = state["LLK"]
FLK = state["FLK"]

In [31]:
LLK.u(1, otype="column")

(40380,)

In [32]:
FLK.n.v(40380)

1

# Run the comparison

Here we go!

In [13]:
doDiffs()

Alignment complete, 40983 entries.
 0 taken:     1 x
 1 taken:    18 x
 2 taken: 36340 x
 3 taken:   538 x
 4 taken:   312 x
 6 taken:    32 x
 7 taken:    12 x
 8 taken:     1 x
 9 taken:     9 x
12 taken:     1 x


We show the first few lines of the alignment table.

Left you see the LK slot numbers and the words in those slots.

Right you see the AF words and their AF slot numbers.

The middle column is a measure for the edit distance between the words at both sides.

`99` means: at one of the sides the word is missing.
`88` means: this was a prescribed, special case, no edit distance has been measured.

All other values are the number of edits you have to make in order to change one word to the other.

Sometimes words are combined: a number of words left corresponds to a posiibly different number of words right.
You see that indicated by `+n` and `n+`. If the numbers are not equal, empty words are inserted with `^n` or `n^`.

If just a single word left is aligned with a single word right, you see it marked with `=` on the left and right.

In [14]:
print(printLines(start=0, end=20))

pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
      |     |0 |                    |@99~0.0|bnzlylālʿ           |  |    1 047:01
      |     |0 |                    |@99~0.0|ylrʿā               |  |    2 047:01
008:02|    1|  |               ālḥmd|@0 ~1.0|ālḥmd               |  |    3 047:02
008:02|    2|  |                 llh|@0 ~1.0|lh                  |  |    4 047:02
008:02|    3|  |                mnzl|@0 ~1.0|mnzl                |  |    5 047:02
008:02|    4|  |               ālḥkm|@0 ~1.0|ālḥk                |  |    6 047:02
008:02|    5|  |                 ʿlá|@0 ~1.0|ʿlá                 |  |    7 047:02
008:02|    6|  |                ḳlwb|@0 ~1.0|ḳlwb                |  |    8 047:02
008:02|    7|  |               ālklm|@0 ~1.0|ālklm               |  |    9 047:02
008:02|    8|  |              bāḥdyŧ|@0 ~1.0|bāḥdyŧ              |  |   10 047:02
008:02|    9|  |

In [15]:
checkAlignment()


SANITY

All OK

AGREEMENT

Where are the words?

	LK-only:   712 slots
	AF-only:   604 slots
	both:    39667 slots

How well is the agreement?

edit distance   0 : 39507 words
edit distance   1 :   908 words
edit distance   2 :    51 words
edit distance   3 :    11 words
edit distance  88 :    45 words
edit distance  99 :   461 words
NB: 88 are special cases that have been declared explicitly

COMBINATIONS

What combination alignments are there and how many?
	( 0,  4) :    1 x :
EXAMPLE 1:
pag:ln|slot |cc|textLakhnawi        |@ed~rat|textAfifi           |cc| slot|pag:ln
------|-----|--|--------------------|-------|--------------------|--|-----|------
051:01| 4598|1 |               nwḥyŧ|@88~0.0|wḥh                 | 1| 4555 068:03
051:02| 4599|  |                āʿlm|@0 ~1.0|āʿlm                |  | 4556 068:04
      |     |0 |                    |@88~0.0|āydlk               | 4| 4557 068:04
      |     |0 |                    |@88~0.0|āllh                | 4| 4558 068:04
      |     

# Test cases

In [193]:
doDiffs(startLK=178, startAF=182, steps=4, show=True, debug=True)

[178~182] single comparison with strictness 0
[180~183] single comparison with strictness 0 failed
[180~183] single comparison with strictness 0 recovered
[179~183] left 1-jump to 180
[182~185] single comparison with strictness 1
[183~186] single comparison with strictness 0 failed
[183~186] single comparison with strictness 0 recovered
  178                    ālḥḳ @0  ālḥḳ                      182
  179                   tʿālá @99                       0      
  180                     lmā @0  mā                        183
  181                     smʿ @0  smʿ                       184
  182                   dʿāʾy @1  dʿāy                      185
  183                      ḳd @0  fd                        186
  184                    āǧāb @0  āǧāb                      187
 2 taken:     2 x
 3 taken:     1 x
 4 taken:     1 x


In [194]:
doDiffs(startLK=13347, startAF=13338, steps=4, show=True, debug=True)

[13347~13338] single comparison with strictness 0
[13348~13339] single comparison with strictness 0
[13349~13340] (2, 2) comparison with strictness 0
[13351~13342] single comparison with strictness 0
13347                  ālālhy @0  ālālhy                  13338
13348                     flā @0  flā                     13339
13349 2                   ḳrb @0  ḳrbā                  2 13340
13350 2                  āḳrb @0  ḳrb                   2 13341
13351                      mn @0  mn                      13342
 2 taken:     4 x


In [195]:
doDiffs(startLK=5635, startAF=5607, steps=4, show=True, debug=True)

[5635~5607] single comparison with strictness 3 failed
[5635~5607] single comparison with strictness 3 recovered
[5637~5618] single comparison with strictness 0
[5637~5609] right 9-jump to 5618
[5638~5619] single comparison with strictness 0
[5639~5620] single comparison with strictness 0
 5635                     ḳyl @3  smwhm                    5607
 5636                     lhm @3  flw                      5608
      0                       @99 smwhm                    5609
      0                       @99 ā                        5610
      0                       @99 lsmwhm                   5611
      0                       @99 ḥǧārŧ                    5612
      0                       @99 wšǧrā                    5613
      0                       @99 wkwkbā                   5614
      0                       @99 wlw                      5615
      0                       @99 ḳyl                      5616
      0                       @99 lhm                      5617
 5637 

In [196]:
doDiffs(startLK=11794, startAF=11799, steps=4, show=True, debug=True)

[11794~11799] single comparison with strictness 0 failed
[11794~11799] single comparison with strictness 0 recovered
[11796~11801] single comparison with strictness 0
[11797~11802] single comparison with strictness 0
[11798~11803] single comparison with strictness 0
11794                   ānḳṣt @0  tḳf                     11799
11795                    ʿlyh @0  ʿlyh                    11800
11796                      ān @0  ān                      11801
11797                     šāʾ @0  šāʾ                     11802
11798                    āllh @0  āllh                    11803
 2 taken:     4 x


In [234]:
doDiffs(startLK=27434, startAF=27337, steps=2, show=True, debug=True)

[27434~27337] single comparison with strictness 1
[27442~27338] single comparison with strictness 1
[27435~27338] left 7-jump to 27442
27434                     kmā @1 ~0.8 kā                      27337
27435                   nsbth @99~0.0                       0      
27436                 ālfwḳyŧ @99~0.0                       0      
27437                    ālyh @99~0.0                       0      
27438                      fy @99~0.0                       0      
27439                    ḳwlh @99~0.0                       0      
27440                  yḫāfwn @99~0.0                       0      
27441                    rbhm @99~0.0                       0      
27442                      mn @1 ~0.5 ān                      27338
 3 taken:     1 x
 9 taken:     1 x


In [277]:
doDiffs(startLK=4595, startAF=4552, steps=10, show=True, debug=True)

[4595~4552] single comparison with distance <= 1 and ratio >= 0.8
[4596~4553] single comparison with distance <= 3 and ratio >= 0.6
[4597~4554] special case (1, 1)
[4598~4555] special case (1, 1)
[4599~4556] single comparison with distance <= 0 and ratio >= 1.0
[4600~4557] special case (0, 4)
[4600~4561] single comparison with distance <= 0 and ratio >= 1.0
[4601~4562] single comparison with distance <= 0 and ratio >= 1.0
[4602~4563] single comparison with distance <= 0 and ratio >= 1.0
[4603~4564] single comparison with distance <= 0 and ratio >= 1.0
 4595                  sbwḥyŧ @1 ~0.8 sbwḥyh                   4552
 4596                      fy @1 ~0.7 y                        4553
 4597 1                  klmŧ @88~0.0 lʿh                   1  4554
 4598 1                 nwḥyŧ @88~0.0 wḥh                   1  4555
 4599                    āʿlm @0 ~1.0 āʿlm                     4556
      0                       @88~0.0 āydlk                 4  4557
      0                       @88~

In [286]:
doDiffs(startLK=16197, startAF=16133, steps=10, show=True, debug=True)

[16197~16133] single comparison with distance <= 0 and ratio >= 1.0
[16198~16134] special case (1, 1)
[16199~16135] special case (1, 1)
[16200~16136] special case (1, 1)
[16201~16137] special case (1, 1)
[16202~16138] single comparison with distance <= 0 and ratio >= 1.0
[16203~16139] single comparison with distance <= 0 and ratio >= 1.0
[16204~16140] single comparison with distance <= 0 and ratio >= 1.0
[16205~16141] single comparison with distance <= 0 and ratio >= 1.0
[16206~16142] single comparison with distance <= 0 and ratio >= 1.0
16197                      fy @0 ~1.0 fy                      16133
16198 1              ālʿārfyn @88~0.0 ālʿārf                1 16134
16199 1                   tḳf @88~0.0 yḳf                   1 16135
16200 1                 ʿndhā @88~0.0 ʿndhā                 1 16136
16201 1                    bl @88~0.0 l                     1 16137
16202                      hw @0 ~1.0 hw                      16138
16203                  ālʿārf @0 ~1.0 ālʿārf    

In [291]:
doDiffs(startLK=16210, startAF=16146, steps=15, show=True, debug=True)

[16210~16146] single comparison with distance <= 0 and ratio >= 1.0
[16211~16147] single comparison with distance <= 0 and ratio >= 1.0
[16212~16148] special case (1, 1)
[16213~16149] single comparison failed with distance <= 0 and ratio >= 1.0
[16213~16149] single comparison recovered with distance <= 0 and ratio >= 1.0
[16215~16151] single comparison with distance <= 0 and ratio >= 1.0
[16216~16152] (2, 3) comparison with distance <= 1 and ratio >= 0.8
[16218~16155] (2, 3) comparison with distance <= 2 and ratio >= 0.7
[16220~16158] single comparison with distance <= 0 and ratio >= 1.0
[16221~16159] single comparison with distance <= 0 and ratio >= 1.0
[16222~16160] (2, 1) comparison with distance <= 0 and ratio >= 1.0
[16224~16161] single comparison with distance <= 0 and ratio >= 1.0
[16225~16162] single comparison with distance <= 0 and ratio >= 1.0
[16226~16163] single comparison with distance <= 0 and ratio >= 1.0
[16227~16164] (2, 1) comparison with distance <= 0 and ratio >= 1