# Wrangling GBI Alignments into Workflow

GBI has graciously provided alignments for the ESV, KJV, and NIV.
Unfortunately those alignments are closed-source, so this notebook
will only provide code used for interpreting the aligned JSON file

In [1]:
import collections
import re
import pandas as pd
import json
import unicodedata as unicode
from pathlib import Path
from pprint import pprint
import Levenshtein

# import BHSA data with TF to look at alignment possibilities
from tf.app import use

# organize pathways
PROJ_DIR = Path.home().joinpath('github/CambridgeSemiticsLab/translation_traditions_HB')
GBI_DATA_DIR = PROJ_DIR.joinpath('data/_private_/GBI_alignment')

# load BHSA data
bhsa = use('bhsa')
api = bhsa.api
F, E, T, L, Fs, = api.F, api.E, api.T, api.L, api.Fs

# Explore GBI Data

In [2]:
file2data = {}
for file in GBI_DATA_DIR.glob('*.json'):
    if 'ot' in file.name:
        file2data[file.stem] = json.loads(file.read_text())
        
print('keys:', file2data.keys())

keys: dict_keys(['niv84.ot.alignment', 'kjv.ot.alignment', 'esv.ot.alignment'])


In [3]:
# let's check out the NIV OT alignment

niv_data = file2data['niv84.ot.alignment']

len(niv_data)

23202

In [4]:
ex_verse = niv_data[1]

# parts of individual entry
ex_verse.keys()

dict_keys(['manuscript', 'translation', 'links'])

In [5]:
#ex_verse['links'] # look at the alignment list

In [6]:
# experiment with multiple word link: "was hovering"

def get_trans_text(manu, trans, verse):
    """Join text/translated words for comparison"""
    heb_txt = ' '.join(verse['manuscript']['words'][h]['text'] for h in manu).strip('\u200e')
    eng_txt = ' '.join(verse['translation']['words'][e]['text'] for e in sorted(trans))
    return (heb_txt, eng_txt)
    
for manu, trans in ex_verse['links']:
    if len(trans) > 1:
        trans = sorted(trans)
        heb_txt, eng_txt = get_trans_text(manu, trans, ex_verse)
        print(f'{heb_txt} -> {eng_txt}')

פְּנֵ֣י -> surface of
ר֣וּחַ -> Spirit of
מְרַחֶ֖פֶת -> was hovering


In [7]:
verb_dataset = []

# experiment with collecting verbs in HB
for verse in niv_data:
    for manu, trans in verse['links']:
        _mainword_ = manu[0]
        if verse['manuscript']['words'][_mainword_]['pos'] == 'verb':
            heb_txt, eng_txt = get_trans_text(manu, trans, verse)
            verb_dataset.append((heb_txt, eng_txt))
            
verb_dataset[:5]

[('בָּרָ֣א', 'created'),
 ('הָיְתָ֥ה', 'was'),
 ('מְרַחֶ֖פֶת', 'was hovering'),
 ('יֹּ֥אמֶר', 'said'),
 ('יְהִ֣י', 'Let there be')]

In [8]:
verb_dataset[:10]

[('בָּרָ֣א', 'created'),
 ('הָיְתָ֥ה', 'was'),
 ('מְרַחֶ֖פֶת', 'was hovering'),
 ('יֹּ֥אמֶר', 'said'),
 ('יְהִ֣י', 'Let there be'),
 ('יְהִי־', 'there was'),
 ('יַּ֧רְא', 'saw'),
 ('יַּבְדֵּ֣ל', 'separated'),
 ('יִּקְרָ֨א', 'called'),
 ('קָ֣רָא', 'he called')]

In [9]:
#ex_verse['translation']['words'][:2]

In [10]:
#ex_verse['manuscript']['words'][:2]

In [11]:
# ex_verse['manuscript']['words']

# Alignment with BHSA

We will seek to align the Hebrew texts on the basis of the consonantal text. For this, we take
some cues from Dirk Roorda's alignment efforts between BHSA and OSM: 
https://github.com/ETCBC/bridging/blob/master/programs/BHSAbridgeOSM.ipynb

The essential issue that makes alignment difficult is word divisions, which differ between various 
texts. There is an added difficulty with the GBI Hebrew data, which is based on Westminster Leningrad
Codex (WLC), since that texts sometimes goes with the ketiv or other times with qere.

## Alignment Strategy

One option to align the texts would be to iterate word-by-word while keeping track of the current
position in the text. If BHSA were the reference point, we'd iterate over all word nodes in BHSA
and attempt to match with the next WLC word in the set; this would require us to advance the position
until a match is composed in cases where BHSA word is longer than WLC, while vice versa when BHSA is 
shorter than WLC. Dirk Roorda has followed this strategy in the BHSA // OSM alignment.

Another option, which we shall follow here, is to use verse identity to map positions in 
both BHSA and WLC to a common reference string. For example, consider the following string:

> "A cat jumped up"

Let there be textA and textB, which each index this string differently as follows:

```
            0        1        2
textA = ["A cat", "jumped", "up"]

          0     1        2       3
textB = ["A", "cat", "jumped", "up"]
```

The underlying string for both texts is the same however, and can be joined in such a 
way as to produce an identical string with its own indices: 

```
          0 1 2 3 4 5 6 7 8 9 10 11
string = "A c a t j u m p e d  u  p"
          
```

We can use this identity property as a common reference point to which the text positions can
be mapped and translated:

```
 [ 0 ] -> [0, 1, 2, 3] <- [0, 1] 
textA        string        textB

[1] -> [4, 5, 6, 7, 8, 9] <- [2]
textA        string         textB

[2] -> [10, 11] <- [3]
textA   string     textB
```

Using these mappings, we can produce an alignment:

```
[
    [[0], [0, 1]],
    [[1], [2]],
    [[2], [3]]
]
```

Since verses are identical between the two sources, we need only ensure that verse strings
are likewise identical wherever possible. This is done by converting the text in both
sources to its consonantal form, stripping it of characters unique to each source, stripping
spaces, and joining the text on nothing to create an indexable string. 

## Preprocess verse strings

The first step here is to build the verse strings which can be used for indexing. 
This is also necessary to demonstrate that our strategy will work, and also to 
catch those exceptional cases where it won't.

We need to do a few things in this preprocessing stage:

    1. recognize verse id tags from GBI dataset and convert to TF reference tuples
    2. GBI alignments are organized into lists, which mostly correspond with verses,
        but not always. To aid the alignment work, we should fix this discrepancy by
        following the Hebrew versification and utilizing word id links rather than 
        indices.
    3. prepare strings and indices for alignment
   
    
### Exploratory analysis of word ids

We look at the length of GBI word ids to write a regex pattern that can
recognize each of the parts. The word ids contain versification information.

In [12]:
# reference data stored at word level in 'id' key
# I expect 'id' length to vary by 1, thus 11 or 12

id_lengths = collections.Counter()

for verse in niv_data:
    for word in verse['manuscript']['words']:
        id_lengths[len(str(word['id']))] += 1
        
id_lengths.most_common()

[(12, 295168), (11, 179844)]

Some book ID integers have 11 for single-digit books. We'll correct for this in `id2etcbc` by 
adding a 0 padding to normalize this difference for the regex matching.

### 1. Function for converting ids to TF reference tuples

GBI uses English book order, so we need to make an int 2 book mapping using that order.

In [13]:
# eng book order
eng_book_list = '''
Genesis
Exodus
Leviticus
Numbers
Deuteronomy
Joshua
Judges
Ruth
1 Samuel
2 Samuel
1 Kings
2 Kings
1 Chronicles
2 Chronicles
Ezra
Nehemiah
Esther
Job
Psalms
Proverbs
Ecclesiastes
Song of songs
Isaiah
Jeremiah
Lamentations
Ezekiel
Daniel
Hosea
Joel
Amos
Obadiah
Jonah
Micah
Nahum
Habakkuk
Zephaniah
Haggai
Zechariah
Malachi
'''.strip().replace(' ', '_').split()

# map to integers
int2book = {
    i+1: book for i, book in enumerate(eng_book_list)
}

# regex pattern for matching word ID info to its parts
# e.g. ('01', '001', '001', '001', '1')
# i.e. (bookN, chapterN, verseN, wordN, partN)
ref_id_re = re.compile('([0-9]{2})([0-9]{3})([0-9]{3})([0-9]{3})([1-9])')

def id2ref(id_int):
    """Convert GBI ID ref tag to TF ref tuple"""
    
    id_str = str(id_int)
    
    # fix ambiguity with lack of book padding in single-digit books
    if len(id_str) == 11:
        id_str = '0' + id_str
        
    # match parts
    bookN, chapterN, verseN, wordN, partN = ref_id_re.match(id_str).groups()
    
    book = int2book.get(int(bookN))
    chapter = int(chapterN)
    verse = int(verseN)
    
    return (book, chapter, verse)

Now let's do a sanity check for book order. I'll do this by looking at the 
verse counts for each book. If we've got the book order wrong, we'll see some
anomalous counts.

[postscript: there was an anomaly when I first tried with Heb. book order, with
Jonah showing a verse count of 2,523 😂, now it is fixed with Eng order]

In [14]:
verse_data = []

# compile a dataset for easy dataframe statistics
for verse in niv_data:
    first_word = verse['manuscript']['words'][0]
    book, chapter, verse = id2ref(first_word['id'])
    verse_data.append({'book': book, 'chapter': chapter, 'verse': verse})
    
verse_data_df = pd.DataFrame(verse_data)

In [15]:
verse_data_df.head()

Unnamed: 0,book,chapter,verse
0,Genesis,1,1
1,Genesis,1,2
2,Genesis,1,3
3,Genesis,1,4
4,Genesis,1,5


In [16]:
# count n verses by book

verse_data_df.book.value_counts()

Psalms           2523
Genesis          1533
Jeremiah         1364
Isaiah           1291
Numbers          1288
Ezekiel          1273
Exodus           1213
Job              1070
Deuteronomy       959
1_Chronicles      942
Proverbs          915
Leviticus         859
2_Chronicles      822
1_Kings           813
1_Samuel          810
2_Kings           719
2_Samuel          695
Joshua            658
Judges            618
Nehemiah          405
Daniel            357
Ezra              280
Ecclesiastes      222
Zechariah         211
Hosea             197
Esther            167
Lamentations      154
Amos              146
Song_of_songs     117
Micah             105
Ruth               85
Joel               73
Habakkuk           56
Malachi            55
Zephaniah          53
Jonah              48
Nahum              47
Haggai             38
Obadiah            21
Name: book, dtype: int64

In [17]:
len(L.d(T.nodeFromSection(('Job',)), 'verse')) # sanity check Job verse length

1070

Everything seems to be in order.

### 2. Preprocess GBI data and convert to dict of word ids with links paired to each word

In [18]:
gbi_words = {}

for pseudo_verse in niv_data:
    
    # unpack hebrew, english, and alignment data
    hebrew_words = pseudo_verse['manuscript']['words']
    english_words = pseudo_verse['translation']['words']
    links = pseudo_verse['links']
    
    # enter all hebrew words into gbi words
    # words not included in the translation are not included in the links
    # e.g. את in Gen 1:1 has no entry in links
    # so we must initialize all words with an empty links item
    for hw in hebrew_words:
        hw_id = hw['id']
        hw['tf_ref'] = id2ref(hw_id) # store TF tuple ref
        hw['links'] = [] # compile linked english word data here
        hw['trans_span'] = [] # see below
        gbi_words[hw_id] = hw

    # add alignment to words that have it
    for link in links:
    
        heb_indices, eng_indices = link
            
        # we need to keep track of those cases where
        # multiple Hebrew words are covered by the same
        # english translation string; these are cases where
        # the Hebrew side of the link contains more than 1 element
        # thus we build a list of all ids in the Heb side that will be 
        # added as a key of the word's dictionary, trans_span
        heb_ids = [hebrew_words[i]['id'] for i in heb_indices]
        
        # store links and verse id data under word dict
        # and prepare word dict to be stored in gbi_words
        for hi in heb_indices:
            hword_id = hebrew_words[hi]['id']
            hword = gbi_words[hword_id]
            hword['trans_span'] = sorted(heb_ids) # list of peer Hebrew words in translation
            
            # collect linked english word data
            for ei in eng_indices:
                eword = english_words[ei]
                hword['links'].append(eword)
                
            # sort the english links 
            # preprocess english gloss tag
            # put it in the dict
            hword['links'] = sorted(hword['links'], key=lambda k: k['id'])

In [19]:
# # print example with multiple links in translation
# for w, wdat in gbi_words.items():
#     if len(wdat['links']) > 2:
#         pprint(wdat)
#         break

In [20]:
# gbi_words[10010010041] # NB empty links

### 3. Build normalized verse strings with maps to ids

We will also do this for BHSA, with maps to word node number.

In [21]:
# define patterns and functions which are used to
# normalize both Hebrew texts to a plain consonantal
# version without punctuation or spacing

# pattern matches only Heb consonants for filtering
heb_cons = re.compile('[\u05D0-\u05EA]')

# to normalize final letters
final_letters = {
    'ך':\
    'כ',
    'ם':\
    'מ',
    'ן':\
    'נ',
    'ף':\
    'פ',
    'ץ':\
    'צ',
}

def unFinal(s): 
    """"Replace final Heb letters with non-final version.
    
    Credit Dirk Roorda
    """
    return ''.join(final_letters.get(c, c) for c in s)

def normalize_string(string):
    """Normalize BHSA/WLC strings to make them comparable."""
    string = unicode.normalize('NFD', string) # normalize chars
    string = ''.join(heb_cons.findall(string)) # strip vowels/points/other chars
    string = unFinal(string) # disambiguate final letters
    string = string.replace('\u200e', '') # remove RL character (GBI)
    string = string.replace(' ', '') # remove any latent spaces
    return string

For BHSA, we need to make two verse strings: one for ketiv, one for qere. Thus,
we will plan to have multiple strings and associated indices for each one for both
bhsa and gbi verses. The algorithm will then pick out any matching pair and use that as
the basis for the alignment. Allowing multiple strings for GBI/WLC allows for the 
possibility of solving exceptions with manually added strings. 

In [53]:
def build_string_data(word_ids, string_instructs):
    """Build strings from words with index maps.
    
    Uses the data in string_instructs to convert each word
    into a string and add that string to a large string. 
    Maps each word id to a span of character indices for the larger
    string, telling which indices correspond with a given word.
    
    Args:
        word_ids: list of word ids unique to BHSA or WLC
        string_instructs: list of paired string names / string-making functions
            where string-maker takes a word id and converts to string
        
    Returns:
        list of three-tuples of name, string, and index mappings for words.
    """
    indices = collections.defaultdict(list)
    string_data = []
    
    # build data for each string type
    for name, stringifier in string_instructs:
        
        string = ''
        index = -1
        mapping = collections.defaultdict(list)
        
        for word in word_ids:
            for c in stringifier(word):
                string += c
                index += 1
                mapping[word].append(index)
                
        string_data.append((name, string, mapping))
    
    return string_data

In [55]:
# -- build strings and indices for both BHSA and WLC -- 
# store them in verse_strings for alignment in next step

verse_strings = collections.defaultdict(dict)
    
# -- 1. BHSA --

# define a bunch of stringifier functions
# the variant strings produced by these
# functions will be compared pairwise with the 
# gbi strings to look for any pairwise match; 
# having numerous variants allows for a more robust
# matching process

def bhsa_qere(word):
    """Generate qere strings from BHSA word"""
    string = F.qere_utf8.v(word) or F.g_word_utf8.v(word)
    string = normalize_string(string)
    return string
    
def bhsa_ketiv(word):
    """Generate ketiv strings from BHSA word"""
    string = F.g_word_utf8.v(word)
    string = normalize_string(string)
    return string

def bhsa_qere_art(word):
    """Generate qere with article"""
    string = bhsa_qere(word)
    if not string and F.lex.v(word) == 'H':
        string = 'ה'
    return string
    
def bhsa_ketiv_art(word):
    """Generate ketiv with article"""
    string = bhsa_ketiv(word)
    if not string and F.lex.v(word) == 'H':
        string = 'ה'
    return string

# iterate through all BHSA verses and build strings/indices
for verse in F.otype.s('verse'):
    ref_tuple = T.sectionFromNode(verse)
    verse_words = L.d(verse, 'word')
    string_instructs = [
        ('qere', bhsa_qere),
        ('ketiv', bhsa_ketiv),
        ('qere+art', bhsa_qere_art),
        ('ketiv+art', bhsa_ketiv_art),
    ]
    string_data = build_string_data(verse_words, string_instructs)
    verse_strings['bhsa'][ref_tuple] = string_data
    
# -- 2. WLC --

def gbi_string(word):
    """Generate string for GBI word"""
    string = gbi_words[word]['text']
    string = normalize_string(string)
    return string

def gbi_string_art(word):
    """Build string that has vocalized article
    
    Vocalization of articles can differ between
    the two sources.
    """
    string = gbi_string(word)
    if not string and gbi_words[word]['lemma'] == 'הַ':
        string = 'ה'
    return string

# cluster GBI/WLC words into verses
gbi_verses = collections.defaultdict(list)
for word, word_data in gbi_words.items():
    gbi_verses[word_data['tf_ref']].append(word_data['id'])

# iterate through verses and build strings/indices
for ref_tuple, words in gbi_verses.items():
    string_instructs = [
        ('string', gbi_string),
        ('string+art', gbi_string_art),
    ]
    string_data = build_string_data(words, string_instructs)
    verse_strings['gbi'][ref_tuple] = string_data

Look at a sample of the mappings / strings:

In [56]:
verse_strings['bhsa'][('Genesis', 1, 1)]

[('qere',
  'בראשיתבראאלהימאתהשמימואתהארצ',
  defaultdict(list,
              {1: [0],
               2: [1, 2, 3, 4, 5],
               3: [6, 7, 8],
               4: [9, 10, 11, 12, 13],
               5: [14, 15],
               6: [16],
               7: [17, 18, 19, 20],
               8: [21],
               9: [22, 23],
               10: [24],
               11: [25, 26, 27]})),
 ('ketiv',
  'בראשיתבראאלהימאתהשמימואתהארצ',
  defaultdict(list,
              {1: [0],
               2: [1, 2, 3, 4, 5],
               3: [6, 7, 8],
               4: [9, 10, 11, 12, 13],
               5: [14, 15],
               6: [16],
               7: [17, 18, 19, 20],
               8: [21],
               9: [22, 23],
               10: [24],
               11: [25, 26, 27]})),
 ('qere+art',
  'בראשיתבראאלהימאתהשמימואתהארצ',
  defaultdict(list,
              {1: [0],
               2: [1, 2, 3, 4, 5],
               3: [6, 7, 8],
               4: [9, 10, 11, 12, 13],
               5: [14, 

In [57]:
verse_strings['gbi'][('Genesis', 1, 1)]

[('string',
  'בראשיתבראאלהימאתהשמימואתהארצ',
  defaultdict(list,
              {10010010011: [0],
               10010010012: [1, 2, 3, 4, 5],
               10010010021: [6, 7, 8],
               10010010031: [9, 10, 11, 12, 13],
               10010010041: [14, 15],
               10010010051: [16],
               10010010052: [17, 18, 19, 20],
               10010010061: [21],
               10010010062: [22, 23],
               10010010071: [24],
               10010010072: [25, 26, 27]})),
 ('string+art',
  'בראשיתבראאלהימאתהשמימואתהארצ',
  defaultdict(list,
              {10010010011: [0],
               10010010012: [1, 2, 3, 4, 5],
               10010010021: [6, 7, 8],
               10010010031: [9, 10, 11, 12, 13],
               10010010041: [14, 15],
               10010010051: [16],
               10010010052: [17, 18, 19, 20],
               10010010061: [21],
               10010010062: [22, 23],
               10010010071: [24],
               10010010072: [25, 26, 27

Demonstration of string match:

In [58]:
verse_strings['bhsa'][('Genesis', 1, 1)][0][1] == verse_strings['gbi'][('Genesis', 1, 1)][0][1]

True

## Alignment Algorithm

Now that the strings are ready, it is only a matter of iterating over the verses,
matching up the strings, and cross-referencing the indices.

In [59]:
bhsa2wlc = []
no_match = []

for ref_tuple, string_data in verse_strings['bhsa'].items():
        
    gbi_strings = verse_strings['gbi'][ref_tuple]
        
    # look for matching strings with pairwise iteration
    # if match, save indices for alignment
    bhsa_indices = None
    gbi_indices = None
    for b_str_name, b_str, b_indices in string_data:
        for g_str_name, g_str, g_indices in gbi_strings:
            if b_str == g_str:
                bhsa_indices = b_indices # set indices
                gbi_indices = g_indices
                break
        if bhsa_indices: # break double loop
            break
    
    # -- no match: record a null match --
    if not bhsa_indices:
        no_match.append([ref_tuple, string_data, gbi_strings])
        continue
        
    # -- match! continue on to alignment maneuver --     
    
    # remap gbi string indices to be the keys for easy selection
    gbi_str2word = {
        str_index:word_id for word_id, indices in gbi_indices.items()
            for str_index in indices
    }
    
    # finally, make the matches by iterating through 
    # string indices matched to wordnode, add to set 
    # to avoid duplicates
    for wordnode, str_indices in bhsa_indices.items():
        
        aligned_gbi = set()
        
        for si in str_indices:
            gbi_word = gbi_str2word[si]            
            aligned_gbi.add(gbi_word)
                        
        # done! save result
            
        # check to see if WLC word has already been matched
        # with previous BHSA word; if so, expand BHSA side
        # of the alignment instead
        if bhsa2wlc and aligned_gbi.issubset(set(bhsa2wlc[-1][1])):
            bhsa2wlc[-1][0].append(wordnode)        
        else:
            bhsa2wlc.append([[wordnode], sorted(aligned_gbi)])
        
print(len(bhsa2wlc), 'matches made')
print(len(no_match), 'matches missed')

419079 matches made
33 matches missed


In [65]:
# Bridge missing links

def find_word_from_index(index_set, index_dict):
    """Selects a word from string index dict"""
    for word, indices in index_dict.items():
        if index_set.issubset(set(indices)):
            return word
        
def build_edits(stringset1, stringset2, debug=False):
    """Iterate through 2 stringsets and look for the differences
    
    Needed to manually correct unlinked verses
    """
    
    # find closest pairwise match with edit distance
    scores = []
    for namei, stri, indi in stringset1:
        for namej, strj, indj in stringset2:
            scores.append((Levenshtein.distance(stri, strj),  (namei, stri, indi), (namej, strj, indj)))
    closest_set = sorted(scores)[0]
    
    # calculate edits necessary
    source_set, dest_set = closest_set[1:]
    source_str, dest_str = source_set[1], dest_set[1]
    source_inds, dest_inds = source_set[2], dest_set[2]
    edit_ops = Levenshtein.editops(source_str, dest_str)
    
    # use edit instructions to find and fix
    # offending words, note "i" refers to index
    alterations = []
    for op, source_i, dest_i in edit_ops:
        
        offending_word = find_word_from_index({source_i}, source_inds)
        ow_text = normalize_string(gbi_words[offending_word]['text'])
        
        # edit op indices are relative to the whole verse string
        # in order to relate it to a single word, must 
        # first find out which index that word starts with in verse string
        # and adjust with the difference accordingly
        orig_source_i = source_i # keep copy for debug
        first_i = source_inds[offending_word][0]
        source_i = source_i - first_i
        
        # apply corrections using indices 
        if op == 'delete':
            mod_text = ow_text[:source_i] + ow_text[source_i+1:]
        elif op == 'insert':
            ins_char = dest_str[dest_i]
            mod_text = ow_text[:source_i] + ins_char + ow_text[source_i:]
        elif op == 'replace':
            repl_char = dest_str[dest_i]
            mod_text= ow_text[:source_i] + repl_char + ow_text[source_i+1:]
        
        # save corrections
        alterations.append({'id':offending_word, 'original': ow_text, 'mod': mod_text})

        # provide printout of activity
        if debug:
            print(op)
            print(source_set[0], source_str)
            print(dest_set[0], dest_str)
        
    return alterations
   

# store edits here
gbi_word_alts = []
debug = False
    
for nm in no_match:
    verse, bhsa_str, gbi_str = nm
    alterations = build_edits(gbi_str, bhsa_str, debug)
    if debug:
        print(verse)
        print(alterations)
        print()
        print('-'*60)
    else:
        gbi_word_alts.extend(alterations)

if not debug:
    print(len(gbi_word_alts), 'alterations prepared')

35 alterations prepared


In [66]:
gbi_word_alts

[{'id': 10140020171, 'original': 'צבויימ', 'mod': 'צביימ'},
 {'id': 90250180131, 'original': 'עשוית', 'mod': 'עשויות'},
 {'id': 100180080051, 'original': 'נפצת', 'mod': 'נפצית'},
 {'id': 100180120062, 'original': 'לוא', 'mod': 'לא'},
 {'id': 120160060152, 'original': 'אדומימ', 'mod': 'אדמימ'},
 {'id': 120190230072, 'original': 'רב', 'mod': 'רכב'},
 {'id': 230300050031, 'original': 'הביש', 'mod': 'הבאיש'},
 {'id': 230420240042, 'original': 'משסה', 'mod': 'משוסה'},
 {'id': 240050060221, 'original': 'משובותי', 'mod': 'משבותי'},
 {'id': 240150110061, 'original': 'שריתי', 'mod': 'שרית'},
 {'id': 240180160051, 'original': 'שריקות', 'mod': 'שריקת'},
 {'id': 240420200031, 'original': 'התעיתמ', 'mod': 'התעתמ'},
 {'id': 240480050042, 'original': 'לוחית', 'mod': 'לחית'},
 {'id': 260320320052, 'original': 'י', 'mod': 'ו'},
 {'id': 260400310013, 'original': 'ו', 'mod': 'יו'},
 {'id': 260430110122, 'original': 'ו', 'mod': 'יו'},
 {'id': 260440240091, 'original': 'ישפטו', 'mod': 'ישפט'},
 {'id': 1914