<img align="right" src="images/tf-small.png"/>

# Strong numbers

# Application: Strong numbers
Stephen Ku has prepared a Strong number mapping for version `4`, based on 
[OpenScriptures Bible Lexicon](https://github.com/openscriptures/HebrewLexicon).

This provides us with a nice use case:
can we apply the Strong number mapping for version `4` to versions `3`, `4b` and `2016`
as well?
Below we will get a pretty good view on the differences between the versions.
We use the
[BHSA transcription](https://shebanq.ancient-data.org/shebanq/static/docs/BHSA-transcription.pdf)
to write down the diffs.

In [1]:
import os,collections
from tf.fabric import Fabric

We need a map from a version to its previous version.

In [19]:
versions = ['4', '4b', '4c']
locations = {
    '4': '~/github/text-fabric-data-legacy',
    '4b': '~/github/text-fabric-data-legacy',
    '4c': '~/github/text-fabric-data', 
}

preVersion = dict(((v, versions[i]) for (i,v) in enumerate(versions[1:])))
preVersion

{'4b': '4', '4c': '4b'}

Load all versions in one go!
For each version we load the `omap` feature that maps the slots from the previous version to the slots of this version.

In [20]:
TF = {}
api = {}
for v in versions:
    omap = '' if v == '4' else 'omap@{}-{}'.format(preVersion[v], v)
    TF[v] = Fabric(locations=locations[v], modules='hebrew/etcbc{}'.format(v))
    api[v] = TF[v].load('''
        {} lex
    '''.format(omap))

A4 = api['4']
A4b = api['4b']
A4c = api['4c']

This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
110 features found and 0 ignored
  0.00s loading features ...
   |     0.16s B lex                  from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4
   |     0.00s Feature overview: 105 nodes; 4 edges; 1 configs; 7 computeds
  5.55s All features loaded/computed - for details use loadLog()
This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs   

# Strong numbers

Let us apply the maps for the purpose of assigning Strong numbers to the words of the versions 4b and 4c.
We have a mapping for 4, compiled as a csv file by Stephen Ku from the OpenScriptures data.

First we perform a basic check on the Strong numbers as provided for version 4.

In [21]:
STRONG = 'hebrew/strong'
strongDir = '{}/{}'.format(os.path.expanduser(locations['4c']), STRONG)
strongFile = '{}/{}'.format(strongDir, 'MonadStrong.csv')
strongs = {}

In [22]:
strongs['4'] = {}
first = True
with open(strongFile, encoding='utf-16') as fh:
    for line in fh:
        if first:
            first = False
            continue
        (slot, strong) = line.rstrip().split(',', 1)
        strongs['4'][int(slot)] = strong

## Consistency check

Do slots with the same lexemes get identical Strong numbers?

In [23]:
def checkConsistency(v):
    strongFromLex = collections.defaultdict(set)
    lexFromStrong = collections.defaultdict(set)

    for n in api[v].F.otype.s('word'):
        if n in strongs[v]:
            strongFromLex[api[v].F.lex.v(n)].add(strongs[v][n])
            lexFromStrong[strongs[v][n]].add(api[v].F.lex.v(n))


    multipleStrongs = set()
    for (lx, strongset) in strongFromLex.items():
        if len(strongset) > 1:
            multipleStrongs.add(lx)

    multipleLexs = set()
    for (st, lexset) in lexFromStrong.items():
        if len(lexset) > 1:
            multipleLexs.add(lx)

    print('{} lexemes with multiple Strong numbers'.format(len(multipleStrongs)))
    print('{} Strong numbers with multiple lexemes'.format(len(multipleStrongs)))
    for lx in sorted(multipleStrongs)[0:10]:
        print('{}: {}'.format(lx, ', '.join(sorted(strongFromLex[lx]))))

In [24]:
checkConsistency('4')

1226 lexemes with multiple Strong numbers
1226 Strong numbers with multiple lexemes
<BD/: 5649, 5650
<BD[: 5647, 5648
<BD_NGW/: 5665, 5838
<BJ/: 5645, 5672
<BR/: 5675, 5676
<CQ[: 6217, 6231
<CT[: 6245 b, 6246
<D: 5703, 5704, 5705
<D/: 5703, 5704
<DH[: 5709, 5710 b


Obviously not. The ETCBC lexemes and the Strong numbers are different classification systems for word occurrences in the Bible!

# Map the Strong numbers

In [25]:
strongs['4b'] = {}
for (n, s) in strongs['4'].items():
    for m in A4b.Es('omap@4-4b').f(n):
        strongs['4b'][m] = s

In [26]:
strongs['4c'] = {}
for (n, s) in strongs['4b'].items():
    for m in A4c.Es('omap@4b-4c').f(n):
        strongs['4c'][m] = s

# Check consistency again

Now in the new versions.

In [27]:
checkConsistency('4b')

1219 lexemes with multiple Strong numbers
1219 Strong numbers with multiple lexemes
<BD/: 5649, 5650
<BD[: 5647, 5648
<BD_NGW/: 5665, 5838
<BJ/: 5645, 5672
<BR/: 5675, 5676
<CQ[: 6217, 6231
<CT[: 6245 b, 6246
<D: 5703, 5704, 5705
<D/: 5703, 5704
<DH[: 5709, 5710 b


In [28]:
checkConsistency('4c')

1219 lexemes with multiple Strong numbers
1219 Strong numbers with multiple lexemes
<BD/: 5649, 5650
<BD[: 5647, 5648
<BD_NGW/: 5665, 5838
<BJ/: 5645, 5672
<BR/: 5675, 5676
<CQ[: 6217, 6231
<CT[: 6245 b, 6246
<D: 5703, 5704, 5705
<D/: 5703, 5704
<DH[: 5709, 5710 b


That looks good.

# Writing the Strong numbers

In [29]:
nodeFeatures = {}
provenance = dict(
    source='Strong numbers provided by https://github.com/openscriptures/HebrewLexicon',
    author='Compiled for ETCBC by Stephen Ku; transferred across versions by Dirk Roorda',
)

for v in versions:
    metaData = {
        '': provenance,
        'otext@strong': {
            'about': 'Provides Strong numbers to Hebrew Words',
            'see': 'https://github.com/ETCBC/text-fabric/blob/master/Versions/strong.ipynb',
            'fmt:lex-strong-plain': '{strong} ',
        },
        'strong': {
            'valueType': 'str',
        },
    }
    nodeFeatures = dict(strong=strongs[v])
    TF[v].save(
        module='hebrew/strong/{}'.format(v),
        nodeFeatures=nodeFeatures,
        metaData=metaData,
    )

  0.00s Exporting 1 node and 0 edge and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4:
   |     0.73s T strong               to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4
   |     0.00s M otext@strong         to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4
  0.74s Exported 1 node features and 0 edge features and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4
  0.00s Exporting 1 node and 0 edge and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b:
   |     0.72s T strong               to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b
   |     0.00s M otext@strong         to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b
  0.72s Exported 1 node features and 0 edge features and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b
  0.00s Exporting 1 node and 0 edge and 1 config features to /Users/dirk/github/

# Using Strong numbers

Let us load the new `strong` feature in the newest ETCBC version, `4c`.

In [13]:
TF = Fabric(modules=['hebrew/etcbc4c', 'hebrew/strong/4c'])
api = TF.load('''
        g_word_utf8
        lex strong
''')
api.makeAvailableIn(globals())

This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
110 features found and 0 ignored
  0.00s loading features ...
   |     0.21s B g_word_utf8          from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.29s B strong               from /Users/dirk/github/text-fabric-data/hebrew/strong/4c
   |     0.13s B lex                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.00s Feature overview: 103 nodes; 5 edges; 2 configs; 7 computeds
  5.96s All features loaded/computed - for details use loadLog()


We print a few verses of Genesis in lexeme and in strong representation.
The module `strong` defines a new text format!

In [14]:
(book, chapter) = ('Genesis', 1)

for verse in range(1,4):
    vn = T.nodeFromSection((book, chapter, verse))
    words = L.d(vn, otype='word')
    for fmt in ('lex-trans-plain', 'lex-strong-plain'):
        print('{} {}:{} ({})\n\t{}'.format(
            book, chapter, verse, fmt,
            T.text(words, fmt=fmt)
        ))

Genesis 1:1 (lex-trans-plain)
	B R>CJT BR> >LHJM >T H CMJM W >T H >RY 
Genesis 1:1 (lex-strong-plain)
	8675 7225 1254 a 430 853 8676 8064 8678 853 8676 776 
Genesis 1:2 (lex-trans-plain)
	W H >RY HJH THW W BHW W XCK <L PNH THWM W RWX >LHJM RXP <L PNH H MJM 
Genesis 1:2 (lex-strong-plain)
	8678 8676 776 1961 8414 8678 922 8678 2822 5921 a 6440 8415 8678 7307 430 7363 b 5921 a 6440 8676 4325 
Genesis 1:3 (lex-trans-plain)
	W >MR >LHJM HJH >WR W HJH >WR 
Genesis 1:3 (lex-strong-plain)
	8678 559 430 1961 216 8678 1961 216 


# Divergence between lexemes and Strong

As we noted when constructing the Strong features, there is no 1-1 correspondence between ETCBC lexemes and Strong numbers. Let us inspect a few cases where they diverge.

We reimplement something like `consistencyCheck()` above, but now based on the active `strong` feature.
And we collect the slots that exhibit one lexeme with several Strong numbers and vice versa.

So let's just collect all relevant information.

In [15]:
strongLex = collections.defaultdict(lambda: collections.defaultdict(set))
lexStrong = collections.defaultdict(lambda: collections.defaultdict(set))

indent(reset=True)
info('Gathering lexemes and Strongs')
for n in F.otype.s('word'):
    lex = F.lex.v(n)
    sng = F.strong.v(n)
    if sng != None:
        strongLex[sng][lex].add(n)
        lexStrong[lex][sng].add(n)
info('Done: {} lexemes and {} Strongs'.format(len(lexStrong), len(strongLex)))

  0.00s Gathering lexemes and Strongs
  2.38s Done: 8771 lexemes and 9300 Strongs


Now rank the lexemes by the number of Strongs they are associated with, and the Strongs by the number of lexemes they
are associated with.

In [16]:
lexRanked = sorted(lexStrong, key=lambda x: -len(lexStrong[x]))
sngRanked = sorted(strongLex, key=lambda x: -len(strongLex[x]))

Inspect the top 10 of both.

In [21]:
def inspectTop(dataRanked, data, amount):
    for d in dataRanked[0:amount]:
        print(d)
        related = data[d]
        for r in related:
            occs = sortNodes(related[r])
            print('\t{} ({} occs)'.format(r, len(occs)))
            n = occs[0]
            s = L.u(n, otype='sentence')[0]
            ws = L.d(s, otype='word')
            print('\t\te.g. {} {}:{} - {} in {}'.format(
                *T.sectionFromNode(n),
                F.g_word_utf8.v(n),
                T.text(ws, fmt='text-orig-full'),
            ))

In [22]:
inspectTop(lexRanked, lexStrong, 10)

BJT/
	1004 b (2046 occs)
		e.g. Genesis 6:14 - בַּ֥יִת in וְכָֽפַרְתָּ֥ אֹתָ֛הּ מִבַּ֥יִת וּמִח֖וּץ בַּכֹּֽפֶר׃ 
	1030+ (2 occs)
		e.g. 1_Samuel 6:14 - בֵּֽית in וְהָעֲגָלָ֡ה בָּ֠אָה אֶל־שְׂדֵ֨ה יְהֹושֻׁ֤עַ בֵּֽית־הַשִּׁמְשִׁי֙ 
	1022+ (4 occs)
		e.g. 1_Samuel 16:1 - בֵּֽית in אֶֽשְׁלָחֲךָ֙ אֶל־יִשַׁ֣י בֵּֽית־הַלַּחְמִ֔י 
	1023+ (1 occs)
		e.g. 2_Samuel 15:17 - בֵּ֥ית in וַיַּעַמְד֖וּ בֵּ֥ית הַמֶּרְחָֽק׃ 
	1038+ (2 occs)
		e.g. 2_Samuel 20:14 - בֵ֥ית in וַֽיַּעֲבֹ֞ר בְּכָל־שִׁבְטֵ֣י יִשְׂרָאֵ֗ל אָבֵ֛לָה וּבֵ֥ית מַעֲכָ֖ה 
	1017+ (1 occs)
		e.g. 1_Kings 16:34 - בֵּ֥ית in בְּיָמָ֞יו בָּנָ֥ה חִיאֵ֛ל בֵּ֥ית הָאֱלִ֖י אֶת־יְרִיחֹ֑ה 
	1006 (1 occs)
		e.g. Isaiah 15:2 - בַּ֧יִת in עָלָ֨ה הַבַּ֧יִת וְדִיבֹ֛ן הַבָּמֹ֖ות לְבֶ֑כִי 
	1053+ (1 occs)
		e.g. Jeremiah 43:13 - בֵּ֣ית in וְשִׁבַּ֗ר אֶֽת־מַצְּבֹות֙ בֵּ֣ית שֶׁ֔מֶשׁ אֲשֶׁ֖ר בְּאֶ֣רֶץ מִצְרָ֑יִם 
	1004 a (3 occs)
		e.g. Ezekiel 41:9 - בֵּ֥ית in רֹ֣חַב הַקִּ֧יר אֲֽשֶׁר־לַצֵּלָ֛ע אֶל־הַח֖וּץ חָמֵ֣שׁ אַמֹּ֑ות וַאֲשֶׁ֣ר מֻנָּ֔ח בֵּ֥ית צְלָעֹ֖ות א

In [23]:
inspectTop(sngRanked, strongLex, 10)

7227 a
	RB/ (422 occs)
		e.g. Genesis 6:5 - רַבָּ֛ה in וַיַּ֣רְא יְהוָ֔ה כִּ֥י רַבָּ֛ה רָעַ֥ת הָאָדָ֖ם בָּאָ֑רֶץ וְכָל־יֵ֨צֶר֙ מַחְשְׁבֹ֣ת לִבֹּ֔ו רַ֥ק רַ֖ע כָּל־הַיֹּֽום׃ 
	RBB[ (4 occs)
		e.g. Genesis 18:20 - רָ֑בָּה in זַעֲקַ֛ת סְדֹ֥ם וַעֲמֹרָ֖ה כִּי־רָ֑בָּה 
	RJB[ (2 occs)
		e.g. Deuteronomy 33:7 - רָ֣ב in יָדָיו֙ רָ֣ב לֹ֔ו 
	RB==/ (2 occs)
		e.g. Jeremiah 52:19 - רַב in וְאֶת־הַ֠סִּפִּים וְאֶת־הַמַּחְתֹּ֨ות וְאֶת־הַמִּזְרָקֹ֜ות וְאֶת־הַסִּירֹ֣ות וְאֶת־הַמְּנֹרֹ֗ות וְאֶת־הַכַּפֹּות֙ וְאֶת־הַמְּנַקִיֹ֔ות אֲשֶׁ֤ר זָהָב֙ זָהָ֔ב וַאֲשֶׁר־כֶּ֖סֶף כָּ֑סֶף לָקַ֖ח רַב־טַבָּחִֽים׃ 
	RB=/ (4 occs)
		e.g. Job 23:6 - רָב in הַבְּרָב־כֹּ֖חַ יָרִ֣יב עִמָּדִ֑י לֹ֥א 
2342 a
	XJL==[ (5 occs)
		e.g. Genesis 8:10 - יָּ֣חֶל in וַיָּ֣חֶל עֹ֔וד שִׁבְעַ֥ת יָמִ֖ים אֲחֵרִ֑ים 
	XJL[ (40 occs)
		e.g. Deuteronomy 2:25 - חָל֖וּ in הַיֹּ֣ום הַזֶּ֗ה אָחֵל֙ תֵּ֤ת פַּחְדְּךָ֙ וְיִרְאָ֣תְךָ֔ עַל־פְּנֵי֙ הָֽעַמִּ֔ים תַּ֖חַת כָּל־הַשָּׁמָ֑יִם אֲשֶׁ֤ר יִשְׁמְעוּן֙ שִׁמְעֲךָ֔ וְרָגְז֥וּ וְחָל֖וּ מִפָּנֶֽיךָ׃ 
	XWL[ (11