<img align="right" src="tf-small.png"/>

# Strong numbers

Stephen Ku has prepared a Strong number mapping for version 4, based on 
[OpenScriptures Bible Lexicon](https://github.com/openscriptures/HebrewLexicon).

Using the 
[maps](https://github.com/ETCBC/text-fabric/blob/master/Versions/etcbc-versions.ipynb)
between the slots of versions 4, 4b and 4c,
we add the Strong numbers to versions 4b and 4c
as well.

In [18]:
import os,collections
from tf.fabric import Fabric

We need a map from a version to its previous version.

In [19]:
versions = ['4', '4b', '4c']
locations = {
    '4': '~/github/text-fabric-data-legacy',
    '4b': '~/github/text-fabric-data-legacy',
    '4c': '~/github/text-fabric-data', 
}

preVersion = dict(((v, versions[i]) for (i,v) in enumerate(versions[1:])))
preVersion

{'4b': '4', '4c': '4b'}

Load all versions in one go!
For each version we load the `omap` feature that maps the slots from the previous version to the slots of this version.

In [20]:
TF = {}
api = {}
for v in versions:
    omap = '' if v == '4' else 'omap@{}-{}'.format(preVersion[v], v)
    TF[v] = Fabric(locations=locations[v], modules='hebrew/etcbc{}'.format(v))
    api[v] = TF[v].load('''
        {} lex
    '''.format(omap))

A4 = api['4']
A4b = api['4b']
A4c = api['4c']

This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
110 features found and 0 ignored
  0.00s loading features ...
   |     0.16s B lex                  from /Users/dirk/github/text-fabric-data-legacy/hebrew/etcbc4
   |     0.00s Feature overview: 105 nodes; 4 edges; 1 configs; 7 computeds
  5.55s All features loaded/computed - for details use loadLog()
This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs   

# Strong numbers

Let us apply the maps for the purpose of assigning Strong numbers to the words of the versions 4b and 4c.
We have a mapping for 4, compiled as a csv file by Stephen Ku from the OpenScriptures data.

First we perform a basic check on the Strong numbers as provided for version 4.

In [21]:
STRONG = 'hebrew/strong'
strongDir = '{}/{}'.format(os.path.expanduser(locations['4c']), STRONG)
strongFile = '{}/{}'.format(strongDir, 'MonadStrong.csv')
strongs = {}

In [22]:
strongs['4'] = {}
first = True
with open(strongFile, encoding='utf-16') as fh:
    for line in fh:
        if first:
            first = False
            continue
        (slot, strong) = line.rstrip().split(',', 1)
        strongs['4'][int(slot)] = strong

## Consistency check

Do slots with the same lexemes get identical Strong numbers?

In [23]:
def checkConsistency(v):
    strongFromLex = collections.defaultdict(set)
    lexFromStrong = collections.defaultdict(set)

    for n in api[v].F.otype.s('word'):
        if n in strongs[v]:
            strongFromLex[api[v].F.lex.v(n)].add(strongs[v][n])
            lexFromStrong[strongs[v][n]].add(api[v].F.lex.v(n))


    multipleStrongs = set()
    for (lx, strongset) in strongFromLex.items():
        if len(strongset) > 1:
            multipleStrongs.add(lx)

    multipleLexs = set()
    for (st, lexset) in lexFromStrong.items():
        if len(lexset) > 1:
            multipleLexs.add(lx)

    print('{} lexemes with multiple Strong numbers'.format(len(multipleStrongs)))
    print('{} Strong numbers with multiple lexemes'.format(len(multipleStrongs)))
    for lx in sorted(multipleStrongs)[0:10]:
        print('{}: {}'.format(lx, ', '.join(sorted(strongFromLex[lx]))))

In [24]:
checkConsistency('4')

1226 lexemes with multiple Strong numbers
1226 Strong numbers with multiple lexemes
<BD/: 5649, 5650
<BD[: 5647, 5648
<BD_NGW/: 5665, 5838
<BJ/: 5645, 5672
<BR/: 5675, 5676
<CQ[: 6217, 6231
<CT[: 6245 b, 6246
<D: 5703, 5704, 5705
<D/: 5703, 5704
<DH[: 5709, 5710 b


Obviously not. The ETCBC lexemes and the Strong numbers are different classification systems for word occurrences in the Bible!

# Map the Strong numbers

In [25]:
strongs['4b'] = {}
for (n, s) in strongs['4'].items():
    for m in A4b.Es('omap@4-4b').f(n):
        strongs['4b'][m] = s

In [26]:
strongs['4c'] = {}
for (n, s) in strongs['4b'].items():
    for m in A4c.Es('omap@4b-4c').f(n):
        strongs['4c'][m] = s

# Check consistency again

Now in the new versions.

In [27]:
checkConsistency('4b')

1219 lexemes with multiple Strong numbers
1219 Strong numbers with multiple lexemes
<BD/: 5649, 5650
<BD[: 5647, 5648
<BD_NGW/: 5665, 5838
<BJ/: 5645, 5672
<BR/: 5675, 5676
<CQ[: 6217, 6231
<CT[: 6245 b, 6246
<D: 5703, 5704, 5705
<D/: 5703, 5704
<DH[: 5709, 5710 b


In [28]:
checkConsistency('4c')

1219 lexemes with multiple Strong numbers
1219 Strong numbers with multiple lexemes
<BD/: 5649, 5650
<BD[: 5647, 5648
<BD_NGW/: 5665, 5838
<BJ/: 5645, 5672
<BR/: 5675, 5676
<CQ[: 6217, 6231
<CT[: 6245 b, 6246
<D: 5703, 5704, 5705
<D/: 5703, 5704
<DH[: 5709, 5710 b


That looks good.

# Writing the Strong numbers

In [29]:
nodeFeatures = {}
provenance = dict(
    source='Strong numbers provided by https://github.com/openscriptures/HebrewLexicon',
    author='Compiled for ETCBC by Stephen Ku; transferred across versions by Dirk Roorda',
)

for v in versions:
    metaData = {
        '': provenance,
        'otext@strong': {
            'about': 'Provides Strong numbers to Hebrew Words',
            'see': 'https://github.com/ETCBC/text-fabric/blob/master/Versions/strong.ipynb',
            'fmt:lex-strong-plain': '{strong} ',
        },
        'strong': {
            'valueType': 'str',
        },
    }
    nodeFeatures = dict(strong=strongs[v])
    TF[v].save(
        module='hebrew/strong/{}'.format(v),
        nodeFeatures=nodeFeatures,
        metaData=metaData,
    )

  0.00s Exporting 1 node and 0 edge and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4:
   |     0.73s T strong               to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4
   |     0.00s M otext@strong         to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4
  0.74s Exported 1 node features and 0 edge features and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4
  0.00s Exporting 1 node and 0 edge and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b:
   |     0.72s T strong               to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b
   |     0.00s M otext@strong         to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b
  0.72s Exported 1 node features and 0 edge features and 1 config features to /Users/dirk/github/text-fabric-data-legacy/hebrew/strong/4b
  0.00s Exporting 1 node and 0 edge and 1 config features to /Users/dirk/github/

# Using Strong numbers

Let us load the new `strong` feature in the newest ETCBC version, `4c`.

In [30]:
TF = Fabric(modules=['hebrew/etcbc4c', 'hebrew/strong/4c'])
api = TF.load('''
        lex strong
''')
api.makeAvailableIn(globals())

This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
110 features found and 0 ignored
  0.00s loading features ...
   |     1.94s T strong               from /Users/dirk/github/text-fabric-data/hebrew/strong/4c
   |     0.16s B lex                  from /Users/dirk/github/text-fabric-data/hebrew/etcbc4c
   |     0.00s Feature overview: 103 nodes; 5 edges; 2 configs; 7 computeds
  7.75s All features loaded/computed - for details use loadLog()


We print a few verses of Genesis in lexeme and in strong representation.
The module `strong` defines a new text format!

In [31]:
(book, chapter) = ('Genesis', 1)

for verse in range(1,4):
    vn = T.nodeFromSection((book, chapter, verse))
    words = L.d(vn, otype='word')
    for fmt in ('lex-trans-plain', 'lex-strong-plain'):
        print('{} {}:{} ({})\n\t{}'.format(
            book, chapter, verse, fmt,
            T.text(words, fmt=fmt)
        ))

Genesis 1:1 (lex-trans-plain)
	B R>CJT BR> >LHJM >T H CMJM W >T H >RY 
Genesis 1:1 (lex-strong-plain)
	8675 7225 1254 a 430 853 8676 8064 8678 853 8676 776 
Genesis 1:2 (lex-trans-plain)
	W H >RY HJH THW W BHW W XCK <L PNH THWM W RWX >LHJM RXP <L PNH H MJM 
Genesis 1:2 (lex-strong-plain)
	8678 8676 776 1961 8414 8678 922 8678 2822 5921 a 6440 8415 8678 7307 430 7363 b 5921 a 6440 8676 4325 
Genesis 1:3 (lex-trans-plain)
	W >MR >LHJM HJH >WR W HJH >WR 
Genesis 1:3 (lex-strong-plain)
	8678 559 430 1961 216 8678 1961 216 
