In [2]:
from tf.app import use
A = use('bhsa')

This is Text-Fabric 9.0.0
Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html

120 features found and 0 ignored


## אָתוֹן (female donkey)
Simple search for female donkey

**Note**: "/" here denotes a noun

In [3]:
results = A.search('''
word lex=>TWN/
''')
A.table(results)

  0.35s 44 results


n,p,word
1,Genesis 12:16,אֲתֹנֹ֖ת
2,Genesis 32:16,אֲתֹנֹ֣ת
3,Genesis 45:23,אֲתֹנֹ֡ת
4,Genesis 49:11,אֲתֹנֹ֑ו
5,Numbers 22:21,אֲתֹנֹ֑ו
6,Numbers 22:22,אֲתֹנֹ֔ו
7,Numbers 22:23,אָתֹון֩
8,Numbers 22:23,אָתֹון֙
9,Numbers 22:23,אָתֹ֔ון
10,Numbers 22:25,אָתֹ֜ון


Evidently, searching with Hebrew script is also possible:

In [4]:
results = A.search('''
word lex_utf8=אתון
''')
A.table(results)

  0.35s 44 results


n,p,word
1,Genesis 12:16,אֲתֹנֹ֖ת
2,Genesis 32:16,אֲתֹנֹ֣ת
3,Genesis 45:23,אֲתֹנֹ֡ת
4,Genesis 49:11,אֲתֹנֹ֑ו
5,Numbers 22:21,אֲתֹנֹ֑ו
6,Numbers 22:22,אֲתֹנֹ֔ו
7,Numbers 22:23,אָתֹון֩
8,Numbers 22:23,אָתֹון֙
9,Numbers 22:23,אָתֹ֔ון
10,Numbers 22:25,אָתֹ֜ון


## חֲמוֹר donkey

In [5]:
results = A.search('''
word lex=XMWR/
''')
A.table(results)

  0.34s 96 results


n,p,word
1,Genesis 12:16,חֲמֹרִ֔ים
2,Genesis 22:3,חֲמֹרֹ֔ו
3,Genesis 22:5,חֲמֹ֔ור
4,Genesis 24:35,חֲמֹרִֽים׃
5,Genesis 30:43,חֲמֹרִֽים׃
6,Genesis 32:6,חֲמֹ֔ור
7,Genesis 34:28,חֲמֹרֵיהֶּ֑ם
8,Genesis 36:24,חֲמֹרִ֖ים
9,Genesis 42:26,חֲמֹרֵיהֶ֑ם
10,Genesis 42:27,חֲמֹרֹ֖ו


# Disambigutation
See https://etcbc.github.io/bhsa/features/lex/.

So far I'm only aware of two disambiguation markers on the `lex` feature:
- `/` denotes a noun
- `[` denotes a verb

## Noun
E.g., search for noun מֶ֫לֶךְ "king" (five results only):

In [6]:
results = A.search('''
word lex=MLK/
''')
A.table(results[0:5])

  0.35s 2703 results


n,p,word
1,Genesis 14:1,מֶֽלֶךְ־
2,Genesis 14:1,מֶ֣לֶךְ
3,Genesis 14:1,מֶ֣לֶךְ
4,Genesis 14:1,מֶ֥לֶךְ
5,Genesis 14:2,מֶ֣לֶךְ


## Verb
E.g. search for verb מלך "reign as king" (five results only):

In [7]:
results = A.search('''
word lex=MLK[
''')
A.table(results[0:5])

  0.35s 347 results


n,p,word
1,Genesis 36:31,מָלְכ֖וּ
2,Genesis 36:31,מְלָךְ־
3,Genesis 36:32,יִּמְלֹ֣ךְ
4,Genesis 36:33,יִּמְלֹ֣ךְ
5,Genesis 36:34,יִּמְלֹ֣ךְ


## Other markers
I haven't found full documentation for the different markers that disambiguate word types. For starters, I'd like to know which ones exist.

Based on https://etcbc.github.io/bhsa/features/sp/, I'd expect there are around 14 different markers.

So I'll try generating a list based on all lexemes in the data set. This will take a few steps.

In [10]:
lexemes = A.search('''
lex
''')
print(len(lexemes))

  0.01s 9233 results
9233
<class 'tuple'>
(1437567,)


Are lexeme results tuples or integers?

In [None]:
print(type(lexemes[0]))
print(lexemes[0])

Ok, so they're tuples with a single integer (which is the node id).

### Accessing the API
According to the docs, we'll need the `NodeFeature` class to access nodes programatically.

See
- https://annotation.github.io/text-fabric/tf/cheatsheet.html#f-node-features
- https://annotation.github.io/text-fabric/tf/core/nodefeature.html#tf.core.nodefeature.NodeFeature

In [12]:
from tf.core.nodefeature import NodeFeature as F

Unfortunately, the documentation doesn't say how that class can be linked to or used with our data set.

The following can't work because `F` doesn't know about the BHSA data set `A`:

In [13]:
F.lex.v(lexemes[0])

AttributeError: type object 'NodeFeature' has no attribute 'lex'

After some digging, it seems that the TF api is replicated on `A`. So we actually need to access api classes via `A.TF.api`. See below.

In [15]:
for attr in dir(A.TF.api):
    if attr[0:1] != "_":
        print(attr)

AllComputeds
AllEdges
AllFeatures
C
Call
Computed
ComputedString
Cs
E
Eall
Edge
EdgeString
Es
F
Fall
Feature
FeatureString
Fs
L
Locality
N
Nodes
S
Search
T
TF
Text
ensureLoaded
ignored
isLoaded
makeAvailableIn


So forget about the earlier `import` statement. We need this instead:

In [16]:
F = A.TF.api.F

So let's try accessing the `lex` feature of a given lexeme node:

In [23]:
F.lex.v(lexemes[0])

That doesn't do anything. It turns out the parameter expects an integer, not a tuple; see https://annotation.github.io/text-fabric/tf/core/nodefeature.html#tf.core.nodefeature.NodeFeature.v.

(Frankly, the method should really return an error rather than fail silently when the wrong data type is passed.)

In [24]:
F.lex.v(lexemes[0][0])

'B'

### Creating a list of all markers
Now it should be easy to create a complete list of all lexemes (specifically of values for the feature `lex` on all lexeme nodes):

In [38]:
lexeme_list = [F.lex.v(l) for (l,) in lexemes]
print(len(lexeme_list))

9233


What does the actual data look like? Here's the first 20 lexemes:

In [37]:
lexeme_list[0:10]

['B', 'R>CJT/', 'BR>[', '>LHJM/', '>T', 'H', 'CMJM/', 'W', '>RY/', 'HJH[']

To get just the disambiguation markers, I'll compare the `lex` feature with the `lex0` feature, which strips disambiguation.

In [43]:
lex0_list = [F.lex0.v(l) for (l,) in lexemes]
print(len(lexeme_list))

AttributeError: 'NodeFeatures' object has no attribute 'lex0'

So `lex0` doesn't actually exist, even though it is documented here: https://etcbc.github.io/bhsa/features/lex0/

In that case I'll need to filter out the Hebrew transcription to get the disambiguation markers by themselves.

In [47]:
letters = set('>BGDHWZXVJKLMNS<PYQRFCT')
unique_markers = set()
for lexeme in lexeme_list:
    marker = "".join([char for char in lexeme if char not in letters])
    unique_markers.add(marker)
print(len(unique_markers))
print(unique_markers)

20


Tada!!

### What do they mean?
Now we need some examples to figure out what each marker means.

for marker in unique_markers:
    regex = '[>BGDHWZXVJKLMNS<PYQRFCT]*' + marker
    query = "word lex~" + regex
    results = A.search(query, limit=10)
    A.table(results)

In [53]:
for marker in unique_markers:
    print("Disambiguation marker: `" + marker + "` ====================")
    regex = '[>BGDHWZXVJKLMNS<PYQRFCT]*'.join('[' + char + ']' for char in marker)
    query = "word lex~" + regex
    results = A.search(query)
    A.table(results[0:10])

  0.71s 426584 results


n,p,word
1,Genesis 1:1,בְּ
2,Genesis 1:1,רֵאשִׁ֖ית
3,Genesis 1:1,בָּרָ֣א
4,Genesis 1:1,אֱלֹהִ֑ים
5,Genesis 1:1,אֵ֥ת
6,Genesis 1:1,הַ
7,Genesis 1:1,שָּׁמַ֖יִם
8,Genesis 1:1,וְ
9,Genesis 1:1,אֵ֥ת
10,Genesis 1:1,הָ




  featVals = re.compile(valRe)


  0.43s 73710 results


n,p,word
1,Genesis 1:1,בָּרָ֣א
2,Genesis 1:2,הָיְתָ֥ה
3,Genesis 1:2,מְרַחֶ֖פֶת
4,Genesis 1:3,יֹּ֥אמֶר
5,Genesis 1:3,יְהִ֣י
6,Genesis 1:3,יְהִי־
7,Genesis 1:4,יַּ֧רְא
8,Genesis 1:4,טֹ֑וב
9,Genesis 1:4,יַּבְדֵּ֣ל
10,Genesis 1:5,יִּקְרָ֨א


  0.37s 46 results


n,p,word
1,Joshua 10:38,דְּבִ֑רָה
2,Joshua 10:39,דְבִ֨רָה֙
3,Joshua 11:21,דְּבִ֣ר
4,Joshua 12:13,דְּבִר֙
5,Joshua 15:7,דְּבִרָה֮
6,Joshua 15:15,דְּבִ֑ר
7,Joshua 15:15,דְּבִ֥ר
8,Joshua 15:26,שְׁמַ֖ע
9,Joshua 15:49,דְבִֽר׃
10,Joshua 19:35,צֵ֔ר


  0.37s 2 results


n,p,word
1,1_Kings 10:15,עֶ֖רֶב
2,Jeremiah 25:24,עֶ֔רֶב


  0.39s 13039 results


n,p,word
1,Genesis 1:5,בֹ֖קֶר
2,Genesis 1:8,בֹ֖קֶר
3,Genesis 1:13,בֹ֖קֶר
4,Genesis 1:19,בֹ֖קֶר
5,Genesis 1:23,בֹ֖קֶר
6,Genesis 1:27,זָכָ֥ר
7,Genesis 1:31,בֹ֖קֶר
8,Genesis 2:8,עֵ֖דֶן
9,Genesis 2:10,עֵ֔דֶן
10,Genesis 2:15,עֵ֔דֶן


  0.38s 682 results


n,p,word
1,Genesis 4:22,תּ֣וּבַל קַ֔יִן
2,Genesis 4:22,תּֽוּבַל־קַ֖יִן
3,Genesis 10:11,רְחֹבֹ֥ת עִ֖יר
4,Genesis 12:8,בֵֽית־אֵ֖ל
5,Genesis 12:8,בֵּֽית־אֵ֤ל
6,Genesis 13:3,בֵּֽית־אֵ֑ל
7,Genesis 13:3,בֵּֽית־אֵ֖ל
8,Genesis 14:5,עַשְׁתְּרֹ֣ת קַרְנַ֔יִם
9,Genesis 14:6,אֵ֣יל פָּארָ֔ן
10,Genesis 14:7,עֵ֤ין מִשְׁפָּט֙


  0.38s 617 results


n,p,word
1,Genesis 4:19,עָדָ֔ה
2,Genesis 4:20,עָדָ֖ה
3,Genesis 4:23,עָדָ֤ה
4,Genesis 4:25,אָדָ֥ם
5,Genesis 5:1,אָדָ֑ם
6,Genesis 5:3,אָדָ֗ם
7,Genesis 5:4,אָדָ֗ם
8,Genesis 5:5,אָדָם֙
9,Genesis 5:32,חָ֥ם
10,Genesis 6:10,חָ֥ם




  featVals = re.compile(valRe)


  0.38s 1061 results


n,p,word
1,Genesis 1:9,יִקָּו֨וּ
2,Genesis 2:21,יִּישָׁ֑ן
3,Genesis 3:13,הִשִּׁיאַ֖נִי
4,Genesis 3:15,תְּשׁוּפֶ֥נּוּ
5,Genesis 4:1,קָנִ֥יתִי
6,Genesis 4:22,חֹרֵ֥שׁ
7,Genesis 8:10,יָּ֣חֶל
8,Genesis 9:27,יַ֤פְתְּ
9,Genesis 11:3,נִלְבְּנָ֣ה
10,Genesis 11:6,יִבָּצֵ֣ר


  0.38s 4 results


n,p,word
1,1_Kings 10:15,עֶ֖רֶב
2,Jeremiah 25:24,עֶ֔רֶב
3,Proverbs 20:17,עָרֵ֣ב
4,Song_of_songs 2:14,עָרֵ֖ב


  0.38s 3226 results


n,p,word
1,Genesis 4:19,עָדָ֔ה
2,Genesis 4:20,עָדָ֖ה
3,Genesis 4:21,יוּבָ֑ל
4,Genesis 4:23,עָדָ֤ה
5,Genesis 4:25,אָדָ֥ם
6,Genesis 4:25,שֵׁ֑ת
7,Genesis 4:26,שֵׁ֤ת
8,Genesis 4:26,אֱנֹ֑ושׁ
9,Genesis 5:1,אָדָ֑ם
10,Genesis 5:3,אָדָ֗ם


  0.51s 164028 results


n,p,word
1,Genesis 1:1,רֵאשִׁ֖ית
2,Genesis 1:1,אֱלֹהִ֑ים
3,Genesis 1:1,שָּׁמַ֖יִם
4,Genesis 1:1,אָֽרֶץ׃
5,Genesis 1:2,אָ֗רֶץ
6,Genesis 1:2,תֹ֨הוּ֙
7,Genesis 1:2,בֹ֔הוּ
8,Genesis 1:2,חֹ֖שֶׁךְ
9,Genesis 1:2,פְּנֵ֣י
10,Genesis 1:2,תְהֹ֑ום


  0.37s 2 results


n,p,word
1,Isaiah 8:1,מַהֵ֥ר שָׁלָ֖ל חָ֥שׁ בַּֽז׃
2,Isaiah 8:3,מַהֵ֥ר שָׁלָ֖ל חָ֥שׁ בַּֽז׃


  0.39s 16660 results


n,p,word
1,Genesis 1:5,בֹ֖קֶר
2,Genesis 1:8,בֹ֖קֶר
3,Genesis 1:9,יִקָּו֨וּ
4,Genesis 1:13,בֹ֖קֶר
5,Genesis 1:19,בֹ֖קֶר
6,Genesis 1:23,בֹ֖קֶר
7,Genesis 1:27,זָכָ֥ר
8,Genesis 1:31,בֹ֖קֶר
9,Genesis 2:8,עֵ֖דֶן
10,Genesis 2:10,עֵ֔דֶן


  0.38s 639 results


n,p,word
1,Genesis 4:19,עָדָ֔ה
2,Genesis 4:20,עָדָ֖ה
3,Genesis 4:23,עָדָ֤ה
4,Genesis 4:25,אָדָ֥ם
5,Genesis 5:1,אָדָ֑ם
6,Genesis 5:3,אָדָ֗ם
7,Genesis 5:4,אָדָ֗ם
8,Genesis 5:5,אָדָם֙
9,Genesis 5:32,חָ֥ם
10,Genesis 6:10,חָ֥ם


  0.37s 15 results


n,p,word
1,Judges 7:25,עֹרֵ֣ב
2,Judges 7:25,עֹורֵ֤ב
3,Judges 7:25,עֹורֵב֙
4,Judges 7:25,עֹרֵ֣ב
5,Judges 8:3,עֹרֵ֣ב
6,1_Kings 10:15,עֶ֖רֶב
7,Isaiah 10:26,עֹורֵ֑ב
8,Jeremiah 25:24,עֶ֔רֶב
9,Psalms 83:12,עֹרֵ֣ב
10,Job 36:16,רַ֭חַב


  0.38s 201 results


n,p,word
1,Genesis 14:6,שֵׂעִ֑יר
2,Genesis 32:4,שֵׂעִ֖יר
3,Genesis 33:14,שֵׂעִֽירָה׃
4,Genesis 33:16,שֵׂעִֽירָה׃
5,Genesis 36:8,שֵׂעִ֔יר
6,Genesis 36:9,שֵׂעִֽיר׃
7,Genesis 36:20,שֵׂעִיר֙
8,Genesis 36:21,שֵׂעִ֖יר
9,Genesis 36:30,שֵׂעִֽיר׃ פ
10,Genesis 46:24,שִׁלֵּֽם׃


  0.38s 11 results


n,p,word
1,Deuteronomy 10:6,בְּאֵרֹ֥ת בְּנֵי־יַעֲקָ֖ן
2,Joshua 13:17,בֵ֖ית בַּ֥עַל מְעֹֽון׃
3,1_Kings 15:20,אָבֵ֣ל בֵּֽית־מַעֲכָ֑ה
4,2_Kings 10:12,בֵּֽית־עֵ֥קֶד הָרֹעִ֖ים
5,2_Kings 15:29,אָבֵ֣ל בֵּֽית־מַעֲכָ֡ה
6,Isaiah 8:1,מַהֵ֥ר שָׁלָ֖ל חָ֥שׁ בַּֽז׃
7,Isaiah 8:3,מַהֵ֥ר שָׁלָ֖ל חָ֥שׁ בַּֽז׃
8,Jeremiah 39:3,נֵרְגַ֣ל שַׂר־֠אֶצֶר
9,Jeremiah 39:3,נֵרְגַ֤ל שַׂר־אֶ֨צֶר֙
10,Jeremiah 39:13,נֵרְגַ֥ל שַׂר־אֶ֖צֶר




  featVals = re.compile(valRe)


  0.38s 95 results


n,p,word
1,Genesis 8:10,יָּ֣חֶל
2,Genesis 24:21,מִשְׁתָּאֵ֖ה
3,Exodus 15:21,תַּ֥עַן
4,Exodus 32:4,יָּ֤צַר
5,Exodus 32:18,עֲנֹ֣ות
6,Exodus 32:18,עֲנֹ֣ות
7,Exodus 32:18,עַנֹּ֔ות
8,Numbers 21:17,עֱנוּ־
9,Numbers 22:3,יָּ֨גָר
10,Deuteronomy 1:17,תָג֨וּרוּ֙




  featVals = re.compile(valRe)


  0.38s 13 results


n,p,word
1,Jeremiah 6:20,עָ֥רְבוּ
2,Jeremiah 31:26,עָ֥רְבָה
3,Ezekiel 16:37,עָרַ֣בְתְּ
4,Hosea 6:3,יֹ֥ורֶה
5,Hosea 9:4,יֶֽעֶרְבוּ־
6,Hosea 10:12,יֹרֶ֥ה
7,Malachi 3:4,עָֽרְבָה֙
8,Psalms 104:34,יֶעֱרַ֣ב
9,Proverbs 3:24,עָרְבָ֥ה
10,Proverbs 13:19,תֶעֱרַ֣ב


  0.39s 4258 results


n,p,word
1,Genesis 4:1,אֶת־
2,Genesis 4:19,עָדָ֔ה
3,Genesis 4:20,עָדָ֖ה
4,Genesis 4:21,יוּבָ֑ל
5,Genesis 4:23,עָדָ֤ה
6,Genesis 4:25,אָדָ֥ם
7,Genesis 4:25,שֵׁ֑ת
8,Genesis 4:26,שֵׁ֤ת
9,Genesis 4:26,אֱנֹ֑ושׁ
10,Genesis 5:1,אָדָ֑ם


That worked, but it hasn't really clarified what the disambiguation markers mean.

## So what?
So I'll ignore the disambiguation markers for the time being and use the more transparent `sp` feature to specify part of speech. Because the feature `lex0` does not seem to work, you might need to use `lex` with a regular expression if you don't want to include the disambiguation markers.

Alternatively, just avoid transcription altogether and specify the lexeme in unicode. E.g.: 
- `word lex_utf8=מלך`
- `word lex_utf8=מלך sp=verb`
- `word lex_utf8=מלך sp=subs`

See further:
- https://annotation.github.io/text-fabric/tf/about/searchusage.html#value-specifications
- https://etcbc.github.io/bhsa/features/sp/
- https://etcbc.github.io/bhsa/features/lex/
- https://etcbc.github.io/bhsa/features/lex0/
- https://etcbc.github.io/bhsa/features/lex_utf8/