# Named Entities in the BHSA

For prelimanaries, such as installing Text-Fabric and using it, consult the
[start tutorial](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/bhsa/start.ipynb)

We show how to fetch person/place/people/measure names from the BHSA data

In [1]:
import os
from tf.app import use

In [2]:
A = use("bhsa", hoist=globals())

If you expand the triangle in front of BHSA above, you see which features have been loaded.

We need [nametype](https://etcbc.github.io/bhsa/features/nametype/) specifically.
It is a mapping from word numbers to types of proper names.

Here is a frequency distribution of its values:

In [3]:
F.nametype.freqList()

(('pers', 23184),
 ('topo', 7188),
 ('pers,gens,topo', 4498),
 ('ppde', 2512),
 ('pers,gens', 358),
 ('gens', 300),
 ('gens,topo', 110),
 ('mens', 30),
 ('pers,god', 4))

We query the measure names (`mens`):

In [4]:
query = """
word nametype=mens
"""

results = A.search(query)

  0.35s 20 results


In [5]:
A.table(results)

n,p,word
1,1_Kings 6:1,זִ֗ו
2,1_Kings 6:37,זִֽו׃
3,1_Kings 6:38,בּ֗וּל
4,Zechariah 1:7,שְׁבָ֔ט
5,Zechariah 7:1,כִסְלֵֽו׃
6,Esther 2:16,טֵבֵ֑ת
7,Esther 3:7,נִיסָ֔ן
8,Esther 3:7,אֲדָֽר׃ ס
9,Esther 3:13,אֲדָ֑ר
10,Esther 8:9,סִיוָ֗ן


The frequency list promised 30 results but we see only 20. That is because there are also other things that have a name type: lexemes:

In [6]:
queryL = """
lex nametype=mens
"""

resultsL = A.search(queryL)

  0.01s 10 results


In [7]:
A.table(resultsL)

n,p,lex
1,זִו,
2,בּוּל,
3,שְׁבָט,
4,כִּסְלֵו,
5,טֵבֵת,
6,נִיסָן,
7,אֲדָר,
8,סִיוָן,
9,אֲדָר,
10,אֱלוּל,


Let's make a data file of all words that have a name type.
We'll produce a tab-separated file with a bit of extra information.

In [15]:
query = """
word nametype gloss*
"""

results = A.search(query)

  0.66s 35569 results


In [18]:
A.table(results, end=10)

n,p,word
1,Genesis 2:4,יְהוָ֥ה
2,Genesis 2:5,יְהוָ֤ה
3,Genesis 2:7,יְהוָ֨ה
4,Genesis 2:8,יְהוָ֧ה
5,Genesis 2:8,עֵ֖דֶן
6,Genesis 2:9,יְהוָ֤ה
7,Genesis 2:10,עֵ֔דֶן
8,Genesis 2:11,פִּישֹׁ֑ון
9,Genesis 2:11,ה֣וּא
10,Genesis 2:11,חֲוִילָ֔ה


In [19]:
A.show(results, start=10000, end=10003)

In [20]:
A.export(results, toFile="namedEntities.tsv")

In [21]:
!head -n 20 ~/Downloads/namedEntities.tsv

��R 	 S 1 	 S 2 	 S 3 	 N O D E 1 	 T Y P E 1 	 T E X T 1 	 g l o s s 1 	 n a m e t y p e 1 
 1 	 G e n e s i s 	 2 	 4 	 7 4 0 	 w o r d 	 �������  	 Y H W H 	 p e r s 
 2 	 G e n e s i s 	 2 	 5 	 7 6 5 	 w o r d 	 �������  	 Y H W H 	 p e r s 
 3 	 G e n e s i s 	 2 	 7 	 7 9 3 	 w o r d 	 �������  	 Y H W H 	 p e r s 
 4 	 G e n e s i s 	 2 	 8 	 8 1 7 	 w o r d 	 �������  	 Y H W H 	 p e r s 
 5 	 G e n e s i s 	 2 	 8 	 8 2 1 	 w o r d 	 ������  	 E d e n 	 t o p o 
 6 	 G e n e s i s 	 2 	 9 	 8 3 4 	 w o r d 	 �������  	 Y H W H 	 p e r s 
 7 	 G e n e s i s 	 2 	 1 0 	 8 6 7 	 w o r d 	 ������  	 E d e n 	 t o p o 
 8 	 G e n e s i s 	 2 	 1 1 	 8 8 5 	 w o r d 	 ����������  	 P i s h o n 	 t o p o 
 9 	 G e n e s i s 	 2 	 1 1 	 8 8 6 	 w o r d 	 �����  	 h e 	 p p d e 
 1 0 	 G e n e s i s 	 2 	 1 1 	 8 9 3 	 w o r d 	 ���������  	 H a v i l a h 	 t o p o 
 1 1 	 G e n e s i s 	 2 	 1 2 	 9 0 3 	 w o r d 

Note that this file is in UTF16 with a byte order that is chosen such that the file opens without issue in Excel.

If you want to read the file by Python, it works like this:

In [23]:
filePath = os.path.expanduser("~/Downloads/namedEntities.tsv")

i = 0
limit = 20

with open(filePath, encoding="utf16") as fh:
    for line in fh:
        i += 1
        cells = line.rstrip("\n").split("\t")
        print(i, cells)
        if i > limit:
            break

1 ['R', 'S1', 'S2', 'S3', 'NODE1', 'TYPE1', 'TEXT1', 'gloss1', 'nametype1']
2 ['1', 'Genesis', '2', '4', '740', 'word', 'יְהוָ֥ה ', 'YHWH', 'pers']
3 ['2', 'Genesis', '2', '5', '765', 'word', 'יְהוָ֤ה ', 'YHWH', 'pers']
4 ['3', 'Genesis', '2', '7', '793', 'word', 'יְהוָ֨ה ', 'YHWH', 'pers']
5 ['4', 'Genesis', '2', '8', '817', 'word', 'יְהוָ֧ה ', 'YHWH', 'pers']
6 ['5', 'Genesis', '2', '8', '821', 'word', 'עֵ֖דֶן ', 'Eden', 'topo']
7 ['6', 'Genesis', '2', '9', '834', 'word', 'יְהוָ֤ה ', 'YHWH', 'pers']
8 ['7', 'Genesis', '2', '10', '867', 'word', 'עֵ֔דֶן ', 'Eden', 'topo']
9 ['8', 'Genesis', '2', '11', '885', 'word', 'פִּישֹׁ֑ון ', 'Pishon', 'topo']
10 ['9', 'Genesis', '2', '11', '886', 'word', 'ה֣וּא ', 'he', 'ppde']
11 ['10', 'Genesis', '2', '11', '893', 'word', 'חֲוִילָ֔ה ', 'Havilah', 'topo']
12 ['11', 'Genesis', '2', '12', '903', 'word', 'הִ֖וא ', 'she', 'ppde']
13 ['12', 'Genesis', '2', '13', '918', 'word', 'גִּיחֹ֑ון ', 'Gihon', 'topo']
14 ['13', 'Genesis', '2', '13', '919', 'wor

See also the documentation of the
[export function](https://annotation.github.io/text-fabric/tf/advanced/display.html#tf.advanced.display.export)

CC-BY Dirk Roorda