# The values of the nametype feature

The name type feature has values that are comma-separated strings of keywords.

We compute how many words have a non-empty value, and how often the keywords occur.

We also investigate which kinds of words have got values for this feature.

In [1]:
%load_ext autoreload
%autoreload 2

In [5]:
import collections

from tf.app import use

In [3]:
A = use("ETCBC/bhsa", hoist=globals())

**Locating corpus resources ...**

Name,# of nodes,# slots / node,% coverage
book,39,10938.21,100
chapter,929,459.19,100
lex,9230,46.22,100
verse,23213,18.38,100
half_verse,45179,9.44,100
sentence,63717,6.7,100
sentence_atom,64514,6.61,100
clause,88131,4.84,100
clause_atom,90704,4.7,100
phrase,253203,1.68,100


# Compute

How many words have nthe feature, and what is the frequency of the atomic values?

In [18]:
nameTypes = collections.Counter()
wordsWithNameType = 0

for w in F.otype.s("word"):
    nameTypeValue = F.nametype.v(w)

    if not nameTypeValue:
        continue

    wordsWithNameType += 1

    for atom in nameTypeValue.split(","):
        nameTypes[atom] += 1

How are the words with a nametype valye distributed over the parts-of-speech?

In [19]:
nameTypeByPos = collections.Counter()
noNameTypeByPos = collections.Counter()

for w in F.otype.s("word"):
    pos = F.sp.v(w)
    
    if F.nametype.v(w):
        nameTypeByPos[pos] += 1
    else:
        noNameTypeByPos[pos] += 1

# Show

In [20]:
print(f"{wordsWithNameType:>6} words have a name type")

 35506 words have a name type


In [21]:
for (atom, n) in sorted(nameTypes.items()):
    print(f"{n:>6} x {atom}")

  5080 x gens
     3 x god
    20 x mens
 26354 x pers
  2505 x ppde
 10916 x topo


In [22]:
for (pos, n) in sorted(nameTypeByPos.items()):
    m = noNameTypeByPos[pos]
    print(f"{pos:<5}: {n:>5}x with and {m:>5}x without name type")

nmpr : 33001x with and     1x without name type
prps :  2505x with and  2506x without name type


Let's find the one proper name without a name type:

In [26]:
results = A.search("""
word nametype# sp=nmpr gloss*
""")

  0.34s 1 result


In [27]:
A.show(results, condenseType="sentence")

Is it a coincidence that the number of words with `sp=prps` is twice the number of words with `nametype=ppde` plus one?

In [34]:
resultsN = A.search("""
word sp=prps nametype#
""")

  0.23s 2506 results


In [33]:
resultsY = A.search("""
word sp=prps nametype
""")

  0.14s 2505 results


In [35]:
results = A.search("""
word sp=prps nametype*
""")

  0.28s 5011 results


In [36]:
A.show(results, condenseType="sentence", end=10)

It seems that there is no systematical pattern in prps words with and without name type.

Let's make a list of consecutive stretches of prps words that either have or have not all a nametype.

In [40]:
stretches = []
prevNameType = None

for w in F.sp.s("prps"):
    nameType = not not F.nametype.v(w)

    if prevNameType is None:
        stretches.append([nameType, 1])
    elif prevNameType != nameType:
        stretches.append([nameType, 1])
    else:
        stretches[-1][-1] += 1

    prevNameType = nameType

len(stretches)

1791

In [44]:
for i in range(0, len(stretches), 2):
    (iKind, iN) = stretches[i]
    (i1Kind, i1N) = stretches[i + 1] if i + 1 < len(stretches) else (None, 0)
    iKindRep = "Y" if iKind else "N"
    i1KindRep = "Y" if i1Kind else "N"
    iNRep = f"{iN:>2}"
    i1NRep = f"{i1N:>2}"
    print(f"{iKindRep}{iNRep} - {i1KindRep}{i1NRep}")

Y11 - N 4
Y 1 - N 1
Y 1 - N 1
Y 1 - N 1
Y 2 - N 3
Y10 - N 5
Y 1 - N 1
Y 1 - N 1
Y 1 - N 4
Y 6 - N 3
Y 4 - N 4
Y10 - N 1
Y 1 - N 2
Y 2 - N 3
Y 1 - N 2
Y 1 - N 5
Y 5 - N 6
Y15 - N 2
Y 1 - N 1
Y 6 - N 4
Y 2 - N 2
Y 4 - N 3
Y 2 - N 1
Y 1 - N 1
Y 1 - N 9
Y 1 - N 2
Y 1 - N 1
Y 4 - N 1
Y 1 - N 2
Y 7 - N 4
Y 1 - N 7
Y 1 - N 2
Y 1 - N 1
Y 1 - N 1
Y 1 - N 4
Y 1 - N 1
Y 1 - N 2
Y 3 - N 2
Y 1 - N 4
Y 1 - N 3
Y 2 - N 4
Y 3 - N 3
Y 1 - N 4
Y 1 - N 2
Y 1 - N 1
Y 8 - N 1
Y 6 - N 2
Y 2 - N 1
Y11 - N 3
Y 2 - N 2
Y 6 - N 1
Y 1 - N 1
Y 2 - N 1
Y 2 - N 1
Y 4 - N 1
Y 1 - N 2
Y 1 - N 1
Y 9 - N 2
Y 3 - N 5
Y 1 - N 7
Y 2 - N 5
Y 2 - N 1
Y 1 - N 3
Y 1 - N 2
Y 4 - N 1
Y 1 - N 1
Y 1 - N 1
Y 1 - N 1
Y 1 - N 7
Y 2 - N 5
Y 4 - N 3
Y 1 - N 1
Y 6 - N 4
Y 3 - N 2
Y 1 - N 3
Y 1 - N 1
Y 6 - N 1
Y 1 - N 1
Y 1 - N 1
Y 1 - N 1
Y 1 - N 9
Y 2 - N 1
Y 2 - N 4
Y 5 - N 9
Y 3 - N 6
Y 1 - N 4
Y 3 - N 7
Y 2 - N 4
Y 1 - N 4
Y 3 - N 2
Y 3 - N 1
Y 2 - N 1
Y 3 - N 1
Y 1 - N 1
Y 2 - N 5
Y 2 - N 5
Y 7 - N 1
Y 1 - N 1
Y 1 - N 7
Y 2 - N 3


There are many 1-1 alternations of prps words, but also long stretches with or without nametypes.
So it really seems coincidental that there is just 1 prps word more without than with a nametype.