# Searching for Variant Part of Speech Uses

Which words are most heavily split between two part of speech uses?

In this notebook, I make a count of a lexeme's "phrase-dependent part of speech" feature from the ETCBC's [BHSA](https://etcbc.github.io/bhsa/features/0_home/). This feature is phrase-dependent because the part of speech is changed to reflect its funtion in context. This query provides useful examples for the paper of contextual shifts in parts of speech.

In [76]:
from tf.app import use
from IPython.display import display
import pandas as pd
import collections
pd.set_option('display.max_rows', 100)
A = use('bhsa', hoist=globals())

TF app is up-to-date.
Using annotation/app-bhsa commit d3cf8f0c2ab5d690a0fda14ea31c33da5c5c8483 (=latest)
  in /Users/cody/text-fabric-data/__apps__/bhsa.
Using etcbc/bhsa/tf - c rv1.6 in /Users/cody/text-fabric-data
Using etcbc/phono/tf - c r1.2 in /Users/cody/text-fabric-data
Using etcbc/parallels/tf - c r1.2 in /Users/cody/text-fabric-data


## Count Strategy

The strategy is simple: for every lexeme, count how many uses of each part of speech value is attested in the text. Then divide those values by the sum total of all lexeme occurrences to obtain a percentage of use per part of speech. The data is arranged in a table: columns are parts of speech, rows are lexemes.

The lexemes are written in [etcbc transcription](https://annotation.github.io/text-fabric/Writing/Hebrew.html). I attach the lexical part of speech value to the lexeme, separated by a dot.

In [113]:
lex2pdpcount = collections.defaultdict(lambda: collections.Counter())
for lex in F.otype.s('lex'):
    token = f'{F.lex.v(lex)}.{F.sp.v(lex)}'
    for use in L.d(lex,'word'):
        lex2pdpcount[token][F.pdp.v(use)] += 1

lexpdpcount = pd.DataFrame.from_dict(lex2pdpcount, orient='index').fillna(0)
lexpdpprop = lexpdpcount.divide(lexpdpcount.sum(1), axis=0)
lexpdpprop['max_prop'] = lexpdpprop.max(1)
without_verb = lexpdpprop.drop(lexpdpprop[lexpdpprop.index.str.contains('verb')].index)

### Specific Examples for Paper

NB "max_prop" == maximum proportion in the dataset. The decimal numbers represent proportions/pecentages (i.e. 0.8 = 80%). The meaning of the abbreviated part of speech values can be found [here](https://etcbc.github.io/bhsa/features/pdp/).

#### כול

In [118]:
lexpdpprop.loc['KL/.subs'].sort_values(ascending=False)
# but see HALOT for adjectival uses

max_prop    1.0
subs        1.0
inrg        0.0
prin        0.0
prps        0.0
nega        0.0
nmpr        0.0
prde        0.0
intj        0.0
advb        0.0
art         0.0
adjv        0.0
verb        0.0
conj        0.0
prep        0.0
Name: KL/.subs, dtype: float64

#### יותר

In [120]:
lexpdpprop.loc['JWTR/.subs'].sort_values(ascending=False)

max_prop    0.555556
subs        0.555556
advb        0.333333
prep        0.111111
inrg        0.000000
prin        0.000000
prps        0.000000
nega        0.000000
nmpr        0.000000
prde        0.000000
intj        0.000000
art         0.000000
adjv        0.000000
verb        0.000000
conj        0.000000
Name: JWTR/.subs, dtype: float64

#### קדמה

In [121]:
lexpdpprop.loc['QDMH/.subs'].sort_values(ascending=False)

max_prop    0.666667
prep        0.666667
subs        0.333333
inrg        0.000000
prin        0.000000
prps        0.000000
nega        0.000000
nmpr        0.000000
prde        0.000000
intj        0.000000
advb        0.000000
art         0.000000
adjv        0.000000
verb        0.000000
conj        0.000000
Name: QDMH/.subs, dtype: float64

#### טוב

In [122]:
lexpdpprop.loc['VWB/.adjv'].sort_values(ascending=False)

max_prop    0.682303
adjv        0.682303
subs        0.313433
advb        0.004264
inrg        0.000000
prin        0.000000
prps        0.000000
nega        0.000000
nmpr        0.000000
prde        0.000000
intj        0.000000
art         0.000000
verb        0.000000
conj        0.000000
prep        0.000000
Name: VWB/.adjv, dtype: float64

#### קטן

In [123]:
lexpdpprop.loc['QVN/.adjv'].sort_values(ascending=False)

max_prop    0.574074
adjv        0.574074
subs        0.425926
inrg        0.000000
prin        0.000000
prps        0.000000
nega        0.000000
nmpr        0.000000
prde        0.000000
intj        0.000000
advb        0.000000
art         0.000000
verb        0.000000
conj        0.000000
prep        0.000000
Name: QVN/.adjv, dtype: float64

#### קל

In [114]:
lexpdpprop.loc['QL/.adjv'].sort_values(ascending=False)

max_prop    0.461538
adjv        0.461538
subs        0.384615
advb        0.153846
inrg        0.000000
prin        0.000000
prps        0.000000
nega        0.000000
nmpr        0.000000
prde        0.000000
intj        0.000000
art         0.000000
verb        0.000000
conj        0.000000
prep        0.000000
Name: QL/.adjv, dtype: float64

## Full Tables

The tables below were used to find interesting examples.

In [80]:
without_verb.sort_values(by='max_prop').head(100)

Unnamed: 0,prep,conj,subs,verb,adjv,art,advb,intj,prde,nmpr,nega,prps,prin,inrg,max_prop
<QB/.subs,0.4,0.2,0.266667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4
KN/.adjv,0.0,0.0,0.307692,0.0,0.423077,0.0,0.269231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.423077
QL/.adjv,0.0,0.0,0.384615,0.0,0.461538,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.461538
YX/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
BRWR/.adjv,0.0,0.0,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
M<DNT/.subs,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
XSN/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
>LM=/.subs,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
NWH==/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
MTQ=/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5


In [82]:
# show nouns

lexpdpprop[lexpdpprop.index.str.contains('subs')].sort_values(by='max_prop').head(100)

Unnamed: 0,prep,conj,subs,verb,adjv,art,advb,intj,prde,nmpr,nega,prps,prin,inrg,max_prop
<QB/.subs,0.4,0.2,0.266667,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4
>LM=/.subs,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
M<DNT/.subs,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
BDD/.subs,0.0,0.0,0.454545,0.0,0.0,0.0,0.545455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.545455
JWTR/.subs,0.111111,0.0,0.555556,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.555556
PT</.subs,0.0,0.0,0.571429,0.0,0.0,0.0,0.428571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.571429
NGD/.subs,0.578947,0.0,0.421053,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.578947
TXWT/.subs,0.4,0.0,0.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.6
>PS/.subs,0.0,0.0,0.604651,0.0,0.0,0.0,0.162791,0.0,0.0,0.0,0.232558,0.0,0.0,0.0,0.604651
SBJB/.subs,0.0,0.0,0.380952,0.0,0.0,0.0,0.619048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.619048


In [83]:
# show adjv

lexpdpprop[lexpdpprop.index.str.contains('adjv')].sort_values(by='max_prop').head(100)

Unnamed: 0,prep,conj,subs,verb,adjv,art,advb,intj,prde,nmpr,nega,prps,prin,inrg,max_prop
KN/.adjv,0.0,0.0,0.307692,0.0,0.423077,0.0,0.269231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.423077
QL/.adjv,0.0,0.0,0.384615,0.0,0.461538,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.461538
QRJ>/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
>BL===/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
BRWR/.adjv,0.0,0.0,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
<VWP/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
NWH==/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
NKX/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
<RWM/.adjv,0.0,0.0,0.1875,0.0,0.5,0.0,0.3125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
TCJ<J/.adjv,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5


## The Curious Cases of בן and פה

These two terms shift their function/meaning based on specialized contexts. Those contexts are visualized herein.

### בן

Anywhere בן occurs in a phrase with שׁנה it functions idiomatically to express age. There are always numbers present.

In [125]:
A.show(A.search('''

phrase
    word lex=BN/
    word lex=CNH/

'''), condenseType='phrase')

  0.86s 248 results


### פה 

When this term is used in the construct it functions as prepositional.

See especially Ex 28:32, where פה is even used in construct to ראשׁ prepositionally.

In [129]:
A.show(A.search('''

word lex=PH/ st=c
<: word

'''), condenseType='clause')

  0.51s 212 results


## Syntactic Connotations 

### The על Adversative Construction

In [None]:
A.show(A.search('''



'''))