# Logical structures

The content of this notebook is to compute logical strucures of Biblical Hebrew verbs according to the Role and Reference Grammar definitions.

A basic requirement is to distinguish between stative and active verbs as well as causative and non-causative verbs. This issue has been dealt with in another notebook. Another requirement is to distinguish between intransitive and transitive verbs, a task that is not easy for Biblical Hebrew because the object of a verb is not obligatory. It may be realized as an independent lexeme or as a suffix, but it may also only be inferred from the context.

The strategy applied here is to search the full corpus (Genesis-Kings) for all instances of each verb in the *qal* stem. If the verb never occurs with an object, it is regarded as intransitive, but if it occurs one or several times with an object, it is considered transitive

In [180]:
import collections
import pandas as pd
import numpy as np
from tf.app import use

In [2]:
A = use('bhsa', hoist=globals())

TF app is up-to-date.
Using annotation/app-bhsa commit 35e4fee27a1cd6f0a5caea9118129349ddb0604e (=latest)
  in C:\Users\Ejer/text-fabric-data/__apps__/bhsa.
Using etcbc/bhsa/tf - c r1.4 in C:\Users\Ejer/text-fabric-data
Using etcbc/phono/tf - c r1.1 in C:\Users\Ejer/text-fabric-data
Using etcbc/parallels/tf - c r1.1 in C:\Users\Ejer/text-fabric-data


## Preparing the corpus

In [3]:
# prepare the corpus

corpus = [book for book in F.otype.s('book') if book < T.nodeFromSection(('Isaiah',))]
print('Corpus:\n')
print('\n'.join(T.sectionFromNode(book)[0] for book in corpus))

Corpus:

Genesis
Exodus
Leviticus
Numbers
Deuteronomy
Joshua
Judges
1_Samuel
2_Samuel
1_Kings
2_Kings


In [4]:
sets={'corpus':corpus} # make set for searching

## Data extraction

The verbs are extracted according to the transitivity frames in which they occur. Here, only intransitive and transitive frames are of importance, hence, ditransitive frames are not specified and will be included in single transitive frames.

In [111]:
#Intransitive frame without object. Subject may be implicit or explicit.
intransitive = '''
corpus
 clause
 /without/
    phrase function=Objc
 /-/
   phrase function=Pred|PreS
     word lex#HJH[ sp=verb vs=qal language=Hebrew
'''

#Transitive frame with object suffix.
transitive1 = '''
corpus
 clause
   phrase function=PreO
     word lex#HJH[ sp=verb vs=qal language=Hebrew
'''

#Transitive frame with lexical object
transitive2 = '''
corpus
 clause
   phrase function=Pred|PreS
     word lex#HJH[ sp=verb vs=qal language=Hebrew
   phrase function=Objc
'''

In [112]:
frames = {'(S)':intransitive, '(S)-O_sfx':transitive1, '(S)-O':transitive2}

verb_constructions = {}

n=0
for fr in frames:
    
    clause = ''
    lex = ''
    stem = ''
    
    results = A.search(frames[fr], sets=sets, silent=True)
    
    for r in results:
        
        clause = r[1]
        lex = F.lex.v(r[3])
        cmpl_prep = ''

        verb_constructions[clause] = [lex, fr]
            
    n+=1
    print(f'... frame {n} completed')

... frame 1 completed
... frame 2 completed
... frame 3 completed


In [113]:
df = pd.DataFrame(verb_constructions).T
df.columns = ['lex','frame']

In [332]:
df.head

<bound method NDFrame.head of          lex  frame
427557  >MR[    (S)
427561  VWB[    (S)
427568  >MR[    (S)
427580  >MR[    (S)
427586  R>H[    (S)
427587  VWB[    (S)
427588  >MR[    (S)
427598  R>H[    (S)
427599  VWB[    (S)
427603  >MR[    (S)
427613  MCL[    (S)
427615  R>H[    (S)
427616  VWB[    (S)
427620  >MR[    (S)
427626  CRY[    (S)
427628  R>H[    (S)
427629  VWB[    (S)
427631  >MR[    (S)
427632  PRH[    (S)
427633  RBH[    (S)
427635  RBH[    (S)
427639  >MR[    (S)
427646  R>H[    (S)
427647  VWB[    (S)
427648  >MR[    (S)
427650  RDH[    (S)
427656  >MR[    (S)
427657  PRH[    (S)
427658  RBH[    (S)
427661  RDH[    (S)
...      ...    ...
467944  R>H[  (S)-O
467948  LQX[  (S)-O
467949  MCX[  (S)-O
467955  <FH[  (S)-O
467959  NTN[  (S)-O
467962  LQX[  (S)-O
467965  NTN[  (S)-O
467967  NTN[  (S)-O
467969  NGF[  (S)-O
467975  <FH[  (S)-O
468000  LQX[  (S)-O
468006  <FH[  (S)-O
468013  LQX[  (S)-O
468033  <FH[  (S)-O
468042  BNH[  (S)-O
468055  TPF[  (S)-O
468058  CX

Are all relevant verbs extracted? We can find out by comparing with a singular search that extracts all verbs regardless of frame:

In [115]:
verbs = '''
corpus
 clause
  phrase function=Pred|PreO|PreS
   word lex#HJH[ sp=verb vs=qal language=Hebrew
'''

results_all_verbs = A.search(verbs, sets=sets)

  3.00s 20697 results


In [116]:
print(f'Number of cases in transitivity dataset: {len(df)}')
print(f'Number of all verbs regardless of transitivity frames: {len(results_all_verbs)}')
print(f'\nDifference: {len(results_all_verbs)-len(df)}')

Number of cases in transitivity dataset: 20696
Number of all verbs regardless of transitivity frames: 20697

Difference: 1


Which clauses are different?

In [117]:
mismatches = []
df_clauses = list(df.index)

for r in results_all_verbs:
    clause = r[1]
    if clause not in df_clauses:
        mismatches.append(clause)
        
len(mismatches)

0

A clause may occur twice in the results of all verbs regardless of transitivity frames. We can check that:

In [119]:
clauses = []

for r in results_all_verbs:
    clause = r[1]
    clauses.append(clause)
    
collections.Counter(clauses).most_common()[:10]

[(431017, 2),
 (427553, 1),
 (427557, 1),
 (427560, 1),
 (427561, 1),
 (427563, 1),
 (427564, 1),
 (427568, 1),
 (427571, 1),
 (427576, 1)]

In fact, a clause occurs twice because it has two phrases with object suffix. In the first data extraction the same clause only occurs once because it is added to a dictionary, and in effect, the second instance overwriting the first. Anyway, the clause needs only be represented once, so we can ignore this inconsistency.

In [120]:
A.pretty(431017)

## Statistics

To distinguish transitive and intransitive verbs, lexeme and frame are crosstabulated to get the frequencies of each constellation. Afterwards, two subsets are created from the dataframe based on the following rules:

1. Intransitive: No instances of lexical or suffix object
2. Transitives: Instances of either lexical or suffix object

In [121]:
lex_frames = pd.crosstab(index=df.lex, columns=df.frame)
lex_frames.head()

frame,(S),(S)-O,(S)-O_sfx
lex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
<BD[,27,101,44
<BH[,2,0,0
<BR[,162,61,0
<BV[,1,1,0
<CN[,2,0,0


In [123]:
intransitive_verbs = lex_frames[(lex_frames['(S)-O'] == 0) & (lex_frames['(S)-O_sfx'] == 0)]

In [356]:
intransitive_verbs.sort_values(by='(S)', ascending=False)[:20]

frame,(S),(S)-O,(S)-O_sfx
lex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BW>[,978,0,0
HLK[,787,0,0
JY>[,397,0,0
<LH[,352,0,0
MWT[,327,0,0
CWB[,295,0,0
JCB[,282,0,0
QWM[,260,0,0
MLK[,178,0,0
JRD[,176,0,0


In [125]:
transitive_verbs = lex_frames[(lex_frames['(S)-O'] > 0) | (lex_frames['(S)-O_sfx'] > 0)]

In [126]:
transitive_verbs.sort_values(by=['(S)',('(S)-O')], ascending=False)[:10]

frame,(S),(S)-O,(S)-O_sfx
lex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
>MR[,3168,9,0
BW>[,978,1,0
HLK[,787,5,0
<FH[,629,713,25
JY>[,397,4,0
<LH[,352,2,0
CM<[,318,110,10
R>H[,254,248,33
NTN[,219,634,121
JD<[,216,61,14


We verify that all lexemes are indeed included in one of the two subsets:

In [128]:
len(lex_frames)-(len(intransitive_verbs)+len(transitive_verbs))

0

### Surprising cases

It is surprising that a number of motion verbs in fact occurs in transitive frames. These include BW> "arrive", HLK "walk", JY> "go out", and <LH "ascend". These verbs deserve closer attention:

In [133]:
query = '''
corpus
 clause
  phrase function=Pred|PreS
   word lex={} sp=verb vs=qal language=Hebrew
  phrase function=Objc
'''

#### 1. BW> "arrive"

In [134]:
BWO = A.search(query.format('BW>['), sets=sets)
A.show(BWO)

  2.70s 1 result


The clause has not been parsed correctly. Most likely, the object phrase is the object of the infinitive phrase "to see" and not the object of "arrive". We can therefore still treat BW> as an intransitive. Accordingly, the clause is excluded from the analysis:

In [135]:
exclude_clauses = set()

In [138]:
exclude_clauses.update([r[1] for r in BWO])

#### 2. HLK "walk"

In [141]:
HLK = A.search(query.format('HLK['), sets=sets)
A.show(HLK)

  2.72s 6 results


In all these cases, the object phrase is not specifying an object affected by the predicate but is modifying the verb differently, either denoting time, or place or the nature of the path to walk. All the cases will be removed:

In [142]:
exclude_clauses.update([r[1] for r in HLK])

#### 3. JY> "go out"

In [145]:
JYA = A.search(query.format('JY>['), sets=sets)
A.show(JYA)

  2.32s 4 results


In all 4 cases, the object should probably be interpreted as "source", that is, "from the town", so the accusative is not a real object in these examples. All cases are excluded:

In [146]:
exclude_clauses.update([r[1] for r in JYA])

#### 4. <LH "go up"

In [149]:
ALH = A.search(query.format('<LH['), sets=sets)
A.show(ALH)

  2.37s 2 results


In both cases, the object phrases should rather be understood as locatives. Both cases are excluded:

In [150]:
exclude_clauses.update([r[1] for r in ALH])

The cases revisited here demonstrate that the object phrase in the corpus is not clearly defined. The accussative particle (את) can be used with direct objects but can also be used with other modifiers of the verb. Joüon-Muraoka distinguish between direct accussative on the one hand (i.e. affected/effected objects) and indirect accussative on the other hand (i.e. indirectly modifying the verb, sometimes place and time). So far, there is no feature in the database accouting for this difference, so this ambiguity will certainly affect the results of the transitivity frame extraction.

### Updated transitivity analysis

For now, we redo the analysis while excluding the clauses addressed here:

In [155]:
df1 = df[~df.index.isin(exclude_clauses)]

In [187]:
df_transitivity = pd.crosstab(index=df1.lex, columns=df1.frame)
df_transitivity.head()

frame,(S),(S)-O,(S)-O_sfx
lex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
<BD[,27,101,44
<BH[,2,0,0
<BR[,162,61,0
<BV[,1,1,0
<CN[,2,0,0


In [188]:
conditions = [
    (df_transitivity['(S)-O'] == 0) & (df_transitivity['(S)-O_sfx'] == 0)]

choices = ['intransitive']
df_transitivity['transitivity'] = np.select(conditions, choices, default='transitive')

In [195]:
df_transitivity.head()

frame,(S),(S)-O,(S)-O_sfx,transitivity
lex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
<BD[,27,101,44,transitive
<BH[,2,0,0,intransitive
<BR[,162,61,0,transitive
<BV[,1,1,0,transitive
<CN[,2,0,0,intransitive


## Logical structures

The logical structures are created on the basis of Aktionsart and transitivity. This section combines both features for computation of simple logical structures according to the definitions provided by RRG.

Fundamentally, there are two types of logical structures, one for stative verbs, and one for active verbs. Each of these can be expanded with certain modifiers, but they will not be accounted for here.

Types:
* Stative: pred' (x) or (x, y)
* Active: do' (x, [pred' (x) or (x, y)])

If the verb is intransitive, only the x-slot will be filled. For transitive verbs, both the x- and y-slot will be filled.

First, we also need to import the data of activity/stativity:

In [243]:
data = pd.read_csv('active_stative_verbs_pca_evaluated.csv')
data = data[['lex', 'Aktionsart']]

The following function generates a general logical struture for each verb without inserting specific arguments:

In [309]:
def getAktionsart(pred):
    for r in data.iterrows():
        lex = str(r[1].lex)
        if '_' in lex:
            lex = lex[lex.index('_')+1:]
            if pred == lex:
                Aktionsart = r[1].Aktionsart
                return Aktionsart

In [361]:
def genericLogicalStructure(pred):
    
    Aktionsart = getAktionsart(pred)

    if not Aktionsart:
        return "No Aktionsart annotated"
    
    else:
        #Transitivity
        transitivity = df_transitivity[df_transitivity.index == pred].transitivity.item()
        
        if Aktionsart == 'stative' and transitivity == 'intransitive':
            return f"{pred}' (x)"
        elif Aktionsart == 'stative' and transitivity == 'transitive':
            return f"{pred}' (x, y)"
        elif Aktionsart == 'active' and transitivity == 'intransitive':
            return f"do' (x, [{pred}' (x)])"
        elif Aktionsart == 'active' and transitivity == 'transitive':
            return f"do' (x, [{pred}' (x, y)])"
        else:
            return "No logical structure"

genericLogicalStructure('R>H[')

"R>H[' (x, y)"

The next function takes specific clauses for parsing into specific logical structures including the arguments:

In [362]:
def logicalStructure(clause):
    
    A.pretty(clause)
    
    #Parsing clause to get predicate, subject, object and complement
    subj = ''
    objc = ''
    pred = ''
    for ph in L.d(clause, 'phrase'):
        if F.function.v(ph) in ['Subj']:
            subj += T.text(L.d(ph, 'word')).rstrip()
            
        if F.function.v(ph) in ['Objc']:
            objc += T.text(L.d(ph, 'word')).rstrip()

        if F.function.v(ph) in ['Pred','PreO','PreS','PreC']:
            for w in L.d(ph, 'word'):
                if F.sp.v(w) == 'verb':
                    pred = F.lex.v(w)
                    stem = F.vs.v(w)        
    if not pred:
        return "No predicate"
    
    if stem not in ['nif','hof','pual']:
        x = subj
        y = objc
    else:
        x = objc
        y = subj
        
    if not x:
        x = 'nan'
    
    Aktionsart = getAktionsart(pred)

    if not Aktionsart:
        return "No Aktionsart annotated"
        
    else:
        if Aktionsart == 'stative' and not y:
            return f"{pred}' ({x})"
        elif Aktionsart == 'stative' and y:
            return f"{pred}' ({x}, {y})"
        elif Aktionsart == 'active' and not y:
            return f"do' ({x}, [{pred}' ({x})])"
        elif Aktionsart == 'active' and y:
            return f"do' ({x}, [{pred}' ({x}, {y})])"
        else:
            return "No logical structure"
    
logicalStructure(427582)

"R>H[' (nan, הַיַּבָּשָׁ֑ה)"