# Export *Aktionsart*

This notebook combines annotations created in previous notebooks for a final review and export as TF-features. The annotations in question are "stative/active" and "causative/non-causative" which divide the Hebrew verbs into four groups:

1. Stative verbs: *fear, see*
2. Active verbs: *run, eat*
3. Causative stative verbs: *give, put*
4. Causative active verbs: *bring, pour*

## Import

In [1]:
import os, sys

#Data analysis
import collections
import pandas as pd
import numpy as np

#Text-fabric
from tf.app import use

In [2]:
A = use('bhsa', hoist=globals(), locations='Aktionsart/tf')

rate limit is 60 requests per hour, with 45 left for this hour


To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/


	connecting to online GitHub repo annotation/app-bhsa ... connected
Using TF-app in C:\Users\Ejer/text-fabric-data/annotation/app-bhsa/code:
	rv1.3=#f38d56bd757e87fe12d0c125e1ca52ee4376127b (latest release)
rate limit is 60 requests per hour, with 40 left for this hour


To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/


	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in C:\Users\Ejer/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6=#bac4a9f5a2bbdede96ba6caea45e762fe88f88c5 (latest release)
rate limit is 60 requests per hour, with 35 left for this hour


To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/


	connecting to online GitHub repo etcbc/phono ... connected
Using data in C:\Users\Ejer/text-fabric-data/etcbc/phono/tf/c:
	r1.2=#1ac68e976ee4a7f23eb6bb4c6f401a033d0ec169 (latest release)
rate limit is 60 requests per hour, with 30 left for this hour


To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/


	connecting to online GitHub repo etcbc/parallels ... connected
Using data in C:\Users\Ejer/text-fabric-data/etcbc/parallels/tf/c:
	r1.2=#395dfe2cb69c261862fab9f0289e594a52121d5c (latest release)
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used


Three datasets are imported: The stative/active dataset, the morphological causative dataset, and the lexical causative dataset.

In [3]:
#Dataset path
PATH = 'datasets/'

dyna = pd.read_csv(f'{PATH}Lev17-26.Aktionsart_sta_act_final_1.csv')
dyna.columns = ['lex_stem','anno']

morph_caus = pd.read_csv(f'{PATH}Lev17-26_morphological_causitive.csv')
morph_caus.columns = ['lex_stem','anno']

lex_caus = pd.read_csv(f'{PATH}Lev17-26_lexical_causative.csv')
lex_caus.columns = ['lex_stem','anno']

For each dataset two additional columns containing lexeme and stem, respectively, are added:

In [4]:
def getLexStem(string, trans=True):
    
    if trans:
        lex = string[:string.index('_')]
        stem = string[string.index('_')+1:]
    else:
        lex = string[string.index('_')+1:]
        stem = string[:string.index('_')]
    return lex, stem

for d in [dyna, morph_caus, lex_caus]:
    d['stem'] = [getLexStem(n)[1] for n in list(d.lex_stem)]
    d['lex'] = [getLexStem(n)[0] for n in list(d.lex_stem)]

## Final review

### Preparing dataset

The annotations are combined for each verb to perform a final review:

In [5]:
anno_dict = collections.defaultdict(lambda: collections.defaultdict(lambda: collections.defaultdict()))

for row in dyna.iterrows():
    lex = row[1].lex
    stem = row[1].stem
    lex_stem = row[1].lex_stem
    
    anno_dict[lex][stem]['dyn'] = row[1].anno
    
    if lex_stem in list(morph_caus.lex_stem):
        anno_dict[lex][stem]['caus'] = morph_caus[morph_caus.lex_stem == lex_stem].anno.item()
        
    elif lex_stem in list(lex_caus.lex_stem):
        anno_dict[lex][stem]['caus'] = lex_caus[lex_caus.lex_stem == lex_stem].anno.item()
        
    else:
        anno_dict[lex][stem]['caus'] = '?'
        
#anno_dict

In [6]:
anno_df = pd.concat({
        k: pd.DataFrame.from_dict(v, 'index') for k, v in anno_dict.items()
    }, 
    axis=0, sort=True)

anno_df.caus = anno_df.caus.astype('str')
anno_df[:10]

Unnamed: 0,Unnamed: 1,caus,dyn
<BD[,hif,?,act
<BD[,hof,?,act
<BD[,nif,?,act
<BD[,pual,?,act
<BD[,qal,,act
<BR[,hif,caus,act
<BR[,piel,?,act
<BR[,qal,,act
<CQ[,qal,caus,sta
<FH[,nif,,act


We create a new column by combining the aspects of dynamicity and causativity:

In [7]:
Aktionsart = []

for r in anno_df.iterrows():
    caus = r[1].caus
    dyn = r[1].dyn
    
    if dyn == 'sta' and caus != 'caus':
        Akt = 'sta'
    elif dyn== 'act' and caus != 'caus':
        Akt = 'act'
    elif dyn == 'sta' and caus == 'caus':
        Akt = 'caus sta'
    elif dyn == 'act' and caus == 'caus':
        Akt = 'caus act'
    else:
        print(error)

    Aktionsart.append(Akt)
    
anno_df['Aktionsart'] = Aktionsart

For review, we filter out cases where the causative aspect has not been determined:

In [8]:
review_df = anno_df[anno_df.caus != '?']

In [9]:
review_df.head()

Unnamed: 0,Unnamed: 1,caus,dyn,Aktionsart
<BD[,qal,,act,act
<BR[,hif,caus,act,caus act
<BR[,qal,,act,act
<CQ[,qal,caus,sta,caus sta
<FH[,nif,,act,act


In [13]:
review_df[review_df.index.get_level_values(0) == 'JTR[']

Unnamed: 0,Unnamed: 1,caus,dyn,Aktionsart
JTR[,hif,,sta,sta
JTR[,nif,,sta,sta


### Review

The first test regards whether all verbs and stems in Leviticus 17-26 are accounted for:

In [10]:
query = '''
book book=Leviticus
 chapter chapter=17|18|19|20|21|22|23|24|25|26
  clause
   phrase function=Pred|PreO|PreS|PreC|PtcO
     word pdp=verb
'''
Lev_verbs = A.search(query)

  1.87s 936 results


In [11]:
lexemes = collections.defaultdict(list)

for r in Lev_verbs:
    if (F.lex.v(r[4]), F.vs.v(r[4])) not in review_df.index:
        lexemes[(F.lex.v(r[4]), F.vs.v(r[4]))].append(r[2])

In [12]:
print(f'Number of lexemes to annotate: {len(lexemes)}')

Number of lexemes to annotate: 6


#### Annotating remaining lexemes

In [13]:
annotation = {}

In [14]:
lex = [lex for lex in lexemes.keys()]

def show(lex, clause_dict=lexemes):
    clauses = clause_dict[lex]
    
    for cl in clauses:
        A.pretty(cl)

##### KRT "cut" (*Niphal*)

In [20]:
#show(lex[0])

In [16]:
annotation['KRT[_nif'] = ('caus','sta')

##### VM> "be unclean" (*Hitpael*)

In [17]:
#show(lex[1])

In [18]:
annotation['VM>[_hit'] = ('caus','sta')

##### QDC "be holy" (*Hitpael*)

In [19]:
#show(lex[2])

In [20]:
annotation['QDC[_hit'] = ('caus','sta')

##### JLD "bear" (Niphal)

In [21]:
#show(lex[3])

In [22]:
annotation['JLD[_nif'] = ('nan','sta')

##### HLK "walk" (*Hitpael*)

In [23]:
#show(lex[4])

In [24]:
annotation['HLK[_hit'] = ('nan','act')

##### CLX "send" (*Hiphil*)

In [25]:
#show(lex[5])

In [26]:
annotation['CLX[_hif'] = ('caus','act')

The annotations will be added to the original data later.

#### Review

In [27]:
cases = list(review_df.index)
print(f'Number of cases to review: {len(cases)}')

Number of cases to review: 231


In [26]:
n=0

In [44]:
query = '''
book book=Leviticus
 chapter chapter=17|18|19|20|21|22|23|24|25|26
  clause
    phrase function=Pred|PreO|PreS|PreC|PtcO
     word pdp=verb lex={}
'''

def show(lex_stem, df=review_df):
    lex=lex_stem[0]
    stem=lex_stem[1]
    
    print(df[(df.index.get_level_values(0) == lex) & (df.index.get_level_values(1) == stem)].Aktionsart.item())
    
    results = A.search(query.format(lex))
    clause_verbs = {r[2]:r[4] for r in results if F.vs.v(r[4]) == stem}
    
    for cl in clause_verbs:
        print(clause_verbs[cl])
        A.pretty(cl, highlights={r[4]:'gold' for r in results})

In [28]:
print(n)
show(cases[n])
n+=1

0
act
  1.18s 3 results
67871


67885


67981


#### Changes

The review above shows that some annotations need to be changed because of local factors. These factors include different uses of a lexeme in a particular context, and those and similar cases will be accounted for here.

We create a dictionary to store the changes:

In [28]:
changes = {}

##### <BR "pass"

The verb <BR "pass" is used in a technical sense with the object shofar where it seems to indicate a simple activity rather than a caused activity.

In [29]:
#show(('<BR[','hif'))

In [30]:
changes[67309] = ('nan','act')
changes[67327] = ('nan','act')

##### JSP "add"

JSP "add" sometimes means add (causative stative) but sometimes just "continue", which is likely an activity:

In [31]:
#show(('JSP[','qal'))

In [32]:
changes[68444] = ('nan','act')

##### M<L "be unfaithful"

In the present case M<L "be unfaithful" seems to refer to a specific act suggested by the relative which refers to some concrete deeds being performed. Therefore active in this particular case:

In [33]:
#show(('M<L[','qal'))

In [34]:
changes[68816] = ('nan','act')

##### NF> "lift"

The meaning of "lifting sin" seems come close to "be responsible", therefore stative.

In [35]:
#show(('NF>[','hif'))

Another case is "lift a poor man's face" which seem to be a causative stative, causing the poor man's face to be exalted.

In [36]:
#show(('NF>[','qal'))

In [37]:
changes[65642] = ('caus','sta')
changes[63390] = ('nan','sta')
changes[63916] = ('nan','sta')
changes[64031] = ('caus','sta')
changes[64067] = ('nan','sta')
changes[64745] = ('nan','sta')
changes[64786] = ('nan','sta')
changes[64797] = ('nan','sta')
changes[65525] = ('nan','sta')
changes[67035] = ('nan','sta')

##### NKH "strike"

In many cases, NKH "strike" does not only mean "blow" or "hit" but strik to death. Therefore it is causative of becomining not-being. The counter-example is Lev 26:24 where the verb certainly does not imply murder.

In [38]:
#show(('NKH[','hif'))

In [39]:
changes[67062] = ('caus','sta')
changes[67069] = ('caus','sta')
changes[67109] = ('caus','sta')
changes[67113] = ('caus','sta')

##### QDC "be holy"

In one case, the verb QDC in *Piel* does not indicate causing someone to be holy but merely regarding someone as being holy, therefore only stative:

In [40]:
#show(('QDC[','piel'))

In [41]:
changes[65085] = ('nan','sta')

##### QWM "be high"

In the two cases of QWM "be high" in the causative, the object is in fact not affected but rather effected. The verb should probably be translated "erected" and is a construction verb, where the Undergoer is effected. Therefore, an activity:

In [42]:
#show(('QWM[','hif'))

In [43]:
changes[68156] = ('nan','act')
changes[68292] = ('nan','act')

In another case, QWM refers to an event of standing up in front of elderly people (Lev 19:32)

In [44]:
changes[64300] = ('nan','act')

##### RYH= "pay off"

In most cases it seems right to treat RYH "be acceptable" as a stative. In some cases, however, the verb seems to have an active sense of "paying off", that is, doing something because of their sin.

In [45]:
#show(('RYH=[','qal'))

In [46]:
changes[68845] = ('nan','act')
changes[68882] = ('nan','act')

##### XV> "miss"

In these two cases, the verb refers to an activity:

In [47]:
#show(('XV>[','qal'))

In [48]:
changes[64160] = ('nan','act')
changes[64167] = ('nan','act')

##### NQP "go around"

The verb seems to be a simple activity in this case.

In [49]:
#show(('NQP[','hif'))

In [50]:
changes[64231] = ('nan','act')

## Export as TF-feature

#### Create dictionary

First, we create a dictionary containing each word node and its corresponding annotation:

In [51]:
review_df

Unnamed: 0,Unnamed: 1,caus,dyn,Aktionsart
<BD[,qal,,act,act
<BR[,hif,caus,act,caus act
<BR[,qal,,act,act
<CQ[,qal,caus,sta,caus sta
<FH[,nif,,act,act
...,...,...,...,...
ZNH[,qal,,act,act
ZR<[,qal,,act,act
ZRH[,piel,caus,sta,caus sta
ZRQ[,qal,caus,act,caus act


In [52]:
last_word = L.d(T.nodeFromSection(('2_Kings',25,30)), 'word')[-1]
   
word_dict = collections.defaultdict()

for n in range(1, last_word+1):
    if F.pdp.v(n) == 'verb' and F.function.v(L.u(n, 'phrase')[0]) in ['Pred','PreS','PreO','PreC','PtcO']:
        lex = F.lex.v(n)
        stem = F.vs.v(n)

        if (lex, stem) in review_df.index:
            caus = review_df[(review_df.index.get_level_values(0)==lex) 
                             & (review_df.index.get_level_values(1)==stem)]['caus'].item()
            dyna = review_df[(review_df.index.get_level_values(0)==lex) 
                             & (review_df.index.get_level_values(1)==stem)]['dyn'].item()
                
            word_dict[n] = (caus, dyna)

In [53]:
word_dict

defaultdict(None,
            {15: ('nan', 'sta'),
             33: ('nan', 'act'),
             35: ('nan', 'sta'),
             38: ('nan', 'sta'),
             41: ('nan', 'sta'),
             49: ('caus', 'sta'),
             59: ('nan', 'act'),
             69: ('nan', 'act'),
             72: ('nan', 'sta'),
             75: ('nan', 'sta'),
             80: ('nan', 'act'),
             82: ('nan', 'sta'),
             89: ('nan', 'sta'),
             90: ('caus', 'sta'),
             96: ('nan', 'act'),
             102: ('caus', 'sta'),
             123: ('nan', 'sta'),
             126: ('nan', 'act'),
             133: ('nan', 'sta'),
             136: ('nan', 'sta'),
             141: ('nan', 'act'),
             158: ('nan', 'sta'),
             161: ('nan', 'act'),
             172: ('nan', 'act'),
             175: ('nan', 'sta'),
             180: ('nan', 'act'),
             191: ('nan', 'act'),
             202: ('nan', 'sta'),
             205: ('caus', 'act'),
       

#### Update 1

The next step is to add the remaining verbs (6 verbs) to each of this dictionaries. We therefore need to retrieve the word nodes for each of the verb+stem combinations and add the annotation to the dictionaries. We have only looked at these particular verbs in the context of Lev 17-26 so we will restrict the annotation to the few cases in Leviticus:

In [54]:
annotation

{'KRT[_nif': ('caus', 'sta'),
 'VM>[_hit': ('caus', 'sta'),
 'QDC[_hit': ('caus', 'sta'),
 'JLD[_nif': ('nan', 'sta'),
 'HLK[_hit': ('nan', 'act'),
 'CLX[_hif': ('caus', 'act')}

In [55]:
query = '''
book book=Leviticus
 chapter chapter=17|18|19|20|21|22|23|24|25|26
  clause
   phrase function=Pred|PreO|PreS|PreC|PtcO
     word pdp=verb
'''
Lev_verbs = A.search(query)

  1.64s 936 results


In [56]:
#Extract lexemes from dictionary
lexemes = [getLexStem(n)[0] for n in annotation]

#Using the previously written query to extract these particular lexemes from Lev 17-26:
results = A.search(query.format('|'.join(lexemes)))

  1.50s 936 results


Having found all instances of the five lexemes in Lev 17-26 we now check whether the lexemes are of the right stem. If so, the word node and the annotation are added to the dictionary:

In [57]:
print('Updates made:\n\n**************************************\n')

for r in results:
    lex = F.lex.v(r[4])
    stem = F.vs.v(r[4])
    
    lex_stem = f'{lex}_{stem}'
    
    if lex_stem in annotation:
        
        #We add the word node and annotation to the dictionary:
        word_dict[r[4]] = annotation[lex_stem]
        print(f'{r[4]} ({lex} {stem}): {annotation[lex_stem]}')
        
print('\n**************************************')

Updates made:

**************************************

63086 (KRT[ nif): ('caus', 'sta')
63202 (KRT[ nif): ('caus', 'sta')
63350 (KRT[ nif): ('caus', 'sta')
63690 (VM>[ hit): ('caus', 'sta')
63793 (KRT[ nif): ('caus', 'sta')
63818 (VM>[ hit): ('caus', 'sta')
63923 (KRT[ nif): ('caus', 'sta')
64554 (QDC[ hit): ('caus', 'sta')
64736 (KRT[ nif): ('caus', 'sta')
64767 (KRT[ nif): ('caus', 'sta')
64984 (VM>[ hit): ('caus', 'sta')
65022 (VM>[ hit): ('caus', 'sta')
65024 (VM>[ hit): ('caus', 'sta')
65160 (VM>[ hit): ('caus', 'sta')
65411 (KRT[ nif): ('caus', 'sta')
65850 (JLD[ nif): ('nan', 'sta')
66467 (KRT[ nif): ('caus', 'sta')
68317 (HLK[ hit): ('nan', 'act')
68507 (CLX[ hif): ('caus', 'act')

**************************************


#### Update 2

Next, we update both dictionaries ("dyna" and "caus") with the changes made previously:

In [58]:
changes

{67309: ('nan', 'act'),
 67327: ('nan', 'act'),
 68444: ('nan', 'act'),
 68816: ('nan', 'act'),
 65642: ('caus', 'sta'),
 63390: ('nan', 'sta'),
 63916: ('nan', 'sta'),
 64031: ('caus', 'sta'),
 64067: ('nan', 'sta'),
 64745: ('nan', 'sta'),
 64786: ('nan', 'sta'),
 64797: ('nan', 'sta'),
 65525: ('nan', 'sta'),
 67035: ('nan', 'sta'),
 67062: ('caus', 'sta'),
 67069: ('caus', 'sta'),
 67109: ('caus', 'sta'),
 67113: ('caus', 'sta'),
 65085: ('nan', 'sta'),
 68156: ('nan', 'act'),
 68292: ('nan', 'act'),
 64300: ('nan', 'act'),
 68845: ('nan', 'act'),
 68882: ('nan', 'act'),
 64160: ('nan', 'act'),
 64167: ('nan', 'act'),
 64231: ('nan', 'act')}

We update the dictionary with the changes:

In [59]:
for w in changes:
    word_dict[w] = changes[w]

Finally, we transform the dictionary into two dictionaries, containing either of the two annotations. This is a necessary step for exporting as TF-features:

In [60]:
caus = {}
dyna = {}

for w in word_dict:
    if word_dict[w][0] != 'nan':
        caus[w] = word_dict[w][0]
    dyna[w] = word_dict[w][1]

### TF export

We can now export the *Aktionsart* annotations as TF-features. First, we assign TF version names and paths to ensure the right storage of the features:

In [61]:
if 'SCRIPT' not in locals():
    SCRIPT = False
    FORCE = True
    CORE_NAME = 'bhsa'
    NAME = 'Aktionsart'
    VERSION= 'c'
    CORE_MODULE = 'core'

In [62]:
repoBase = os.path.expanduser('~/text-fabric-data/etcbc')
coreTf = '{}/{}/tf/{}'.format(repoBase, CORE_NAME, VERSION) #Path of the core TF datasets
thisTf = '~Feature_sets/{}/tf/{}'.format(NAME, VERSION) #Path of actor datasets

#### Causation

In [63]:
nodeFeatures = dict(caus=caus)
metaData = dict(
    caus=dict(
        valueType='str',
        description="Causation annotations for verbs",
        coreData='BHSA',
        coreVersion=VERSION
    )
)
TF.save(nodeFeatures=nodeFeatures, metaData=metaData, module='c')

True

#### Dynamicity

In [64]:
nodeFeatures = dict(dyna=dyna)
metaData = dict(
    dyna=dict(
        valueType='str',
        description="Dynamicity annotations for verbs",
        coreData='BHSA',
        coreVersion=VERSION
    )
)
TF.save(nodeFeatures=nodeFeatures, metaData=metaData, module='c')

True