*best viewed in [nbviewer](https://nbviewer.jupyter.org/github/CambridgeSemiticsLab/BH_time_collocations/blob/master/results/notebooks/prepare_annotations.ipynb)*

# Annotating Adverbials with Semantic Classes
### Cody Kingham
<a href="../../docs/sponsors.md"><img height=200px width=200px align="left" src="../../docs/images/CambridgeU_BW.png"></a>

In [1]:
! echo "last updated:"; date

last updated:
Tue 28 Apr 2020 14:33:01 BST


## Building Manual Annotations

We want to annotate the semantic relationships expressed between time adverbials
and the situations they describe. Following Haspelmath (*From Space to Time*, 1997)
we note that temporal adverbials are essentially metaphorical extensions of locative
phrases. These phrases are most frequently marked with prepositions, the semantics of 
which are fairly straightforward. As Van der Merwe et al. state in the case of 
Biblical Hebrew (Biblical Hebrew Reference Grammar, 2017):

> Prepositions express primarily the spatial relationships between trajectors and landmarks. (328)

Temporal adverbials function in much the same way by metaphorically extending the concept of 
space to a two dimensional timeline. For instance, see the example and timeline below:

```
"The baby was born before her great
grandfather died." (Haspelmath 1997: 28)

         RefT: her great-grandfather died
            |
————————————————————————>
    |
   LSit: the baby was born
```

In this example, the located situation `LSit` is situated before the reference time `RefT`.
The relationship between the `LSit` and `RefT` is supplied by the preposition "before".

Another way to think about the role of PP's is that it expresses a relationship 
between a "landmark" and a "target", following Fillmore ("Mini-grammars of some 
time-when expressions in English", 2001). The information most often supplied by the 
preposition is "direction" (ibid., 38):

```
                      distance
            |—————————————————————————|
 ----------                             ----------                          
|          |                           |          |
|  target  | <———————————————————————— | landmark |
|          |         direction         |          | 
 ----------                             ----------

```

In this conceptualization, the situation is the target and reference time the landmark. 
The preposition indeed provides directionality, but we might more accurately say, with 
Haspelmath, that a preposition provides location. For instance, in Haspelmath's
simulataneous location ("in") the metaphor is based not on movement but static location.

As Haspelmath surveys world languages, he identifies several common categories.
Modern Hebrew is amongst the languages surveyed. Here are the semantic classes 
for Modern Hebrew with their most common prepositions:

    anterior - לפני
    posterior - אחרי
    simultaneous location - ב
    anterior durative - עד
    posterior durative - מן
    atelic extent - ø + quantified NP
    telic extent -  ב + quantified NP
    distance future - עוד
    distance past - לפני in sense of "ago"
    distance posterior - זה + quantified NP
    
In this notebook, we will export a number of automatically annotated adverbials so that 
they can be hand-checked and modified.

In addition to semantic tags for the time adverbials, we will tag two additional pieces
of information:

* the node of the primary modified element — time adverbials can modify within diffferent 
levels of scope (Klein, *Time in Language*, 1994), and do not always modify the verb. This
data will be a stretch of slots (words in [BHSA](https://github.com/etcbc/bhsa)) that express 
the modified situation.
* lexical aspect quality of the verb using modified Vendler categories from Croft, *Verbs*, 2012, 44.

<hr>

# Python

Now we import the modules and data needed for the analysis.

In [46]:
# standard & data science packages
import collections
import pandas as pd
pd.set_option('max_rows', 100)
pd.set_option('max_colwidth',100)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams['font.serif'] = ['SBL Biblit']
import seaborn as sns
from bidi.algorithm import get_display # bi-directional text support for plotting
from paths import main_table, figs

# custom packages (see /tools)
from tf_tools.load import load_tf
from stats.significance import contingency_table, apply_fishers

# launch Text-Fabric with custom data
TF, API, A = load_tf(silent='deep')
A.displaySetup(condenseType='phrase')
F, E, T, L = A.api.F, A.api.E, A.api.T, A.api.L # corpus analysis methods

# load and set up project dataset
times_full = pd.read_csv(main_table, sep='\t')
times_full.set_index(['node'], inplace=True)
times = times_full[~times_full.classi.str.contains('component')] # select singles

To increase the rate,see https://annotation.github.io/text-fabric/Api/Repo/


In [47]:
times.head()

Unnamed: 0_level_0,ref,book,ph_type,text,token,clause,classi,time,time_etcbc,time_pos,...,qual_str,demonstrative,demon_str,demon_dist,ordinal,ord_str,cl_kind,verb,tense,verb_lex
node,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1446800,Gen 1:1,Genesis,prep_ph,בְּרֵאשִׁ֖ית,ב.ראשׁית,בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃,single.prep.bare.øanchor,ראשׁית,R>CJT/,subs,...,,False,,,False,,VC,True,qtl,ברא
1446801,Gen 2:2,Genesis,prep_ph,בַּיֹּ֣ום הַשְּׁבִיעִ֔י,ב.ה.יום.ה.שׁביעי,וַיְכַ֤ל אֱלֹהִים֙ בַּיֹּ֣ום הַשְּׁבִיעִ֔י מְלַאכְתֹּ֖ו,single.prep.definite.def_apposition.ordinal,יום,JWM/,subs,...,,False,,,True,שׁביעי,VC,True,wyqtl,כלה
1446802,Gen 2:2,Genesis,prep_ph,בַּיֹּ֣ום הַשְּׁבִיעִ֔י,ב.ה.יום.ה.שׁביעי,וַיִּשְׁבֹּת֙ בַּיֹּ֣ום הַשְּׁבִיעִ֔י מִכָּל־מְלַאכְתֹּ֖ו,single.prep.definite.def_apposition.ordinal,יום,JWM/,subs,...,,False,,,True,שׁביעי,VC,True,wyqtl,שׁבת
1446803,Gen 2:5,Genesis,prep,טֶ֚רֶם,טרם,וְכֹ֣ל׀ שִׂ֣יחַ הַשָּׂדֶ֗ה טֶ֚רֶם יִֽהְיֶ֣ה בָאָ֔רֶץ,single.bare.øanchor,טרם,VRM/,subs,...,,False,,,False,,VC,True,yqtl,היה
1446804,Gen 2:5,Genesis,prep,טֶ֣רֶם,טרם,וְכָל־עֵ֥שֶׂב הַשָּׂדֶ֖ה טֶ֣רֶם יִצְמָ֑ח,single.bare.øanchor,טרם,VRM/,subs,...,,False,,,False,,VC,True,yqtl,צמח


# Generic Overview

First, let's get re-acquainted with the general makeup of the dataset.

In [4]:
time_surfaces = pd.DataFrame(times['token'].value_counts())
time_surfaces.head(50)

Unnamed: 0,token
עוד,344
עתה,340
ב.ה.יום.ה.הוא,201
ה.יום,191
אז,117
ל.עולם,99
ב.ה.בקר,78
כל.ה.יום,76
אחר,67
עד.ה.יום.ה.זה,65


## Generating Automatic Annotations for Biblical Hebrew

We generate automatic annotations to lessen the workload of annotating and to solve 
repetitive tasks at once. These annotations are all tentative, and subject to human
correction and adjustment.

In order to formulate a standard, I want to practice with a few key cases that we've
already seen in the dataset above. Here's a diverse group of common adverbials selected
from the above counts.

    ב.ה.יום.ה.הוא
    ב.ה.עת.ה.היא
    עד.ה.יום.ה.זה
    שׁבע.יום
    
I will compile a time line graph for each of these adverbials, following Haspelmath's
layout. I then want to distill each graph into a set of grouped and ordered tags using
tuple. These tags will be utilized for the annotations.
    
In each diagram, `RefT` refers to the time contained in the adverbial, typically the semantic
head of the adverbial. `LSit` represents that a situation (typically a verb or a state)
is being located somewhere with reference to `RefT`. That relationship is expressed most often
by a preposition. A `QSit` represents a quantified situation, i.e. one which is being extended
over a quantified amount of time, most often with a time unit like a day/month/year.

We append a `_b` for bounded (punctual) or `_d` for durative to both `RefT` and `LSit`,
so that it is possible to have any combination of a durative/punctual `RefT` and 
durative/punctual `LSit`.