# Tense, Aspect, and Time in Hebrew Psalms
Goal: Explore the relationship between tense, aspect, and time markers in biblical Hebrew poetry by examining the correlation between time markers in biblical Hebrew clauses and verb forms.<br>
<br>
What is needed to **build** the data?<br>
1. **collect and categorise time markers into kind of time or aspect indicators** - This is the most difficult step, as seen below
2. gather all clauses that contain a time or aspect indicator
3. segment clause sections until:
    * a new time indicator occurs OR
    * the inheritance line is broken by a higher connection

## Discussion for the 3 needs:

### Categorising time markers
What kinds of features in the ETCBC4c are relevant for building the data? What other kinds of parameters can I think of? Time indicators should not only be time-oriented lexemes, but also syntactic constructions.<br><br>
Relevant features:
* `function`
    * = `'Time'`
* `pdp`
    * = `'adverb'`
    * = `'preposition'`, esp. temporal expressions, how to determine?
    * = `'adjective'` ?, i.e. in the case of verbless statements, or as modifier of nouns? -explore

Other features, i.e. exploration and custom searches such as patterns (יום ביום)
* `nametype` ? - explore
    * =`'mens'` "measurement units" ?
* `ls`
    * = `'nmdi'` -  distributive noun
    * = `'padv'` - potential adverb
    * = `'ordn'` ordinal numbers

Let's perform some cursory searches to inventory what is present in the features already...

In [1]:
from tf.fabric import Fabric

TF = Fabric(modules='Hebrew/etcbc4c')

This is Text-Fabric 2.0.0
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data/features/hebrew/etcbc4c/0_overview.html
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
106 features found and 0 ignored


In [2]:
api = TF.load("""
otype book chapter verse
function pdp nametype ls
vt lex g_cons g_cons_utf8 gloss
""")

  0.00s loading features ...
   |     0.05s B otype                from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.01s B book                 from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.01s B chapter              from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.01s B verse                from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.29s B g_cons               from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.29s B g_cons_utf8          from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.10s B function             from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.20s B pdp                  from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.01s B nametype             from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.27s B ls                   from /Users/Cody/github/text-fabric-data/Hebrew/etcbc4c
   |     0.20s B vt            

In [3]:
api.makeAvailableIn(globals())

In [4]:
# set corpus to Psalms
corpus = 'Psalmi'

In [5]:
import collections as col

# make an inventory of phrases with time function
# the dict's keys are the consonantal, transliterated texts of the phrase
# clauses are stored in set for printing passage text
function_time = col.defaultdict(set)
function_time_occ = 0

# loop through all phrase types; take only phrases in corpus
for phrase in F.otype.s('phrase'):
    book = L.u(phrase, otype='book')[0]     # calculate current book node
    book = F.book.v(book)                   # convert book node number to book string
    if book != corpus:                      # check book string against corpus name
        continue                            # skip to the next node if not in corpus
    if F.function.v(phrase) == 'Time':
        words = [F.g_cons.v(word) for word in L.d(phrase, otype = 'word')] # list comprehension to gather consonantal text of phrases 
        time_tag = ' '.join(words)              # compile into string to function as dict key in inventory
        clause = L.u(phrase, otype = 'clause')  # look up the embedding clause
        function_time[time_tag].add(clause)     # store it
        function_time_occ += 1                  # up the occurrence count
         
# Let's look at the results...
# print all to analyze
print('{} function_time phrases in the Psalms'.format(len(function_time)))
print('{} occurrences of function_time phrases in the Psalms\n'.format(function_time_occ))
for time_tag, occurrence in sorted(list((key, len(data)) for key, data in function_time.items()),\
key = lambda k: -k[1]):
    print('{}    {:>15}'.format(occurrence, time_tag))

101 function_time phrases in the Psalms
271 occurrences of function_time phrases in the Psalms

39             L <WLM
25           KL H JWM
11               TMJD
11              L NYX
7              B JWM
6          L DR W DR
6               L <D
6             B  BQR
5               JWMM
5            B  LJLH
5    M <TH W <D <WLM
5             <D >NH
5        L <WLM W <D
5               <WLM
5            B KL <T
4             <DJ <D
4        JWMM W LJLH
3              H JWM
3           B KL JWM
3            <D <WLM
3             L  <RB
3          <WLM W <D
3               LJLH
3                <TH
3             <D MTJ
2          B JWM YRH
2             L  BQR
2          B JWM R<H
2         B JWM YRTJ
2                NYX
2                QDM
2            JWM JWM
2             B <WDJ
2         KL JMJ XJJ
2                BQR
2              L <LM
2        L <D L <WLM
2         L >RK JMJM
2                 >Z
2           B  LJLWT
2                JWM
2              B XJJ
1         B KL JMJ

In [22]:
phrase_node = None

for phrase in L.d(489272, otype='phrase'): phrase_node = phrase if F.function.v(phrase) == 'Time' else phrase_node
    
[F.pdp.v(word) for word in L.d(phrase_node, otype='word')]

['prep', 'subs']

Methodological problem: <br>
By what methodology can tense or aspect be linked to a given lexeme? What is considered a past, present, or future tense lemma? And does it always function as such? If other contextual elements contribute to the sense, what elements?
<br><br>
ל (L) &nbsp;&nbsp;-&nbsp;&nbsp; seems to sometimes convey duration followed by a time-noun<br>
    * See for example L <WLM, L NYX, L <D, L  <RB
ב (B) &nbsp;&nbsp;-&nbsp;&nbsp; conveys specific times
    * B JWM, B  BQR, B  LJLH
But also B can convey more "durative"? time lengths when paired with a modifier like KL:
    * B KL <T, B KL JWM
These provide some basic working theories...but they are still based on intuition. Is there a way to establish the preposition's role in time phrases more objectively?<br>
ideas:
    * examine prepositions' roles in simpler, narrative texts that might clarify the roles.
    * other ways?...
<br><br>
The common pattern, at least in these examples, seems to be:<br> &nbsp;&nbsp;&nbsp;&nbsp;`preposition + time_noun`<br>
The time noun can also take on an adjective like KL.<br><br>**To do later: inventory phrase constituents in function_time phrases.**<br>



In [121]:
# side query:
# are there any chapters that occur without function_time phrases in Psalms?

def time_phrase_ch(chapter_node):
    """
    inventory phrase-functions for a given chapter node
    return boolean for presence or absence of function_time phrase
    """
    cn = chapter_node                # abbreviate, long name fyi
    phrase_functions = set(F.function.v(pf) for pf in L.d(cn, otype='phrase'))  # find all the phrase functions in the chapter 
    if 'Time' in phrase_functions:
        return True
    else:
        return False
    
def percent(occ, total):
    """
    calculate and return rounded percentages for result printing
    """
    raw_percent = (occ/total) * 100
    round_off_2 = round(raw_percent, 2)
    return round_off_2
    
def give_tp_chs(corp):
    """
    display time_phrase chapters and stats for a given corpus
    """
    total_none = 0
    total_chs = 0
    print('CHAPTERS WITHOUT TIME_PHRASES in {}:\n'.format(corp))
    for cn in F.otype.s('chapter'):
        if F.book.v(cn) != corp:
            continue
        total_chs += 1
        if not time_phrase_ch(cn):
            print(F.chapter.v(cn), end= ', ')
            total_none += 1
    print('\n\n{}/{} ({}%) chapters of {} contain no time_phrases\n\n'.format(total_none, total_chs, 
                                                                       percent(total_none, total_chs),
                                                                        corp
                                                                      ))
    
give_tp_chs(corpus)

CHAPTERS WITHOUT TIME_PHRASES in Psalmi:

3, 4, 8, 11, 14, 24, 26, 28, 29, 36, 39, 40, 43, 46, 47, 51, 53, 54, 57, 58, 60, 63, 64, 65, 67, 70, 76, 84, 87, 94, 97, 98, 99, 100, 107, 108, 114, 116, 117, 118, 120, 122, 123, 124, 126, 127, 129, 130, 133, 135, 136, 137, 139, 141, 142, 144, 147, 149, 150, 

59/150 (39.33%) chapters of Psalmi contain no time_phrases




This highlights the necessity of identifying other kinds of tense/aspect indicators. Also, what if a text contains no  identifiable external time/aspect indicators? Then what role does TA play if any? Does the verb contribute to TA in this case, or are the functions purely a-temporal and discourse-oriented?<br><br>
How does this compare with a narrative text like Genesis?

In [122]:
give_tp_chs('Genesis')

CHAPTERS WITHOUT TIME_PHRASES in Genesis:

36, 

1/50 (2.0%) chapters of Genesis contain no time_phrases




A significant difference! Data-wise, the Psalms poetry contains far fewer explicit time_phrases (leaving aside for the time being the size diff. between corpora). What is Gen. 36?

In [136]:
# print Hebrew text for Genesis 36
gen_36 = T.nodeFromSection(('Genesis', 36)) # grab the chapter node
for verse in L.d(gen_36, otype = 'verse'):  # loop through verse nodes with a look down into chapter
    words = [w for w in L.d(verse, 'word')] # gather word nodes
    print(F.verse.v(verse), T.text(words))  # feed to T.text and print with vs. numbers

1 וְאֵ֛לֶּה תֹּלְדֹ֥ות עֵשָׂ֖ו ה֥וּא אֱדֹֽום׃ 
2 עֵשָׂ֛ו לָקַ֥ח אֶת־נָשָׁ֖יו מִבְּנֹ֣ות כְּנָ֑עַן אֶת־עָדָ֗ה בַּת־אֵילֹון֙ הַֽחִתִּ֔י וְאֶת־אָהֳלִֽיבָמָה֙ בַּת־עֲנָ֔ה בַּת־צִבְעֹ֖ון הַֽחִוִּֽי׃ 
3 וְאֶת־בָּשְׂמַ֥ת בַּת־יִשְׁמָעֵ֖אל אֲחֹ֥ות נְבָיֹֽות׃ 
4 וַתֵּ֧לֶד עָדָ֛ה לְעֵשָׂ֖ו אֶת־אֱלִיפָ֑ז וּבָ֣שְׂמַ֔ת יָלְדָ֖ה אֶת־רְעוּאֵֽל׃ 
5 וְאָהֳלִֽיבָמָה֙ יָֽלְדָ֔ה אֶת־יְע֥וּשׁ וְאֶת־יַעְלָ֖ם וְאֶת־קֹ֑רַח אֵ֚לֶּה בְּנֵ֣י עֵשָׂ֔ו אֲשֶׁ֥ר יֻלְּדוּ־לֹ֖ו בְּאֶ֥רֶץ כְּנָֽעַן׃ 
6 וַיִּקַּ֣ח עֵשָׂ֡ו אֶת־נָ֠שָׁיו וְאֶת־בָּנָ֣יו וְאֶת־בְּנֹתָיו֮ וְאֶת־כָּל־נַפְשֹׁ֣ות בֵּיתֹו֒ וְאֶת־מִקְנֵ֣הוּ וְאֶת־כָּל־בְּהֶמְתֹּ֗ו וְאֵת֙ כָּל־קִנְיָנֹ֔ו אֲשֶׁ֥ר רָכַ֖שׁ בְּאֶ֣רֶץ כְּנָ֑עַן וַיֵּ֣לֶךְ אֶל־אֶ֔רֶץ מִפְּנֵ֖י יַעֲקֹ֥ב אָחִֽיו׃ 
7 כִּֽי־הָיָ֧ה רְכוּשָׁ֛ם רָ֖ב מִשֶּׁ֣בֶת יַחְדָּ֑ו וְלֹ֨א יָֽכְלָ֜ה אֶ֤רֶץ מְגֽוּרֵיהֶם֙ לָשֵׂ֣את אֹתָ֔ם מִפְּנֵ֖י מִקְנֵיהֶֽם׃ 
8 וַיֵּ֤שֶׁב עֵשָׂו֙ בְּהַ֣ר שֵׂעִ֔יר עֵשָׂ֖ו ה֥וּא אֱדֹֽום׃ 
9 וְאֵ֛לֶּה תֹּלְדֹ֥ות עֵשָׂ֖ו אֲבִ֣י אֱדֹ֑ום בְּהַ֖ר שֵׂעִֽיר׃ 
10 אֵ֖לֶּה שְׁמֹ֣ות בְּנֵ

So we have a geneaology text without time_phrase. Makes sense!

In [137]:
# explore the different kinds of adverbs, what parameters can distinguish time?
# explore the different kinds of prep phrases, what parameters can specify

pdp_adverb = set()
prep_temp = set()