# לעולם Time Spans

The time marker לעולם, at the clause level, appears alongside the *yiqtol* verb in of all cases 60% (46/76). The next highest is the *qatal* form with 12% (9/76) followed by the *weqetal* form with 7% (5/76). Infinitive constructs are also well represented at 8%, but these are probably not primary verb forms. 


## Research Question on לעולם Time Spans

Do the tendencies at the clause level of the time indicator לעולם also appear within daughter clauses of a לעולם-marked clause? What kind of statistics appear within discourse structures modified with this time indicator?

The question can be asked both for clauses with the time marker and a *yiqtol* + daughter clauses, but also for clauses that fall outside of the tendency. In other words, do clauses with a qatal + לעולם reflect their own kind of verbal tendencies, or is the verbal tense more affected by the mother clause tense?

In [69]:
# load modules
import pickle, collections
import pandas as pd
from pprint import pprint
from IPython.display import display, HTML

# load TF module and data
from tf.fabric import Fabric
TF = Fabric(modules='hebrew/etcbc4c', silent=True)
api = TF.load('''book chapter verse
                 pdp vt lex typ function tab
                 mother
              ''')

api.makeAvailableIn(globals())

  0.00s loading features ...
   |     0.01s B book                 from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.00s B chapter              from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.01s B verse                from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.13s B pdp                  from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.12s B vt                   from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.18s B lex                  from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.34s B typ                  from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.08s B function             from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.03s B tab                  from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     1.51s B mother               from /Users/Cody/github/text-fabric-data/hebrew/etcbc4c
   |     0.00s Feature overview

In [22]:
# import time markers data
tm_data_file = 'data/time_markers.pickle'

# load data
with open(tm_data_file, 'rb') as infile:
    tm_data = pickle.load(infile)

print('data available: ', ', '.join(tm_data.keys()))

data available:  markers, top_markers, stats_rows, preposition_cl_lists


In [23]:
# assign the data
markers = tm_data['markers']
top_markers = tm_data['top_markers']
stats_rows = tm_data['stats_rows']

In [24]:
print('data per time marker:')
markers['L <WLM'].keys()

data per time marker:


dict_keys(['count', 'clauses', 'tense_cl_lists', 'tense_counts', 'tense_percents', 'example_phrase'])

In [25]:
# import custom function for weqetal detection
from functions.verbs import is_weqt

## Gather Clause Chains

The following functions are gathered and modified from a previous notebook on time spans. 

In [35]:
def getTime(clause_node):
    '''
    Return the phrase nodes of a time marker if there is one.
    
    The ETCBC stores time phrases as a phrase function feature.
    However, other markers of time are not as obvious.
    This function also checks for less obvious markers.
    '''
    
    # ETCBC time-phrase functions
    time_markers = list(phrase for phrase in L.d(clause_node, otype='phrase')
                    if F.function.v(phrase) == 'Time')
    
    # TO DO: add...
    # substantives with a time sense
    # "when" preposition (ב + infinitive)
    
    return time_markers

def climbClauseTree(root_clause, span, coverage):
    '''
    Modify a given span list and coverage set recursively
    by climbing a tree (mother_daughters), beginning with root clause.
    
    With mother/daughter clause relations, we have 
    something like a syntax tree, but on a text level.
    This function takes a clause with an initial time phrase and
    recursively iterates through its descendants to gather them.
    
    We add descendants to the span if they don't have 
    an intervening time indicator. They also go into
    the coverage set, so that we don't double-cover 
    them in a later loop.
    '''    
    for daughter in E.mother.t(root_clause):
        
        # stop if intervening time marker
        if getTime(daughter):
            break

        # add to span and coverage
        span.append(daughter)
        coverage.add(daughter)
        
        # move down the tree with recursive call
        climbClauseTree(daughter, span, coverage)
        
def getSpanData(clause_list):
    '''
    Returns a dictionary containing data on a supplied span.
    A span is a list of clause nodes
    and a time marker in the first clause (root).
    Timeless spans do not begin with a time marker.
    '''
    
    # count tenses
    tenses = collections.Counter()
    
    for clause in clause_list:
        
        verb = [word for word in L.d(clause, otype='word')
                   if F.pdp.v(word) == 'verb'
               ]
        
        # skip verbless clauses (for now!)
        if not verb:
            continue
            
        else:
            verb = verb[0]
            
        tense = F.vt.v(verb)
        
        tenses[tense] += 1
        tenses['total'] += 1
    
    span_data = {'span_clause_atoms': clause_list,
                 'span_tense_count' : tenses}
    
    return span_data
        
def buildTimeSpans(corpus_clauses):
    '''
    Gather the time spans by calling climbClauseTree while
    looping over all clauses in the corpus.
    
    Return time spans as list of dictionaries;
    the dictionaries contain keys of data for each span
    
    Skip clauses already in the coverage set.
    Save the time spans in an OrderedDict, keyed by the root clause node
    '''
    time_spans = list()
    coverage = set()
    
    # add timeless spans here:
    timeless_span = list() 
    
    for clause in corpus_clauses:
        
        # build spans for clauses not yet visited that have time marker
        if clause not in coverage and getTime(clause):
            
            # save/reset timeless_span
            if timeless_span:
                time_spans.append(getSpanData(timeless_span))
                timeless_span = list()
            
            # calculate timespan
            span = []
            span.append(clause) # add root clause
            climbClauseTree(clause, span, coverage) # build the rest
            time_spans.append(getSpanData(span))
            
        elif clause not in coverage and not getTime(clause):
            # add to timeless span
            timeless_span.append(clause)
    
    return time_spans

In [36]:
# get L<WLM clause_atoms with yiqtol verb
l_olam_clauses = [L.d(clause, otype='clause_atom')[0] 
                      for clause in markers['L <WLM']['tense_cl_lists']['impf']]

In [46]:
# get time spans for L<WLM; take only spans with more than a single clause atom
l_olam_spans = [span for span in buildTimeSpans(l_olam_clauses)
                    if len(span['span_clause_atoms']) > 1]

print(len(l_olam_spans), 'spans for L<WLM found')
print()
print('sample: ')
pprint(l_olam_spans[0])

17 spans for L<WLM found

sample: 
{'span_clause_atoms': [539035, 539036, 539037, 539038],
 'span_tense_count': Counter({'impf': 3, 'total': 3})}


## Gather Verb Statistics within Time Spans

In [49]:
l_olam_span_verb_counts = collections.Counter()

for span in l_olam_spans:
    
    tense_data = span['span_tense_count']
    
    l_olam_span_verb_counts.update(tense_data)

In [55]:
sorted_lolam_counts = sorted([(count, tense) for tense, count in l_olam_span_verb_counts.items()],
                                reverse=True)

for count, tense in sorted_lolam_counts:
    
    percent = round((count/l_olam_span_verb_counts['total'])*100, 1)
    
    print(tense, f'{percent}%', f'({count})')
    print()

total 100.0% (141)

perf 39.7% (56)

impf 35.5% (50)

impv 10.6% (15)

infc 6.4% (9)

ptca 4.3% (6)

ptcp 2.1% (3)

wayq 0.7% (1)

infa 0.7% (1)



In [64]:
# get the span with the wayyiqtol and examine it

lolam_span_with_wayyiqtol = [span['span_clause_atoms'] for span in l_olam_spans
                                if 'wayq' in span['span_tense_count']][0]

lolam_span_with_wayyiqtol

[594103, 594104, 594105, 594106, 594107, 594111]

In [104]:
def display_span(clause_atom_list):
    html_span = '<span style="font-size: 14pt; font-family: Times New Roman">{content}</span>'
    
    html_div =\
    '''
    <div style="font-size: 15pt; 
                font-family: Times New Roman; 
                direction: rtl; 
                color:{color};
                width: 80%">

                {content} 
    </div>
    '''

    book, chapter, verse = T.sectionFromNode(clause_atom_list[0])

    # show header
    display(HTML(html_span.format(content=f'{book} {chapter}')))
    print()

    for ca in clause_atom_list:

        book, chapter, verse = T.sectionFromNode(ca)
        
        # look for time markers
        time_markers = [phrase for phrase in L.d(ca, otype='phrase')
                           if F.function.v(phrase) == 'Time'
                       ]

        text = T.text(L.d(ca, otype='word'))
        indent = '...' * F.tab.v(ca)
        typ = F.typ.v(ca)
        text_indented = f'{chapter}:{verse}' +\
                        '&nbsp;&nbsp;&nbsp;&nbsp;' +\
                        str(F.tab.v(ca)) +\
                        '&nbsp;&nbsp;' +\
                        typ +\
                        '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;' +\
                        indent +\
                        text

        # format color
        cur_clause = L.u(ca, otype='clause')[0]
        color = 'blue' if time_markers else ''

        display(HTML(html_div.format(color=color, content=text_indented)))
    print()

In [105]:
display_span(lolam_span_with_wayyiqtol)







In [106]:
# display all time spans

for span in l_olam_spans:
    
    span_clauses = span['span_clause_atoms']
    
    display_span(span_clauses)







































































































In [109]:
# export pickled timespans

with open('l_olam_wayyiqtol.pickle', 'wb') as outfile:
    
    pickle.dump(lolam_span_with_wayyiqtol, outfile)