<div style="text-align:center; font-size: 120%">
<h1>Time Spans</h1>

<table>
<tr>
<td>
<img src="images/tf.png" 
style="width:250px; height:150px;"
>
</td>
<td>
<img src="images/vuEtcbc.png"
style="width:315px; height:150;"
>
</td>
</tr>
</table>
</div>

In this project, we use the [Text-Fabric](https://github.com/ETCBC/text-fabric) Python package combined with the Biblical Hebrew data from the [Eep Talstra Centre for Bible and Computer](http://www.wi.th.vu.nl) to create an advanced visualization of the use of time phrases in Biblical Hebrew.

In [1]:
from IPython.display import display, HTML
import os, glob, collections

## Load Text-Fabric Data

**<span style="color:red">Before moving on to this step,</span>** <br>
please run [Text_Fabric_Tutorial.ipynb](Text_Fabric_Tutorial.ipynb) to set up the package and data on your system.<br>
The api information contained in it is also crucial for understanding the code in this notebook.

In [2]:
from tf.fabric import Fabric

TF_data_dir = '/Users/Cody/github/text-fabric-data/' # specify your TF data directory

text_fabric = Fabric(locations=TF_data_dir,      # instantiate processor
                     modules='Hebrew/etcbc4c')   # module path in TF data dir

This is Text-Fabric 2.3.0
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
109 features found and 0 ignored


In [3]:
# load the features
tf = text_fabric.load('''
                      book chapter verse 
                      function pdp vt
                      lex lex_utf8 g_word_utf8
                      mother tab
                    ''')

  0.00s loading features ...
   |     0.01s B book                 from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.01s B chapter              from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.01s B verse                from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.21s B g_word_utf8          from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.20s B lex_utf8             from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.14s B function             from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.14s B pdp                  from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.12s B vt                   from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.13s B lex                  from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.20s B mother               from /Users/Cody/github/text-fabric-data//Hebrew/etcbc4c
   |     0.03s B tab 

In [4]:
# globalize the TF objects:
# cf. explanation in Text_Fabric_Tutorial.ipynb

tf.makeAvailableIn(globals()) 

## Gather ETCBC Data

**!! Add short API reference !!**

**L.d/u returns a tuple of contained or containing objects (up/down)
F.[feature].v(node) returns a string of the specified node feature
see Text_Fabric_Tutorial.ipynb for api details**

In [5]:
def getCorpusClauses(corpus):
    '''
    Requires a corpus string.
    Returns all the clause nodes for that corpus.
    '''
    corpus = next(book for book in F.otype.s('book') 
                  if F.book.v(book) == corpus)

    corpus_clauses = L.d(corpus, otype='clause_atom') # all clauses in corpus
    
    return corpus_clauses

In [6]:
def mapDaughterToMother(corpus_clauses):
    '''
     In the ETCBC, mother/daughter clauses specify 
     discourse analysis categories between clauses.
     Mother clauses give rise to daughter clauses based on
     calculated similarities, conjunctions, etc.

     In TF, these relations are only mapped from daughter to mother.
     We need to reverse that mapping to build a complete clause tree.

     We iterate through all clause nodes and 
     build the mapping into mother_daughters, a defaultdict.
     Mother clause nodes serve as keys with list values;
     in the list, the daughters are progressively appended.
    '''

    mother_daughters = collections.defaultdict(list)

    for clause in corpus_clauses:
        if F.tab.v(clause) == 0: # skip parallel clauses
            continue

        daughter_node = clause
        for mother_node in E.mother.f(daughter_node): # edge feature; tuple
            mother_daughters[mother_node].append(daughter_node)

    return mother_daughters

## Build Time Spans

In [7]:
def checkTime(clause_node):
    '''
    Check to see if a given clause has a time indicator.
    Return True or False.
    
    The ETCBC stores time phrases as a phrase function feature.
    However, other markers of time are not as obvious.
    
    This function also checks for less obvious markers.
    '''
    # ETCBC time-phrase functions
    phrases = L.d(clause_node, otype='phrase') # phrases in clause
    phrase_functions = set(F.function.v(phrase) for phrase in phrases) # phr functions
    
    # substantives with a time sense
    # "when" preposition (ב + infinitive)
    
    if any(['Time' in phrase_functions]):
        return True
    else:
        return False

In [21]:
def climbClauseTree(root_clause, mother_daughters, span, coverage):
    '''
    With mother/daughter clause relations, we have 
    something like a syntax tree, but on a text level.
    This function takes a clause with an initial time phrase and
    recursively iterates through its descendants to gather them.
    
    span is a list; coverage is a set.
    
    We add descendants to the span if they don't have 
    an intervening time indicator. They also go into
    the coverage set, so that we don't double-cover 
    them in a later loop.
    '''    
    for daughter in mother_daughters[root_clause]:
        
        # skip if intervening time marker
        if checkTime(daughter):
            break

        # add to span and coverage
        span.append(daughter)
        coverage.add(daughter)
        
        # move down the tree with recursive call
        climbClauseTree(daughter, mother_daughters, span, coverage)

In [22]:
def buildTimeSpans(corpus_clauses):
    '''
    Gather the time spans by calling climbClauseTree while
    looping over all clauses in the corpus.
    
    Return time spans as list with  !!! FIX !!!
    root clause as key and span list as value 
    
    Skip clauses already in the coverage set.
    Save the time spans in an OrderedDict, keyed by the root clause node
    '''
    mother_daughters = mapDaughterToMother(corpus_clauses)
    time_spans = list()
    coverage = set()
    
    # add timeless spans here:
    timeless_span = list() 
    
    for clause in corpus_clauses:
        
        # build spans for clauses not yet visited that have time marker
        if clause not in coverage and checkTime(clause):
            
            # save/reset timeless_span
            if timeless_span:
                time_spans.append({'time':False, 'clauses':timeless_span})
                timeless_span = list()
            
            # calculate timespan
            span = []
            span.append(clause) # add root clause
            climbClauseTree(clause, mother_daughters, span, coverage) # build the rest
            time_spans.append({'time':True, 'clauses':span})
            
        elif clause not in coverage and not checkTime(clause):
            # add to timeless span
            timeless_span.append(clause)
            
    return time_spans

In [23]:
corpus = 'Genesis'
corpus_clauses = getCorpusClauses(corpus)
time_spans = buildTimeSpans(corpus_clauses)

## Create HTML Visualization

In [24]:
def getClauseLabel(clause_node):
    '''
    Convert a Text-Fabric node integer into a clean label.
    return a label: "book.chapter.verse.NthClause"
    '''
    # find book, chapter, verse nodes containing given clause
    book_node = L.u(clause_node, otype='book')[0]   
    chapter_node = L.u(clause_node, otype='chapter')[0] 
    verse_node = L.u(clause_node, otype='verse')[0]  
    
    # convert section nodes to their string representations
    book = F.book.v(book_node)
    chapter = F.chapter.v(chapter_node)
    verse = F.verse.v(verse_node)
    
    # find which clause number this clause node is
    verse_clauses = L.d(verse_node, otype='clause_atom') # all verse's clauses
    clause_num = verse_clauses.index(clause_node) + 1    # N'th clause; +1 for aesthetics
    
    # format into label and return string
    clause_label = '{book}.{chap}.{ver}.{clause}'.format(book=book,
                                                         chap=chapter,
                                                         ver=verse,
                                                         clause=clause_num)
    return clause_label

In [27]:
import itertools

def writeHTMLDoc(all_spans, title):
    '''
    Write HTML visualization.
    '''
    
    # open html templates: 
    with open('HTMLTemplates/doc.txt') as doc:
        doc_template = doc.read()
    with open('HTMLTemplates/data.txt') as data:
        data_template = data.read().replace('\n','').replace('\t','')

    # html chars/formats 
    indent = '&nbsp;&nbsp;&nbsp;&nbsp;'
    colors = itertools.cycle(('#addfff','#a3e2a1'))

    # compile html code to this str:
    html_body = ''

    for span in all_spans:
        
        color = next(colors) if span['time'] else '' # assign shading
        
        for clause in span['clauses']:
            label = indent + getClauseLabel(clause)
            indentation = F.tab.v(clause) * indent # etcbc clause relation
            
            # assemble plain-text representation:
            words = L.d(clause, otype='word')
            text = T.text(words) + indentation
            
            # fill template; add to html code
            html_body += data_template.format(color=color,
                                                 text=text,
                                                 label=label)
    
    # complete html code         
    html_doc = doc_template.format(data=html_body,
                                   title=title)
    return html_doc

In [28]:
title = 'Time Spans in {corpus}'.format(corpus=corpus)

time_span_visualization = writeHTMLDoc(time_spans, title)

with open('{}_timespans.html'.format(corpus), 'w') as htmlfile:
    htmlfile.write(time_span_visualization)

** [See Time Span Visualization](Genesis_timespans.html) **