<div style="text-align:center; font-size: 120%">
<h1>Time Spans</h1>

<table>
<tr>
<td>
<img src="https://camo.githubusercontent.com/b6d477b661f86325a7701d8102ceb4d7ff51e29a/68747470733a2f2f7261772e6769746875622e636f6d2f45544342432f746578742d6661627269632f6d61737465722f646f63732f74662e706e67" 
style="width:250px; height:150px;"
>
</td>
<td>
<img src="https://camo.githubusercontent.com/f5e0eae3d290577fe2e5722031f7306a9817e8cd/68747470733a2f2f7261772e6769746875622e636f6d2f45544342432f6c61662d6661627269632d6e62732f6d61737465722f696d616765732f56552d45544342432d736d616c6c2e706e67"
style="width:315px; height:150;"
>
</td>
</tr>
</table>
<p style='clear:both'>In this project, we use the [Text-Fabric](https://github.com/ETCBC/text-fabric) Python package combined with the Biblical Hebrew data from the [Eep Talstra Centre for Bible and Computer](http://www.wi.th.vu.nl) to create an advanced visualization of the use of time phrases in Biblical Hebrew.</span>
</p>
</div>

In [76]:
import collections as collect
from IPython.display import display, HTML
import os, glob

## Load Text-Fabric Data

**<span style="color:red">Before moving on to this step,</span>** <br>
please run [Text_Fabric_Tutorial.ipynb]() to set up the package and data on your system.<br>
The api information contained in it is also crucial for understanding the code in this notebook.

In [116]:
from tf.fabric import Fabric

TF_data_dir = '/Users/Cody/Desktop/text-fabric-data/' # specify your TF data directory

text_fabric = Fabric(locations=TF_data_dir,      # instantiate processor
                     modules='Hebrew/etcbc4c')   # module path in TF data dir

This is Text-Fabric 2.3.0
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
108 features found and 0 ignored


In [117]:
# load the features
tf = text_fabric.load('''
                      book chapter verse 
                      function pdp vt
                      lex lex_utf8 g_word_utf8
                      mother tab
                    ''')

  0.00s loading features ...
   |     0.01s B book                 from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.00s B chapter              from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.01s B verse                from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.19s B g_word_utf8          from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.19s B lex_utf8             from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.08s B function             from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.15s B pdp                  from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.13s B vt                   from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.14s B lex                  from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.39s B mother               from /Users/Cody/Desktop/text-fabric-data//Hebrew/etcbc4c
   |     0.

In [119]:
# globalize the TF objects:
# cf. explanation in Text_Fabric_Tutorial.ipynb

tf.makeAvailableIn(globals()) 

Linguistic data is also available:

In [None]:
def GetLabel(clause_node):
    '''
    convert a Text-Fabric node number into a label
    return a label of the format: book.chapter.verse.clause
    '''
    book_node = L.u(clause_node, otype='book')[0]
    chapter_node = L.u(clause_node, otype='chapter')[0]
    verse_node = L.u(clause_node, otype='verse')[0]
    verse_clauses = L.d(verse_node, otype='clause_atom')
    clause_num = verse_clauses.index(clause_node) + 1
    clause_label = '{}.{}.{}.{}'.format(F.book.v(book_node), 
                                        F.chapter.v(chapter_node), 
                                        F.verse.v(verse_node), 
                                        clause_num)
    return clause_label
    

In [17]:
def mapMotherToDaugher(clause_nodes):
     '''map each mother clause to its daughter clauses
        in Text-Fabric that information is stored the other way around
        so we build up that data with a defaultdict with a list value'''
    motherToDaughters = collect.defaultdict(list)
    for clause in clauseNodes:
        if F.tab.v(clause) == 0: # do not store parallel daughter clauses
            continue
        daughter = getLabel(clause)
        for mother in E.mother.f(clause):
            motherToDaughters[mother].append(daughter)
    
    
def getEtcbcData(corpus):
    '''
    Gather the needed data from the Text-Fabric module;
    label clauses with a simple reference tag (ex. Psalms.1.1.1)
    return an embedded dictionary keyed by data labels...;
    '''
    # the corpus argument is a string; 
    # find the node number with a feature value that matches the corpus string
    # there will be only 1 result, so we use 'next' to pull the first result from the generator 
    corpus = next(book for book in F.otype.s('book') if F.book.v(book) == corpus)
    
    # now we pull all the clause node numbers from the corpus:
    clauseNodes = L.d(corpus, otype='clause_atom')
    
    # we will store all of the clause data in this dict:
    clauses = collect.OrderedDict()
    
    # iterate over the clause node numbers and gather the data to be returned
    # the data is stored in the clauses ordered dictionary
    for clause in clauseNodes:
        clauseLabel = getLabel(clause)
        wordNodes = L.d(clause, otype='word')
        text = T.text(wordNodes)
        indentation = F.tab.v(clause)
        phraseNodes = L.d(clause, otype='phrase')
        timePhrases = tuple(phrase for phrase in phraseNodes if F.function.v(phrase) == 'Time')
        timePhraseText = tuple(T.text(L.d(phrase, otype='word')) for phrase in timePhrases)
        daughters = motherToDaughters.get(clause, [])
        mother = getLabel(E.mother.f(clause)[0]) if E.mother.f(clause) else None
        clauses[clauseLabel] = {
                                'text': text,
                                'timePhraseText': timePhraseText,
                                'daughters': daughters,
                                'mother': mother,
                                'timeCategories':[],
                                'indentation':indentation,
                                'etcbcClauseNode': clause #!! REMOVE LATER
                                }
    return clauses

## Process Time Spans

In [5]:
def climbClauseTree(clause, clauseDict, span, coverage):
    '''
    recursively climb the clause tree and build the time spans
    if a time marker occurs inside the tree, break the span 
    span is a list that accrues as the code descends the tree
    coverage is a set that tracks which clauses are accounted for
    '''    
    for daughter in clauseDict[clause]['daughters']:
        timeMarkers = clauseDict[daughter]['timePhraseText']
        if timeMarkers:
            continue
        else:
            span.append(daughter)
            coverage.add(daughter)
            climbClauseTree(daughter, clauseDict, span, coverage)

def buildTimeSpans(clauseDict):
    '''
    loop through all clauses and call the climbClauseTree function
    each timespan is stored in timeSpans dict, keyed by its first clause
    coverage tracks which clauses are accounted for to avoid overlap
    '''
    timeSpans = collect.OrderedDict()
    coverage = set()
    for clause, cData in clauseDict.items():
        if clause not in coverage and cData['timePhraseText']:
            span = []
            span.append(clause)
            climbClauseTree(clause, clauseDict, span, coverage)
            timeSpans[clause] = span
    return timeSpans

## Write HTML Visualization

In [6]:
from HTMLWriter import writeHTML

corpus = 'Genesis'
test = getEtcbcData(corpus)

In [7]:
spans = buildTimeSpans(test)

In [8]:
title = 'Time Spans in {corpus}'.format(corpus=corpus)

timeSpanDoc = writeHTML(test, spans, title)
with open('test.html', 'w') as file:
    file.write(timeSpanDoc)

In [10]:
class test:
    def __init__(self, string):
        self.string = string

In [13]:
frog = test('ribbit')

In [15]:
frog.string

'ribbit'

In [18]:
bok = next(b for b in F.otype.s('book'))

In [20]:
L.u(bok, otype='book')

()