<p style="text-align:center;font-size:30px;font-weight:bold">Clause Syntax in the Song of Songs:<br><br> A Preliminary Study</p>
<br>
<br>
# Song of Songs Translation Processing
<strong>Purpose of this notebook:</strong>
<br>
<br>
Since English translations need to be manually compared with the Hebrew text, this code creates files that facilitate faster data input for the user. The code uses NLTK (Natural Language Tool Kit), a Python module, to analyze the English translations and print a clean format to a text file. The code takes in the plaintext translation (ex: NAS > plaintext. An example of the result can be viewed in the translation folder (NAS, for example, under the format NAS_ch1_coded.txt).  
<br>
<br>
The text below provides a brief example of the result from the NAS chapter 1:
<br>
<br>

--------------------------------------------------
2
"may he kiss me with the kisses of his mouth
     for your love is better than wine.

may        kiss       is         
mod        pres       pres       

יִשָּׁקֵ֨נִי֙        

ZYq0       AjCl       


--------------------------------------------------
<br>
<br>
The notebook shows the result for the NAS. To process each translation, "filename" in cell 5 is modified to reflect the three-letter translation code. 

In [3]:
import re
import nltk
import collections
from laf.fabric import LafFabric
from etcbc.preprocess import prepare
from etcbc.lib import Transcription
fabric = LafFabric(verbose='NORMAL')

  0.00s This is LAF-Fabric 4.5.21
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



In [4]:
API = fabric.load('etcbc4b', 'lexicon', 'monad_boundaries_input',
{
    "xmlids" : {"node": False, "edge" : False},
    "features" : ("otype monads g_word_utf8 typ sp label chapter verse book",""),
    "prepare" : prepare
}
           )
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.00s USING main  DATA COMPILED AT: 2015-11-02T15-08-56
  0.00s USING annox DATA COMPILED AT: 2016-01-27T19-01-17
  4.53s LOGFILE=/Users/Cody/laf-fabric-output/etcbc4b/monad_boundaries_input/__log__monad_boundaries_input.txt
  4.53s INFO: LOADING PREPARED data: please wait ... 
  4.53s prep prep: G.node_sort
  4.66s prep prep: G.node_sort_inv
  5.45s prep prep: L.node_up
  9.42s prep prep: L.node_down
    17s prep prep: V.verses
    17s prep prep: V.books_la
    17s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
    20s INFO: LOADED PREPARED data
    20s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK monad_boundaries_input AT 2016-05-11T16-37-50


In [5]:
#all nodes in the Song of Songs are logged for faster searching and retrieval (nodes).

nodes = []
for n in NN():
    if F.otype.v(n) == 'book':
        cur_book = F.book.v(n)
    if cur_book == 'Canticum' and F.otype.v(n) != 'book':
        nodes.append(n)
        
msg('Complete! {} appended'.format(len(nodes)))

    13s Complete! 6019 appended


In [8]:
#The loop moves through all 8 chapters of the Song by opening the necessary file and producing a modified version. 
#There are some necessary adjustments for the versification differences 
#between the Hebrew and English texts. This problem created an interesting dillemma.
#As a work-around, chapters 6 and 7 are dealth with differently.

ct = 1

while ct <= 8:
    tar_chapter = '{}'.format(str(ct))
    filename = 'NAS_ch{}'.format(str(ct))

    cur_book = None

    eng_ch1 = []

    with open('{}.txt'.format(filename), 'r') as txtfile:
        for line in txtfile:
            eng_ch1.append(line.strip().lower())
        
    new_ch = []
    sen_vs = collections.OrderedDict([])

    for verse in eng_ch1:
        new_ch.append(re.split(';\s|\?\s|\!\s|\.\s',verse))

    for verse in new_ch:
        if verse[0][1] != ' ':
            vrs = verse[0][0]+verse[0][1]
        else:
            vrs = verse[0][0]
        sen_vs[vrs] = []
        for sentence in verse:
            if sentence == verse[0]:
                sen_vs[vrs].append(sentence[2:])
            else: 
                sen_vs[vrs].append(sentence)
            

    POS_index = {'VB' : 'pres','VBD': 'past','VBG':'PSTptc','VBN': 'PTptc','VBP': 'pres',
                'VBZ': 'pres','MD':'mod'}
    eng_verbs = collections.OrderedDict([])

    for verse in sen_vs:
    
        for sentence in sen_vs[verse]:
            tokens = nltk.word_tokenize(sentence)
            tagged = nltk.pos_tag(tokens)
            for item in tagged:
                if re.match('VB?|MD',item[1]):
                    if verse not in eng_verbs:
                        eng_verbs[verse] = []
                        eng_verbs[verse].append([item[0],POS_index[item[1]]])
                    else:
                        eng_verbs[verse].append([item[0],POS_index[item[1]]])
    
        if verse not in eng_verbs:
            eng_verbs[verse] = ''
        
    cur_chapter = None
    cur_verse = None

    cl_types = collections.OrderedDict([])

    #adjustment for chapter 7 versification difference
    adjustment = 1 if tar_chapter == '7' else 0

    for n in nodes:
        otype = F.otype.v(n)
        if otype == 'chapter':
            cur_chapter = F.chapter.v(n)
        if otype == 'clause' and cur_chapter == tar_chapter:
            if str(int(F.verse.v(L.u('verse',n)))-adjustment) not in cl_types:
                cl_types[str(int(F.verse.v(L.u('verse',n)))-adjustment)] = [F.typ.v(n)]
            elif str(int(F.verse.v(L.u('verse',n)))-adjustment) in cl_types:
                cl_types[str(int(F.verse.v(L.u('verse',n)))-adjustment)].append(F.typ.v(n))

    #to adjust for eng / heb versification difference
    if tar_chapter == '6':
        for n in nodes:
            otype = F.otype.v(n)
        
            if otype == 'chapter':
                cur_chapter = F.chapter.v(n)
            
            if otype == 'clause' and cur_chapter == '7' and F.verse.v(L.u('verse',n)) == '1':
                if '13' not in cl_types:
                    cl_types['13'] = [F.typ.v(n)]
                elif '13' in cl_types:
                    cl_types['13'].append(F.typ.v(n))
                
                
    verbs = collections.OrderedDict([])

    for n in nodes:
        otype = F.otype.v(n)
        if otype == 'chapter':
            cur_chapter = F.chapter.v(n)
        if (otype == 'word') and (cur_chapter == tar_chapter) and (F.sp.v(n)=='verb'):
            if str(int(F.verse.v(L.u('verse',n)))-adjustment) not in verbs:
                verbs[str(int(F.verse.v(L.u('verse',n)))-adjustment)] = [ F.g_word_utf8.v(n) ]
            elif str(int(F.verse.v(L.u('verse',n)))-adjustment) in verbs:
                verbs[str(int(F.verse.v(L.u('verse',n)))-adjustment)].append(F.g_word_utf8.v(n))
            

    #to adjust for eng / heb versification difference
    if tar_chapter == '6':
        for n in nodes:
            otype = F.otype.v(n)
        
            if otype == 'chapter':
                cur_chapter = F.chapter.v(n)
            
            if otype == 'word' and cur_chapter == '7' and F.sp.v(n) == 'verb' and F.verse.v(L.u('verse',n)) == '1':
                if '13' not in verbs:
                    verbs['13'] = [F.g_word_utf8.v(n)]
                elif '13' in verbs:
                    verbs['13'].append(F.g_word_utf8.v(n))
                
    with open('{}_coded.txt'.format(filename), 'w') as txtfile:
        for verse in sen_vs:
            txtfile.write(verse+'\n')
            indent = 0
            for sentence in sen_vs[verse]:
                txtfile.write((' '*indent)+sentence + '\n')
                indent += 5
            txtfile.write('\n')
            string1 = ''
            string2 = ''
            string3 = ''
            string4 = ''
        
            for sentence in eng_verbs[verse]:
                string1 += '{:11}'.format(sentence[0])
                string3 += '{:11}'.format(sentence[1])
    
            if verse in verbs:
                new_order = []
                for word in verbs[verse]:
                    new_order.append(word)
                new_order.reverse()
                for word in new_order:
                    string2 += '{:20}'.format(word)
                
            for typ in cl_types[verse]:
                string4 += '{:11}'.format(typ)
        
            txtfile.write(string1+'\n')
            txtfile.write(string3+'\n\n')
            txtfile.write(string2+'\n\n')
            txtfile.write(string4+'\n\n')
            txtfile.write('\n')
        
        
            txtfile.write('-'*50+'\n\n')
        ct += 1
        msg('{} written'.format(filename))

 5m 56s NAS_ch1 written
 5m 56s NAS_ch2 written
 5m 56s NAS_ch3 written
 5m 57s NAS_ch4 written
 5m 57s NAS_ch5 written
 5m 57s NAS_ch6 written
 5m 57s NAS_ch7 written
 5m 57s NAS_ch8 written
