<p style="text-align:center;font-size:30px;font-weight:bold">Clause Syntax in the Song of Songs:<br><br> A Preliminary Study</p>
<br>
<br>
# Song of Songs Monad Search
<strong>Purpose of this notebook:</strong>
<br>
This search takes data stored in csv files from monad_input.ipynb and compares the cola boundaries with syntactic boundaries stored in the ETCBC4b database. The syntactic structures that are found are written to another csv file with "." separated abbreviations. If no syntactic structures are found, the cola is stored as "N" (none). These cola are further examined in "stichometry_analyses.ipynb" to see what kind of syntactic structures are found in the "None" cola.
<br>
<br>
The example provided here is from BHQ chapter 1. It takes in a file (BHQchapter_1.csv) and returns a file with the syntax data stored in it (BHQchapter_1_coded.csv). The example file has been kept in this directory for reference. But the other chapters are stored in the BHQ/ROB directories. These searches were written in the early stages of my work, so I did them chapter by chapter and copied them into a full csv file later. If I were to write this again, I would do all 8 chapters at once.

In [1]:
import sys
import collections 
import laf
from laf.fabric import LafFabric
from etcbc.preprocess import prepare

fabric = LafFabric(verbose='NORMAL')

  0.00s This is LAF-Fabric 4.5.21
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



In [2]:
fabric.load('etcbc4b', '--', 'monad_search',
{
    "xmlids" : {"node": False, "edge" : False},
    "features" : ("oid otype monads chapter verse book g_word_utf8 trailer_utf8",""),
    "prepare" : prepare
}
           )
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.07s USING main  DATA COMPILED AT: 2015-11-02T15-08-56
  6.84s LOGFILE=/Users/Cody/laf-fabric-output/etcbc4b/monad_search/__log__monad_search.txt
  6.84s INFO: LOADING PREPARED data: please wait ... 
  6.84s prep prep: G.node_sort
  7.23s prep prep: G.node_sort_inv
  8.31s prep prep: L.node_up
    13s prep prep: L.node_down
    20s prep prep: V.verses
    20s prep prep: V.books_la
    20s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
    24s INFO: LOADED PREPARED data
    24s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK monad_search AT 2016-05-13T14-48-25


In [3]:
#indexes all nodes in the Song of Songs to facilitate faster searches
#SongofSongs nodes are stored in "nodes" list

corpus = ["Canticum"]

cur_book = None
nodes = []

for n in NN():
    if cur_book in corpus:
        nodes.append(n)
    
    if F.otype.v(n) == 'book':
        cur_book = F.book.v(n)
        
msg('{} nodes appended'.format(len(nodes)))

    59s 6020 nodes appended


In [9]:
filename = 'BHQchapter_1'


import csv
sticho_dict = collections.OrderedDict([])
monad_list = []
type_codes = {"word":"W", "phrase_atom":"Pa", "clause_atom":"Ca", "sentence_atom":"Sa", "subphrase":"Sp", 
              "phrase":"P","clause":"C","sentence":"S","half_verse":"Hv", "verse": "V"}

with open('{}.csv'.format(filename), 'r') as csvfile2:
    readit = csv.reader(csvfile2)
    for line in readit:
        sticho_dict[int(line[0])] = [int(line[1]), line[2], line[3], line[4]]

for key in sticho_dict:
    monad_list.append([key,sticho_dict[key][0]])
    

sysct = 0
for pair in monad_list:
    lookup = '{}-{}'.format(pair[0],pair[1])
    monads = tuple(int(x) for x in lookup.split('-'))
    monad_list = [str(x) for x in range(monads[0], monads[1] +1)]
    
    #looks up every word between m_1 and m_2
    monad_set = set(monad_list)
    
    lookedup = []
    words = {}
    

    for node in nodes:
        mon = F.monads.v(node)
        if mon == lookup:
            lookedup.append(node)
        if mon in monad_set:
            words[mon] = node
    
    otype_code = ""
    
    if lookedup != []:
        dotct = len(lookedup)
        for n in lookedup:
            otype = F.otype.v(n)
            if otype in type_codes:
                dotct -= 1
                if dotct > 0:
                    otype_code += type_codes[otype]+"."
                else:
                    otype_code += type_codes[otype]
    
    else:
        otype_code += "N"
    
    sticho_dict[pair[0]].append(otype_code)
    sysct += 1
    msg("( {}/{} )  ({})".format(sysct,len(sticho_dict),otype_code))

 4m 51s ( 1/48 )  (V.Hv.S.Sa)
 4m 51s ( 2/48 )  (S.Sa.C.Ca)
 4m 51s ( 3/48 )  (S.Sa.C.Ca)
 4m 51s ( 4/48 )  (S.Sa.C.Ca)
 4m 51s ( 5/48 )  (S.Sa)
 4m 51s ( 6/48 )  (Hv.S.Sa.C.Ca)
 4m 51s ( 7/48 )  (Hv)
 4m 51s ( 8/48 )  (S.Sa.C.Ca)
 4m 51s ( 9/48 )  (N)
 4m 51s ( 10/48 )  (S.Sa.C.Ca)
 4m 51s ( 11/48 )  (S.Sa.C.Ca)
 4m 51s ( 12/48 )  (N)
 4m 51s ( 13/48 )  (C.Ca.P.Pa)
 4m 51s ( 14/48 )  (Pa)
 4m 51s ( 15/48 )  (Pa)
 4m 51s ( 16/48 )  (N)
 4m 51s ( 17/48 )  (C.Ca)
 4m 51s ( 18/48 )  (S.Sa.C.Ca)
 4m 51s ( 19/48 )  (S.Sa)
 4m 51s ( 20/48 )  (S.Sa)
 4m 51s ( 21/48 )  (N)
 4m 51s ( 22/48 )  (C.Ca)
 4m 51s ( 23/48 )  (C.Ca)
 4m 51s ( 24/48 )  (N)
 4m 51s ( 25/48 )  (P.Pa)
 4m 51s ( 26/48 )  (C.Ca)
 4m 51s ( 27/48 )  (C.Ca)
 4m 51s ( 28/48 )  (S.Sa.C.Ca)
 4m 51s ( 29/48 )  (N)
 4m 51s ( 30/48 )  (P.Pa)
 4m 51s ( 31/48 )  (P)
 4m 51s ( 32/48 )  (N)
 4m 51s ( 33/48 )  (C.Ca)
 4m 51s ( 34/48 )  (C.Ca)
 4m 51s ( 35/48 )  (N)
 4m 51s ( 36/48 )  (P.Pa)
 4m 51s ( 37/48 )  (C.Ca)
 4m 51s ( 38/48 )  (C.

In [10]:
with open('{}_coded.csv'.format(filename), 'w') as csvfile:
    writer = csv.writer(csvfile)
    sysct = 0
    for key in sticho_dict:
        writer.writerow([sticho_dict[key][1]]+[key]+[sticho_dict[key][0]]+[sticho_dict[key][2]]+[sticho_dict[key][3]]+[sticho_dict[key][4]])
msg("job finished! file saved.")

 4m 56s job finished! file saved.


In [12]:
#Some examples of cola that carry no formal syntactic correspondence

total = 0
totalN = 0
types = []
accents = []
for key in sticho_dict:
    total += 1
    if sticho_dict[key][4] == "N":
        types.append(sticho_dict[key][2])
        accents.append(sticho_dict[key][3])
        totalN += 1

print ("total stichoi = " + (str(total)))
print ("total none = " + (str(totalN)))
print (str(round(float(totalN / total)*100,2)) + "%")
print (types)

total stichoi = 48
total none = 11
22.92%
['ab', 'ab', 'ab', 'abc', 'ab', 'abc', 'ab', 'ab', 'ab', 'ab', 'ab']
