<p style="text-align:center;font-size:30px;font-weight:bold">Clause Syntax in the Song of Songs:<br><br> A Preliminary Study</p>
<br>
<br>
# Song of Songs Clause Statistics
<strong>Purpose of this search:</strong>
<br>
<br>
This search produces clause statistics on mainline clauses for the Song of Songs. The clause use statistics are calculated both for the book as a whole and chapter by chapter. Additional statistics are calculated such as standard deviation for different clause types, as well as the coefficient of variation, to measure clause use consistency.
<br>
<br>
For the purpose of this study, "mainline" clauses are those which are not adverbial or adjectival. Clauses such as vocatives, ellipses, casus pendens, and defective are excluded. Also, clauses stored as "AjCl" in ETCBC are considered nominal clauses for the purpose here.
<br>
<br>
In this notebook, statistics are printed to the console. However, simple modifications to the search could write results to a CSV, txt file, or HTML display.

In [1]:
#imports the modules needed for the search

import sys
import collections

#imports the LAF-Fabric modules

from laf.fabric import LafFabric
from etcbc.preprocess import prepare
fabric = LafFabric(verbose = '')

  0.00s This is LAF-Fabric 4.5.21
API reference: http://laf-fabric.readthedocs.org/en/latest/texts/API-reference.html
Feature doc: https://shebanq.ancient-data.org/static/docs/featuredoc/texts/welcome.html



In [2]:
#loads the data into the processor; "features" refers to types of data stored in the ETCBC database. 
#The various features can be accessed at 
# https://shebanq.ancient-data.org/shebanq/static/docs/featuredoc/features/comments/0_overview.html

API = fabric.load('etcbc4b','--','clause_stats',

                  {'primary': False,
                   'xmlids':{'node':False,'edge':False},
                   'features':('book chapter verse otype code typ txt monads',''),
                   'prepare': prepare
        
                    }
)
exec(fabric.localnames.format(var='fabric'))

  0.00s LOADING API: please wait ... 
  0.03s USING main  DATA COMPILED AT: 2015-11-02T15-08-56
  4.00s LOGFILE=/Users/Cody/laf-fabric-output/etcbc4b/clause_stats/__log__clause_stats.txt
  4.00s INFO: LOADING PREPARED data: please wait ... 
  4.00s prep prep: G.node_sort
  4.20s prep prep: G.node_sort_inv
  4.73s prep prep: L.node_up
  8.36s prep prep: L.node_down
    15s prep prep: V.verses
    15s prep prep: V.books_la
    15s ETCBC reference: http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
    17s INFO: LOADED PREPARED data
    17s INFO: DATA LOADED FROM SOURCE etcbc4b AND ANNOX lexicon FOR TASK clause_stats AT 2016-04-13T14-13-41


<strong>Functions:</strong>

In [14]:
#retrieves book, chapter, or verse information for a given node
#uses the "look up" function provided by the ETCBC package of modules
#see http://laf-fabric.readthedocs.org/en/latest/texts/ETCBC-reference.html
def get_ref(ref_type,node):
    if ref_type == 'verse':
        verse = L.u('verse',node)
        return F.verse.v(verse)
    elif ref_type == 'chapter':
        chapter = L.u('chapter',node)
        return F.chapter.v(chapter)
    elif ref_type == 'book':
        book = L.u('book',node)
        return F.book.v(book)
    
#returns a percentage between a total and a frequency
def get_percent(total,freq):
    return str(round((float(freq / total)*100),2))+'%'

#receives a clause_atom node, checks its code and returns only mainline clause types
#for this search, mainline is defined as asyndetic, parallel, coordinate with ו or או and direct speech (999)
def code_filter(node):
    
    code = int(F.code.v(node))
    
    #asyndetic

    if 100 <= code <= 167:
        return True
    
    #parallel
    elif 200 <= code <= 201:
        return True
    
    #asyndetic with conj.
    elif 300 <= code <= 367:
        return True
    
    #syndetic
    elif 400 <= code <= 487:
        return True
    
    #first cl in direct speech
    elif code == 999:
        return True
    
#provides a simplified identifier for clause types
def simple_type(typ):
    
    simple_type = None
    
    simplified = {'way' : ["Way0", "WayX"],
                  'nmcl' : ["NmCl","AjCl"],
                  'ptc' : ["Ptcp"],
                  'qtl' : ["WQt0", "WQtX", "WxQ0", "WXQt", "WxQX", "xQt0", "XQtl", "xQtX", "ZQt0", "ZQtX"],
                  'yqtl' : ["WxY0", "WXYq", "WxYX", "WYq0", "WYqX", "xYq0", "XYqt", "xYqX", "ZYq0", "ZYqX"],
                  'impv' : ["WIm0","WImX", "WxI0", "WXIm", "WxIX", "xIm0", "XImp", "xImX", "ZIm0", "ZImX"],
                 }

    for key in simplified:
        if typ in simplified[key]:
            simple_type = key
            
    return simple_type

<strong>Data retrieval:</strong>

In [15]:
#indexes all nodes in the Song of Songs to facilitate faster searches
#SongofSongs nodes are stored in "nodes" list

corpus = ["Canticum"]

cur_book = None
nodes = []

for n in NN():
    if cur_book in corpus:
        nodes.append(n)
    
    if F.otype.v(n) == 'book':
        cur_book = F.book.v(n)
        
msg('{} nodes appended'.format(len(nodes)))

 5m 16s 6020 nodes appended


In [16]:
#retrieves the last relevant clause atom node in the Song of Songs
#this node will provide a reference point for the chapter by chapter search below 
#when the loop hits this node, the code will assemble the data for the final chapter (since there is no chapter marker
#to trigger the final compilation)

#last_node is reset for each clause atom until it loops through all nodes in Song of Song
#this leaves the last node as the variable last_node
last_node = None

for n in nodes:
    otype = F.otype.v(n)
    if otype == 'clause_atom' and code_filter(n) == True and simple_type(F.typ.v(n)) != None:
        last_node = n

In [17]:
#current chapter
cur_chapter = None

#if chapter_check does not equal cur_chapter, it initiates the compilation of occurrences for the chapter 
#resets to cur_chapter after compilation
chapter_check = None


ch_clauses = []
occurrences = collections.OrderedDict([('total',0)])
results = collections.OrderedDict([])

for n in nodes:
    
    #assigns otype feature to a variable
    otype = F.otype.v(n)
    
    if otype == 'chapter':
        
        #sets current chapter
        cur_chapter = F.chapter.v(n)
        
        #chapter_check is called for each clause atom; sets to current chapter if loop has just started 
        if chapter_check == None:
            chapter_check = F.chapter.v(n)
    
    if otype == 'clause_atom':
        
        #checks to see if clause atom is a mainline clause
        if code_filter(n) == True and simple_type(F.typ.v(n)) != None:
            
            #converts clause code to simplified form (i.e. qtl, yqtl, etc.)
            clause = simple_type(F.typ.v(n))
            
            #has the chapter changed or the book ended? if same chapter, append clause to ch_clauses list
            #ch_clauses acts as a staging area until the chapter changes 
            if get_ref('chapter',n) == chapter_check and n != last_node:
                ch_clauses.append(clause)
            
            #if chapter is different or the node is the last node
            else:
                
                #checks to see if this is the last node; appends clause if true
                if n == last_node:
                    ch_clauses.append(clause)
        
                #assembles frequency data for the chapter
                for clause in ch_clauses:
                    occurrences['total'] += 1
                    if clause in occurrences:
                        occurrences[clause] += 1
                    else:
                        occurrences[clause] = 1
                        
                #orders occurrences from greatest to least
                occurrences = collections.OrderedDict(sorted(occurrences.items(),key=lambda t: t[1],reverse = True))
                
                #adds any missing clause types into occurrences with '0'
                types = ['nmcl','qtl','yqtl','impv','ptc','way']
                for typ in types:
                    if typ not in occurrences:
                        occurrences[typ] = 0
                
                #drops clause occurrences for the chapter into results
                for key in occurrences:
                    if chapter_check in results:
                        results[chapter_check].append([key,occurrences[key]])
                    else:
                        results[chapter_check] = [[key,occurrences[key]]]
                
                #resets variables for next chapter
                chapter_check = cur_chapter
                ch_clauses = []
                occurrences = collections.OrderedDict([('total',0)])
                
                #logs data for current chapter
                ch_clauses.append(clause)
                
msg(str(len(results))+' chapters logged')

 5m 17s 8 chapters logged


<strong>Now the clause data for each chapter is assembled. Time to present the results:</strong>

In [18]:
#assembles occurrences for the whole book
book_occ = collections.OrderedDict([('total',0)])
for chapter in results:
    for typ in results[chapter]:
        if typ[0] == 'total':
            book_occ['total'] += typ[1]
        else:
            if typ[0] in book_occ:
                book_occ[typ[0]] += typ[1]
            else:
                book_occ[typ[0]] = typ[1]
                
#orders occurrences from highest to lowest
book_occ = collections.OrderedDict(sorted(book_occ.items(),key=lambda t: t[1],reverse = True))

#prints results for whole book
print ('Song of Songs')
print ('-'*10)
counter = 0
for typ in book_occ:
    total = book_occ['total']
    if typ != 'total':
        counter += 1
        print ('{} {:>5}   {:>5}   {:>5}'.format(counter,typ, book_occ[typ],get_percent(total,book_occ[typ])))
print ('\n')

#prints results by chapter
for chapter in results:
    counter = -1
    
    ch_stats = collections.OrderedDict()
    print ('chapter ' + chapter)
    print ('-'*10)
    total = 0
    for typ in results[chapter]:
        if typ[0] == 'total':
            total = typ[1]
    for typ in results[chapter]:
        counter += 1
        if typ[0] != 'total':
            print ('{} {:>5}   {:>5}   {:>5}'.format(counter,typ[0], typ[1],get_percent(total,typ[1])))
        
    print ('\n')
    
#of course, results might also be written to a CSV or txt file or displayed with HTML

Song of Songs
----------
1  nmcl     120   36.59%
2   qtl      83   25.3%
3  yqtl      58   17.68%
4  impv      34   10.37%
5   ptc      31   9.45%
6   way       2   0.61%


chapter 1
----------
1  nmcl      14   36.84%
2  yqtl      11   28.95%
3   qtl       8   21.05%
4  impv       4   10.53%
5   ptc       1   2.63%
6   way       0    0.0%


chapter 2
----------
1   qtl      14   29.79%
2  nmcl      12   25.53%
3  impv      11   23.4%
4   ptc       8   17.02%
5  yqtl       2   4.26%
6   way       0    0.0%


chapter 3
----------
1   qtl       9   34.62%
2  nmcl       7   26.92%
3  yqtl       5   19.23%
4  impv       3   11.54%
5   ptc       2   7.69%
6   way       0    0.0%


chapter 4
----------
1  nmcl      25   58.14%
2  yqtl       7   16.28%
3   qtl       5   11.63%
4  impv       4    9.3%
5   ptc       2   4.65%
6   way       0    0.0%


chapter 5
----------
1   qtl      21   35.0%
2  nmcl      21   35.0%
3   ptc      10   16.67%
4  yqtl       4   6.67%
5  impv       4   6.67%
6 

# Standard Deviation

In [19]:
import math

#redefines get_percent so that it does not round, convert to string, and add a % sign
def get_percent2(total1,total2):
    return (float(total2 / total)*100)

In [20]:
#pairs clause types (i.e., qtl, yqtl, etc.) with their chapter by chapter percentages so that the mean 
#can be calculated for the standard deviation

#type_percents contains clause types as keys
#each clause type key has a list as a value with chapter by chapter percentages
#ex: 'nmcl': [36.84,25.53,26.92,58.14,35.00,63.33,29.27,23.26]
#thus each value contains 8 percentages for all 8 chapters in the Song
type_percents = collections.OrderedDict()

#calculates and appends chapter by chapter percentages
for chapter in results:
    total = 0
    
    #retrieves total for chapter so that percent can be calculated
    for typ in results[chapter]:
        if typ[0] == 'total': 
            total = typ[1]
            
    #retrieves total for clause types in the chapter and converts to percentage
    for typ in results[chapter]:
        if typ[0] in type_percents:
            type_percents[typ[0]].append(get_percent2(total,typ[1]))
        elif typ[0] != 'total':
            type_percents[typ[0]] = [get_percent2(total,typ[1])]

In [21]:
#calculates the mean for each clause type in type_percents:

#type_mean pairs clause types with the mean percentage in the Song
type_mean = collections.OrderedDict()

for typ in type_percents:
    total = 0.0
    for percent in type_percents[typ]:
        total += percent
    mean = total / 8.0 #(number of chapters)
    type_mean[typ] = mean

In [22]:
#calculates variance for each clause type

type_variance = collections.OrderedDict()

for typ in type_percents:
    pre_variance = 0.0
    for percent in type_percents[typ]:
        pre_variance += (percent-type_mean[typ])**2
    
    variance = pre_variance/8.0 #(number of chapters)
    
    type_variance[typ] = variance

<strong> Now ready to print standard deviation:</strong>

In [23]:
#calculates and prints standard deviation for each clause type

type_stnd_dev = collections.OrderedDict()

for typ in type_variance:
    type_stnd_dev[typ] = math.sqrt(type_variance[typ])

print('Standard Deviations:')
for typ in type_stnd_dev:
    print ("{0:<5}{1:>5}".format(typ,round(type_stnd_dev[typ],2)))

Standard Deviations:
nmcl 14.25
yqtl 12.24
qtl   7.33
impv  5.54
ptc   5.89
way    2.2


<strong>Standard deviation allows for the coefficient of variation to be printed next:</strong>

In [24]:
print('Coefficients of Variation:')
for typ in type_stnd_dev:
    percent = float(book_occ[typ])/float(book_occ['total'])
    coeff_var = round(type_stnd_dev[typ]/percent,2)
    print ("{0:<5}{1:>5}".format(typ,coeff_var))

Coefficients of Variation:
nmcl 38.96
yqtl 69.21
qtl  28.95
impv 53.47
ptc  62.31
way  361.59


# Clause Type Inventory
<br>
While the previous searches simplified clause type by the main verb, what kind of specific clause types are in the Song of Songs? This search inventories all clause types with their four-letter clause consitutent codes to answer this question.

In [99]:
clause_inventory = collections.OrderedDict()
total = 0

#We include the adjective clause types from ETCBC with the nominal clause
#Every clause type is passed through this filter, returns normal type code if not AjCl
def adj_nmcl(typ):
    if typ == "AjCl":
        return "NmCl"
    else:
        return typ

for n in nodes:
    otype = F.otype.v(n)
    
    if otype == 'clause_atom' and code_filter(n) == True and simple_type(F.typ.v(n)) != None:
        total += 1
        typ = adj_nmcl(F.typ.v(n))
        if typ in clause_inventory:
            clause_inventory[typ] += 1
        else:
            clause_inventory[typ] = 1

clause_inventory = collections.OrderedDict(sorted(clause_inventory.items(),key=lambda t: t[1],reverse = True))            

msg(str(total)+' clauses logged')

for typ in clause_inventory:
    print (typ, clause_inventory[typ],get_percent(total,clause_inventory[typ]))

 4h 00m 12s 328 clauses logged


NmCl 120 36.59%
Ptcp 32 9.76%
ZQt0 27 8.23%
ZIm0 26 7.93%
ZYq0 22 6.71%
xYq0 16 4.88%
XQtl 14 4.27%
ZQtX 12 3.66%
xQt0 10 3.05%
WIm0 7 2.13%
xQtX 7 2.13%
WYq0 5 1.52%
WXQt 5 1.52%
WxQ0 5 1.52%
WxY0 4 1.22%
WXYq 3 0.91%
WQt0 2 0.61%
WQtX 2 0.61%
xYqX 2 0.61%
ZYqX 2 0.61%
Way0 2 0.61%
XYqt 2 0.61%
WYqX 1 0.3%


In [100]:
#calculate the mean of absolute difference

statistics = []
differences = []

count = 0
compare = None

for clause in clause_inventory:
    statistics.append(get_percent2(total,clause_inventory[clause]))

for number in statistics:
    count += 1
    if count % 2 == 0:
        differences.append(compare-number)
    else:
        compare = number

print(sum(differences)/len(differences))

2.8547671840354765


<strong>How do these statistics compare with Esther, a narrative book?</strong>

In [101]:
esther_clause_inventory = collections.OrderedDict()
esther_total = 0

#iterates through all nodes stored in ETCBC to retrieve Esther clause data
for n in NN():
    if get_ref('book',n) == "Esther":
        otype = F.otype.v(n)
        if otype == 'clause_atom' and code_filter(n) == True and simple_type(F.typ.v(n)) != None:
            esther_total += 1
            typ = adj_nmcl(F.typ.v(n))
            if typ in esther_clause_inventory:
                esther_clause_inventory[typ] += 1
            else:
                esther_clause_inventory[typ] = 1

esther_clause_inventory = collections.OrderedDict(sorted(esther_clause_inventory.items(),key=lambda t: t[1],reverse = True)) 

msg(str(esther_total)+" clauses logged.")

for typ in esther_clause_inventory:
    print (typ, esther_clause_inventory[typ],get_percent(esther_total,esther_clause_inventory[typ]))

 4h 00m 19s 387 clauses logged.


WayX 89 23.0%
Way0 66 17.05%
NmCl 56 14.47%
Ptcp 43 11.11%
WXQt 19 4.91%
WxQ0 14 3.62%
WYq0 10 2.58%
WxY0 10 2.58%
WQt0 10 2.58%
XQtl 7 1.81%
xQtX 7 1.81%
ZYq0 7 1.81%
WXYq 6 1.55%
xYq0 6 1.55%
ZIm0 6 1.55%
ZYqX 5 1.29%
xQt0 5 1.29%
WQtX 4 1.03%
WIm0 4 1.03%
WxYX 2 0.52%
WxQX 2 0.52%
xYqX 2 0.52%
XYqt 2 0.52%
ZQt0 2 0.52%
WYqX 1 0.26%
WxI0 1 0.26%
WXIm 1 0.26%


In [102]:
#calculate the mean of absolute difference

est_statistics = []
est_differences = []

est_count = 0
est_compare = None

for clause in esther_clause_inventory:
    est_statistics.append(get_percent2(esther_total,esther_clause_inventory[clause]))

for number in est_statistics:
    est_count += 1
    if est_count % 2 == 0:
        est_differences.append(est_compare-number)
    else:
        est_compare = number

print(sum(est_differences)/len(est_differences))

1.125703564727955
