# Stichometry Versus Syntax in the Song of Songs

This notebook compares the stichometric boundaries of the Song of Songs as found in BHQ and Roberts 2001 dissertation, "Let Me See Your Form" against the clause boundaries found in ETCBC4b.

Stichometry data was gathered during my previous master's thesis, "Clause Syntax in the Song of Songs: A Preliminary Study." (Southeastern Baptist Theological Seminary, 2016). 

### To Do / Ideas
* Export results and data to a spreadsheet so that non-programmers can still validate my findings for themselves.

In [131]:
import csv
import collections

### Load Legacy Data

We load the ETCBC4b dataset which conforms with the cola slot numbers gathered during my 2016 thesis. 

In [20]:
# initialize Text-Fabric and load the ETCBC4b data

from tf.fabric import Fabric

TF = Fabric(locations='/Users/Cody/github/text-fabric-data-legacy',
            modules='hebrew/etcbc4b')

api = TF.load('''
                book
                oslots
              ''')

api.makeAvailableIn(globals())

This is Text-Fabric 2.3.6
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask shebanq@ancient-data.org for an invite to Slack
111 features found and 0 ignored
  0.00s loading features ...
   |     1.35s B oslots               from /Users/Cody/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.02s B book                 from /Users/Cody/github/text-fabric-data-legacy/hebrew/etcbc4b
   |     0.00s Feature overview: 105 for nodes; 5 for edges; 1 configs; 7 computed
    10s All features loaded/computed - for details use loadLog()


### Preliminary Processing


In [94]:
# Load all the clause nodes in the Song of Songs

# get Song's book node
song_node = next(book for book in F.book.s('Canticum'))

In [103]:
# TEMPORARY? WORKAROUND FOR OSLOTS

def get_oslots(object_type, corpus):
    '''
    gathers the oslot ranges for a given otype & corpus
    returns a set of strings that are the oslot ranges
    requires a string otype and a book node integer.
    '''

    oslots = set()

    objects = L.d(corpus, otype=object_type)
    
    # add slot ranges to set as string
    for obj in objects:
        slots = L.d(obj, otype='word')
        oslots.add('{},{}'.format(slots[0], slots[-1]))
        
    return oslots

### Gather all of the slot ranges for linguistic objects in the Song of Songs. 

These will be used to compare with cola slot ranges. Some ranges may agree with the clause_atom, others with the sentence, others with phrases, etc.

In [106]:
# do not gather slot ranges for these objects
omit_objects = {'book', 'chapter', 'word'}

# get all otypes in database save ommited
all_otypes = tuple(otype for otype in F.otype.all
                      if otype not in omit_objects)


# map otypes to a set of their oslot ranges
oslots_by_otype = collections.OrderedDict()

for otype in all_otypes:
    oslots = get_oslots(otype, song_node)
    
    # add to dict
    oslots_by_otype[otype] = oslots
    
# count up the oslots ranges!
oslot_range_count = len(tuple(oslots for otype in oslots_by_otype
                                for oslots in oslots_by_otype[otype]))
# show results
print(oslot_range_count, 'oslot ranges loaded')
print('objects loaded: ', ', '.join(oslots_by_otype.keys()))

4329 oslot ranges loaded
objects loaded:  verse, half_verse, sentence, sentence_atom, clause, clause_atom, phrase, phrase_atom, subphrase


### Gather all the oslot ranges for the cola in the BHQ and Roberts sources

In [107]:
# draw in the stichometry data for both Roberts and BHQ

sticho_sources = {'bhq':'BhqStichometry.csv', 
                  'roberts':'RobertsStichometry.csv'}

# sources as keys; slot ranges as values stored as set
sticho_slots = {}

# open each source file and get slot ranges
for source, file in sticho_sources.items():
    with open(file) as infile:
        reader = csv.reader(infile)
        
        # get a set of all slot ranges 
        slots = set('{},{}'.format(line[1], line[2])
                        for line in reader)
        
        # save to dict keyed by source name
        sticho_slots[source] = slots
            
# print length of sources 
for source, slots in sticho_slots.items():
    print(len(slots), 'slots loaded for', source)

421 slots loaded for roberts
401 slots loaded for bhq


## Gather Statistics

We compare the sources at the same time against the clause boundaries in the corpus.

In [126]:
# dict of dicts with sources as keys, match dicts as values
# match dicts contain otype keys with slot sets as values
matched_objects = collections.defaultdict(dict)

# find matches and store to matched_objects
for source, source_slots in sticho_slots.items():
    
    # find matches and save as sets
    for otype, otype_slots in oslots_by_otype.items():
        matches = source_slots & otype_slots
        matched_objects[source][otype] = matches
    
    # identify and store slot ranges with no matches
    all_matches = set(match for otype in matched_objects[source]
                        for match in matched_objects[source][otype])
    matched_objects[source]['no_match'] = source_slots - all_matches
    
    # save all slots for percentage calcs
    matched_objects[source]['all_cola'] = source_slots

#### present basic results

In [130]:
for source, match_data in matched_objects.items():

    # count the matches 
    match_counts = tuple((otype, len(matches))
                             for otype, matches in match_data.items()
                        )
    
    # order the matches 
    match_data = sorted(match_counts, 
                        key = lambda s: s[-1], 
                        reverse=True)
    
    # print results
    print('**', source, '**')
    for otype, matches in match_data:
        print(otype, ' -- ', matches)
        
    print('\n')

** roberts **
all_cola  --  421
clause_atom  --  250
clause  --  248
sentence  --  139
sentence_atom  --  139
no_match  --  87
phrase_atom  --  56
phrase  --  51
half_verse  --  35
subphrase  --  1
verse  --  1


** bhq **
all_cola  --  401
clause  --  233
clause_atom  --  230
sentence  --  140
sentence_atom  --  140
no_match  --  82
phrase_atom  --  50
half_verse  --  42
phrase  --  42
verse  --  1
subphrase  --  0


