# Classification of EBH and LBH clauses with LSTM

With this script a model is trained on clauses from EBH and LBH books. With this model one can classify clauses as either EBH or LBH. This is done on clauses from Jonah, Ruth and the prose tale of Job.

The model uses a so called Long Short Term Memory network (or LSTM network), which is capable of finding complex patterns in sequence data.

In this script, the model is trained on all EBH and LBH books, with the exception of one book. This is the validation book, on which the model is validated. All the books are the validation book, one after the other. For each validation book, the model is trained 200 times. The results may vary, due to sampling variation and variation in the initialization of the weights. The final result is the average of 200 runs of the model.

Finally, the clauses from Jonah, the prose tale of Job, and Ruth are classified as EBH or LBH, to find out whether the language of these texts is more similar to EBH or LBH.

It is possible to analyze data on phrase level or word level. In the phrase level analysis, the clause is represented as a sequence of phrase functions. In the word level analysis, the clause is represented as a sequence of parts of speech.

Distinction is made between narrative (N) and quoted speech (Q).

Choose what you want to analyze in the following cell.

In [1]:
# level should be 'phrase_level' or 'word_level'

level = 'phrase_level'

#txt_type should be 'Q' or 'N'

txt_type = 'N'

Import some relevant libraries

In [2]:
import sys, os, csv, collections
import numpy as np
import pandas as pd
from pprint import pprint

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K

from sklearn.model_selection import train_test_split

import dill as pickle

Start TF!

In [3]:
from tf.app import use
A = use('bhsa', hoist=globals())

	connecting to online GitHub repo annotation/app-bhsa ... connected
Using TF-app in C:\Users\geitb/text-fabric-data/annotation/app-bhsa/code:
	rv1.2=#5fdf1778d51d938bfe80b37b415e36618e50190c (latest release)
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in C:\Users\geitb/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6=#bac4a9f5a2bbdede96ba6caea45e762fe88f88c5 (latest release)
	connecting to online GitHub repo etcbc/phono ... connected
Using data in C:\Users\geitb/text-fabric-data/etcbc/phono/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in C:\Users\geitb/text-fabric-data/etcbc/parallels/tf/c:
	r1.2 (latest release)
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used


The EBH and LBH subcorpora are defined.

In [4]:
ebh = ['Genesis', 'Exodus', 'Leviticus', 'Numbers', 'Deuteronomy', 'Joshua', 'Judges', 'Samuel', 'Kings']
lbh = ['Esther', 'Daniel', 'Ezra', 'Nehemiah', 'Chronicles']

Phrase functions are counted in the Hebrew Bible in N and Q clauses and stored in the dictionary func_count.

In [5]:
def count_elems(level, txt_type):
    
    elem_count = collections.defaultdict(int)
    
    for cl in F.otype.s('clause'):
        if F.txt.v(cl)[-1] != txt_type:
            continue
            
        words = L.d(cl, 'word')
        
        # select Hebrew data
        if F.language.v(words[0]) != 'Hebrew':
            continue
    
        if level == 'phrase_level':            
    
            phrases = L.d(cl, 'phrase')
        
            # a list of phrase functions is collected.
            funcs = [F.function.v(ph) for ph in phrases]

            for fun in funcs:
                elem_count[fun] += 1
                
        elif level == 'word_level':
            
            # a list of pos are collected.
            poss = [F.sp.v(wo) for wo in words]
            
            for pos in poss:
                elem_count[pos] += 1
            
    return(elem_count)    

In [6]:
def make_language_list(level, cl):
    
    if level == 'phrase_level':
        
        phrases = L.d(cl, 'phrase')
        elements = [F.function.v(ph) for ph in phrases] 
        
    
    elif level == 'word_level':
        
        words = L.d(cl, 'word')
        elements = [F.sp.v(wo) for wo in words]
        
    return elements
    
    

Data are collected. The resulting dataset is split in train and validation set.
N and Q clauses are collected separately.

In [7]:
def get_data_from_tf(val_book, level, txt_type):
    """
    The argument of the function is val_book. This is the book on which no data are trained, 
    but functions as the test set.
    
    The function returns four objects:
    
    cl_list_ebh_q, contains the nodes of clauses in EBH books
    cl_list_lbh_q, contains the nodes of clauses in LBH books
    
    targets_dict, assigns the output value to each clause: 0 for EBH clauses, 1 for LBH clauses.
    phr_funcs_dict, assigns a list of phrase functions to a clause node
    
    """

    cl_list_ebh = []
    cl_list_lbh = []
    
    phr_funcs_dict = {}
    targets_dict = {}

    for cl in F.otype.s('clause'):
        if F.txt.v(cl)[-1] != txt_type:
            continue
            
        words = L.d(cl, 'word')
        
        # only use Hebrew clauses
        if F.language.v(words[0]) != 'Hebrew':
            continue
        
        # do not use val_book
        if T.bookName(cl).split('_')[-1] == val_book:
            continue
        
        # process EBH, first remove poetic parts
        # 1_Samuel and 2_Samuel are treated as one book, which is achieved in the following line 
        # (same for Kings and Chronicles)
        if T.bookName(cl).split('_')[-1] in ebh:
            bo, ch, ve = T.sectionFromNode(cl)
            if bo == 'Genesis' and ch == 49 and 1 < ve < 28:
                continue
            elif bo == 'Exodus' and ch == 15 and ve < 19:
                continue
            elif bo == 'Numbers' and ch in {23,24}:
                continue
            elif bo == 'Deuteronomy' and ch in {32,33}:
                continue
            elif bo == 'Judges' and ch == 5:
                continue
            elif bo == '1_Samuel' and ch == 2 and ve < 11:
                continue
            elif bo == '2_Samuel' and ch == 1 and ve > 18:
                continue
            elif bo == '2_Samuel' and ch == 22:
                continue
            elif bo == '2_Samuel' and ch == 23 and ve < 8:
                continue
            
            # make list of phrase functions
            
            lang_list = make_language_list(level, cl)
            

            phr_funcs_dict[cl] = lang_list
            targets_dict[cl] = 0
  
            cl_list_ebh.append(cl)

        # process LBH the same way as EBH
        elif T.bookName(cl).split('_')[-1] in lbh:    
            bo,ch,ve = T.sectionFromNode(cl)
            if bo == 'Daniel' and ch == 2 and 19 < ve < 24:
                continue
            if bo == 'Daniel' and ch == 8 and 22 < ve < 27:
                continue  
            if bo == 'Daniel' and ch == 12 and ve < 4:
                continue
            if bo == 'Nehemiah' and ch == 9 and 5 < ve < 38:
                continue    
            if bo == '1_Chronicles' and ch == 16 and 7 < ve < 37:
                continue
            
            lang_list = make_language_list(level, cl)

            targets_dict[cl] = 1
            phr_funcs_dict[cl] = lang_list

            cl_list_lbh.append(cl)
                
    return cl_list_ebh, cl_list_lbh, targets_dict, phr_funcs_dict

The data of the test-book are prepared.

In [8]:
def prepare_val_book(val_book, level, txt_type):
    """
    The function returns two objects:
    
    cl_lists, a list containing all nodes of clauses in the test book
    funcs_dicts, a dict in which a list of functions is assigned to a clause node
    """
    
    cl_list_book = []
    book_funcs_dict = {}

    # iterate over all the clauses
    for cl in F.otype.s('clause'):

        if F.txt.v(cl)[-1] != txt_type:
             continue
                
        words = L.d(cl, 'word')
        if F.language.v(words[0]) != 'Hebrew':
            continue

        bo,ch,ve = T.sectionFromNode(cl)
        if bo == 'Genesis' and ch == 49 and 1 < ve < 28:
            continue
        elif bo == 'Exodus' and ch == 15 and ve < 19:
            continue
        elif bo == 'Numbers' and ch in {23,24}:
            continue
        elif bo == 'Deuteronomy' and ch in {32,33}:
            continue
        elif bo == 'Judges' and ch == 5:
            continue
        elif bo == '1_Samuel' and ch == 2 and ve < 11:
            continue
        elif bo == '2_Samuel' and ch == 1 and ve > 18:
            continue
        elif bo == '2_Samuel' and ch == 22:
            continue
        elif bo == '2_Samuel' and ch == 23 and ve < 8:
            continue
        if bo == 'Daniel' and ch == 2 and 19 < ve < 24:
            continue
        if bo == 'Daniel' and ch == 8 and 22 < ve < 27:
            continue  
        if bo == 'Daniel' and ch == 12 and ve < 4:
            continue
        if bo == 'Nehemiah' and ch == 9 and 5 < ve < 38:
            continue    
        if bo == '1_Chronicles' and ch == 16 and 7 < ve < 37:
            continue
            
        # select clauses from the val_book
        if bo.split('_')[-1] == val_book:
            
            lang_list = make_language_list(level, cl)
                        
            cl_list_book.append(cl)
            book_funcs_dict[cl] = lang_list              
        
    cl_lists = {val_book : cl_list_book}
    
    funcs_dicts = {val_book:book_funcs_dict}
    
    return cl_lists, funcs_dicts

Now prepare data of the texts of uncertain date: Jonah, the prose tale of Job and Ruth.

In [9]:
 def prepare_jonah_job_ruth(level, txt_type):
    """
    The function returns two objects:
    
    cl_lists, a list containing a list of Q clause nodes for each of the three texts
    book_funcs_dict, a dict containing three dicts (one for each text) with lists of phrase functions of all Q clauses 
    """
    
    cl_lists = collections.defaultdict(list)
    book_funcs_dict = collections.defaultdict(dict)

    for cl in F.otype.s('clause'):
        
        bo,ch,ve = T.sectionFromNode(cl)
        
        if F.txt.v(cl)[-1] != txt_type:
             continue
                
        words = L.d(cl, 'word')
        if F.language.v(words[0]) != 'Hebrew':
            continue

        # in the book of Jonah chapter 2 is removed, this is Jonah's Psalm.
        if bo == 'Jonah' and ch != 2:
            lang_list = make_language_list(level, cl)

            cl_lists[bo].append(cl)
            book_funcs_dict[bo][cl] = lang_list
         
        # select Ruth data
        if bo == 'Ruth':
            lang_list = make_language_list(level, cl)

            cl_lists[bo].append(cl)
            book_funcs_dict[bo][cl] = lang_list
   
        # from Job, the prose tale is selected in chapters 1, 2 and 42.
        if bo == 'Job' and ch in {1,2}:
        
            lang_list = make_language_list(level, cl)
            
            cl_lists[bo].append(cl)
            book_funcs_dict[bo][cl] = lang_list
            
        if bo == 'Job' and ch == 42 and ve > 6:
            
            lang_list = make_language_list(level, cl)

            cl_lists[bo].append(cl)
            book_funcs_dict[bo][cl] = lang_list
    
    return cl_lists, book_funcs_dict

In the functions assign_to_ints, make_conv_dict phrase functions are converted to integers, because the network can only process numeric data.

In [10]:
def assign_to_ints(func_count):
    """
    create f2int_dict, which maps phrase functions to integers
    """

    f2int_dict = {}
    f_list = []
    for value in func_count.values():
        f_list.append(value)

    sorted_freqs = (sorted(f_list, reverse=True))

    for key in func_count.keys():
        f2int_dict[key] = sorted_freqs.index(func_count[key]) + 1
    
    return f2int_dict

In [11]:
def make_conv_dict(f2int_dict,list_of_lists):

    phr_ints = {}
    
    for cl_list in list_of_lists:
        for clause in cl_list:
            func_ints = [f2int_dict[fun] for fun in phr_funcs_dict[clause]]
            phr_ints[clause] = func_ints
    
    return phr_ints

In select_clauses clauses are selected randomly for the train set. This is done to make the numer of EBH and LBH clauses equal in the train set.

In [12]:
def select_clauses(cl_list_ebh, cl_list_lbh):

    idx_ebh = np.random.choice(cl_list_ebh, int(len(cl_list_lbh)), replace = False)
    idx_lbh = np.random.choice(cl_list_lbh, int(len(cl_list_lbh)), replace = False)
    tot_index = np.concatenate((idx_ebh,idx_lbh), axis = 0)
    
    return tot_index

In [13]:
def convert_to_integers(tot_index, phr_ints, targets_dict):

    selected_input = [phr_ints[cl] for cl in tot_index]
    selected_input = np.array(selected_input)

    selected_targets = [targets_dict[cl] for cl in tot_index]
    selected_targets = np.array(selected_targets)
    
    return selected_input, selected_targets

Find out maximum length of selected N and Q clauses together. Shorter clauses are padded with zeros, because all clauses need to have the same length.

In [14]:
def calc_max_len(selected_input):

    max_length = 0
    for sub_corp in [selected_input]:
        for clause in sub_corp:
            if len(clause) > max_length:
                max_length = len(clause)
        
    return max_length

Data are split in train and test set, and sequences are padded.

In [15]:
def test_train_split(selected_input, selected_targets):

    # pad sequences
    X_train = sequence.pad_sequences(selected_input, maxlen=max_length)

    # train/test split
    data_train_cl, data_test_cl, labels_train, labels_test = train_test_split(tot_index, selected_targets, test_size=0.15, random_state=42)
    data_train, data_test, labels_train, labels_test = train_test_split(X_train, selected_targets, test_size=0.15, random_state=42)

    return data_train, data_test, labels_train, labels_test, data_test_cl

In train_LSTM_model the model is trained. 

The network consists of an embedding layer, two LSTM networks with 300 cells each and dropout to prevent overfitting.
Finally there is a dense layer with a single cell, as usual in binary classification tasks.

In [16]:
def train_LSTM_model(data_train, labels_train, data_test, labels_test, max_length, level, vocab_size):

    if level == 'phrase_level':
        epochs = 8
    elif level == 'word_level':
        epochs = 10
    
    #top_words = 100
    embedding_vector_length = 20
    
    model = Sequential()
    model.add(Embedding(vocab_size, embedding_vector_length, input_length=max_length))
    model.add(LSTM(300, activation = 'relu', return_sequences=True))
    model.add(Dropout(0.5))
    model.add(LSTM(300, activation = 'relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    model.fit(data_train, labels_train, validation_data=(data_test, labels_test), epochs=epochs, batch_size=256)
    
    return model

Check accuracy on the test set.

In [17]:
def evaluate_model(data_test, labels_test, model):
    
    scores = model.evaluate(data_test, labels_test, verbose=0)
    y_hat = model.predict_classes(data_test, verbose=0)
    
    return scores[1]*100, y_hat

Make predictions on val_book and/or book of uncertain date.

In [18]:
def predict_book(cl_list, funcs_dict, f2int_dict, model, max_length):
    """
    The function outputs two objects:
    
    np.sum(y_hat), the total number of clauses classified as LBH (of which each has the value 1)
    len(y_hat), total number of predicted clauses in the book.
    """

    phr_ints = []
    for clause in cl_list:

        func_ints = [f2int_dict[fun] for fun in funcs_dict[clause]]

        phr_ints.append(func_ints)

    selected_input = np.array(phr_ints)

    sel_input = np.array(sequence.pad_sequences(selected_input, maxlen=max_length))
    y_hat = model.predict_classes(sel_input, verbose=0)

    return np.sum(y_hat), len(y_hat)

In [19]:
all_accuracy_dict = {}

validation_preds = collections.defaultdict(list)
jo_jo_ru_preds = collections.defaultdict(list)
clause_counts = collections.defaultdict(list)

cl_lists, funcs_dicts = prepare_jonah_job_ruth(level, txt_type)

# iterate over all test books
for validation_book in ['Genesis', 'Exodus', 'Leviticus', 'Numbers', 'Deuteronomy', 'Joshua', 'Judges', 'Samuel', 'Kings', 'Esther', 'Daniel', 'Ezra', 'Nehemiah', 'Chronicles']:
    print(validation_book, level, txt_type)
    
    func_count = count_elems(level, txt_type)
    cl_list_ebh, cl_list_lbh, targets_dict, phr_funcs_dict = get_data_from_tf(validation_book, level, txt_type)
    cl_dicts_val, funcs_dicts_val = prepare_val_book(validation_book, level, txt_type)

    f2int_dict = assign_to_ints(func_count)
    list_of_lists = [cl_list_ebh, cl_list_lbh]
    phr_ints = make_conv_dict(f2int_dict,list_of_lists)

    verse_dict = collections.defaultdict(list)

    accuracy_list = []

    # Train the model and make predictions, 200 times for each EBH/LBH book
    for i in range(200):
        print(validation_book, level, txt_type, i)
        
        tot_index = select_clauses(cl_list_ebh, cl_list_lbh)
        selected_input, selected_targets = convert_to_integers(tot_index, phr_ints, targets_dict)
        max_length = calc_max_len(selected_input)
        data_train, data_test, labels_train, labels_test, data_test_cl = test_train_split(selected_input, selected_targets)

        model = train_LSTM_model(data_train, labels_train, data_test, labels_test, max_length, level, len(func_count))
        accuracy, y_hat = evaluate_model(data_test, labels_test, model)

        accuracy_list.append(accuracy)
        
        sum_pred, tot_clauses = predict_book(cl_dicts_val[validation_book], funcs_dicts_val[validation_book], f2int_dict, model, max_length)
        
        #validation 
        validation_preds[validation_book].append(sum_pred)
        clause_counts[validation_book].append(tot_clauses)
        
        for uncertain_book in {"Jonah", "Ruth", "Job"}:
            sum_pred, tot_clauses = predict_book(cl_lists[uncertain_book], funcs_dicts[uncertain_book], f2int_dict, model, max_length)
            jo_jo_ru_preds[uncertain_book].append(sum_pred)
            clause_counts[uncertain_book].append(tot_clauses)

        K.clear_session()

    all_accuracy_dict[validation_book] = accuracy_list


Genesis phrase_level N
Genesis phrase_level N 0
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Genesis phrase_level N 1
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Genesis phrase_level N 2
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Exodus phrase_level N
Exodus phrase_level N 0
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Exodus phrase_level N 1
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Exodus phrase_level N 2
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Leviticus phrase_level N
Leviticus phras

Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Leviticus phrase_level N 1
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Leviticus phrase_level N 2
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Numbers phrase_level N
Numbers phrase_level N 0
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Numbers phrase_level N 1
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Numbers phrase_level N 2
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Deuteronomy phrase_level N
Deuteronomy phrase_level N 0
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoc

Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Deuteronomy phrase_level N 2
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Joshua phrase_level N
Joshua phrase_level N 0
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Joshua phrase_level N 1
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Joshua phrase_level N 2
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Judges phrase_level N
Judges phrase_level N 0
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Judges phrase_level N 1
Train on 8712 samples, validate on 1538 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8

Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Esther phrase_level N
Esther phrase_level N 0
Train on 7950 samples, validate on 1404 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Esther phrase_level N 1
Train on 7950 samples, validate on 1404 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Esther phrase_level N 2
Train on 7950 samples, validate on 1404 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Daniel phrase_level N
Daniel phrase_level N 0
Train on 8304 samples, validate on 1466 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Daniel phrase_level N 1
Train on 8304 samples, validate on 1466 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Daniel phrase_level N 2
Train on 8304 samples, validate on 1466 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoc

Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Chronicles phrase_level N 1
Train on 2811 samples, validate on 497 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Chronicles phrase_level N 2
Train on 2811 samples, validate on 497 samples
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


Save all relevant objects!

In [20]:
jo_jo_ru_df = pd.DataFrame.from_dict(jo_jo_ru_preds)
filename = 'jo_jo_ru_preds_' + level + '_' + txt_type + '.csv'
jo_jo_ru_df.to_csv(filename, index=False)

In [21]:
jo_jo_ru_df.head()

Unnamed: 0,Job,Jonah,Ruth
0,36,25,33
1,48,30,43
2,45,26,36
3,49,30,41
4,38,28,33


In [48]:
ebh_lbh_lengths = {}
jo_jo_ru_length = {}

books = ['Genesis', 'Exodus', 'Leviticus', 'Numbers', 'Deuteronomy', 'Joshua', 'Judges', 'Samuel', 'Kings', 'Esther', 'Daniel', 'Ezra', 'Nehemiah', 'Chronicles']
         
for bo in books:
    ebh_lbh_lengths[bo] = clause_counts[bo]
         
for bo in ['Jonah', 'Job', 'Ruth']:
         jo_jo_ru_length[bo] = clause_counts[bo]
         
jo_jo_ru_len_df = pd.DataFrame.from_dict(jo_jo_ru_length)


ebh_lbh_lengths_df = pd.DataFrame.from_dict(ebh_lbh_lengths)

all_lengths = pd.concat([jo_jo_ru_len_df.head(1), ebh_lbh_lengths_df.head(1)], axis=1)

filename = 'clause_counts_' + level + '_' + txt_type + '.csv'
all_lengths.to_csv(filename, index=False)      

In [50]:
all_lengths

Unnamed: 0,Jonah,Job,Ruth,Genesis,Exodus,Leviticus,Numbers,Deuteronomy,Joshua,Judges,Samuel,Kings,Esther,Daniel,Ezra,Nehemiah,Chronicles
0,110,141,201,3300,1633,296,1500,460,1420,1756,3809,4038,448,240,351,615,3471


The data are saved in csv files, for instance, [this one](validation_preds_phrase_level_N.csv) for N clauses with phrase level analysis.

In [52]:
validation_preds_df = pd.DataFrame.from_dict(validation_preds)
filename = 'validation_preds_' + level + '_' + txt_type + '.csv'
validation_preds_df.to_csv(filename, index=False)

all_accuracy_df = pd.DataFrame.from_dict(all_accuracy_dict)
filename = 'all_accuracy_' + level + '_' + txt_type + '.csv'
all_accuracy_df.to_csv(filename, index=False)

In [53]:
validation_preds_df.head(3)

Unnamed: 0,Genesis,Exodus,Leviticus,Numbers,Deuteronomy,Joshua,Judges,Samuel,Kings,Esther,Daniel,Ezra,Nehemiah,Chronicles
0,891,665,72,681,225,629,774,1131,1631,250,124,236,402,2023
1,1107,659,87,640,247,587,478,1149,1478,220,106,235,299,2299
2,978,557,79,587,200,676,512,1184,1399,195,133,238,305,1521


In [54]:
all_accuracy_df.head(3)

Unnamed: 0,Genesis,Exodus,Leviticus,Numbers,Deuteronomy,Joshua,Judges,Samuel,Kings,Esther,Daniel,Ezra,Nehemiah,Chronicles
0,57.932377,60.663199,61.378413,60.858256,59.362811,59.752923,61.638492,60.208064,59.947985,61.752135,59.68622,58.967203,60.606062,61.167002
1,61.378413,60.273081,59.167749,62.288684,58.127439,59.297788,58.192456,60.273081,60.338104,58.760685,58.526605,59.525472,57.797486,60.160965
2,59.167749,60.728216,60.403121,61.053318,59.037709,59.947985,58.77763,60.338104,58.517557,61.324787,61.937243,59.455687,61.19734,57.14286
