# New Text-Fabric module: The Dead Sea Scrolls

By Martijn Naaijer and Jarod Jacobs

Earlier this year, the CACCHT project (Creating Annotated Corpora of Classical Hebrew Texts) was started. CACCHT is a joint project of the ETCBC and the Theological Seminary at Andrews University and the researchers involved include: Jarod Jacobs, Martijn Naaijer, Robert Rezetko, Oliver Glanz and Wido van Peursen. CACCHT focuses on statistically analyzing Ancient Hebrew texts. At the core of our work is the BHSA and the extrabiblical module, but for a comprehensive analysis we intend to broaden our scope by including the Dead Sea Scrolls and Rabbinic texts.

We have complete the first stage the project, the results of which can be found on the [ETCBC github page](https://github.com/ETCBC/dss): a brand new Text-Fabric module containing the Dead Sea Scrolls with morphological encoding.

The DSS transcriptions and morphological data connected with them were generously provided by Martin Abegg. The transcriptions come from various sources, but primarily reflect what is found in the Discoveries in the Judean Desert series. Abegg started morphologically tagging the Qumran texts in the mid-90s with the assistance of several people. Over the following decades, Abegg completed full morphological tagging of nearly every Hebrew and Aramaic scroll found in the Judaean Desert between 1947 and today. In support of open access ideals, Abegg provided CACCHT his work from the past decades, which have been converted to Text-Fabric by Dirk Roorda. 

## Part of Speech tagging of Hebrew texts

With Abegg's data in Text-Fabric, the next step is to convert Abegg's morphological encoding to the encoding system used by the ETCBC. Over the next few months we will work on converting word, phrase and clause features. For part of the data this is pretty straightforward. We assume that features like verb tense, stem formation, gender, number and state are similar to the ETCBC encoding, with only small adaptations needing to be made. Our initial foray into this will be converting part of speech tagging. The Abegg's dataset contains part of speech values, but its conventions deviate from that which is use in the ETCBC database.

POS tagging of the Dead Sea Scroll has various challenges. In the first place, there are many ambiguous cases. For instance, the word אל can be a preposition, but it can also also be a noun meaning “god”. Of course a decision can be made by manually encoding all the DSS in the ETCBC encoding, or we can rely on POS tags and or other indirect information in the dataset of Abegg. The disadvantage of using other information from the dataset is that the conversion would become pretty complicated and in many cases the encoding would remain difficult. For instance, Abegg does not distinguish between part of speech (feature sp) and phrase dependent part of speech (feature pdp).

In this project we want to tag the DSS with the ETCBC encoding system automatically, without manually encoding the logic behind each tag and decision.

How does this work? First, it is important to state that we do not have clause boundaries in the Abegg dataset. This makes the task of POS tagging more difficult, because the structure of a clause may give an indication of the POS of a word. As an example, if a clause ends with the word אל, it is more likely to be a noun than a preposition, because a preposition is followed within the clause by other words or a pronominal suffix.

Even with that limitation, Abegg's dataset does have information about the structure of words and their environment. Significantly, we know where word boundaries are, for instance, ויהי has been split into ו and יהי already. Also, POS tagging is helped by word morphology and, most importantly, we know the order of words in a book or text.

Automated systems for the analysis of language can be roughly divided into two kinds: rule-driven and pattern-driven. Rule-driven systems contain a lot of human input, such as "if then" blocks of code. For instance, in the case of a POS tagger, such a block can be: "if a word is 'H', the POS is 'article'", or "if a word is 'MCH' or 'YHWH', the POS is 'proper noun'". In general, this kind of systems works well, but there are some problems. One is that there are many ambiguous cases, and a rule-driven system can become very complicated to distinguish all the possible cases. Also, there may be patterns in the dataset that the researcher has missed, in which case the rule driven system remains incomplete. 

In the CACCHT project we opt for the pattern-driven approach based on machine learning. Instead of relying on a system based on rules, we let an algorithm search for patterns in the data. In recent years, pattern-driven systems have started to outperform rule-driven system. Modern pattern-driven systems generally rely on machine learning algorithms to identify the structure of the data. The model is feed a large set of examples called the training set. For a POS tagging model, the training set contains words all tagged with their part of speech. The model identifies patterns in the training data from which it builds a structure that can be used tagged new texts that do not have part of speech tags. This approach is called supervised learning.

For the CACCHT project, we train our model on the BHSA, where we know the POS of all the words. The model learns the relationship between words and the corresponding pos values, and then we use this model to predict the POS of the words in the DSS.

We have already seen that there are ambiguous cases, so how do we solve these? If it is possible to use the context of a word we would be helped enormously, because we expect that the preposition אל has a different environment in clause than the noun אל.

To solve this problem, we use a so called sequence to sequence model (seq2seq). Instead of modeling the relationship between a word and a POS, we model the relationship between two sequences. One sequence consists of a number of words, the other of the corresponding POS. These sequences need to be kept relatively short, so we use a clause in each data sample which is a natural choice. However, as we have already mentioned, the Abegg data contain word boundaries, but there are no clause boundaries. Therefore we have chosen to feed the algorithm sequences of eight words. Here is an example of how this works:

In ETCBC transcription, the first sequence in Genesis 1:1 looks as follows:

'B R>CJT BR> >LHJM >T H CMJM W'

This is the first input sequence. The corresponding output is a list and looks as follows.

['\t', 'prep', 'subs', 'verb', 'subs', 'prep', 'art', 'subs', 'conj', '\n']

The signs '\t' and '\n' are start and stop signs, occurring in every output sequence.

For the second and third sample in the train set we move one word forward every time, so the inputs look as follows:

'R>CJT BR> >LHJM >T H CMJM W >T'

'BR> >LHJM >T H CMJM W >T H'

The corresponding outputs are:

['\t', 'subs', 'verb', 'subs', 'prep', 'art', 'subs', 'conj', 'prep', '\n']

['\t', 'verb', 'subs', 'prep', 'art', 'subs', 'conj', 'prep', 'art', '\n']

We move forward this way to the end of the book of Genesis, then we process Exodus and move forward until the end of Chronicles is reached. One book is withheld from the model (in our case Nehemiah), to act as the test set. Keeping part of the data separate as a test set is a standard procedure in machine learning practice.

You can see the seq2seq model as a translation model, and this is exactly what it is used for in other applications. In these applications the input can consist of English sentences, which are translated by the model to another language, for instance Dutch.

What kind of algorithms can be used for such a task, in which a sequence of POS is predicted for a sequence of characters? A type of model which is used often for sequence analysis is the so-called Long Short-Term Memory model (or LSTM model), which is a kind of Neural Network. It is used for a variety of Natural Language Processing tasks (such as chatbots, text classification and text summarization), but also for making predictions of numeric sequences, such as forecasting time-series. It is beyond the scope of this blog to go into the details of Neural Networks and the LSTM model, but there are a lot of helpful sources  online about it, such as [this blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) or the [Keras documentation of seq2seq LSTM models](https://keras.io/examples/lstm_seq2seq/).

One challenge with LSTM models is that the algorithm only ingests numbers and all sequences have to have the same length. Because of this, some further preprocessing is needed. We check the length of the longest sequence and give all the sequences that length by adding zeros to it (this is called padding). All sequences consist of eight words, so how can they have varying lengths? We have chosen to use a character based model, so the model sees the input sequence as a sequence of characters. It does not know where the word boundaries are, because the model takes the space as a character just like the other characters. We also convert each character to a number so that the model can work with them.

Then all those input and output sequences are fed to the algorithm, which trains a model that finds the relationship between the input and output sequences (at least, that is what we hope, of course).

With this model and the input sequences of the DSS we can predict their POS, but how do we know how well the model performs? Predictions based on machine learning models rarely predict everything correctly. To find out how good it is, we start with making predictions on the test set: the book of Nehemiah. We make predictions on the words (first converted to sequences similar to the training data), and then compare these predictions with the true values of the POS in the ETCBC database. When that is done we know how often it predicts unseen words correctly.

## Let's do some real work

The following script works through this whole procedure of training the LSTM model and makes predictions on the test set. The following steps are made:

- import of relevant libraries
- prepare_train_data() in this function input and output sequences for the train set are created.
- prepare test data() input and output sequences for the test set are created.
- in create_dicts() and one_hot_encode() the sequences are preprocessed further.
- in define_LSTM_model() the encode-decode architecture is created.
- compile_and_train() does the training of the model. Here some important hyperparameters of the model are chosen.
- After that, predictions are made using the model and the test set, which is the book of Nehemiah. After the evaluation it becomes clear how well the model works on unseen data. These predictions demonstrate what we want to use the model for: automatically analyzing Hebrew texts grammatically!


First some libraries are imported. Of course, we use [Text-Fabric with the BHSA data](https://etcbc.github.io/bhsa) which you can access with Python 3 and as framework for the Neural Network we use [Keras](https://keras.io).


In [1]:
import collections

import pandas as pd
import numpy as np

from sklearn.utils import shuffle
from statistics import mode

from keras.models import Model
from keras.layers import Input, LSTM, Dense
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
from tf.app import use
A = use('bhsa', hoist=globals())

A.displaySetup(extraFeatures='g_cons')

TF app is up-to-date.
Using annotation/app-bhsa commit 78de65f21cdaae29c46231bcfa0b72a0552fb882 (=latest)
  in C:\Users\geitb/text-fabric-data/__apps__/bhsa.
Using etcbc/bhsa/tf - c r1.5 in C:\Users\geitb/text-fabric-data
Using etcbc/phono/tf - c r1.2 in C:\Users\geitb/text-fabric-data
Using etcbc/parallels/tf - c r1.2 in C:\Users\geitb/text-fabric-data


**Documentation:** <a target="_blank" href="https://etcbc.github.io/bhsa" title="provenance of BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis">BHSA</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Writing/Hebrew" title="('Hebrew characters and transcriptions',)">Character table</a> <a target="_blank" href="https://etcbc.github.io/bhsa/features/0_home" title="BHSA feature documentation">Feature docs</a> <a target="_blank" href="https://github.com/annotation/app-bhsa" title="bhsa API documentation">bhsa API</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Api/Fabric/" title="text-fabric-api">Text-Fabric API 7.4.8</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Use/Search/" title="Search Templates Introduction and Reference">Search Reference</a>

In the function prepare_train_data() the train set is created, and some other useful information is collected. The argument of the function, test_book, is the book which will be excluded from the train set, because it is upon this book that the model will be tested.

In [22]:
def prepare_train_data(test_book):

    input_seqs = []
    output_pos = []
    input_chars = set()
    output_vocab = set()

    # iterate over all the books
    for bo in F.otype.s("book"): 
        
        # exclude the test_book
        if F.book.v(bo) == test_book:
            continue
               
        # all the words from a book are collected
        words = L.d(bo, 'word')
        
        # Now we iterate over all the words, except the last words, because all the sequences have to be 8 words long
        for w in words[0:-7]:
            
            # In the following two lines the train data are prepared
            
            # here the input data are created
            # words_train is a string with the consonantal representation of 8 words, separated by spaces
            # elided-he is excluded, this is the empty string
            g_cons_train = (" ".join([F.g_cons.v(w) for w in range(w, w+8) if (F.g_cons.v(w) != '')])).strip()
            
            # and here outputs are created
            # it is a list containing parts of speech
            parts_of_speech = [F.sp.v(w) for w in range(w, w+8) if (F.g_cons.v(w) != '')]
            
            # the two preceding lines of code and their counterparts in the function prepare_test_data() are the only places 
            # where we extract data from the etcbc database with text-fabric before the data are trained
            
            # to the outputs a start ('\t') and stop ('\n') symbol are added
            parts_of_speech = ['\t'] + parts_of_speech + ['\n']
             
            # the input sepuence g_cons_train is added to input_seqs, which is a list containing all the inputs
            input_seqs.append(g_cons_train)
            
            # the list parts_of_speech is added to the list output_pos
            output_pos.append(parts_of_speech)
            
            # for further processing we need the "vocabularies" of the input and output
            # we use a character-based model, so the input vocabulary consists of all the distinct characters in the input strings
            # also included is the space
            for ch in g_cons_train:
                input_chars.add(ch)
            
            # also collected is the output vocabulary, which consists of all the parts of speech in the etcbc database
            for pos in parts_of_speech:
                output_vocab.add(pos)
                
    
    input_chars = sorted(list(input_chars))
    output_vocab = sorted(list(output_vocab))
    
    # in the LSTM network all the sequences have to have the same length. We find out what the length of the longest sequence is,
    # all the other sequences will get that length
    max_len_input = max([len(clause) for clause in input_seqs])
    max_len_output = max([len(poss) for poss in output_pos])
    
    # shuffle the data. The model will get the data in small batches, it is preferable if the batches are more or less homogeneous
    # of course the inputs and outputs have to be shuffled identically
    input_seqs, output_pos = shuffle(input_seqs, output_pos)
    
    return input_seqs, output_pos, input_chars, output_vocab, max_len_input, max_len_output

In the function prepare_test_data() the test data are prepared, consisting of the data of the single book not included in the train data.

In [23]:
def prepare_test_data(test_book):

    input_seqs_test = []
    output_seqs_test = []
    g_cons_test = []
    pos_test = [] 
    
    for bo in F.otype.s('book'):
        
        # exclude other books than test_book
        if F.book.v(bo) != test_book:
            continue
            
        words = L.d(bo, 'word')

        for w in words[0:-7]:
            
            if F.g_cons.v(w) == '':
                continue
            
            # prepare the test data
            input_seq_test = (" ".join([F.g_cons.v(w) for w in range(w, w+8) if (F.g_cons.v(w) != '')])).strip()
            output_seq_test = [F.sp.v(w) for w in range(w, w+8) if (F.g_cons.v(w) != '')]
            
            input_seqs_test.append(input_seq_test)
            output_seqs_test.append(output_seq_test)
            
    return input_seqs_test, output_seqs_test, [w for w in words if (F.g_cons.v(w) != '')]

The network can only handle numeric data, but after the data have been processed as numbers, they need to be converted back to characters. The function create_dicts() provides dictionaries with mapping between the input and output vocabularies and integers.

In [24]:
def create_dicts(input_voc, output_voc):
    
    # these dicts map the input sequences
    input_idx2char = {}
    input_char2idx = {}

    for k, v in enumerate(input_voc):
        input_idx2char[k] = v
        input_char2idx[v] = k
     
    # and these dicts map the output sequences of parts of speech
    output_idx2char = {}
    output_char2idx = {}
    
    for k, v in enumerate(output_voc):
        output_idx2char[k] = v
        output_char2idx[v] = k
        
    return input_idx2char, input_char2idx, output_idx2char, output_char2idx

Now the final data preparation function is made. Categorical data are generally fed to the LSTM network in one-hot encoded form. The inputs and the outputs have the same length. Also created is an array called decoder_target.

In [25]:
def one_hot_encode(nb_samples, max_len_input, max_len_output, input_chars, output_vocab, input_char2idx, output_char2idx, input_clauses, output_pos):
    
    # three-dimensional numpy arrays are created 
    tokenized_input = np.zeros(shape = (nb_samples, max_len_input, len(input_chars)), dtype='float32')
    tokenized_output = np.zeros(shape = (nb_samples, max_len_output, len(output_vocab)), dtype='float32')
    target_data = np.zeros((nb_samples, max_len_output, len(output_vocab)), dtype='float32')

    for i in range(nb_samples):
        for k, ch in enumerate(input_clauses[i]):
            tokenized_input[i, k, input_char2idx[ch]] = 1
        
        for k, ch in enumerate(output_pos[i]):
            tokenized_output[i, k, output_char2idx[ch]] = 1

            # decoder_target_data will be ahead by one timestep and will not include the start character.
            if k > 0:
                target_data[i, k-1, output_char2idx[ch]] = 1
                
    return tokenized_input, tokenized_output, target_data

In the function define_LSTM_model() the architecture of the model is created. Neural networks are very flexible structures and a variety of architectures have been developed for various tasks. Here we use the encoder-decoder architecture with two LSTM layers in the encoder. In the architecture there is a variety of hyperparameters that you have to choose. Better hyperparameters lead to better predictions, so it is important to spend time on optimizing this. Hyperparameters in this architecture are the number of LSTM layers, the number of cells in each LSTM layer and the activation function.

In [26]:
def define_LSTM_model(input_chars, output_vocab):

    # encoder model
    encoder_input = Input(shape=(None,len(input_chars)))
    encoder_LSTM = LSTM(250,activation='relu',return_state=True, return_sequences=True)(encoder_input)
    encoder_LSTM = LSTM(250,return_state=True)(encoder_LSTM)
    encoder_outputs, encoder_h, encoder_c = encoder_LSTM
    encoder_states = [encoder_h, encoder_c]
    
    # decoder model
    decoder_input = Input(shape=(None,len(output_vocab)))
    decoder_LSTM = LSTM(250, return_sequences=True, return_state = True)
    decoder_out, _ , _ = decoder_LSTM(decoder_input, initial_state=encoder_states)
    decoder_dense = Dense(len(output_vocab), activation='softmax')
    decoder_out = decoder_dense (decoder_out)
    
    model = Model(inputs=[encoder_input, decoder_input],outputs=[decoder_out])

    model.summary()

    return encoder_input, encoder_states, decoder_input, decoder_LSTM, decoder_dense, model

Now the model is compiled and trained using the function compile_and_train(). The data are fed to the model in small batches. The train data are split in a train and validation set. The latter data consist of 5% of the original train set. The model is trained on the train set, and makes a prediction on these data. The difference between the predictions and the true values of the output are calculated with categorical crossentropy and is called the loss. During training this loss becomes smaller, which means that the predictions become more accurate. However, we want the model not only to become good on the train data, but it should be general enough to make accurate predictions on unseen data. Therefore, after every epoch a prediction is made on the small validation set and the validation loss is calculated. Ideally, the validation loss is more or less equal to the train loss. After a number of epochs, you will notice that the train loss keeps decreasing, while the validation loss remains equal or even increases. At this point the model starts to overfit, which means that the algorithm is modeling idiosyncrasies in the train data instead of general patterns. In that case it is time to stop training and make predictions on the test set.

Again, you have to choose a number of hyperparameters. These are the optimizer, the loss function, the batch size, the number of epochs and the learning rate. If you want, you can even tune more hyperparameters.

With Earlystopping() the training process can be stopped earlier than the given number of epochs. This is useful if the model starts overfitting and the validation loss does not decrease anymore.

Note that training an LSTM model is a computationally intensive process. It is recommended to run the script on a GPU.

In [27]:
def compile_and_train(model, one_hot_in, one_hot_out, targets, batch_size, epochs, val_split):

    callback = EarlyStopping(monitor='val_loss', patience=3, verbose=0, mode='auto')
    adam = Adam(lr=0.0008, beta_1=0.99, beta_2=0.999, epsilon=0.00000001)
    model.compile(optimizer=adam, loss='categorical_crossentropy')
    model.fit(x=[one_hot_in,one_hot_out], 
              y=targets,
              batch_size=batch_size,
              epochs=epochs,
              validation_split=val_split,
              callbacks=[callback])
    
    return model

The train data are prepared. The test data consist of sequences of words from the book of Nehemiah, so in the preparation of the train data, Nehemiah is excluded.

In [28]:
test_book = "Nehemia" # the book name is in Latin, because the tf-feature "book" is used in the functions prepare_train_data() and prepare_test_data().

input_clauses, output_pos, input_chars, output_vocab, max_len_input, max_len_output = prepare_train_data(test_book)
input_idx2char, input_char2idx, output_idx2char, output_char2idx = create_dicts(input_chars, output_vocab)

nb_samples = len(input_clauses)
one_hot_input, one_hot_output, target_data = one_hot_encode(nb_samples, max_len_input, max_len_output, input_chars, output_vocab, input_char2idx, output_char2idx, input_clauses, output_pos)

The test data are prepared

In [10]:
test_clauses, output_test, test_word_nodes = prepare_test_data(test_book)
one_hot_test_data, _, _ = one_hot_encode(len(test_clauses), max_len_input, max_len_output, input_chars, output_vocab, input_char2idx, output_char2idx, test_clauses, output_pos)

Here the functions define_LSTM_model() and compile_and_train() are called. A neural network learns in an iterative process. One iteration is called an epoch. In each iteration a prediction is made, and the train and validation loss are calculated, as you can see in the output.

The architecture of the model is also printed with the number of parameters. You also see the number of train samples (397552 samples).

In [11]:
encoder_input, encoder_states, decoder_input, decoder_LSTM, decoder_dense, model = define_LSTM_model(input_chars, output_vocab)
model = compile_and_train(model, one_hot_input, one_hot_output, target_data, 1024, 150, 0.05)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, None, 25)     0                                            
__________________________________________________________________________________________________
lstm_1 (LSTM)                   [(None, None, 250),  276000      input_1[0][0]                    
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, None, 16)     0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [(None, 250), (None, 501000      lstm_1[0][0]                     
                                                                 lstm_1[0][1]                     
          

In [12]:
# Encoder inference model
encoder_model_inf = Model(encoder_input, encoder_states)

# Decoder inference model
decoder_state_input_h = Input(shape=(250,))
decoder_state_input_c = Input(shape=(250,))
decoder_input_states = [decoder_state_input_h, decoder_state_input_c]

decoder_out, decoder_h, decoder_c = decoder_LSTM(decoder_input, 
                                                 initial_state=decoder_input_states)

decoder_states = [decoder_h , decoder_c]

decoder_out = decoder_dense(decoder_out)

decoder_model_inf = Model(inputs=[decoder_input] + decoder_input_states,
                          outputs=[decoder_out] + decoder_states )

In the function decode_seq() the predictions on the test set are made. The input, inp_seq, consists of one one-hot encoded sequence of words in the book of Nehemiah.

In [13]:
def decode_seq(inp_seq):
    
    states_val = encoder_model_inf.predict(inp_seq)
    
    target_seq = np.zeros((1, 1, len(output_vocab)))
    target_seq[0, 0, output_char2idx['\t']] = 1
    
    pred_pos = []
    stop_condition = False
    
    while not stop_condition:
        
        decoder_out, decoder_h, decoder_c = decoder_model_inf.predict(x=[target_seq] + states_val)
        
        max_val_index = np.argmax(decoder_out[0,-1,:])
        sampled_out_char = output_idx2char[max_val_index]
        pred_pos.append(sampled_out_char)
        
        if (sampled_out_char == '\n'):
            stop_condition = True
        
        target_seq = np.zeros((1, 1, len(output_vocab)))
        target_seq[0, 0, max_val_index] = 1
        
        states_val = [decoder_h, decoder_c]
        
    return pred_pos

Now the function decode_seq() is called, the predictions are made and the results are preprocessed. 

For most words eight predictions are made, because each word (except the words at the beginning and end of a book) occurs in eight sequences. In the dict decision_dict all eight predictions are collected.

In [14]:
decision_dict = collections.defaultdict(list)

for seq_index in range(len(one_hot_test_data)):
    inp_seq = one_hot_test_data[seq_index:seq_index+1]
    
    pred_pos = decode_seq(inp_seq)
    
    if len(pred_pos[:-1]) == len(output_test[seq_index]):
        for pred_ind in range(len(pred_pos[:-1])):
            decision_dict[seq_index + pred_ind].append(pred_pos[:-1][pred_ind])   

We simply use majority voting to decide what the final prediction is. So, if the model predicts 5 times "verb" and 3 times "subs" for a certain word, we decide that the word is a verb. In the case of a tie, e.g. 4 times "verb" and 4 times "subs", the value with the lowest index is chosen, which can be seen as a random choice from the alternatives with equal numbers. 

### Misclassifications on the test set

We start the evaluation with the bad news: misclassifications. We want the model to predict the POS correctly as often as possible, but in practice it is difficult to reach 100% accuracy. The following cell outputs the words in the book in Nehemiah that were misclassified by the model.

The output shows:

- the text-fabric node number
- (book, chapter, verse)
- the consonants of a word
- correct POS
- predicted POS

In [15]:
correct_test = 0
wrong_test = 0
cross_dict = collections.defaultdict(lambda: collections.defaultdict(int))

for key in range(len(test_word_nodes)):
    data = collections.Counter(decision_dict[key])
    cross_dict[F.sp.v(test_word_nodes[key])][max(decision_dict[key], key=data.get)] += 1

    if F.sp.v(test_word_nodes[key]) == max(decision_dict[key], key=data.get):
        correct_test += 1

    else:
        wrong_test += 1
        print(test_word_nodes[key], T.sectionFromNode(test_word_nodes[key]), F.g_cons.v(test_word_nodes[key]), F.sp.v(test_word_nodes[key]), max(decision_dict[key], key=data.get))


383423 ('Nehemiah', 1, 1) KSLW nmpr subs
383435 ('Nehemiah', 1, 2) XNNJ nmpr verb
383482 ('Nehemiah', 1, 3) MPRYT verb nmpr
383506 ('Nehemiah', 1, 4) YM verb subs
383542 ('Nehemiah', 1, 6) QCBT adjv verb
383580 ('Nehemiah', 1, 7) XBL verb subs
383652 ('Nehemiah', 1, 9) CM advb nmpr
383674 ('Nehemiah', 1, 11) QCBT adjv verb
383758 ('Nehemiah', 2, 2) R< subs adjv
383784 ('Nehemiah', 2, 3) XRBH adjv subs
383832 ('Nehemiah', 2, 5) >BNNH verb subs
383846 ('Nehemiah', 2, 6) MHLKK subs verb
383951 ('Nehemiah', 2, 9) <MJ prep subs
383966 ('Nehemiah', 2, 10) <BD subs adjv
383999 ('Nehemiah', 2, 12) <MJ prep subs
384017 ('Nehemiah', 2, 12) <MJ prep subs
384044 ('Nehemiah', 2, 13) FBR verb subs
384075 ('Nehemiah', 2, 14) <BR verb subs
384079 ('Nehemiah', 2, 15) <LH verb subs
384086 ('Nehemiah', 2, 15) FBR verb subs
384149 ('Nehemiah', 2, 17) XRBH adjv subs
384189 ('Nehemiah', 2, 18) BNJNW verb subs
384204 ('Nehemiah', 2, 19) <BD subs adjv
384208 ('Nehemiah', 2, 19) GCM nmpr subs
384227 ('Nehemiah

### Quantitative evaluation

The following table shows the predictions in the rows and the true values according to the ETCBC database in the columns. On the diagonal you see the numbers of correct predictions.

In [16]:
evaluation = []

all_pos = list(cross_dict.keys())

for key in all_pos:
    eval_pos = [cross_dict[key][key2] if key2 in cross_dict[key] else 0 for key2 in all_pos]
    evaluation.append(eval_pos)
    
# put everything in dataframe
df_eval = pd.DataFrame(evaluation) 
df_eval.columns = all_pos
df_eval.index = all_pos
df_eval

# Below:
# horizontal: predictions
# vertical: true values according to etcbc database

Unnamed: 0,subs,nmpr,conj,verb,prep,prps,art,adjv,advb,prde,intj,nega,inrg,prin
subs,2244,11,0,40,5,0,0,13,1,1,0,0,0,0
nmpr,93,687,0,35,0,0,0,19,0,0,0,0,0,0
conj,0,0,1204,0,0,0,0,0,0,0,0,0,0,0
verb,42,7,0,923,0,0,0,13,0,0,0,0,0,0
prep,7,0,0,0,1167,0,0,0,0,0,0,0,0,0
prps,0,0,0,0,3,101,0,0,0,0,0,0,0,0
art,0,0,0,0,0,0,686,0,0,0,0,0,0,0
adjv,15,1,0,3,0,0,0,187,1,0,0,0,0,0
advb,1,1,0,0,0,0,0,0,47,0,0,0,0,0
prde,0,0,0,0,0,0,0,0,0,46,0,0,0,0


All these result are interesting, but how good is the model? We calculate this by dividing the number of misclassifications by the total number of predictions: 

In [17]:
print("Correct classifications:", correct_test)
print("Misclassifications:", wrong_test)

correct_percent = 100 * correct_test  / (correct_test + wrong_test)
print("Accuracy:", round(correct_percent, 1), "%")

Correct classifications: 7398
Misclassifications: 319
Accuracy: 95.9 %


So, the model predicts the POS of biblical data correctly in nearly 96% of the words, which we think is decent. The most difficult POS to predict is the proper noun (nmpr), in 93 cases in which the true value is a proper noun a substantive (subs) was predicted. The results may vary slightly between different runs of the script.

The model can be saved and loaded again to be used for making predictions on for instance the Dead Sea Scrolls. The language of the DSS may differ a bit from Biblical Hebrew, which may lead to a slight decrease in accuracy, but on the other hand, the extra-biblical text-fabric module contains some DSS scrolls already, which is helpful, because they can already be added to the training set. This addition of other training data is only one way to improve the model. There might be various other ways. If you have suggestions, or if you are a student and you would like to do a project on Ancient Hebrew and machine learning, let us know!

While it is difficult to say beforehand how the model can be improved and how good exactly the algorithm works on unseen DSS, we will update you soon with our findings in a another blogpost.

### Correct classifications

Finally, for the record, these are the correct predictions. In the output you see:

- The text-fabric node number
- (book, chapter, verse)
- the consonants of the word
- correct pos
- predicted pos (which is identical to the correct pos, of course)

In [18]:
for key in range(len(test_word_nodes)):
    data = collections.Counter(decision_dict[key])
    cross_dict[F.sp.v(test_word_nodes[key])][max(decision_dict[key], key=data.get)] += 1

    if F.sp.v(test_word_nodes[key]) == max(decision_dict[key], key=data.get):

        print(test_word_nodes[key], T.sectionFromNode(test_word_nodes[key]), F.g_cons.v(test_word_nodes[key]), F.sp.v(test_word_nodes[key]), max(decision_dict[key], key=data.get))


383415 ('Nehemiah', 1, 1) DBRJ subs subs
383416 ('Nehemiah', 1, 1) NXMJH nmpr nmpr
383417 ('Nehemiah', 1, 1) BN subs subs
383418 ('Nehemiah', 1, 1) XKLJH nmpr nmpr
383419 ('Nehemiah', 1, 1) W conj conj
383420 ('Nehemiah', 1, 1) JHJ verb verb
383421 ('Nehemiah', 1, 1) B prep prep
383422 ('Nehemiah', 1, 1) XDC subs subs
383424 ('Nehemiah', 1, 1) CNT subs subs
383425 ('Nehemiah', 1, 1) <FRJM subs subs
383426 ('Nehemiah', 1, 1) W conj conj
383427 ('Nehemiah', 1, 1) >NJ prps prps
383428 ('Nehemiah', 1, 1) HJJTJ verb verb
383429 ('Nehemiah', 1, 1) B prep prep
383430 ('Nehemiah', 1, 1) CWCN nmpr nmpr
383431 ('Nehemiah', 1, 1) H art art
383432 ('Nehemiah', 1, 1) BJRH subs subs
383433 ('Nehemiah', 1, 2) W conj conj
383434 ('Nehemiah', 1, 2) JB> verb verb
383436 ('Nehemiah', 1, 2) >XD subs subs
383437 ('Nehemiah', 1, 2) M prep prep
383438 ('Nehemiah', 1, 2) >XJ subs subs
383439 ('Nehemiah', 1, 2) HW> prps prps
383440 ('Nehemiah', 1, 2) W conj conj
383441 ('Nehemiah', 1, 2) >NCJM subs subs
383442

383840 ('Nehemiah', 2, 6) CGL subs subs
383841 ('Nehemiah', 2, 6) JWCBT verb verb
383842 ('Nehemiah', 2, 6) >YLW subs subs
383843 ('Nehemiah', 2, 6) <D prep prep
383844 ('Nehemiah', 2, 6) MTJ inrg inrg
383845 ('Nehemiah', 2, 6) JHJH verb verb
383847 ('Nehemiah', 2, 6) W conj conj
383848 ('Nehemiah', 2, 6) MTJ inrg inrg
383849 ('Nehemiah', 2, 6) TCWB verb verb
383850 ('Nehemiah', 2, 6) W conj conj
383851 ('Nehemiah', 2, 6) JJVB verb verb
383852 ('Nehemiah', 2, 6) L prep prep
383853 ('Nehemiah', 2, 6) PNJ subs subs
383854 ('Nehemiah', 2, 6) H art art
383855 ('Nehemiah', 2, 6) MLK subs subs
383856 ('Nehemiah', 2, 6) W conj conj
383857 ('Nehemiah', 2, 6) JCLXNJ verb verb
383858 ('Nehemiah', 2, 6) W conj conj
383859 ('Nehemiah', 2, 6) >TNH verb verb
383860 ('Nehemiah', 2, 6) LW prep prep
383861 ('Nehemiah', 2, 6) ZMN subs subs
383862 ('Nehemiah', 2, 7) W conj conj
383863 ('Nehemiah', 2, 7) >WMR verb verb
383864 ('Nehemiah', 2, 7) L prep prep
383866 ('Nehemiah', 2, 7) MLK subs subs
383867 ('

384210 ('Nehemiah', 2, 19) <RBJ adjv adjv
384211 ('Nehemiah', 2, 19) W conj conj
384212 ('Nehemiah', 2, 19) JL<GW verb verb
384213 ('Nehemiah', 2, 19) LNW prep prep
384214 ('Nehemiah', 2, 19) W conj conj
384215 ('Nehemiah', 2, 19) JBZW verb verb
384216 ('Nehemiah', 2, 19) <LJNW prep prep
384217 ('Nehemiah', 2, 19) W conj conj
384218 ('Nehemiah', 2, 19) J>MRW verb verb
384219 ('Nehemiah', 2, 19) MH prin prin
384220 ('Nehemiah', 2, 19) H art art
384221 ('Nehemiah', 2, 19) DBR subs subs
384222 ('Nehemiah', 2, 19) H art art
384223 ('Nehemiah', 2, 19) ZH prde prde
384224 ('Nehemiah', 2, 19) >CR conj conj
384225 ('Nehemiah', 2, 19) >TM prps prps
384226 ('Nehemiah', 2, 19) <FJM verb verb
384229 ('Nehemiah', 2, 19) H art art
384230 ('Nehemiah', 2, 19) MLK subs subs
384233 ('Nehemiah', 2, 20) W conj conj
384234 ('Nehemiah', 2, 20) >CJB verb verb
384235 ('Nehemiah', 2, 20) >WTM prep prep
384236 ('Nehemiah', 2, 20) DBR subs subs
384237 ('Nehemiah', 2, 20) W conj conj
384238 ('Nehemiah', 2, 20) >W

384632 ('Nehemiah', 3, 19) <ZR nmpr nmpr
384633 ('Nehemiah', 3, 19) BN subs subs
384634 ('Nehemiah', 3, 19) JCW< nmpr nmpr
384635 ('Nehemiah', 3, 19) FR subs subs
384636 ('Nehemiah', 3, 19) H art art
384637 ('Nehemiah', 3, 19) MYPH nmpr nmpr
384638 ('Nehemiah', 3, 19) MDH subs subs
384639 ('Nehemiah', 3, 19) CNJT adjv adjv
384640 ('Nehemiah', 3, 19) M prep prep
384641 ('Nehemiah', 3, 19) NGD subs subs
384643 ('Nehemiah', 3, 19) H art art
384644 ('Nehemiah', 3, 19) NCQ subs subs
384645 ('Nehemiah', 3, 19) H art art
384647 ('Nehemiah', 3, 20) >XRJW subs subs
384648 ('Nehemiah', 3, 20) HXRH verb verb
384649 ('Nehemiah', 3, 20) HXZJQ verb verb
384651 ('Nehemiah', 3, 20) BN subs subs
384652 ('Nehemiah', 3, 20) ZBJ nmpr nmpr
384653 ('Nehemiah', 3, 20) MDH subs subs
384654 ('Nehemiah', 3, 20) CNJT adjv adjv
384655 ('Nehemiah', 3, 20) MN prep prep
384656 ('Nehemiah', 3, 20) H art art
384657 ('Nehemiah', 3, 20) MQYW< subs subs
384658 ('Nehemiah', 3, 20) <D prep prep
384659 ('Nehemiah', 3, 20) P

385069 ('Nehemiah', 4, 3) JWMM advb advb
385070 ('Nehemiah', 4, 3) W conj conj
385071 ('Nehemiah', 4, 3) LJLH subs subs
385072 ('Nehemiah', 4, 3) M prep prep
385073 ('Nehemiah', 4, 3) PNJHM subs subs
385074 ('Nehemiah', 4, 4) W conj conj
385075 ('Nehemiah', 4, 4) J>MR verb verb
385076 ('Nehemiah', 4, 4) JHWDH nmpr nmpr
385077 ('Nehemiah', 4, 4) KCL verb verb
385078 ('Nehemiah', 4, 4) KX subs subs
385079 ('Nehemiah', 4, 4) H art art
385080 ('Nehemiah', 4, 4) SBL subs subs
385081 ('Nehemiah', 4, 4) W conj conj
385082 ('Nehemiah', 4, 4) H art art
385083 ('Nehemiah', 4, 4) <PR subs subs
385084 ('Nehemiah', 4, 4) HRBH verb verb
385085 ('Nehemiah', 4, 4) W conj conj
385086 ('Nehemiah', 4, 4) >NXNW prps prps
385087 ('Nehemiah', 4, 4) L> nega nega
385088 ('Nehemiah', 4, 4) NWKL verb verb
385089 ('Nehemiah', 4, 4) L prep prep
385090 ('Nehemiah', 4, 4) BNWT verb verb
385091 ('Nehemiah', 4, 4) B prep prep
385093 ('Nehemiah', 4, 4) XWMH subs subs
385094 ('Nehemiah', 4, 5) W conj conj
385095 ('Nehe

385494 ('Nehemiah', 5, 5) BNTJNW subs subs
385495 ('Nehemiah', 5, 5) NKBCWT verb verb
385496 ('Nehemiah', 5, 5) W conj conj
385497 ('Nehemiah', 5, 5) >JN subs subs
385498 ('Nehemiah', 5, 5) L prep prep
385499 ('Nehemiah', 5, 5) >L subs subs
385500 ('Nehemiah', 5, 5) JDNW subs subs
385501 ('Nehemiah', 5, 5) W conj conj
385502 ('Nehemiah', 5, 5) FDTJNW subs subs
385503 ('Nehemiah', 5, 5) W conj conj
385504 ('Nehemiah', 5, 5) KRMJNW subs subs
385505 ('Nehemiah', 5, 5) L prep prep
385506 ('Nehemiah', 5, 5) >XRJM adjv adjv
385507 ('Nehemiah', 5, 6) W conj conj
385508 ('Nehemiah', 5, 6) JXR verb verb
385509 ('Nehemiah', 5, 6) LJ prep prep
385510 ('Nehemiah', 5, 6) M>D subs subs
385511 ('Nehemiah', 5, 6) K prep prep
385512 ('Nehemiah', 5, 6) >CR conj conj
385513 ('Nehemiah', 5, 6) CM<TJ verb verb
385514 ('Nehemiah', 5, 6) >T prep prep
385515 ('Nehemiah', 5, 6) Z<QTM subs subs
385516 ('Nehemiah', 5, 6) W conj conj
385517 ('Nehemiah', 5, 6) >T prep prep
385518 ('Nehemiah', 5, 6) H art art
38551

385914 ('Nehemiah', 6, 1) BNJTJ verb verb
385915 ('Nehemiah', 6, 1) >T prep prep
385916 ('Nehemiah', 6, 1) H art art
385917 ('Nehemiah', 6, 1) XWMH subs subs
385918 ('Nehemiah', 6, 1) W conj conj
385919 ('Nehemiah', 6, 1) L> nega nega
385920 ('Nehemiah', 6, 1) NWTR verb verb
385921 ('Nehemiah', 6, 1) BH prep prep
385922 ('Nehemiah', 6, 1) PRY subs subs
385923 ('Nehemiah', 6, 1) GM advb advb
385924 ('Nehemiah', 6, 1) <D prep prep
385925 ('Nehemiah', 6, 1) H art art
385926 ('Nehemiah', 6, 1) <T subs subs
385927 ('Nehemiah', 6, 1) H art art
385928 ('Nehemiah', 6, 1) HJ> prps prps
385929 ('Nehemiah', 6, 1) DLTWT subs subs
385930 ('Nehemiah', 6, 1) L> nega nega
385931 ('Nehemiah', 6, 1) H<MDTJ verb verb
385932 ('Nehemiah', 6, 1) B prep prep
385934 ('Nehemiah', 6, 1) C<RJM subs subs
385935 ('Nehemiah', 6, 2) W conj conj
385936 ('Nehemiah', 6, 2) JCLX verb verb
385938 ('Nehemiah', 6, 2) W conj conj
385940 ('Nehemiah', 6, 2) >LJ prep prep
385941 ('Nehemiah', 6, 2) L prep prep
385942 ('Nehemiah

386308 ('Nehemiah', 6, 18) CBW<H subs subs
386309 ('Nehemiah', 6, 18) LW prep prep
386310 ('Nehemiah', 6, 18) KJ conj conj
386311 ('Nehemiah', 6, 18) XTN subs subs
386312 ('Nehemiah', 6, 18) HW> prps prps
386313 ('Nehemiah', 6, 18) L prep prep
386315 ('Nehemiah', 6, 18) BN subs subs
386316 ('Nehemiah', 6, 18) >RX nmpr nmpr
386317 ('Nehemiah', 6, 18) W conj conj
386318 ('Nehemiah', 6, 18) JHWXNN nmpr nmpr
386319 ('Nehemiah', 6, 18) BNW subs subs
386320 ('Nehemiah', 6, 18) LQX verb verb
386321 ('Nehemiah', 6, 18) >T prep prep
386322 ('Nehemiah', 6, 18) BT subs subs
386323 ('Nehemiah', 6, 18) MCLM nmpr nmpr
386324 ('Nehemiah', 6, 18) BN subs subs
386325 ('Nehemiah', 6, 18) BRKJH nmpr nmpr
386326 ('Nehemiah', 6, 19) GM advb advb
386327 ('Nehemiah', 6, 19) VWBTJW subs subs
386328 ('Nehemiah', 6, 19) HJW verb verb
386330 ('Nehemiah', 6, 19) L prep prep
386331 ('Nehemiah', 6, 19) PNJ subs subs
386332 ('Nehemiah', 6, 19) W conj conj
386333 ('Nehemiah', 6, 19) DBRJ subs subs
386334 ('Nehemiah',

386656 ('Nehemiah', 7, 27) CMNH subs subs
386657 ('Nehemiah', 7, 28) >NCJ subs subs
386658 ('Nehemiah', 7, 28) BJT_<ZMWT nmpr nmpr
386659 ('Nehemiah', 7, 28) >RB<JM subs subs
386660 ('Nehemiah', 7, 28) W conj conj
386661 ('Nehemiah', 7, 28) CNJM subs subs
386662 ('Nehemiah', 7, 29) >NCJ subs subs
386663 ('Nehemiah', 7, 29) QRJT_J<RJM nmpr nmpr
386664 ('Nehemiah', 7, 29) KPJRH nmpr nmpr
386665 ('Nehemiah', 7, 29) W conj conj
386666 ('Nehemiah', 7, 29) B>RWT nmpr nmpr
386667 ('Nehemiah', 7, 29) CB< subs subs
386668 ('Nehemiah', 7, 29) M>WT subs subs
386669 ('Nehemiah', 7, 29) >RB<JM subs subs
386670 ('Nehemiah', 7, 29) W conj conj
386671 ('Nehemiah', 7, 29) CLCH subs subs
386672 ('Nehemiah', 7, 30) >NCJ subs subs
386673 ('Nehemiah', 7, 30) H art art
386674 ('Nehemiah', 7, 30) RMH nmpr nmpr
386675 ('Nehemiah', 7, 30) W conj conj
386676 ('Nehemiah', 7, 30) GB< nmpr nmpr
386677 ('Nehemiah', 7, 30) CC subs subs
386678 ('Nehemiah', 7, 30) M>WT subs subs
386679 ('Nehemiah', 7, 30) <FRJM subs s

387060 ('Nehemiah', 7, 69) >BWT subs subs
387061 ('Nehemiah', 7, 69) NTNW verb verb
387062 ('Nehemiah', 7, 69) L prep prep
387064 ('Nehemiah', 7, 69) ML>KH subs subs
387065 ('Nehemiah', 7, 69) H art art
387066 ('Nehemiah', 7, 69) TRCT> subs subs
387067 ('Nehemiah', 7, 69) NTN verb verb
387068 ('Nehemiah', 7, 69) L prep prep
387070 ('Nehemiah', 7, 69) >WYR subs subs
387071 ('Nehemiah', 7, 69) ZHB subs subs
387073 ('Nehemiah', 7, 69) >LP subs subs
387074 ('Nehemiah', 7, 69) MZRQWT subs subs
387075 ('Nehemiah', 7, 69) XMCJM subs subs
387076 ('Nehemiah', 7, 69) KTNWT subs subs
387077 ('Nehemiah', 7, 69) KHNJM subs subs
387078 ('Nehemiah', 7, 69) CLCJM subs subs
387079 ('Nehemiah', 7, 69) W conj conj
387080 ('Nehemiah', 7, 69) XMC subs subs
387081 ('Nehemiah', 7, 69) M>WT subs subs
387082 ('Nehemiah', 7, 70) W conj conj
387083 ('Nehemiah', 7, 70) M prep prep
387084 ('Nehemiah', 7, 70) R>CJ subs subs
387085 ('Nehemiah', 7, 70) H art art
387086 ('Nehemiah', 7, 70) >BWT subs subs
387087 ('Nehe

387482 ('Nehemiah', 8, 11) LWJM adjv adjv
387483 ('Nehemiah', 8, 11) MXCJM verb verb
387484 ('Nehemiah', 8, 11) L prep prep
387485 ('Nehemiah', 8, 11) KL subs subs
387486 ('Nehemiah', 8, 11) H art art
387487 ('Nehemiah', 8, 11) <M subs subs
387488 ('Nehemiah', 8, 11) L prep prep
387489 ('Nehemiah', 8, 11) >MR verb verb
387490 ('Nehemiah', 8, 11) HSW verb verb
387491 ('Nehemiah', 8, 11) KJ conj conj
387492 ('Nehemiah', 8, 11) H art art
387493 ('Nehemiah', 8, 11) JWM subs subs
387495 ('Nehemiah', 8, 11) W conj conj
387496 ('Nehemiah', 8, 11) >L nega nega
387497 ('Nehemiah', 8, 11) T<YBW verb verb
387498 ('Nehemiah', 8, 12) W conj conj
387499 ('Nehemiah', 8, 12) JLKW verb verb
387500 ('Nehemiah', 8, 12) KL subs subs
387501 ('Nehemiah', 8, 12) H art art
387502 ('Nehemiah', 8, 12) <M subs subs
387503 ('Nehemiah', 8, 12) L prep prep
387504 ('Nehemiah', 8, 12) >KL verb verb
387505 ('Nehemiah', 8, 12) W conj conj
387506 ('Nehemiah', 8, 12) L prep prep
387507 ('Nehemiah', 8, 12) CTWT verb verb


387908 ('Nehemiah', 9, 8) N>MN verb verb
387909 ('Nehemiah', 9, 8) L prep prep
387910 ('Nehemiah', 9, 8) PNJK subs subs
387911 ('Nehemiah', 9, 8) W conj conj
387912 ('Nehemiah', 9, 8) KRWT verb verb
387913 ('Nehemiah', 9, 8) <MW prep prep
387914 ('Nehemiah', 9, 8) H art art
387915 ('Nehemiah', 9, 8) BRJT subs subs
387916 ('Nehemiah', 9, 8) L prep prep
387917 ('Nehemiah', 9, 8) TT verb verb
387918 ('Nehemiah', 9, 8) >T prep prep
387919 ('Nehemiah', 9, 8) >RY subs subs
387920 ('Nehemiah', 9, 8) H art art
387921 ('Nehemiah', 9, 8) KN<NJ adjv adjv
387922 ('Nehemiah', 9, 8) H art art
387923 ('Nehemiah', 9, 8) XTJ adjv adjv
387924 ('Nehemiah', 9, 8) H art art
387925 ('Nehemiah', 9, 8) >MRJ adjv adjv
387926 ('Nehemiah', 9, 8) W conj conj
387927 ('Nehemiah', 9, 8) H art art
387928 ('Nehemiah', 9, 8) PRZJ adjv adjv
387929 ('Nehemiah', 9, 8) W conj conj
387930 ('Nehemiah', 9, 8) H art art
387931 ('Nehemiah', 9, 8) JBWSJ adjv adjv
387932 ('Nehemiah', 9, 8) W conj conj
387933 ('Nehemiah', 9, 8) H 

388313 ('Nehemiah', 9, 24) >RY subs subs
388314 ('Nehemiah', 9, 24) W conj conj
388315 ('Nehemiah', 9, 24) TKN< verb verb
388316 ('Nehemiah', 9, 24) L prep prep
388317 ('Nehemiah', 9, 24) PNJHM subs subs
388318 ('Nehemiah', 9, 24) >T prep prep
388319 ('Nehemiah', 9, 24) JCBJ verb verb
388320 ('Nehemiah', 9, 24) H art art
388321 ('Nehemiah', 9, 24) >RY subs subs
388322 ('Nehemiah', 9, 24) H art art
388323 ('Nehemiah', 9, 24) KN<NJM adjv adjv
388324 ('Nehemiah', 9, 24) W conj conj
388325 ('Nehemiah', 9, 24) TTNM verb verb
388326 ('Nehemiah', 9, 24) B prep prep
388327 ('Nehemiah', 9, 24) JDM subs subs
388328 ('Nehemiah', 9, 24) W conj conj
388329 ('Nehemiah', 9, 24) >T prep prep
388330 ('Nehemiah', 9, 24) MLKJHM subs subs
388331 ('Nehemiah', 9, 24) W conj conj
388332 ('Nehemiah', 9, 24) >T prep prep
388333 ('Nehemiah', 9, 24) <MMJ subs subs
388334 ('Nehemiah', 9, 24) H art art
388335 ('Nehemiah', 9, 24) >RY subs subs
388336 ('Nehemiah', 9, 24) L prep prep
388337 ('Nehemiah', 9, 24) <FWT v

388723 ('Nehemiah', 10, 1) H art art
388724 ('Nehemiah', 10, 1) XTWM verb verb
388725 ('Nehemiah', 10, 1) FRJNW subs subs
388727 ('Nehemiah', 10, 1) KHNJNW subs subs
388728 ('Nehemiah', 10, 2) W conj conj
388729 ('Nehemiah', 10, 2) <L prep prep
388730 ('Nehemiah', 10, 2) H art art
388733 ('Nehemiah', 10, 2) H art art
388734 ('Nehemiah', 10, 2) TRCT> subs subs
388735 ('Nehemiah', 10, 2) BN subs subs
388736 ('Nehemiah', 10, 2) XKLJH nmpr nmpr
388737 ('Nehemiah', 10, 2) W conj conj
388738 ('Nehemiah', 10, 2) YDQJH nmpr nmpr
388740 ('Nehemiah', 10, 3) <ZRJH nmpr nmpr
388741 ('Nehemiah', 10, 3) JRMJH nmpr nmpr
388742 ('Nehemiah', 10, 4) PCXWR nmpr nmpr
388743 ('Nehemiah', 10, 4) >MRJH nmpr nmpr
388744 ('Nehemiah', 10, 4) MLKJH nmpr nmpr
388745 ('Nehemiah', 10, 5) XVWC nmpr nmpr
388746 ('Nehemiah', 10, 5) CBNJH nmpr nmpr
388747 ('Nehemiah', 10, 5) MLWK nmpr nmpr
388748 ('Nehemiah', 10, 6) XRM nmpr nmpr
388749 ('Nehemiah', 10, 6) MRMWT nmpr nmpr
388751 ('Nehemiah', 10, 7) DNJ>L nmpr nmpr
3887

389173 ('Nehemiah', 10, 39) >L prep prep
389174 ('Nehemiah', 10, 39) H art art
389175 ('Nehemiah', 10, 39) LCKWT subs subs
389176 ('Nehemiah', 10, 39) L prep prep
389177 ('Nehemiah', 10, 39) BJT subs subs
389178 ('Nehemiah', 10, 39) H art art
389179 ('Nehemiah', 10, 39) >WYR subs subs
389180 ('Nehemiah', 10, 40) KJ conj conj
389181 ('Nehemiah', 10, 40) >L prep prep
389182 ('Nehemiah', 10, 40) H art art
389183 ('Nehemiah', 10, 40) LCKWT subs subs
389184 ('Nehemiah', 10, 40) JBJ>W verb verb
389185 ('Nehemiah', 10, 40) BNJ subs subs
389186 ('Nehemiah', 10, 40) JFR>L nmpr nmpr
389187 ('Nehemiah', 10, 40) W conj conj
389188 ('Nehemiah', 10, 40) BNJ subs subs
389189 ('Nehemiah', 10, 40) H art art
389190 ('Nehemiah', 10, 40) LWJ adjv adjv
389191 ('Nehemiah', 10, 40) >T prep prep
389192 ('Nehemiah', 10, 40) TRWMT subs subs
389193 ('Nehemiah', 10, 40) H art art
389194 ('Nehemiah', 10, 40) DGN subs subs
389195 ('Nehemiah', 10, 40) H art art
389196 ('Nehemiah', 10, 40) TJRWC subs subs
389197 ('Ne

389586 ('Nehemiah', 11, 21) H art art
389587 ('Nehemiah', 11, 21) NTJNJM subs subs
389588 ('Nehemiah', 11, 21) JCBJM verb verb
389589 ('Nehemiah', 11, 21) B prep prep
389591 ('Nehemiah', 11, 21) <PL nmpr nmpr
389592 ('Nehemiah', 11, 21) W conj conj
389593 ('Nehemiah', 11, 21) YJX> nmpr nmpr
389594 ('Nehemiah', 11, 21) W conj conj
389596 ('Nehemiah', 11, 21) <L prep prep
389597 ('Nehemiah', 11, 21) H art art
389598 ('Nehemiah', 11, 21) NTJNJM subs subs
389599 ('Nehemiah', 11, 22) W conj conj
389600 ('Nehemiah', 11, 22) PQJD subs subs
389601 ('Nehemiah', 11, 22) H art art
389602 ('Nehemiah', 11, 22) LWJM adjv adjv
389603 ('Nehemiah', 11, 22) B prep prep
389604 ('Nehemiah', 11, 22) JRWCLM nmpr nmpr
389605 ('Nehemiah', 11, 22) <ZJ nmpr nmpr
389606 ('Nehemiah', 11, 22) BN subs subs
389607 ('Nehemiah', 11, 22) BNJ nmpr nmpr
389608 ('Nehemiah', 11, 22) BN subs subs
389609 ('Nehemiah', 11, 22) XCBJH nmpr nmpr
389610 ('Nehemiah', 11, 22) BN subs subs
389611 ('Nehemiah', 11, 22) MTNJH nmpr nmpr


390037 ('Nehemiah', 12, 27) W conj conj
390038 ('Nehemiah', 12, 27) B prep prep
390039 ('Nehemiah', 12, 27) XNKT subs subs
390040 ('Nehemiah', 12, 27) XWMT subs subs
390041 ('Nehemiah', 12, 27) JRWCLM nmpr nmpr
390042 ('Nehemiah', 12, 27) BQCW verb verb
390043 ('Nehemiah', 12, 27) >T prep prep
390044 ('Nehemiah', 12, 27) H art art
390045 ('Nehemiah', 12, 27) LWJM adjv adjv
390046 ('Nehemiah', 12, 27) M prep prep
390047 ('Nehemiah', 12, 27) KL subs subs
390048 ('Nehemiah', 12, 27) MQWMTM subs subs
390049 ('Nehemiah', 12, 27) L prep prep
390050 ('Nehemiah', 12, 27) HBJ>M verb verb
390051 ('Nehemiah', 12, 27) L prep prep
390052 ('Nehemiah', 12, 27) JRWCLM nmpr nmpr
390053 ('Nehemiah', 12, 27) L prep prep
390054 ('Nehemiah', 12, 27) <FT verb verb
390056 ('Nehemiah', 12, 27) W conj conj
390057 ('Nehemiah', 12, 27) FMXH subs subs
390058 ('Nehemiah', 12, 27) W conj conj
390059 ('Nehemiah', 12, 27) B prep prep
390060 ('Nehemiah', 12, 27) TWDWT subs subs
390061 ('Nehemiah', 12, 27) W conj conj


390466 ('Nehemiah', 12, 46) KJ conj conj
390467 ('Nehemiah', 12, 46) B prep prep
390468 ('Nehemiah', 12, 46) JMJ subs subs
390469 ('Nehemiah', 12, 46) DWJD nmpr nmpr
390470 ('Nehemiah', 12, 46) W conj conj
390471 ('Nehemiah', 12, 46) >SP nmpr nmpr
390472 ('Nehemiah', 12, 46) M prep prep
390473 ('Nehemiah', 12, 46) QDM subs subs
390474 ('Nehemiah', 12, 46) R>C subs subs
390475 ('Nehemiah', 12, 46) H art art
390476 ('Nehemiah', 12, 46) MCRRJM verb verb
390477 ('Nehemiah', 12, 46) W conj conj
390478 ('Nehemiah', 12, 46) CJR subs subs
390479 ('Nehemiah', 12, 46) THLH subs subs
390480 ('Nehemiah', 12, 46) W conj conj
390481 ('Nehemiah', 12, 46) HDWT verb verb
390482 ('Nehemiah', 12, 46) L prep prep
390483 ('Nehemiah', 12, 46) >LHJM subs subs
390484 ('Nehemiah', 12, 47) W conj conj
390485 ('Nehemiah', 12, 47) KL subs subs
390486 ('Nehemiah', 12, 47) JFR>L nmpr nmpr
390487 ('Nehemiah', 12, 47) B prep prep
390488 ('Nehemiah', 12, 47) JMJ subs subs
390489 ('Nehemiah', 12, 47) ZRBBL nmpr nmpr
39

390883 ('Nehemiah', 13, 16) JCBW verb verb
390884 ('Nehemiah', 13, 16) BH prep prep
390885 ('Nehemiah', 13, 16) MBJ>JM verb verb
390887 ('Nehemiah', 13, 16) W conj conj
390888 ('Nehemiah', 13, 16) KL subs subs
390890 ('Nehemiah', 13, 16) W conj conj
390891 ('Nehemiah', 13, 16) MKRJM verb verb
390892 ('Nehemiah', 13, 16) B prep prep
390894 ('Nehemiah', 13, 16) CBT subs subs
390895 ('Nehemiah', 13, 16) L prep prep
390896 ('Nehemiah', 13, 16) BNJ subs subs
390897 ('Nehemiah', 13, 16) JHWDH nmpr nmpr
390898 ('Nehemiah', 13, 16) W conj conj
390899 ('Nehemiah', 13, 16) B prep prep
390900 ('Nehemiah', 13, 16) JRWCLM nmpr nmpr
390901 ('Nehemiah', 13, 17) W conj conj
390902 ('Nehemiah', 13, 17) >RJBH verb verb
390903 ('Nehemiah', 13, 17) >T prep prep
390904 ('Nehemiah', 13, 17) XRJ subs subs
390905 ('Nehemiah', 13, 17) JHWDH nmpr nmpr
390906 ('Nehemiah', 13, 17) W conj conj
390907 ('Nehemiah', 13, 17) >MRH verb verb
390908 ('Nehemiah', 13, 17) LHM prep prep
390909 ('Nehemiah', 13, 17) MH prin p