# Find phrase boundaries in BH

The model in this notebook is designed to find clause and phrase boundaries in chunks of BH text, based on the division in phrases in the ETCBC database.
The model assumes that the text has been analyzed on word level already.

Each input chunk consists of 20 words. Each word is represented by its part of speech. For instance, a typical chunk has the following structure (the example shows the first 20 words of the book of Genesis):

['prep', 'subs', 'verb', 'subs', 'prep', 'art', 'subs', 'conj', 'prep', 'art', 'subs', 'conj', 'art', 'subs', 'verb', 'subs',  'conj',  'subs','conj', 'subs']

The corresponding output is:

['\t', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', '\n']

Here every 'x' represents a word in the input sequence, and 'p' marks the end of a phrase. '\t' and '\n' are start and stop symbols. 

In the following input chunk we have moved forward one word:

Input:

 ['subs', 'verb', 'subs', 'prep', 'art', 'subs', 'conj', 'prep', 'art', 'subs', 'conj', 'art', 'subs', 'verb', 'subs', 'conj', 'subs', 'conj', 'subs', 'prep']

Output:

['\t', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', '\n']


In [212]:
from keras.models import Model
from keras.layers import Input, LSTM, Dense
import numpy as np

from sklearn.utils import shuffle

In [3]:
from tf.app import use
A = use('bhsa', hoist=globals())
#A.displaySetup(extraFeatures='g_cons')

TF app is up-to-date.
Using annotation/app-bhsa commit 7f353d587f4befb6efe1742831e28f301d2b3cea (=latest)
  in C:\Users\geitb/text-fabric-data/__apps__/bhsa.
Using etcbc/bhsa/tf - c r1.5 in C:\Users\geitb/text-fabric-data
Using etcbc/phono/tf - c r1.2 in C:\Users\geitb/text-fabric-data
Using etcbc/parallels/tf - c r1.2 in C:\Users\geitb/text-fabric-data


**Documentation:** <a target="_blank" href="https://etcbc.github.io/bhsa" title="provenance of BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis">BHSA</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Writing/Hebrew" title="('Hebrew characters and transcriptions',)">Character table</a> <a target="_blank" href="https://etcbc.github.io/bhsa/features/hebrew/c/0_home.html" title="BHSA feature documentation">Feature docs</a> <a target="_blank" href="https://github.com/annotation/app-bhsa" title="bhsa API documentation">bhsa API</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Api/Fabric/" title="text-fabric-api">Text-Fabric API 7.3.15</a> <a target="_blank" href="https://annotation.github.io/text-fabric/Use/Search/" title="Search Templates Introduction and Reference">Search Reference</a>

A train and test set are defined. The model is trained on all the books of the MT, except Jonah. The model will be used to predict parts of speech for this book.

In [44]:
for bo in F.otype.s("book"):
    print(F.book.v(bo))

Genesis
Exodus
Leviticus
Numeri
Deuteronomium
Josua
Judices
Samuel_I
Samuel_II
Reges_I
Reges_II
Jesaia
Jeremia
Ezechiel
Hosea
Joel
Amos
Obadia
Jona
Micha
Nahum
Habakuk
Zephania
Haggai
Sacharia
Maleachi
Psalmi
Iob
Proverbia
Ruth
Canticum
Ecclesiastes
Threni
Esther
Daniel
Esra
Nehemia
Chronica_I
Chronica_II


The data are prepared.

In [215]:
def prepare_train_data():
    """"
    books is a list containing the books of the training set.
    The function returns:
    input_clauses is a list containing strings with the text of BH clauses
    output_pos is a list containing lists with all the pos of BH clauses
    input_chars is a list containing the characters occurring in the input_clauses (the input vocabulary)
    output_vocab is a list containing all the pos occurring in the bhsa
    max_len_input is the maximum length of all the input clauses in number of characters
    max_len_output is the maximum length of all the output clauses in number of phrases (+2, because a 
    start and stop sign are added)
    """

    input_clauses = []
    output_phrases = []
    input_vocab = set()
    output_vocab = set()

    for bo in F.otype.s("book"): 

        if F.book.v(bo) not in {"Genesis", "Exodus", "Leviticus", "Numeri", "Deuteronomium", "Josua", "Judices", "Samuel_I", "Samuel_II", "Reges_I", "Reges_II"}:
            continue
        
        words_in_book = L.d(bo, "word")
        
        for wo in range(words_in_book[0], words_in_book[-1] - 20):
            input_chunk = [F.sp.v(wo2) for wo2 in range(wo, wo+20)]
            
            output_prep = []
            for word in range(wo, wo+20):
                phrase = L.u(word, "phrase")[0]
                words_in_ph = L.d(phrase, "word")
                
                output_prep.append('x')
                input_vocab.add(F.sp.v(word))
                
                if word == words_in_ph[-1]:
                    output_prep.append("p")

            output_chunk = ['\t']
            for elem in output_prep:
                output_chunk.append(elem)
            output_chunk.append('\n')
    
            input_clauses.append(input_chunk)
            output_phrases.append(output_chunk)
    
    input_chars = sorted(list(input_vocab))
    
    output_vocab.add('x')
    output_vocab.add('\t')
    output_vocab.add('\n')
    output_vocab.add('p')
    output_vocab.add('c')
    output_vocab = sorted(list(output_vocab))
    
    max_len_input = max([len(clause) for clause in input_clauses])
    max_len_output = max([len(output_phr) for output_phr in output_phrases])
    
    # shuffle the data
    input_clauses, output_phrases = shuffle(input_clauses, output_phrases)
    
    return input_clauses, output_phrases, input_vocab, output_vocab, max_len_input, max_len_output

In [198]:
def prepare_test_data():
    """
    books is a list containing the test books
    The function returns:
    input_clauses, a list containing the text of clauses in the test books
    """
    input_clauses_test = []
    outputs_test = []
    for bo in F.otype.s("book"): 

        if F.book.v(bo) != "Jona":
            continue
        
        words_in_book = L.d(bo, "word")
        
        for wo in range(words_in_book[0], words_in_book[-1] - 20):
            input_test = [F.sp.v(wo2) for wo2 in range(wo, wo+20)]    
            input_clauses_test.append(input_test)
        
        
            output_prep = []
            for word in range(wo, wo+20):
                phrase = L.u(word, "phrase")[0]
                words_in_ph = L.d(phrase, "word")
                
                output_prep.append('x')
                
                if word == words_in_ph[-1]:
                    output_prep.append("p")
                    
            outputs_test.append(output_prep)
    
    return input_clauses_test, outputs_test

In [200]:
def create_dicts(input_vocab, output_vocab):
    """
    The network can only handle numeric data. This function provides four dicts. 
    Two of them map between integers and the input characters (one dict for every direction), the other two 
    map between integers and parts of speech.
    """

    
    input_idx2char = {}
    input_char2idx = {}

    for k, v in enumerate(input_chars):
        input_idx2char[k] = v
        input_char2idx[v] = k
        
    output_idx2char = {}
    output_char2idx = {}
    
    for k, v in enumerate(output_vocab):
        output_idx2char[k] = v
        output_char2idx[v] = k
     
    
    return input_idx2char, input_char2idx, output_idx2char, output_char2idx

In [201]:
def one_hot_encode(nb_samples, max_len_input, max_len_output, input_chars, output_vocab, input_char2idx, output_char2idx, input_clauses, output_pos):
    """
    Categorical data are generally one-hot encoded in neural networks, which is done here.
    
    """

    tokenized_input_data = np.zeros(shape = (nb_samples,max_len_input,len(input_chars)), dtype='float32')
    tokenized_output = np.zeros(shape = (nb_samples,max_len_output,len(output_vocab)), dtype='float32')
    target_data = np.zeros((nb_samples, max_len_output, len(output_vocab)),dtype='float32')

    for i in range(nb_samples):
        for k, ch in enumerate(input_clauses[i]):
            tokenized_input_data[i, k, input_char2idx[ch]] = 1
        
        for k, ch in enumerate(output_pos[i]):
            tokenized_output[i, k, output_char2idx[ch]] = 1

            # decoder_target_data will be ahead by one timestep and will not include the start character.
            if k > 0:
                target_data[i, k-1, output_char2idx[ch]] = 1
                
    return tokenized_input_data, tokenized_output, target_data

In [202]:
def define_LSTM_model(input_chars, output_vocab):
    """
    
    
    """

    # Encoder model

    encoder_input = Input(shape=(None,len(input_chars)))
    encoder_LSTM = LSTM(512,activation = 'relu',return_state = True, return_sequences=True)(encoder_input)
    encoder_LSTM = LSTM(512,return_state = True)(encoder_LSTM)
    encoder_outputs, encoder_h, encoder_c = encoder_LSTM
    encoder_states = [encoder_h, encoder_c]
    
    # Decoder model

    decoder_input = Input(shape=(None,len(output_vocab)))
    decoder_LSTM = LSTM(512, return_sequences=True, return_state = True)
    decoder_out, _ , _ = decoder_LSTM(decoder_input, initial_state=encoder_states)
    decoder_dense = Dense(len(output_vocab), activation='softmax')
    decoder_out = decoder_dense (decoder_out)
    
    model = Model(inputs=[encoder_input, decoder_input],outputs=[decoder_out])

    model.summary()

    return encoder_input, encoder_states, decoder_input, decoder_LSTM, decoder_dense, model

In [203]:
def compile_and_train(model, tokenized_input, tokenized_output, batch_size, epochs, validation_split):

    model.compile(optimizer='adam', loss='categorical_crossentropy')
    model.fit(x=[tokenized_input,tokenized_output], 
              y=target_data,
              batch_size=batch_size,
              epochs=epochs,
              validation_split=validation_split)
    
    return model

In [216]:
nb_samples = 200000

input_clauses, output_pos, input_chars, output_vocab, max_len_input, max_len_output = prepare_train_data()
input_idx2char, input_char2idx, output_idx2char, output_char2idx = create_dicts(input_chars, output_vocab)
tokenized_input, tokenized_output, target_data = one_hot_encode(nb_samples, max_len_input, max_len_output, input_chars, output_vocab, input_char2idx, output_char2idx, input_clauses, output_pos)

In [217]:
test_clauses, output_test = prepare_test_data()
tokenized_test_data, _, _ = one_hot_encode(len(test_clauses), max_len_input, max_len_output, input_chars, output_vocab, input_char2idx, output_char2idx, test_clauses, output_pos)

In [218]:
len(input_clauses)

211838

In [219]:
encoder_input, encoder_states, decoder_input, decoder_LSTM, decoder_dense, model = define_LSTM_model(input_chars, output_vocab)
model = compile_and_train(model, tokenized_input, tokenized_output, 512, 11, 0.1)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_25 (InputLayer)           (None, None, 14)     0                                            
__________________________________________________________________________________________________
lstm_19 (LSTM)                  [(None, None, 512),  1079296     input_25[0][0]                   
__________________________________________________________________________________________________
input_26 (InputLayer)           (None, None, 5)      0                                            
__________________________________________________________________________________________________
lstm_20 (LSTM)                  [(None, 512), (None, 2099200     lstm_19[0][0]                    
                                                                 lstm_19[0][1]                    
          

In [220]:
# Inference models for testing

# Encoder inference model
encoder_model_inf = Model(encoder_input, encoder_states)

# Decoder inference model
decoder_state_input_h = Input(shape=(512,))
decoder_state_input_c = Input(shape=(512,))
decoder_input_states = [decoder_state_input_h, decoder_state_input_c]

decoder_out, decoder_h, decoder_c = decoder_LSTM(decoder_input, 
                                                 initial_state=decoder_input_states)

decoder_states = [decoder_h , decoder_c]

decoder_out = decoder_dense(decoder_out)

decoder_model_inf = Model(inputs=[decoder_input] + decoder_input_states,
                          outputs=[decoder_out] + decoder_states )

In [221]:
def decode_seq(inp_seq):
    
    # Initial states value is coming from the encoder 
    states_val = encoder_model_inf.predict(inp_seq)
    
    target_seq = np.zeros((1, 1, len(output_vocab)))
    target_seq[0, 0, output_char2idx['\t']] = 1
    
    pred_pos = []
    stop_condition = False
    
    while not stop_condition:
        
        decoder_out, decoder_h, decoder_c = decoder_model_inf.predict(x=[target_seq] + states_val)
        max_val_index = np.argmax(decoder_out[0,-1,:])
        sampled_out_char = output_idx2char[max_val_index]
        pred_pos.append(sampled_out_char)
        
        if (sampled_out_char == '\n'):
            stop_condition = True
        
        target_seq = np.zeros((1, 1, len(output_vocab)))
        target_seq[0, 0, max_val_index] = 1
        
        states_val = [decoder_h, decoder_c]
        
    return pred_pos

In [222]:
for seq_index in range(200):
    inp_seq = tokenized_test_data[seq_index:seq_index+1]
    
    pred_pos = decode_seq(inp_seq)
    print('-')
    print('Input clause:', test_clauses[seq_index])
    print('Predicted phrase boundaries:', pred_pos[:-1])
    print('True phrase boundaries', output_test[seq_index])

-
Input clause: ['conj', 'verb', 'subs', 'nmpr', 'prep', 'nmpr', 'subs', 'nmpr', 'prep', 'verb', 'verb', 'verb', 'prep', 'nmpr', 'art', 'subs', 'art', 'adjv', 'conj', 'verb']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p']
-
Input clause: ['verb', 'subs', 'nmpr', 'prep', 'nmpr', 'subs', 'nmpr', 'prep', 'verb', 'verb', 'verb', 'prep', 'nmpr', 'art', 'subs', 'art', 'adjv', 'conj', 'verb', 'prep']
Predicted phrase boundaries: ['x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
True phrase boundaries ['x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x

-
Input clause: ['subs', 'art', 'adjv', 'conj', 'verb', 'prep', 'conj', 'verb', 'subs', 'prep', 'subs', 'conj', 'verb', 'nmpr', 'prep', 'verb', 'nmpr', 'prep', 'prep', 'subs']
Predicted phrase boundaries: ['x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x']
True phrase boundaries ['x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'x']
-
Input clause: ['art', 'adjv', 'conj', 'verb', 'prep', 'conj', 'verb', 'subs', 'prep', 'subs', 'conj', 'verb', 'nmpr', 'prep', 'verb', 'nmpr', 'prep', 'prep', 'subs', 'nmpr']
Predicted phrase boundaries: ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p']
True phrase boundaries ['x', 'x', 'p', 'x', 'p', 'x', 

-
Input clause: ['nmpr', 'prep', 'prep', 'subs', 'nmpr', 'conj', 'verb', 'nmpr', 'conj', 'verb', 'subs', 'verb', 'nmpr', 'conj', 'verb', 'subs', 'conj', 'verb', 'prep', 'prep']
Predicted phrase boundaries: ['x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
True phrase boundaries ['x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
-
Input clause: ['prep', 'prep', 'subs', 'nmpr', 'conj', 'verb', 'nmpr', 'conj', 'verb', 'subs', 'verb', 'nmpr', 'conj', 'verb', 'subs', 'conj', 'verb', 'prep', 'prep', 'verb']
Predicted phrase boundaries: ['x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p']
True phrase boundaries ['x', 'x'

-
Input clause: ['subs', 'conj', 'verb', 'prep', 'prep', 'verb', 'prep', 'nmpr', 'prep', 'prep', 'subs', 'nmpr', 'conj', 'nmpr', 'verb', 'subs', 'adjv', 'prep', 'art', 'subs']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'p']
-
Input clause: ['conj', 'verb', 'prep', 'prep', 'verb', 'prep', 'nmpr', 'prep', 'prep', 'subs', 'nmpr', 'conj', 'nmpr', 'verb', 'subs', 'adjv', 'prep', 'art', 'subs', 'conj']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 

-
Input clause: ['art', 'subs', 'conj', 'verb', 'subs', 'adjv', 'prep', 'art', 'subs', 'conj', 'art', 'subs', 'verb', 'prep', 'verb', 'conj', 'verb', 'art', 'subs', 'conj']
Predicted phrase boundaries: ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p']
-
Input clause: ['subs', 'conj', 'verb', 'subs', 'adjv', 'prep', 'art', 'subs', 'conj', 'art', 'subs', 'verb', 'prep', 'verb', 'conj', 'verb', 'art', 'subs', 'conj', 'verb']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 

-
Input clause: ['conj', 'verb', 'art', 'subs', 'conj', 'verb', 'subs', 'prep', 'subs', 'conj', 'verb', 'prep', 'art', 'subs', 'conj', 'prep', 'art', 'subs', 'prep', 'art']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'x']
-
Input clause: ['verb', 'art', 'subs', 'conj', 'verb', 'subs', 'prep', 'subs', 'conj', 'verb', 'prep', 'art', 'subs', 'conj', 'prep', 'art', 'subs', 'prep', 'art', 'subs']
Predicted phrase boundaries: ['x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', '

-
Input clause: ['art', 'subs', 'prep', 'art', 'subs', 'prep', 'verb', 'prep', 'prep', 'conj', 'nmpr', 'verb', 'prep', 'subs', 'art', 'subs', 'conj', 'verb', 'conj', 'verb']
Predicted phrase boundaries: ['x', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
-
Input clause: ['subs', 'prep', 'art', 'subs', 'prep', 'verb', 'prep', 'prep', 'conj', 'nmpr', 'verb', 'prep', 'subs', 'art', 'subs', 'conj', 'verb', 'conj', 'verb', 'conj']
Predicted phrase boundaries: ['x', 'p', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p'

-
Input clause: ['subs', 'conj', 'verb', 'conj', 'verb', 'conj', 'verb', 'prep', 'subs', 'art', 'subs', 'conj', 'verb', 'prep', 'prin', 'prep', 'verb', 'verb', 'verb', 'prep']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
-
Input clause: ['conj', 'verb', 'conj', 'verb', 'conj', 'verb', 'prep', 'subs', 'art', 'subs', 'conj', 'verb', 'prep', 'prin', 'prep', 'verb', 'verb', 'verb', 'prep', 'subs']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p']
True phrase boundaries ['x', 

-
Input clause: ['prep', 'verb', 'verb', 'verb', 'prep', 'subs', 'advb', 'verb', 'art', 'subs', 'prep', 'conj', 'nega', 'verb', 'conj', 'verb', 'subs', 'prep', 'subs', 'verb']
Predicted phrase boundaries: ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p']
-
Input clause: ['verb', 'verb', 'verb', 'prep', 'subs', 'advb', 'verb', 'art', 'subs', 'prep', 'conj', 'nega', 'verb', 'conj', 'verb', 'subs', 'prep', 'subs', 'verb', 'conj']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase bo

-
Input clause: ['prep', 'subs', 'verb', 'conj', 'verb', 'subs', 'conj', 'verb', 'prep', 'conj', 'prep', 'prin', 'art', 'subs', 'art', 'prde', 'prep', 'conj', 'verb', 'subs']
Predicted phrase boundaries: ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
-
Input clause: ['subs', 'verb', 'conj', 'verb', 'subs', 'conj', 'verb', 'prep', 'conj', 'prep', 'prin', 'art', 'subs', 'art', 'prde', 'prep', 'conj', 'verb', 'subs', 'conj']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x

-
Input clause: ['conj', 'verb', 'subs', 'conj', 'verb', 'art', 'subs', 'prep', 'nmpr', 'conj', 'verb', 'prep', 'verb', 'intj', 'prep', 'prep', 'conj', 'prep', 'prin', 'art']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'p', 'x']
-
Input clause: ['verb', 'subs', 'conj', 'verb', 'art', 'subs', 'prep', 'nmpr', 'conj', 'verb', 'prep', 'verb', 'intj', 'prep', 'prep', 'conj', 'prep', 'prin', 'art', 'subs']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'p

-
Input clause: ['prin', 'art', 'subs', 'art', 'prde', 'prep', 'prin', 'subs', 'conj', 'prep', 'inrg', 'verb', 'prin', 'subs', 'conj', 'inrg', 'prep', 'prde', 'subs', 'prps']
Predicted phrase boundaries: ['x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'p', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p']
-
Input clause: ['art', 'subs', 'art', 'prde', 'prep', 'prin', 'subs', 'conj', 'prep', 'inrg', 'verb', 'prin', 'subs', 'conj', 'inrg', 'prep', 'prde', 'subs', 'prps', 'conj']
Predicted phrase boundaries: ['x', 'x', 'x', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p']
True phrase boundaries ['x', 'x', 'x', 'x', 'p', 'x', 'p', 'x

Input clause: ['prep', 'prde', 'subs', 'prps', 'conj', 'verb', 'prep', 'adjv', 'prps', 'conj', 'prep', 'nmpr', 'subs', 'art', 'subs', 'prps', 'verb', 'conj', 'verb', 'prep']
Predicted phrase boundaries: ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
True phrase boundaries ['x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x']
-
Input clause: ['prde', 'subs', 'prps', 'conj', 'verb', 'prep', 'adjv', 'prps', 'conj', 'prep', 'nmpr', 'subs', 'art', 'subs', 'prps', 'verb', 'conj', 'verb', 'prep', 'art']
Predicted phrase boundaries: ['x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x', 'x', 'x', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'p', 'x', 'x']
True phrase boundaries ['x', 'p', 'x', 'p', 'x', 'p', 'x'