## Improving Lexical Simplification Using State of the Art Lexical Complexity Prediction Models
#### Demo notebook
This notebook provides a demonstration of simplifying sentence with multi-word expressions. It is tested on a system with the following specifications:
<ol>
    <li>OS: Linux x86-64</li>
    <li>CPU: 3.30 Ghz x 8</li>
    <li>RAM: 40 GiB</li>
    <li>Hard Drive: 20 GiB</li>
    <li>GPU: NVIDIA Corporation GA104M (CUDA compute capability: 8.6)</li>
 </ol>

#### Run the following two cells to import packages and settings

In [1]:
 # imports
import numpy as np
import torch
from transformers import BertTokenizer
from tqdm import tqdm
import re
import codecs

from CWIs.complex_labeller import Complexity_labeller
from plainifier.plainify import *

import warnings
warnings.filterwarnings('ignore')

In [2]:
# settings
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
seed = 1234
np.random.seed(seed)
torch.manual_seed(seed)

two_gram_mwes_list = './CWIs/2_gram_mwe_50.txt'
three_gram_mwes_list = './CWIs/3_gram_mwe_25.txt'
four_gram_mwes_list = './CWIs/4_gram_mwe_8.txt'
pretrained_model_path = './CWIs/cwi_seq.model'
temp_path = './CWIs/temp_file.txt'

path = './plainifier/'
premodel = 'bert-large-uncased-whole-word-masking'
bert_dict = 'tersebert_pytorch_1_0.bin'
embedding = 'crawl-300d-2M-subword.vec'
unigram = 'unigrams-df.tsv'
tokenizer = BertTokenizer.from_pretrained(premodel)
Complexity_labeller_model = Complexity_labeller(pretrained_model_path, temp_path)

2022-04-03 15:49:58.324608: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-03 15:49:58.325438: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-03 15:49:58.325902: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-03 15:49:58.326033: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA 

In [3]:
# loading bert model, word embeddings and unigrams. This process takes 7 minutes
model, similm, tokenfreq, embeddings, vocabulary2 = load_all(path, premodel, bert_dict, embedding, unigram, tokenizer)

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Loading Embeddings


100%|██████████████████████████████| 2000000/2000000 [01:30<00:00, 22001.64it/s]


Loaded Embeddings
Loading Unigrams


100%|██████████████████████████████| 8394369/8394369 [05:53<00:00, 23747.04it/s]

Loaded Unigrams





#### Run the following cell to construct the sentence class

In [15]:
class ComplexSentence:
    # Sentence class
    def __init__(self, sentence, label_model, tokeniser, verbose=True, beam_width=3):
        self.sentence = sentence
        self.tokenised_sentence = self.generate_tokenised_sentence()
#         self.tokenised_sentence = tokeniser.tokenize(self.sentence)
        self.label_model = label_model
        self.verbose = verbose
        self.beam_width = beam_width

        if self.verbose:
            print(f'Untokenised sentence: {self.sentence}')
            print(f'Tokenised sentence: {self.tokenised_sentence}')

        self.label_complex_words()
    
    def generate_tokenised_sentence(self):
        tokens = tokeniseUntokenise(self.sentence, tokenizer)['tokens']
        word_idx = tokeniseUntokenise(self.sentence, tokenizer)['words']
        tokenised_sentence_list = []
        for idx_list in word_idx:
            if len(idx_list)==1:
                tokenised_sentence_list.append(np.array(tokens)[idx_list[0]])
            else:
                word_untokenised = ''
                for idx_list_untokenised in idx_list:
                    word_untokenised += np.array(tokens)[idx_list_untokenised].replace('##', '')
                tokenised_sentence_list.append(word_untokenised)
        return tokenised_sentence_list
    
    def known_complexity(self):
        tokens = tokeniseUntokenise(self.sentence, tokenizer)['tokens']
        word_idx = tokeniseUntokenise(self.sentence, tokenizer)['words']
        known_index = []
        for idx_list in word_idx:
            if len(idx_list)==1 and not re.match(r'^[_\W]+$', tokens[idx_list[0]]):
                #If known label as True
                known_index.append(True)
            else:
                #If unknown label as False
                known_index.append(False)
        return known_index
    
    def label_complex_words(self, init=True):
        # applying complexity labeller to the sentence

        Complexity_labeller.convert_format_string(self.label_model, self.sentence)
        if init:
            self.bin_labels = Complexity_labeller.get_bin_labels(self.label_model)[0]

        # override complexity
        self.bin_labels = np.multiply(self.bin_labels,self.known_complexity())

        self.is_complex = True if np.sum(self.bin_labels) >= 1 else False
        self.probs = Complexity_labeller.get_prob_labels(self.label_model)

        # override complexity
        self.probs = np.multiply(self.probs,self.known_complexity())

        self.complexity_ranking = np.argsort(np.array(self.bin_labels) * np.array(self.probs))[::-1]
        self.most_complex_word = self.tokenised_sentence[self.complexity_ranking[0]]

        if self.verbose:
            print(f'Complex probs: {self.probs}')
            print(f'Binary complexity labels: {self.bin_labels}')

            if self.is_complex:
                print(f'\t Most complex word: {self.most_complex_word} \n')

        if not self.is_complex:
            print(f'\t Simplificaiton complete or no complex expression found.\n')
    
    def find_MWEs_w_most_complex_word(self, n_gram, filepath):
        # finds the n-gram mwe of the most complex word in the sentence, if any
        # returns: mwe positions or complex word positions
        
        complex_word_pos = self.complexity_ranking[0]

        if complex_word_pos - n_gram + 1 > 0:
            sliding_start = complex_word_pos - n_gram + 1
        else:
            sliding_start = 0
        
        if complex_word_pos + n_gram - 1 < len(self.complexity_ranking):
            sliding_end = complex_word_pos
        else:
            sliding_end = len(self.complexity_ranking) - n_gram

        with open(filepath, 'r') as f:
            mwes = set(f.read().split('\n')) # make set
            avg_mwe_complexity = 0
            for pos in range(sliding_start, sliding_end + 1):
                possible_mwe = ' '.join(self.tokenised_sentence[pos: pos + n_gram])
                
                if possible_mwe in mwes:
                    
                    if np.mean(self.probs[pos:pos+n_gram]) > avg_mwe_complexity:
                        avg_mwe_complexity = np.mean(self.probs[pos:pos+n_gram])
                        valid_mwes_idx = np.arange(pos, pos+n_gram, 1)
                        mwe_found = possible_mwe
                    else:
                        continue
                        
        if avg_mwe_complexity > 0:
            self.idx_to_plainify = valid_mwes_idx
        else:
            self.idx_to_plainify = [complex_word_pos]
        
    
    def find_all_ngram_mwes(self):
        # returns: self.idx_to_plainify the indices of the longest mwe found
        
        if not self.is_complex:
            raise ValueError('Sentence is not complex')
        
        # give priority to longer MWEs
        n_gram_files = {2: two_gram_mwes_list, 3: three_gram_mwes_list, 4:four_gram_mwes_list}
        
        for n in reversed(range(2,5)):
            self.find_MWEs_w_most_complex_word(n, n_gram_files[n])
            
            if len(self.idx_to_plainify) == n: # if such mwe is found
                break
    
    def one_step_plainify(self):
        idx_start = self.idx_to_plainify[0]
        idx_end = self.idx_to_plainify[-1]+1
        print(f'Found complex word or expression: ### {" ".join(self.tokenised_sentence[idx_start:idx_end])} ###. Plainifying...')
        processed_sentence = tokeniseUntokenise(self.sentence, tokenizer)
        forward_result = getTokenReplacement(processed_sentence, idx_start, len(self.idx_to_plainify), 
                                  tokenizer, model, similm, tokenfreq, embeddings, vocabulary2,
                                  verbose=False, backwards=False, maxDepth=3, maxBreadth=16, alpha=(1/9,6/9,2/9))
        backward_result = getTokenReplacement(processed_sentence, idx_start, len(self.idx_to_plainify),
                                  tokenizer, model, similm, tokenfreq, embeddings, vocabulary2, 
                                  verbose=False, backwards=True, maxDepth=3, maxBreadth=16, alpha=(1/9,6/9,2/9))
        words, scores = aggregateResults((forward_result, backward_result))
        words = [w.replace('#', '') for w in words]
        print(f'Suggested top 5 subtitutions: {words[:5]}')
        return words[0].split(' ')
        
    
    def sub_in_sentence(self, substitution):
        # plugs a substitution in the sentence, then updates complexity scores
        substitution_len = len(substitution)
        
        idx_start = self.idx_to_plainify[0]
        idx_end = self.idx_to_plainify[-1]+1
        
        self.tokenised_sentence = self.tokenised_sentence[:idx_start] + substitution + self.tokenised_sentence[idx_end:]
        self.sentence = ' '.join(self.tokenised_sentence)
        self.bin_labels = list(self.bin_labels[:idx_start]) + [0] * substitution_len + list(self.bin_labels[idx_end:])
        self.label_complex_words(init=False)
        print(f'\n\t Sentence after substitution: {self.sentence}\n')
        
    def recursive_greedy_plainify(self, max_steps=float('inf')):
        n = 1
        while self.is_complex and n < max_steps:
            self.find_all_ngram_mwes()
            sub = self.one_step_plainify()
            self.sub_in_sentence(sub)
            n += 1
        print(f'Simplification complete.', end='\r')
    
    def recursive_beam_search_plainfy(self, beam_width):
        pass

#### Example sentences

In [6]:
input_sentence = "Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur, or how likely it is that a proposition is true."
sentence = ComplexSentence(input_sentence, label_model=Complexity_labeller_model, tokeniser=tokenizer, verbose=False)
sentence.recursive_greedy_plainify()

2022-04-03 15:59:00.884765: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


Found complex word or expression: ### descriptions of ###. Plainifying...
Suggested top 5 subtitutions: ['measures of', 'estimates of', 'determination of', 'expression of', 'values of']

	 Sentence after substitution: probability is the branch of mathematics concerning numerical measures of how likely an event is to occur , or how likely it is that a proposition is true .

Found complex word or expression: ### probability ###. Plainifying...
Suggested top 5 subtitutions: ['probability theory', 'probability', '. probability', 'or probability', 'theory']

	 Sentence after substitution: probability theory is the branch of mathematics concerning numerical measures of how likely an event is to occur , or how likely it is that a proposition is true .

Found complex word or expression: ### a proposition ###. Plainifying...
Suggested top 5 subtitutions: ['it', 'a statement', 'something', 'a claim', 'an event']

	 Sentence after substitution: probability theory is the branch of mathematics conc

In [7]:
input_sentence = "I took a sip of coffee and kept working."

sentence = ComplexSentence(input_sentence, label_model=Complexity_labeller_model, tokeniser=tokenizer, verbose=False)
sentence.recursive_greedy_plainify()

Found complex word or expression: ### a sip of coffee ###. Plainifying...
Suggested top 5 subtitutions: ['a break', 'it', 'off', 'over', 'a deep breath']
	 Simplificaiton complete or no complex expression found.


	 Sentence after substitution: i took a break and kept working .

Simplification complete.


In [8]:
input_sentence = "We will first introduce several fundamental concepts."

sentence = ComplexSentence(input_sentence, label_model=Complexity_labeller_model, tokeniser=tokenizer, verbose=False)
sentence.recursive_greedy_plainify()

Found complex word or expression: ### fundamental ###. Plainifying...
Suggested top 5 subtitutions: ['fundamental', 'new', 'basic', 'important', 'key']

	 Sentence after substitution: we will first introduce several fundamental concepts .

Found complex word or expression: ### introduce ###. Plainifying...
Suggested top 5 subtitutions: ['introduce', 'establish', 'define', 'discuss', 'present']

	 Sentence after substitution: we will first introduce several fundamental concepts .

Found complex word or expression: ### concepts ###. Plainifying...
Suggested top 5 subtitutions: ['concepts', 'principles', 'ideas', 'elements', 'questions']
	 Simplificaiton complete or no complex expression found.


	 Sentence after substitution: we will first introduce several fundamental concepts .

Simplification complete.


In [9]:
input_sentence = "A neural network is a series of rules that attempt to recognize patterns in a set of data through a process that is the way the human brain operates."

sentence = ComplexSentence(input_sentence, label_model=Complexity_labeller_model, tokeniser=tokenizer, verbose=False)
sentence.recursive_greedy_plainify()

Found complex word or expression: ### patterns in ###. Plainifying...
Suggested top 5 subtitutions: ['and process', 'and understand', 'patterns in', 'and interpret', 'or process']

	 Sentence after substitution: a neural network is a series of rules that attempt to recognize and process a set of data through a process that is the way the human brain operates .

Found complex word or expression: ### operates ###. Plainifying...
Suggested top 5 subtitutions: ['works', 'operates', 'processes information', 'processes', 'functions']

	 Sentence after substitution: a neural network is a series of rules that attempt to recognize and process a set of data through a process that is the way the human brain works .

Found complex word or expression: ### a process that ###. Plainifying...
Suggested top 5 subtitutions: ['this', 'it', 'that', '. this', 'which']

	 Sentence after substitution: a neural network is a series of rules that attempt to recognize and process a set of data through this is th

In [None]:
# Read Test Data
norm_test_dat = ReadInFile("./evaluation/WikiLargeTurkCorpus/test.8turkers.tok.norm")

sentence = []
for i in tqdm(range(len(norm_test_dat))):
    input_sentence = norm_test_dat[i]

    s = ComplexSentence(input_sentence, label_model=Complexity_labeller_model, tokeniser=tokenizer, verbose=True)
    sentence.append(s.recursive_greedy_plainify(max_steps=2))
              
with codecs.open("test.output", "w", "utf-8-sig") as f:
    for item in sentence:
        f.write("%s\n" % item)

  0%|                                                   | 0/359 [00:00<?, ?it/s]

Untokenised sentence: one side of the armed conflicts is composed mainly of the sudanese military and the janjaweed , a sudanese militia group recruited mostly from the afro - arab abbala tribes of the northern rizeigat region in sudan .
Tokenised sentence: ['one', 'side', 'of', 'the', 'armed', 'conflicts', 'is', 'composed', 'mainly', 'of', 'the', 'sudanese', 'military', 'and', 'the', 'janjaweed', ',', 'a', 'sudanese', 'militia', 'group', 'recruited', 'mostly', 'from', 'the', 'afro', '-', 'arab', 'abbala', 'tribes', 'of', 'the', 'northern', 'rizeigat', 'region', 'in', 'sudan', '.']
Complex probs: [4.8367318e-04 4.8353723e-03 4.7576093e-05 7.4103511e-05 3.4064423e-02
 8.7893146e-01 5.3721265e-05 8.3323532e-01 2.5257823e-01 4.1494641e-05
 7.6524419e-05 7.9000813e-01 9.7666979e-02 6.3776330e-05 8.4813066e-05
 0.0000000e+00 0.0000000e+00 8.4037543e-05 8.7213838e-01 7.7302271e-01
 1.0367018e-03 9.0394229e-01 1.2732967e-02 5.6814646e-05 8.6479675e-05
 6.3271761e-01 0.0000000e+00 5.3437324e-0

  0%|                                           | 1/359 [00:10<59:54, 10.04s/it]

Suggested top 5 subtitutions: ['recruited', 'formed', 'made', 'made up', 'composed']
Complex probs: [4.8367318e-04 4.8353723e-03 4.7576093e-05 7.4103511e-05 3.4064423e-02
 8.7893146e-01 5.3721265e-05 8.3323532e-01 2.5257823e-01 4.1494641e-05
 7.6524419e-05 7.9000813e-01 9.7666979e-02 6.3776330e-05 8.4813066e-05
 0.0000000e+00 0.0000000e+00 8.4037543e-05 8.7213838e-01 7.7302271e-01
 1.0367018e-03 9.0394229e-01 1.2732967e-02 5.6814646e-05 8.6479675e-05
 6.3271761e-01 0.0000000e+00 5.3437324e-03 0.0000000e+00 7.1683222e-01
 4.1801886e-05 8.0338134e-05 8.6018294e-03 0.0000000e+00 1.1892053e-01
 3.8908358e-05 6.1245948e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1
 0]
	 Most complex word: conflicts 


	 Sentence after substitution: one side of the armed conflicts is composed mainly of the sudanese military and the janjaweed , a sudanese militia group recruited mostly from the afro - arab abbala tribes of the northern 

  1%|▏                                        | 2/359 [00:21<1:05:16, 10.97s/it]

Suggested top 5 subtitutions: ['lifetime', 'life', 'lives', 'lifetimes', 'entire lifetime']
Complex probs: [0.0000000e+00 5.8313512e-05 1.0722025e-04 8.5629094e-01 7.3251051e-01
 5.9590726e-05 4.5728719e-01 0.0000000e+00 3.8236339e-02 0.0000000e+00
 2.4159947e-04 0.0000000e+00 2.0433872e-03 0.0000000e+00 1.3852671e-04
 2.2718953e-03 0.0000000e+00 7.2764838e-01 4.0925667e-01 5.2970110e-05
 1.9964947e-01 5.9380043e-05 5.8166496e-03 3.2018826e-05 1.7152673e-04
 3.3696718e-03 4.3551619e-05 1.7303621e-04 9.0392327e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: principal 


	 Sentence after substitution: jeddah is the principal gateway to mecca , islam ' s holiest city , which able - bodied muslims are required to visit at least once in their lifetime .

Simplification complete.Untokenised sentence: the great dark spot is thought to represent a hole in the methane cloud deck of neptune .
Tokenised sentence: ['t

  1%|▎                                        | 3/359 [00:42<1:32:11, 15.54s/it]

Suggested top 5 subtitutions: ['be a', 'a', 'be', 'be a black', 'be a dark']
Complex probs: [1.7278924e-04 2.0231667e-03 2.3080757e-02 5.0133842e-01 5.4381133e-05
 1.0870355e-02 5.6188772e-05 3.5083503e-04 1.8481592e-04 1.3900436e-01
 4.6779252e-05 8.0633319e-05 7.1507895e-01 6.9303282e-02 9.3189731e-02
 4.9492548e-05 6.9926518e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0]
	 Most complex word: methane 


	 Sentence after substitution: the great dark spot is thought to be a hole in the methane cloud deck of neptune .

Simplification complete.Untokenised sentence: his next work , saturday , follows an especially eventful day in the life of a successful neurosurgeon .
Tokenised sentence: ['his', 'next', 'work', ',', 'saturday', ',', 'follows', 'an', 'especially', 'eventful', 'day', 'in', 'the', 'life', 'of', 'a', 'successful', 'neurosurgeon', '.']
Complex probs: [5.63501555e-04 1.35765166e-03 1.77612342e-03 0.00000000e+00
 4.68642870e-03 0.00000000e+0

  1%|▍                                        | 4/359 [01:01<1:40:33, 17.00s/it]

Suggested top 5 subtitutions: ['of a', 'of', 'of the', 'of this', 'a']
Complex probs: [5.6531065e-04 1.3636189e-03 1.7833402e-03 0.0000000e+00 4.7183288e-03
 0.0000000e+00 3.3276474e-01 1.1485055e-04 6.9664073e-01 0.0000000e+00
 5.0527998e-04 3.2677079e-05 8.5244661e-05 2.9047246e-03 4.4476466e-05
 1.1261789e-04 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
	 Most complex word: especially 


	 Sentence after substitution: his next work , saturday , follows an especially eventful day in the life of a neurosurgeon .

Simplification complete.Untokenised sentence: the tarantula , the trickster character , spun a black cord and , attaching it to the ball , crawled away fast to the east , pulling on the cord with all his strength .
Tokenised sentence: ['the', 'tarantula', ',', 'the', 'trickster', 'character', ',', 'spun', 'a', 'black', 'cord', 'and', ',', 'attaching', 'it', 'to', 'the', 'ball', ',', 'crawled', 'away', 'fast', 'to', 'the', 'east

  1%|▌                                        | 5/359 [01:20<1:43:26, 17.53s/it]

Suggested top 5 subtitutions: ['as well', 'once more', 'as he went', 'once again', 'more and more']
Complex probs: [1.46517326e-04 0.00000000e+00 0.00000000e+00 7.87552999e-05
 0.00000000e+00 8.12622666e-01 0.00000000e+00 6.30115628e-01
 9.43936466e-05 1.94561330e-03 6.09139204e-01 7.43408964e-05
 0.00000000e+00 0.00000000e+00 9.28645240e-05 5.08628036e-05
 1.14718663e-04 7.38877535e-01 0.00000000e+00 6.74028397e-01
 1.13344053e-03 6.89331114e-01 5.10645514e-05 1.00619356e-04
 5.83222276e-03 0.00000000e+00 5.69001079e-01 4.60263327e-05
 7.26420622e-05 6.40622020e-01 7.57230810e-05 9.52526578e-04
 0.00000000e+00]
Binary complexity labels: [0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0]
	 Most complex word: character 


	 Sentence after substitution: the tarantula , the trickster character , spun a black cord and , attaching it to the ball , crawled away fast to the east , pulling on the cord as well .

Simplification complete.Untokenised sentence: there he died six 

  2%|▊                                        | 7/359 [01:31<1:07:37, 11.53s/it]

Suggested top 5 subtitutions: ['culturally', 'most', 'most closely', 'ethnically', 'linguistically']
Complex probs: [2.3255708e-04 6.6406836e-05 9.1595727e-01 7.2231364e-01 5.7739908e-05
 1.0259508e-04 7.9565650e-01 7.1054769e-01 4.5962821e-05 5.7728374e-01
 5.2403763e-04 5.7118720e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 1 0 0 1 1 0 1 0 1 0]
	 Most complex word: coastal 


	 Sentence after substitution: they are culturally akin to the coastal peoples of papua new guinea .

Simplification complete.Untokenised sentence: since 2000 , the recipient of the kate greenaway medal has also been presented with the colin mears award to the value of £ 5000 .
Tokenised sentence: ['since', '2000', ',', 'the', 'recipient', 'of', 'the', 'kate', 'greenaway', 'medal', 'has', 'also', 'been', 'presented', 'with', 'the', 'colin', 'mears', 'award', 'to', 'the', 'value', 'of', '£', '5000', '.']
Complex probs: [2.7441280e-04 1.8918097e-03 0.0000000e+00 1.1253642e-04 9.2846304e-01
 6.1621453e-05 9

  2%|▉                                        | 8/359 [01:47<1:13:52, 12.63s/it]

Suggested top 5 subtitutions: ['the', 'and the', 'a', 'when the', 'where the']
Complex probs: [2.8168818e-04 1.8665724e-03 0.0000000e+00 1.0457249e-04 7.3089993e-01
 0.0000000e+00 1.1015900e-01 1.2568579e-04 4.5075174e-04 5.3039577e-04
 4.5946726e-01 5.5302025e-05 1.1327909e-04 1.1469454e-01 0.0000000e+00
 4.0851656e-02 5.1492865e-05 1.0813470e-04 3.0339763e-01 5.8051955e-05
 0.0000000e+00 1.6631703e-03 0.0000000e+00]
Binary complexity labels: [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: kate 


	 Sentence after substitution: since 2000 , the kate greenaway medal has also been presented with the colin mears award to the value of £ 5000 .

Simplification complete.Untokenised sentence: following the drummers are dancers , who often play the sogo  (  a tiny drum that makes almost no sound  )  and tend to have more elaborate — even acrobatic — choreography .
Tokenised sentence: ['following', 'the', 'drummers', 'are', 'dancers', ',', 'who', 'often', 'play', 'the', '

  3%|█                                        | 9/359 [01:55<1:07:09, 11.51s/it]

Suggested top 5 subtitutions: ['dances', 'dance moves', 'moves', 'dance', 'dance steps']
Complex probs: [3.06933552e-01 1.21495657e-04 0.00000000e+00 8.95787380e-05
 8.16899896e-01 0.00000000e+00 1.42124030e-04 1.41405419e-03
 4.41132532e-03 9.51210313e-05 0.00000000e+00 0.00000000e+00
 1.03380335e-04 7.64848471e-01 6.73987448e-01 7.08666412e-05
 7.83032738e-04 4.96569555e-03 2.45611533e-04 3.78561928e-03
 0.00000000e+00 6.76713171e-05 6.15869761e-01 5.11431572e-05
 1.15711489e-04 1.79121198e-04 8.26959193e-01 0.00000000e+00
 3.37147852e-04 0.00000000e+00 0.00000000e+00 5.36289990e-01
 0.00000000e+00]
Binary complexity labels: [0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0]
	 Most complex word: elaborate 


	 Sentence after substitution: following the drummers are dancers , who often play the sogo ( a tiny drum that makes almost no sound ) and tend to have more elaborate — even acrobatic — dances .

Simplification complete.Untokenised sentence: the spacecraft consi

  3%|█                                       | 10/359 [02:14<1:18:39, 13.52s/it]

Suggested top 5 subtitutions: ['. it', 'the system', 'system', 'and', 'it']
Complex probs: [0.0000000e+00 8.7323948e-05 7.7048135e-01 5.8457379e-05 1.2697546e-04
 3.8988930e-03 8.0908138e-01 0.0000000e+00 9.9384270e-05 7.2749174e-01
 0.0000000e+00 0.0000000e+00 0.0000000e+00 4.4735060e-03 8.7591827e-05
 8.2472616e-05 6.3960617e-03 0.0000000e+00 8.1312092e-04 8.2589555e-01
 5.8277869e-01 7.3177898e-01 0.0000000e+00 0.0000000e+00 5.8305886e-05
 8.8616209e-05 6.8833715e-01 0.0000000e+00 9.3525451e-01 0.0000000e+00
 3.3274526e-03 7.7074190e-05 7.1068185e-05 1.4493273e-01 8.7722474e-01
 0.0000000e+00 9.3542397e-01 6.8227928e-05 8.2235754e-01 0.0000000e+00
 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1
 0 1 0 0 0]
	 Most complex word: mathematician 


	 Sentence after substitution: . it consists of two main elements : the nasa cassini orbiter , named after the italian - french astronomer giovanni domenico cas

  3%|█▏                                      | 11/359 [02:31<1:23:23, 14.38s/it]

Suggested top 5 subtitutions: ['antonio', 'o', 'ro', 'alessandro', 'sandro']
Complex probs: [8.4689081e-02 0.0000000e+00 0.0000000e+00 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 9.3270615e-03 6.0778722e-05
 6.4527034e-04 4.0875113e-04 0.0000000e+00 5.3064556e-05 1.8939914e-04
 1.5344983e-02 7.6410331e-02 3.7863191e-02 2.1254105e-02 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Simplificaiton complete or no complex expression found.


	 Sentence after substitution: antonio ( " sandro " ) mazzola ( born 8 november 1942 ) is an italian former football player .

Simplification complete.Untokenised sentence: it was originally thought that the debris thrown up by the collision filled in the smaller craters .
Tokenised sentence: ['it', 'was', 'originally', 'thought', 'that', 'the', 'debris', 'thrown', 'up', 'by', 'the', 'collision', 'filled', 'in', 'the', 'smaller', 'craters', '.']
Complex probs: [1.5150094e-04 1.0398680e-04 

  3%|█▎                                      | 12/359 [02:46<1:24:41, 14.64s/it]

Suggested top 5 subtitutions: ['debris', 'material', 'the dust', 'the material', 'the debris']
Complex probs: [1.4715297e-04 1.0283700e-04 9.0757138e-01 7.5734416e-03 9.5803211e-05
 9.5682573e-01 8.0949837e-01 1.5199084e-04 8.0117716e-05 1.1095577e-04
 8.6435163e-01 6.0380244e-01 3.7689533e-05 7.3289360e-05 2.6736889e-02
 9.0187848e-01 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0]
	 Most complex word: originally 


	 Sentence after substitution: it was originally thought that debris thrown up by the collision filled in the smaller craters .

Simplification complete.Untokenised sentence: graham attended wheaton college from 1939 to 1943 , when he graduated with a ba in anthropology .
Tokenised sentence: ['graham', 'attended', 'wheaton', 'college', 'from', '1939', 'to', '1943', ',', 'when', 'he', 'graduated', 'with', 'a', 'ba', 'in', 'anthropology', '.']
Complex probs: [1.5397359e-01 8.5893947e-01 0.0000000e+00 5.0202537e-02 6.2792416e-05
 1.2234540e-03 4.

  4%|█▍                                      | 13/359 [02:58<1:19:23, 13.77s/it]

Suggested top 5 subtitutions: ['graduated', 'was graduated', 'also graduated', 'would graduate', 'finished']
Complex probs: [1.5397359e-01 8.5893947e-01 0.0000000e+00 5.0202537e-02 6.2792416e-05
 1.2234540e-03 4.9983933e-05 2.4210715e-03 0.0000000e+00 1.1167534e-04
 1.5439960e-04 9.7276533e-01 5.3597534e-05 1.1995402e-04 6.9074309e-01
 5.2607858e-05 9.5760804e-01 0.0000000e+00]
Binary complexity labels: [0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0]
	 Most complex word: anthropology 


	 Sentence after substitution: graham attended wheaton college from 1939 to 1943 , when he graduated with a ba in anthropology .

Simplification complete.Untokenised sentence: however , the bzö differs a bit in comparison to the freedom party , as is in favor of a referendum about the lisbon treaty but against an eu - withdrawal .
Tokenised sentence: ['however', ',', 'the', 'bzo', 'differs', 'a', 'bit', 'in', 'comparison', 'to', 'the', 'freedom', 'party', ',', 'as', 'is', 'in', 'favor', 'of', 'a', 'referendum', 

  4%|█▌                                      | 14/359 [03:20<1:33:54, 16.33s/it]

Suggested top 5 subtitutions: ['led government', 'one', 'vote', 'referendum', 'no']
Complex probs: [1.25767970e-02 0.00000000e+00 7.58391398e-05 0.00000000e+00
 8.01556647e-01 9.71884074e-05 1.71296969e-02 4.46410850e-05
 8.67424309e-01 5.57636558e-05 7.68755708e-05 1.12008646e-01
 9.94618051e-03 0.00000000e+00 6.60319347e-05 6.00915337e-05
 4.76323803e-05 7.64481306e-01 4.59378898e-05 1.00335747e-04
 9.26631331e-01 9.66239750e-05 7.94492662e-05 6.91547453e-01
 8.30517769e-01 1.43086610e-04 2.06144920e-04 1.04936400e-04
 1.79461628e-01 0.00000000e+00 8.23820382e-02 1.72467247e-01
 0.00000000e+00]
Binary complexity labels: [0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0]
	 Most complex word: referendum 


	 Sentence after substitution: however , the bzo differs a bit in comparison to the freedom party , as is in favor of a referendum about the lisbon treaty but against an eu - led government .

Simplification complete.Untokenised sentence: many species had vanished b

  4%|█▋                                      | 15/359 [03:39<1:38:46, 17.23s/it]

Suggested top 5 subtitutions: ['had disappeared', 'had completely disappeared', 'had appeared', 'disappeared', 'appeared']
Complex probs: [6.0569204e-04 9.5503968e-01 4.8365403e-04 9.8228514e-01 1.5404492e-04
 1.4797048e-04 1.2979529e-02 4.9835147e-05 9.7710923e-05 7.9417634e-01
 4.5746562e-01 0.0000000e+00 3.8000704e-05 2.3585599e-02 8.5698122e-01
 0.0000000e+00]
Binary complexity labels: [0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0]
	 Most complex word: species 


	 Sentence after substitution: many species had disappeared by the end of the nineteenth century , with european settlement .

Simplification complete.Untokenised sentence: in 1987 wexler was inducted into the rock and roll hall of fame .
Tokenised sentence: ['in', '1987', 'wexler', 'was', 'inducted', 'into', 'the', 'rock', 'and', 'roll', 'hall', 'of', 'fame', '.']
Complex probs: [5.27353914e-05 7.58231734e-04 0.00000000e+00 1.16496674e-04
 9.36812639e-01 1.70900967e-04 1.14342642e-04 2.68149942e-01
 9.58344681e-05 2.88676888e-01 2.41

  4%|█▊                                      | 16/359 [03:59<1:42:46, 17.98s/it]

Suggested top 5 subtitutions: ['entered the', 'joined the', 'entered into the', 'elected to the', 'entered']
Complex probs: [5.1117797e-05 6.2807911e-04 0.0000000e+00 9.2672127e-01 1.2447259e-04
 2.4064952e-01 1.0047802e-04 3.4499979e-01 2.8435692e-01 5.4039578e-05
 8.0575961e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 1 0]
	 Most complex word: fame 


	 Sentence after substitution: in 1987 wexler entered the rock and roll hall of fame .

Simplification complete.Untokenised sentence: in its pure form , dextromethorphan occurs as a white powder .
Tokenised sentence: ['in', 'its', 'pure', 'form', ',', 'dextromethorphan', 'occurs', 'as', 'a', 'white', 'powder', '.']
Complex probs: [5.8771962e-05 1.8440526e-04 7.2020632e-01 6.4851199e-03 0.0000000e+00
 0.0000000e+00 7.8117692e-01 5.4724744e-05 1.1244696e-04 1.0596204e-03
 5.0036186e-01 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 1 0 0 0 1 0]
	 Most complex word: occurs 

Found complex word or expression: ##

  5%|█▉                                      | 17/359 [04:11<1:32:56, 16.31s/it]

Suggested top 5 subtitutions: ['occurs', 'exists', 'appears', 'occurs naturally', 'comes']
Complex probs: [5.8771962e-05 1.8440526e-04 7.2020632e-01 6.4851199e-03 0.0000000e+00
 0.0000000e+00 7.8117692e-01 5.4724744e-05 1.1244696e-04 1.0596204e-03
 5.0036186e-01 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 0 0 0 0 1 0]
	 Most complex word: pure 


	 Sentence after substitution: in its pure form , dextromethorphan occurs as a white powder .

Simplification complete.Untokenised sentence: admission to tsinghua is extremely competitive .
Tokenised sentence: ['admission', 'to', 'tsinghua', 'is', 'extremely', 'competitive', '.']
Complex probs: [8.2911277e-01 7.8101955e-05 0.0000000e+00 5.7975110e-05 8.0186462e-01
 8.8886589e-01 0.0000000e+00]
Binary complexity labels: [1 0 0 0 1 1 0]
	 Most complex word: competitive 

Found complex word or expression: ### competitive ###. Plainifying...


  5%|██                                      | 18/359 [04:18<1:15:37, 13.31s/it]

Suggested top 5 subtitutions: ['competitive', 'difficult', 'expensive', 'limited', 'high']
Complex probs: [8.2911277e-01 7.8101955e-05 0.0000000e+00 5.7975110e-05 8.0186462e-01
 8.8886589e-01 0.0000000e+00]
Binary complexity labels: [1 0 0 0 1 0 0]
	 Most complex word: admission 


	 Sentence after substitution: admission to tsinghua is extremely competitive .

Simplification complete.Untokenised sentence: today nrc is organised as an independent , private foundation .
Tokenised sentence: ['today', 'nrc', 'is', 'organised', 'as', 'an', 'independent', ',', 'private', 'foundation', '.']
Complex probs: [9.3426008e-04 0.0000000e+00 5.3683067e-05 9.2266703e-01 7.6486984e-05
 1.1256110e-04 9.0050542e-01 0.0000000e+00 4.2135663e-02 7.6762956e-01
 0.0000000e+00]
Binary complexity labels: [0 0 0 1 0 0 1 0 0 1 0]
	 Most complex word: organised 

Found complex word or expression: ### organised ###. Plainifying...


  5%|██▏                                       | 19/359 [04:21<58:36, 10.34s/it]

Suggested top 5 subtitutions: ['organized', 'run', 'established', 'registered', 'managed']
Complex probs: [9.2023949e-04 0.0000000e+00 5.2215011e-05 8.9662355e-01 8.0293859e-05
 1.1706125e-04 9.0560997e-01 0.0000000e+00 4.4150714e-02 7.7215999e-01
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 1 0 0 1 0]
	 Most complex word: independent 


	 Sentence after substitution: today nrc is organized as an independent , private foundation .

Simplification complete.Untokenised sentence: it is situated at the coast of the baltic sea , where it encloses the city of stralsund .
Tokenised sentence: ['it', 'is', 'situated', 'at', 'the', 'coast', 'of', 'the', 'baltic', 'sea', ',', 'where', 'it', 'encloses', 'the', 'city', 'of', 'stralsund', '.']
Complex probs: [1.7257451e-04 5.5814518e-05 9.4125313e-01 4.7788442e-05 9.2311209e-05
 1.8614635e-02 4.4441022e-05 8.3417086e-05 6.4331514e-01 1.1903289e-03
 0.0000000e+00 7.9863326e-05 7.6760698e-05 0.0000000e+00 1.2777673e-04
 9.9516204e-03 5.1845

  6%|██▏                                     | 20/359 [04:36<1:06:45, 11.82s/it]

Suggested top 5 subtitutions: ['is located', 'is', 'is situated', 'ends', 'lies']
Complex probs: [1.8214261e-04 5.4329295e-05 4.4943768e-01 4.5598063e-05 9.4484232e-05
 1.8543024e-02 4.5484448e-05 8.4525549e-05 6.5711564e-01 1.2500874e-03
 0.0000000e+00 8.1650185e-05 7.8445672e-05 0.0000000e+00 1.2932440e-04
 1.0356987e-02 5.2229152e-05 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: baltic 


	 Sentence after substitution: it is located at the coast of the baltic sea , where it encloses the city of stralsund .

Simplification complete.Untokenised sentence: he was also named 1982 " sportsman of the year " by sports illustrated .
Tokenised sentence: ['he', 'was', 'also', 'named', '1982', '"', 'sportsman', 'of', 'the', 'year', '"', 'by', 'sports', 'illustrated', '.']
Complex probs: [2.5512194e-04 1.2048500e-04 5.1155919e-04 7.3751821e-03 1.5118682e-03
 0.0000000e+00 8.0551344e-01 4.4441953e-05 9.5526346e-05 9.1635040e-0

  6%|██▎                                     | 21/359 [04:48<1:05:52, 11.70s/it]

Suggested top 5 subtitutions: ['illustrated', 'illustrated magazine', 'illustrated .', 'illustrated international', 'illustrated sports']
Complex probs: [2.5512194e-04 1.2048500e-04 5.1155919e-04 7.3751821e-03 1.5118682e-03
 0.0000000e+00 8.0551344e-01 4.4441953e-05 9.5526346e-05 9.1635040e-04
 0.0000000e+00 9.3857620e-05 2.3135418e-01 8.7872529e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
	 Most complex word: sportsman 


	 Sentence after substitution: he was also named 1982 " sportsman of the year " by sports illustrated .

Simplification complete.Untokenised sentence: fives is a british sport believed to derive from the same origins as many racquet sports .
Tokenised sentence: ['fives', 'is', 'a', 'british', 'sport', 'believed', 'to', 'derive', 'from', 'the', 'same', 'origins', 'as', 'many', 'racquet', 'sports', '.']
Complex probs: [0.0000000e+00 6.2035346e-05 1.3427409e-04 8.5792542e-03 1.1880777e-01
 3.4349021e-01 6.3942207e-05 8.9160132e-01 7.8604

  6%|██▍                                     | 22/359 [05:03<1:11:29, 12.73s/it]

Suggested top 5 subtitutions: ['have', 'be from', 'be of', 'have come from', 'have had']
Complex probs: [0.0000000e+00 6.1004375e-05 1.3334471e-04 9.3034841e-03 1.2969419e-01
 3.6955798e-01 5.3753960e-05 1.8879151e-04 1.4058089e-04 4.3478380e-03
 8.4550130e-01 8.3040497e-05 2.5959351e-04 0.0000000e+00 1.6691709e-01
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
	 Most complex word: origins 


	 Sentence after substitution: fives is a british sport believed to have the same origins as many racquet sports .

Simplification complete.Untokenised sentence: for example , king bhumibol was born on monday , so on his birthday throughout thailand will be decorated with yellow color .
Tokenised sentence: ['for', 'example', ',', 'king', 'bhumibol', 'was', 'born', 'on', 'monday', ',', 'so', 'on', 'his', 'birthday', 'throughout', 'thailand', 'will', 'be', 'decorated', 'with', 'yellow', 'color', '.']
Complex probs: [5.7782865e-05 1.8406691e-02 0.0000000e+00 4.2097275e-0

  6%|██▌                                     | 23/359 [05:20<1:17:39, 13.87s/it]

Suggested top 5 subtitutions: ['in', 'the', 'in the', 'covered with', 'a']
Complex probs: [5.7795649e-05 1.8502114e-02 0.0000000e+00 4.2023775e-03 0.0000000e+00
 8.2655286e-05 3.2898858e-02 4.1437379e-05 5.9107607e-03 0.0000000e+00
 1.5933371e-04 5.7709411e-05 3.1266737e-04 8.3918907e-02 8.7276465e-01
 1.2484296e-02 7.5872405e-05 5.5626011e-04 4.7393150e-05 8.4797172e-03
 3.0403318e-02 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
	 Most complex word: throughout 


	 Sentence after substitution: for example , king bhumibol was born on monday , so on his birthday throughout thailand will be in yellow color .

Simplification complete.Untokenised sentence: both names became defunct in 2007 when they were merged into the national museum of scotland .
Tokenised sentence: ['both', 'names', 'became', 'defunct', 'in', '2007', 'when', 'they', 'were', 'merged', 'into', 'the', 'national', 'museum', 'of', 'scotland', '.']
Complex probs: [1.0326944e-03 2.14

  7%|██▊                                       | 24/359 [05:23<59:36, 10.68s/it]

Suggested top 5 subtitutions: ['merged', 'incorporated', 'combined', 'absorbed', 'consolidated']
Complex probs: [1.0326944e-03 2.1483062e-02 1.0595002e-03 8.6282599e-01 4.5498418e-05
 1.7902087e-03 1.0799312e-04 2.1793305e-04 1.3451926e-04 9.0409917e-01
 1.7978219e-04 1.1298070e-04 8.9070641e-02 8.4690982e-01 5.1123454e-05
 6.8951344e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0]
	 Most complex word: defunct 


	 Sentence after substitution: both names became defunct in 2007 when they were merged into the national museum of scotland .

Simplification complete.Untokenised sentence: nevertheless , tagore emulated numerous styles , including craftwork from northern new ireland , haida carvings from the west coast of canada  (  british columbia  )  , and woodcuts by max pechstein .
Tokenised sentence: ['nevertheless', ',', 'tagore', 'emulated', 'numerous', 'styles', ',', 'including', 'craftwork', 'from', 'northern', 'new', 'ireland', ',', 'haida', 'carvin

  7%|██▊                                     | 25/359 [05:42<1:14:00, 13.30s/it]

Suggested top 5 subtitutions: ['however', '. .', 'while there', 'years', 'time']
Complex probs: [1.32843219e-02 0.00000000e+00 0.00000000e+00 0.00000000e+00
 8.03038061e-01 3.73089075e-01 0.00000000e+00 1.34795129e-01
 0.00000000e+00 6.84087790e-05 9.16538294e-03 4.74235712e-04
 2.09380481e-02 0.00000000e+00 0.00000000e+00 8.46102655e-01
 7.25212158e-05 1.16410054e-04 2.98730936e-03 8.46158992e-03
 4.21662298e-05 1.29133854e-02 0.00000000e+00 1.89924089e-03
 7.29973793e-01 0.00000000e+00 0.00000000e+00 7.16402938e-05
 0.00000000e+00 9.58578603e-05 3.02841216e-02 0.00000000e+00
 0.00000000e+00]
Binary complexity labels: [0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
	 Most complex word: carvings 


	 Sentence after substitution: however , tagore emulated numerous styles , including craftwork from northern new ireland , haida carvings from the west coast of canada ( british columbia ) , and woodcuts by max pechstein .

Simplification complete.Untokenised sentence: o

  7%|██▉                                     | 26/359 [05:49<1:03:28, 11.44s/it]

Suggested top 5 subtitutions: ['introduced', 'proposed', 'announced', 'presented', 'first proposed']
Complex probs: [5.67286188e-05 2.27127015e-03 1.36483039e-04 0.00000000e+00
 8.76812148e-04 0.00000000e+00 5.42548060e-01 6.44393146e-01
 9.10239469e-04 1.03105803e-03 0.00000000e+00 4.73908372e-02
 9.54760432e-01 1.23338337e-04 9.06708896e-01 5.81247368e-05
 1.34302652e-04 1.59113959e-03 1.02340244e-04 9.33097489e-03
 7.37310588e-01 4.98407535e-05 8.78216597e-05 1.18165500e-01
 4.82118558e-05 6.03001833e-01 9.94992480e-02 0.00000000e+00]
Binary complexity labels: [0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0]
	 Most complex word: concept 


	 Sentence after substitution: on october 14 , 1960 , presidential candidate john f . kennedy introduced the concept of what became the peace corps on the steps of michigan union .

Simplification complete.Untokenised sentence: she performed for president reagan in 1988 ' s great performances at the white house series , which aired on th

  8%|███▏                                      | 27/359 [05:55<53:03,  9.59s/it]

Suggested top 5 subtitutions: ['performances', 'performance', 'moments', 'night', 'events']
Complex probs: [2.3744050e-04 9.1406476e-01 4.8794089e-05 2.0589681e-02 4.0143383e-01
 3.9034574e-05 7.9928734e-04 0.0000000e+00 4.9669971e-04 1.1840747e-03
 9.6111482e-01 4.0947019e-05 9.1250113e-05 1.1813736e-03 3.0862011e-03
 5.8733761e-01 0.0000000e+00 1.7910992e-04 8.9282066e-01 5.0139373e-05
 8.7991160e-05 6.8650488e-03 7.9942781e-01 4.4796746e-03 0.0000000e+00]
Binary complexity labels: [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0]
	 Most complex word: performed 


	 Sentence after substitution: she performed for president reagan in 1988 ' s great performances at the white house series , which aired on the public broadcasting service .

Simplification complete.Untokenised sentence: perry saturn  (  with terri  )  defeated eddie guerrero  (  with chyna  )  to win the wwf european championship  (  8 : 10  )  saturn pinned guerrero after a diving elbow drop .
Tokenised sentence: ['per

  8%|███                                     | 28/359 [06:11<1:03:40, 11.54s/it]

Suggested top 5 subtitutions: ['title', 'championship', 'title .', 'championship .', 'cup']
Complex probs: [4.3030423e-01 2.1482183e-01 0.0000000e+00 6.2036474e-05 7.4653536e-01
 0.0000000e+00 4.6643358e-01 4.3592933e-01 6.7918670e-01 0.0000000e+00
 4.9094670e-05 0.0000000e+00 0.0000000e+00 5.8157784e-05 4.3070228e-03
 1.0360653e-04 8.0252969e-01 1.6618252e-02 1.2046197e-01 0.0000000e+00
 8.1379207e-05 0.0000000e+00 1.1324583e-04 0.0000000e+00 8.3492018e-02
 3.8266286e-01 7.4268180e-01 7.7479352e-05 9.4912823e-05 6.1593318e-01
 5.5739927e-01 1.3290414e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0]
	 Most complex word: wwf 


	 Sentence after substitution: perry saturn ( with terri ) defeated eddie guerrero ( with chyna ) to win the wwf european title ( 8 : 10 ) saturn pinned guerrero after a diving elbow drop .

Simplification complete.Untokenised sentence: she remained in the united states until 1927 when she and her h

  8%|███▏                                    | 29/359 [06:32<1:18:57, 14.36s/it]

Suggested top 5 subtitutions: ['in the', 'in', 'the', 'lived in the', 'lived in']
Complex probs: [3.49643349e-04 4.52988352e-05 1.31002438e-04 1.55127635e-02
 9.23803542e-04 3.73514573e-04 2.30485434e-03 1.50505119e-04
 2.45890784e-04 1.33518639e-04 4.76734189e-04 4.63130847e-02
 3.06765527e-01 5.50220175e-05 4.45167115e-03 0.00000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Simplificaiton complete or no complex expression found.


	 Sentence after substitution: she in the united states until 1927 when she and her husband returned to france .

Simplification complete.Untokenised sentence: despina was discovered in late july , 1989 from the images taken by the voyager 2 probe .
Tokenised sentence: ['despina', 'was', 'discovered', 'in', 'late', 'july', ',', '1989', 'from', 'the', 'images', 'taken', 'by', 'the', 'voyager', '2', 'probe', '.']
Complex probs: [0.00000000e+00 1.61125819e-04 8.11022520e-01 4.76549176e-05
 2.37137266e-03 5.42275002e-03 0.00000000e+00 

  8%|███▎                                    | 30/359 [06:43<1:13:07, 13.34s/it]

Suggested top 5 subtitutions: ['probe', 'space probe', 'probes', 'mission', 'orbiter']
Complex probs: [0.00000000e+00 1.61125819e-04 8.11022520e-01 4.76549176e-05
 2.37137266e-03 5.42275002e-03 0.00000000e+00 1.51907804e-03
 6.31383300e-05 1.30014465e-04 1.14998534e-01 4.24808497e-03
 9.74950963e-05 1.28211817e-04 8.60066235e-01 1.55200803e-04
 9.34970319e-01 0.00000000e+00]
Binary complexity labels: [0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
	 Most complex word: voyager 


	 Sentence after substitution: despina was discovered in late july , 1989 from the images taken by the voyager 2 probe .

Simplification complete.Untokenised sentence: the first italian grand prix motor racing championship took place on 4 september 1921 at brescia .
Tokenised sentence: ['the', 'first', 'italian', 'grand', 'prix', 'motor', 'racing', 'championship', 'took', 'place', 'on', '4', 'september', '1921', 'at', 'brescia', '.']
Complex probs: [2.3207023e-04 4.5622471e-03 3.2033082e-02 3.6799824e-03 4.2353824e-02
 

  9%|███▍                                    | 31/359 [06:55<1:12:09, 13.20s/it]

Suggested top 5 subtitutions: ['event', 'race', 'season', 'event which', 'event . it']
Complex probs: [2.4720861e-04 5.1774234e-03 3.6749493e-02 4.2549241e-03 4.8116285e-02
 8.4992349e-02 2.5698489e-01 3.6688842e-02 5.1437947e-04 1.9290468e-03
 4.5007197e-05 7.9999467e-05 2.2278056e-03 7.4277434e-04 3.6227648e-05
 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Simplificaiton complete or no complex expression found.


	 Sentence after substitution: the first italian grand prix motor racing event took place on 4 september 1921 at brescia .

Simplification complete.Untokenised sentence: he also completed two collections of short stories entitled the ribbajack & other curious yarns and seven strange and ghostly tales .
Tokenised sentence: ['he', 'also', 'completed', 'two', 'collections', 'of', 'short', 'stories', 'entitled', 'the', 'ribbajack', '&', 'other', 'curious', 'yarns', 'and', 'seven', 'strange', 'and', 'ghostly', 'tales', '.']
Complex

  9%|███▌                                    | 32/359 [07:10<1:14:32, 13.68s/it]

Suggested top 5 subtitutions: ['collections of', 'books of', 'collection of', 'volumes of', 'series of']
Complex probs: [2.3749868e-04 3.8847770e-04 5.0182468e-01 3.1825702e-04 9.4925433e-01
 5.8998761e-05 5.5751489e-03 2.4322180e-02 8.2629329e-01 9.6378375e-05
 0.0000000e+00 0.0000000e+00 3.6209368e-04 7.1757454e-01 0.0000000e+00
 6.1230268e-05 7.9727871e-04 8.1334305e-01 7.3720235e-05 4.9074191e-01
 6.8266535e-01 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0]
	 Most complex word: entitled 


	 Sentence after substitution: he also completed two collections of short stories entitled the ribbajack & other curious yarns and seven strange and ghostly tales .

Simplification complete.Untokenised sentence: at the voyager 2 images ophelia appears as an elongated object , the major axis pointing towards uranus .
Tokenised sentence: ['at', 'the', 'voyager', '2', 'images', 'ophelia', 'appears', 'as', 'an', 'elongated', 'object', ',', 'the', 'major', 'axi

  9%|███▋                                    | 33/359 [07:29<1:22:58, 15.27s/it]

Suggested top 5 subtitutions: ['elongated', 'irregular', 'shaped', 'oblate', 'elliptical']
Complex probs: [4.8879312e-05 8.0735648e-05 7.3236912e-01 1.2710229e-04 6.0595754e-03
 0.0000000e+00 5.4737431e-01 6.6726134e-05 1.4474435e-04 8.9088643e-01
 3.0199325e-01 0.0000000e+00 6.7564659e-05 7.5032912e-02 8.3558780e-01
 3.5648796e-01 2.6796145e-02 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0]
	 Most complex word: axis 


	 Sentence after substitution: at the voyager 2 images ophelia appears as an elongated object , the major axis pointing towards uranus .

Simplification complete.Untokenised sentence: the british decided to eliminate him and take the land by force .
Tokenised sentence: ['the', 'british', 'decided', 'to', 'eliminate', 'him', 'and', 'take', 'the', 'land', 'by', 'force', '.']
Complex probs: [1.4611856e-04 6.3184230e-03 5.3849834e-01 7.3494484e-05 9.1754228e-01
 1.8467694e-03 7.9530990e-05 1.1721948e-03 1.4820205e-04 1.844448

  9%|███▊                                    | 34/359 [07:51<1:33:03, 17.18s/it]

Suggested top 5 subtitutions: ['to kill', 'to follow', 'to stop', 'to arrest', 'kill']
Complex probs: [1.4437313e-04 6.7909993e-03 5.1208502e-01 6.5688764e-05 1.9643173e-02
 2.0107355e-03 9.6515134e-05 1.4640439e-03 1.5930529e-04 2.0549046e-02
 1.2777233e-04 7.3025964e-02 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: decided 


	 Sentence after substitution: the british decided to kill him and take the land by force .

Simplification complete.Untokenised sentence: some towns on the eyre highway in the south - east corner of western australia , between the south australian border almost as far as caiguna , do not follow official western australian time .
Tokenised sentence: ['some', 'towns', 'on', 'the', 'eyre', 'highway', 'in', 'the', 'south', '-', 'east', 'corner', 'of', 'western', 'australia', ',', 'between', 'the', 'south', 'australian', 'border', 'almost', 'as', 'far', 'as', 'caiguna', ',', 'do', 'not', 'follow', 'official', 'western', '

 10%|███▉                                    | 35/359 [08:12<1:39:26, 18.42s/it]

Suggested top 5 subtitutions: ['eyre', 'stuart', 'great southern', 'kings', 'mitchell']
Complex probs: [4.0065503e-04 7.4439988e-02 5.2915279e-05 9.4402829e-05 7.4599874e-01
 6.6926926e-02 4.4143580e-05 1.0205871e-04 2.6218684e-03 0.0000000e+00
 2.6544353e-03 6.3792878e-01 5.2106923e-05 4.0715276e-03 3.1767131e-03
 0.0000000e+00 8.6493696e-05 9.5533549e-05 2.0852904e-03 4.8514776e-02
 4.5418981e-02 5.8104102e-03 7.3450632e-05 8.3951512e-03 8.5681328e-05
 0.0000000e+00 0.0000000e+00 7.6012897e-05 1.9171168e-04 1.4274794e-01
 3.7837231e-01 2.0484398e-03 2.7220499e-02 4.3556123e-04 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: corner 


	 Sentence after substitution: some towns on the eyre highway in the south - east corner of western australia , between the south australian border almost as far as caiguna , do not follow official western australian time .

Simplification complete.Untokenised sentence

 10%|████                                    | 36/359 [08:31<1:40:28, 18.66s/it]

Suggested top 5 subtitutions: ['lay', 'form of', 'geometrical', 'order of', 'architectural']
Complex probs: [5.90444106e-05 6.00342810e-01 9.67290461e-01 2.36640777e-03
 7.70593584e-02 4.94997221e-05 6.55823886e-01 7.11062021e-05
 0.00000000e+00 6.99356496e-01 1.02017155e-04 2.96064391e-04
 1.46094500e-03 5.86603092e-05 4.34238836e-02 0.00000000e+00
 7.31965338e-05 0.00000000e+00 0.00000000e+00 1.43649595e-04
 1.20365228e-04 3.84896499e-04 1.71069650e-03 6.59586076e-05
 8.13060820e-01 1.29295997e-02 0.00000000e+00 1.06983140e-01
 7.75977969e-05 6.01786077e-01 0.00000000e+00]
Binary complexity labels: [0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0]
	 Most complex word: decoration 


	 Sentence after substitution: in lay decoration small pieces of colored and iridescent shell have been used to create mosaics and inlays , which have been used to decorate walls , furniture and boxes .

Simplification complete.Untokenised sentence: the other incorporated cities on the palos

 10%|████                                    | 37/359 [08:37<1:19:18, 14.78s/it]

Suggested top 5 subtitutions: ['incorporated', 'major', 'small', 'large', 'developed']
Complex probs: [1.6136737e-04 9.5901423e-04 9.4375175e-01 3.7040953e-02 4.8186346e-05
 8.7364264e-05 0.0000000e+00 0.0000000e+00 8.0559158e-01 4.6685892e-03
 5.3937811e-01 0.0000000e+00 0.0000000e+00 0.0000000e+00 4.9295953e-01
 2.1427929e-02 5.7250899e-01 6.9617308e-05 5.5450243e-01 4.1689727e-02
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0]
	 Most complex word: peninsula 


	 Sentence after substitution: the other incorporated cities on the palos verdes peninsula include rancho palos verdes , rolling hills estates and rolling hills .

Simplification complete.Untokenised sentence: fearing that drek will destroy the galaxy , clank asks ratchet to help him find the famous superhero captain qwark , in an effort to stop drek .
Tokenised sentence: ['fearing', 'that', 'drek', 'will', 'destroy', 'the', 'galaxy', ',', 'clank', 'asks', 'ratchet', 'to', 'help', 'him', 

 11%|████▏                                   | 38/359 [08:50<1:15:53, 14.19s/it]

Suggested top 5 subtitutions: ['that', 'because', 'thinking that', 'knowing that', 'as']
Complex probs: [9.70912370e-05 0.00000000e+00 1.14572715e-04 6.89956486e-01
 1.05799576e-04 5.30922651e-01 0.00000000e+00 0.00000000e+00
 1.26091484e-02 0.00000000e+00 5.71370620e-05 1.71206670e-03
 1.10339443e-03 1.60864717e-03 1.35773909e-04 2.35606581e-01
 8.61436486e-01 5.52989431e-02 0.00000000e+00 0.00000000e+00
 3.32149211e-05 1.25927414e-04 6.00142717e-01 6.26046749e-05
 2.32522353e-03 0.00000000e+00 0.00000000e+00]
Binary complexity labels: [0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0]
	 Most complex word: superhero 


	 Sentence after substitution: that drek will destroy the galaxy , clank asks ratchet to help him find the famous superhero captain qwark , in an effort to stop drek .

Simplification complete.Untokenised sentence: it is not actually a true louse .
Tokenised sentence: ['it', 'is', 'not', 'actually', 'a', 'true', 'louse', '.']
Complex probs: [1.5441226e-04 5.007820

 11%|████▋                                     | 40/359 [08:59<51:33,  9.70s/it]

Suggested top 5 subtitutions: ['discipline', 'practice', 'field', 'subject', 'activity']
Complex probs: [1.6973607e-04 8.2000649e-01 7.3584133e-01 1.1842537e-04 7.4800871e-02
 0.0000000e+00 6.3723260e-01 3.0796775e-01 5.0850886e-01 3.8090755e-05
 9.2638582e-02 9.1156197e-01 4.8602295e-01 6.0534792e-05 2.2691913e-04
 2.7871947e-03 6.4071231e-02 0.0000000e+00 8.3418858e-01 3.2596472e-01
 6.1073471e-05 9.4911193e-05 9.1710979e-01 9.4116539e-01 0.0000000e+00]
Binary complexity labels: [0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0]
	 Most complex word: mainstream 


	 Sentence after substitution: he advocates applying a user - centered design process in product development cycles and also works towards popularizing interaction design as a mainstream discipline .

Simplification complete.Untokenised sentence: it is theoretically possible that the other editors who may have reported you , and the administrator who blocked you , are part of a conspiracy against someone half a world away 

 11%|████▊                                     | 41/359 [09:02<43:31,  8.21s/it]

Suggested top 5 subtitutions: ['quite', 'theoretically', 'equally', 'even', 'only']
Complex probs: [1.60078518e-04 4.88974983e-05 6.11490943e-02 1.93569846e-02
 7.48152597e-05 1.00453406e-04 1.17435958e-03 1.73377171e-01
 1.92903171e-04 1.53168003e-04 1.45841987e-04 2.76852846e-01
 1.00046840e-04 0.00000000e+00 5.81344393e-05 8.53969759e-05
 9.50291693e-01 1.88335020e-04 1.15880333e-01 8.98634753e-05
 0.00000000e+00 5.69638942e-05 6.38724305e-03 4.07166663e-05
 9.88462925e-05 9.59229112e-01 2.68969365e-04 1.70033798e-03
 2.77077965e-03 1.23929087e-04 1.11275096e-03 1.27251493e-03
 1.12760485e-04 0.00000000e+00 6.24185055e-02 1.00628694e-03
 7.76738673e-03 4.10094362e-05 5.37502998e-03 0.00000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
 0 0 0]
	 Most complex word: conspiracy 


	 Sentence after substitution: it is quite possible that the other editors who may have reported you , and the administrator who blocked you , are

 12%|████▉                                     | 42/359 [09:10<42:34,  8.06s/it]

Suggested top 5 subtitutions: ['all', 'various', 'the scientific', 'scientific', 'different']
Complex probs: [1.6471697e-03 1.5038935e-03 6.1215207e-05 0.0000000e+00 0.0000000e+00
 1.7454519e-04 5.9569991e-01 4.1049345e-05 6.9141082e-05 7.8740984e-01
 7.9298047e-03 7.0688395e-05 8.7931180e-01 9.3055060e-03 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 1 0 0 1 0 0 1 0 0]
	 Most complex word: climate 


	 Sentence after substitution: working group i : assesses all aspects of the climate system and climate change .

Simplification complete.Untokenised sentence: the island chain forms part of the hebrides , separated from the scottish mainland and from the inner hebrides by the stormy waters of the minch , the little minch and the sea of the hebrides .
Tokenised sentence: ['the', 'island', 'chain', 'forms', 'part', 'of', 'the', 'hebrides', ',', 'separated', 'from', 'the', 'scottish', 'mainland', 'and', 'from', 'the', 'inner', 'hebrides', 'by', 'the', 'stormy', 'waters', 'of', 'the

 12%|████▊                                   | 43/359 [09:32<1:02:46, 11.92s/it]

Suggested top 5 subtitutions: ['from the', 'the', 'from', 'off the', 'isolated from the']
Complex probs: [2.0118518e-04 3.4374353e-01 2.7779594e-01 2.4242340e-01 7.8216745e-03
 5.1274179e-05 1.0915307e-04 0.0000000e+00 0.0000000e+00 6.0438993e-05
 1.0848638e-04 4.3222481e-01 8.0128270e-01 9.3808943e-05 7.0537353e-05
 1.2698237e-04 6.8364578e-01 0.0000000e+00 1.2159394e-04 1.3422302e-04
 8.7281495e-01 2.0172332e-01 4.1565374e-05 8.9075831e-05 0.0000000e+00
 0.0000000e+00 7.8626159e-05 1.2048050e-03 0.0000000e+00 9.2744347e-05
 1.2514548e-04 3.5150156e-03 4.7339356e-05 9.4949202e-05 0.0000000e+00
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: stormy 


	 Sentence after substitution: the island chain forms part of the hebrides , from the scottish mainland and from the inner hebrides by the stormy waters of the minch , the little minch and the sea of the hebrides .

Simplification complete.Untokenise

 12%|████▉                                   | 44/359 [09:48<1:07:36, 12.88s/it]

Suggested top 5 subtitutions: ['dr .', 'and partner', 'welcomed daughter', 'mrs .', 'and daughter']
Complex probs: [7.7909452e-01 1.4195121e-04 3.4250563e-04 8.9632319e-03 1.1972289e-01
 0.0000000e+00 0.0000000e+00 1.6036542e-02 4.8346677e-01 3.9256614e-05
 1.3418243e-03 1.4828573e-04 0.0000000e+00 1.3271511e-03 0.0000000e+00]
Binary complexity labels: [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: orton 


	 Sentence after substitution: orton and his wife dr . alanna marie orton on july 12 , 2008 .

Simplification complete.Untokenised sentence: formal minor planet designations are number - name combinations overseen by the minor planet center , a branch of the iau .
Tokenised sentence: ['formal', 'minor', 'planet', 'designations', 'are', 'number', '-', 'name', 'combinations', 'overseen', 'by', 'the', 'minor', 'planet', 'center', ',', 'a', 'branch', 'of', 'the', 'iau', '.']
Complex probs: [3.8523355e-01 5.5681479e-01 5.7353872e-01 9.6964383e-01 8.1466249e-05
 9.9673483e-04 0.000

 13%|█████                                   | 45/359 [10:05<1:14:03, 14.15s/it]

Suggested top 5 subtitutions: ['names', 'designations', 'naming conventions', 'naming systems', 'names which']
Complex probs: [4.6823800e-01 6.6467488e-01 7.0099479e-01 2.7199691e-02 7.4899355e-05
 1.4647655e-03 0.0000000e+00 1.0395846e-02 9.3407708e-01 8.1271762e-01
 8.4245672e-05 8.8887718e-05 5.9777009e-01 5.2090347e-01 6.3354801e-03
 0.0000000e+00 9.3501731e-05 5.8588421e-01 5.2567437e-05 7.8530698e-05
 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0]
	 Most complex word: combinations 


	 Sentence after substitution: formal minor planet names are number - name combinations overseen by the minor planet center , a branch of the iau .

Simplification complete.Untokenised sentence: by early on september 30 , wind shear began to dramatically increase and a weakening trend began .
Tokenised sentence: ['by', 'early', 'on', 'september', '30', ',', 'wind', 'shear', 'began', 'to', 'dramatically', 'increase', 'and', 'a', 'weakening', 'tren

 13%|█████▏                                  | 46/359 [10:12<1:03:15, 12.13s/it]

Suggested top 5 subtitutions: ['significantly', 'dramatically', 'steeply', 'rapidly', 'slowly']
Complex probs: [1.4766968e-04 2.8578136e-03 4.8469359e-05 4.1107549e-03 1.3589529e-04
 0.0000000e+00 1.4132556e-02 6.4800942e-01 4.6713703e-04 4.5452973e-05
 9.4713801e-01 2.9591852e-01 6.3630767e-05 1.0553831e-04 9.5904136e-01
 7.3095369e-01 5.6835683e-04 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0]
	 Most complex word: weakening 


	 Sentence after substitution: by early on september 30 , wind shear began to significantly increase and a weakening trend began .

Simplification complete.Untokenised sentence: each entry has a datum  (  a nugget of data  )  which is a copy of the datum in some backing store .
Tokenised sentence: ['each', 'entry', 'has', 'a', 'datum', '(', 'a', 'nugget', 'of', 'data', ')', 'which', 'is', 'a', 'copy', 'of', 'the', 'datum', 'in', 'some', 'backing', 'store', '.']
Complex probs: [3.5730039e-04 7.0273471e-01 1.6467007e-04 1.1372456

 13%|█████▍                                    | 47/359 [10:17<52:32, 10.10s/it]

Suggested top 5 subtitutions: ['entry', 'line', 'character', 'item', 'word']
Complex probs: [3.5730039e-04 7.0273471e-01 1.6467007e-04 1.1372456e-04 0.0000000e+00
 0.0000000e+00 1.2495898e-04 0.0000000e+00 4.9416329e-05 4.1912916e-01
 0.0000000e+00 2.2113178e-04 4.3597411e-05 1.4021073e-04 2.5912225e-01
 5.5472792e-05 9.0600181e-05 0.0000000e+00 5.0061877e-05 1.6553923e-04
 5.8215421e-01 2.8943992e-03 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
	 Most complex word: backing 


	 Sentence after substitution: each entry has a datum ( a nugget of data ) which is a copy of the datum in some backing store .

Simplification complete.Untokenised sentence: as a result , although many mosques will not enforce violations , both men and women when attending a mosque must adhere to these guidelines .
Tokenised sentence: ['as', 'a', 'result', ',', 'although', 'many', 'mosques', 'will', 'not', 'enforce', 'violations', ',', 'both', 'men', 'and', 'women', '

 13%|█████▌                                    | 48/359 [10:21<42:49,  8.26s/it]

Suggested top 5 subtitutions: ['rules', 'laws', 'guidelines', 'regulations', 'requirements']
Complex probs: [9.4770381e-05 1.2994208e-04 4.1516200e-02 0.0000000e+00 9.7656669e-04
 2.7983135e-04 8.2860416e-01 9.6765616e-05 1.6870229e-04 7.6013154e-01
 9.1032404e-01 0.0000000e+00 2.5199927e-04 3.8389387e-04 5.1713821e-05
 4.0545201e-04 6.1522573e-05 7.0240867e-01 8.5498585e-05 8.3485693e-01
 1.0777921e-03 6.7843533e-01 6.0637583e-05 6.3043984e-04 8.8019170e-02
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0]
	 Most complex word: violations 


	 Sentence after substitution: as a result , although many mosques will not enforce violations , both men and women when attending a mosque must adhere to these rules .

Simplification complete.Untokenised sentence: mariel of redwall is a fantasy novel by brian jacques , published in 1991 .
Tokenised sentence: ['mariel', 'of', 'redwall', 'is', 'a', 'fantasy', 'novel', 'by', 'brian', 'jacques', ',', 'pu

 14%|█████▋                                    | 49/359 [10:39<56:52, 11.01s/it]

Suggested top 5 subtitutions: ['published in', 'published', 'written in', 'originally published in', 'in']
Complex probs: [0.0000000e+00 5.8604281e-05 0.0000000e+00 5.4780659e-05 1.3010412e-04
 8.2040513e-01 8.6284816e-01 9.4428491e-05 2.4162700e-02 6.4132255e-01
 0.0000000e+00 8.6759871e-01 3.5937886e-05 6.5801892e-04 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 1 1 0 0 1 0 0 0 0 0]
	 Most complex word: novel 


	 Sentence after substitution: mariel of redwall is a fantasy novel by brian jacques , published in 1991 .

Simplification complete.Untokenised sentence: ryan prosser  (  born 10 july , 1988  )  is a professional rugby union player for bristol rugby in the guinness premiership .
Tokenised sentence: ['ryan', 'prosser', '(', 'born', '10', 'july', ',', '1988', ')', 'is', 'a', 'professional', 'rugby', 'union', 'player', 'for', 'bristol', 'rugby', 'in', 'the', 'guinness', 'premiership', '.']
Complex probs: [4.41764370e-02 0.00000000e+00 0.00000000e+00 1.22975716e-02
 1.0680

 14%|█████▌                                  | 50/359 [10:56<1:05:34, 12.73s/it]

Suggested top 5 subtitutions: ['a', 'a professional', 'an american', 'an english', 'an american professional']
Complex probs: [4.6501242e-02 0.0000000e+00 0.0000000e+00 1.2924700e-02 1.0881345e-04
 1.8047412e-03 0.0000000e+00 7.1266876e-04 0.0000000e+00 5.0076251e-05
 1.2441823e-04 7.6476890e-01 1.2095082e-01 6.2349267e-02 4.2400989e-05
 6.8386622e-02 6.6994858e-01 3.9038445e-05 8.2245177e-05 7.6668268e-01
 9.2064905e-01 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0]
	 Most complex word: premiership 


	 Sentence after substitution: ryan prosser ( born 10 july , 1988 ) is a rugby union player for bristol rugby in the guinness premiership .

Simplification complete.Untokenised sentence: like previous assessment reports , it consists of four reports , three of them from its working groups .
Tokenised sentence: ['like', 'previous', 'assessment', 'reports', ',', 'it', 'consists', 'of', 'four', 'reports', ',', 'three', 'of', 'them', 'from', 'its', 'w

 14%|█████▋                                  | 51/359 [11:11<1:08:55, 13.43s/it]

Suggested top 5 subtitutions: ['research', 'report', 'management', 'annual', 'study']
Complex probs: [4.4409581e-04 5.8232403e-01 7.5598371e-01 1.1483094e-02 0.0000000e+00
 8.8899826e-05 7.7151120e-01 5.6499925e-05 2.8944135e-04 9.2215780e-03
 0.0000000e+00 2.6254504e-04 5.2846295e-05 6.7390624e-04 6.7735498e-05
 2.9562379e-04 3.1559905e-03 2.8106369e-02 0.0000000e+00]
Binary complexity labels: [0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: consists 


	 Sentence after substitution: like previous research reports , it consists of four reports , three of them from its working groups .

Simplification complete.Untokenised sentence: their granddaughter hélène langevin - joliot is a professor of nuclear physics at the university of paris , and their grandson pierre joliot , who was named after pierre curie , is a noted biochemist .
Tokenised sentence: ['their', 'granddaughter', 'helene', 'langevin', '-', 'joliot', 'is', 'a', 'professor', 'of', 'nuclear', 'physics', 'at', 'th

 14%|█████▊                                  | 52/359 [11:19<1:01:07, 11.95s/it]

Suggested top 5 subtitutions: ['daughter', 'granddaughter', 'grand daughter', 'sister', 'youngest daughter']
Complex probs: [2.3854875e-04 3.4394238e-02 4.5742011e-01 0.0000000e+00 0.0000000e+00
 0.0000000e+00 4.5753171e-05 1.3728798e-04 4.6252590e-01 4.7566296e-05
 1.1661996e-01 8.6031127e-01 3.8721286e-05 7.3633244e-05 7.2083160e-02
 4.2059481e-05 4.4983798e-03 0.0000000e+00 6.8924368e-05 1.8021606e-04
 8.5859686e-01 6.2817049e-01 0.0000000e+00 0.0000000e+00 1.4653703e-04
 1.0164330e-04 1.3402116e-02 8.9195171e-05 5.5086523e-01 0.0000000e+00
 0.0000000e+00 3.9704493e-05 1.3117706e-04 2.2080243e-02 0.0000000e+00
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
	 Most complex word: physics 


	 Sentence after substitution: their daughter helene langevin - joliot is a professor of nuclear physics at the university of paris , and their grandson pierre joliot , who was named after pierre curie , is a noted biochemist .

Si

 15%|█████▉                                  | 53/359 [11:41<1:15:57, 14.89s/it]

Suggested top 5 subtitutions: ['and', 'and large', 'although smaller', 'though smaller', 'but']
Complex probs: [1.36941773e-04 7.83830941e-01 8.36472511e-01 1.14371855e-04
 2.64501646e-02 8.39483365e-02 8.29703093e-01 5.01008035e-05
 1.13391921e-04 8.64798188e-01 5.41726477e-05 2.99847554e-02
 0.00000000e+00 3.14080506e-04 6.12008333e-01 0.00000000e+00
 7.09619926e-05 9.47590053e-01 9.02803004e-05 7.52902627e-01
 0.00000000e+00]
Binary complexity labels: [0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0]
	 Most complex word: quantities 


	 Sentence after substitution: this stamp remained the standard letter stamp for the remainder of victoria ' s reign , and quantities were printed .

Simplification complete.Untokenised sentence: the international fight league was an american mixed martial arts  (  mma  )  promotion billed as the world ' s first mma league .
Tokenised sentence: ['the', 'international', 'fight', 'league', 'was', 'an', 'american', 'mixed', 'martial', 'arts', '(', 'mma', ')', 

 15%|██████                                  | 54/359 [11:57<1:16:57, 15.14s/it]

Suggested top 5 subtitutions: ['the world', 'or the', 'the ultimate', 'the', 'the first']
Complex probs: [1.58349678e-04 6.49227470e-04 5.01084514e-03 9.08347845e-01
 1.20083801e-04 1.91124898e-04 1.89628743e-03 8.97819400e-02
 6.72435224e-01 2.48345323e-02 0.00000000e+00 4.99553859e-01
 0.00000000e+00 6.79810643e-01 7.90941596e-01 6.92546455e-05
 1.03891056e-04 8.01336893e-04 0.00000000e+00 3.49233916e-04
 7.26439059e-04 6.76054776e-01 8.68392646e-01 0.00000000e+00]
Binary complexity labels: [0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0]
	 Most complex word: league 


	 Sentence after substitution: the world fight league was an american mixed martial arts ( mma ) promotion billed as the world ' s first mma league .

Simplification complete.Untokenised sentence: giardia lamblia  (  synonymous with lamblia intestinalis and giardia duodenalis  )  is a flagellated protozoan parasite that colonises and reproduces in the small intestine , causing giardiasis .
Tokenised sentence: ['giard

 15%|██████▏                                 | 55/359 [12:06<1:07:49, 13.39s/it]

Suggested top 5 subtitutions: ['species', 'cete', 'in nature', 'parasite', 'of origin']
Complex probs: [0.0000000e+00 0.0000000e+00 0.0000000e+00 9.1894424e-01 5.2020980e-05
 0.0000000e+00 0.0000000e+00 6.2501837e-05 0.0000000e+00 0.0000000e+00
 0.0000000e+00 4.1503426e-05 1.0129797e-04 0.0000000e+00 0.0000000e+00
 8.4659827e-01 7.1005590e-05 0.0000000e+00 7.5977252e-05 0.0000000e+00
 3.9558156e-05 7.1067916e-05 1.7438402e-03 0.0000000e+00 0.0000000e+00
 3.8832508e-02 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: synonymous 


	 Sentence after substitution: giardia lamblia ( synonymous with lamblia intestinalis and giardia duodenalis ) is a flagellated protozoan species that colonises and reproduces in the small intestine , causing giardiasis .

Simplification complete.Untokenised sentence: aside from this , cameron has often worked in christian - themed productions , among them the post - rapture 

 16%|██████▌                                   | 56/359 [12:13<58:43, 11.63s/it]

Suggested top 5 subtitutions: ['productions', 'projects', 'films', 'works', 'media']
Complex probs: [8.8695943e-01 9.5114039e-05 1.2325370e-04 0.0000000e+00 2.9132754e-02
 1.3205508e-04 1.0451986e-03 1.3319795e-02 4.3598076e-05 1.1008276e-01
 0.0000000e+00 6.4879709e-01 9.0445065e-01 0.0000000e+00 5.5325456e-04
 4.1896876e-04 8.5647502e-05 3.9544538e-02 0.0000000e+00 0.0000000e+00
 5.5734384e-01 2.8689194e-03 9.7385322e-04 0.0000000e+00 1.1318882e-04
 2.0837408e-02 0.0000000e+00 2.1388684e-03 1.0011252e-03 2.4836159e-03
 0.0000000e+00 0.0000000e+00 1.2497788e-02 0.0000000e+00 6.0355305e-05
 3.6066384e-03 9.9374913e-04 0.0000000e+00 6.8908266e-04 2.9771467e-05
 1.0952756e-03 0.0000000e+00 3.4427947e-05 2.0695121e-04 1.5880118e-04
 6.2745926e-03 4.2611413e-02 0.0000000e+00 7.2680339e-02 0.0000000e+00
 1.5937269e-02 0.0000000e+00]
Binary complexity labels: [1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Most complex word: aside 

 16%|██████▎                                 | 57/359 [12:32<1:08:24, 13.59s/it]

Suggested top 5 subtitutions: ['poland', 'prussia', 'warsaw', 'east prussia', 'germany']
Complex probs: [1.7340865e-04 1.4404311e-04 1.5948449e-04 1.9604770e-02 6.0319155e-03
 4.7118887e-05 1.0388986e-04 4.4707391e-01 5.5576624e-05 9.5731077e-05
 0.0000000e+00 1.6443131e-02 0.0000000e+00 7.2523206e-04 6.0166712e-03
 1.0435749e-03 0.0000000e+00 9.6106865e-03 2.8010067e-01 0.0000000e+00
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
	 Simplificaiton complete or no complex expression found.


	 Sentence after substitution: this was the area east of the mouth of the vistula river , later sometimes called " poland proper " .

Untokenised sentence: after graduation he returned to yerevan to teach at the local conservatory and later he was appointed artistic director of the armenian philarmonic orchestra .
Tokenised sentence: ['after', 'graduation', 'he', 'returned', 'to', 'yerevan', 'to', 'teach', 'at', 'the', 'local', 'conservatory', 'and', 'later', 'h

 16%|██████▍                                 | 58/359 [12:48<1:11:39, 14.28s/it]

Suggested top 5 subtitutions: ['became', 'as', 'became the', 'was', 'as the']
Complex probs: [1.3851443e-04 9.4446450e-01 1.6125679e-04 2.7294219e-01 6.7242254e-05
 8.5305184e-01 7.2938587e-05 8.5203201e-01 4.0572053e-05 7.9767968e-05
 1.8367175e-03 9.4971526e-01 7.5219432e-05 6.6519843e-04 1.0886146e-03
 9.7010851e-01 3.4108096e-01 3.8929065e-05 8.2481980e-05 5.9940028e-01
 0.0000000e+00 7.5396997e-01 0.0000000e+00]
Binary complexity labels: [0 1 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0]
	 Most complex word: artistic 


	 Sentence after substitution: after graduation he returned to yerevan to teach at the local conservatory and later became artistic director of the armenian philarmonic orchestra .

Simplification complete.Untokenised sentence: the story of christmas is based on the biblical accounts given in the gospel of matthew , namely  -  and the gospel of luke , specifically  -  .
Tokenised sentence: ['the', 'story', 'of', 'christmas', 'is', 'based', 'on', 'the', 'biblical', 'a

 16%|██████▌                                 | 59/359 [12:58<1:05:44, 13.15s/it]

Suggested top 5 subtitutions: ['namely', 'specifically', 'that is', 'also namely', 'such as']
Complex probs: [1.9097590e-04 1.7988889e-02 5.8148911e-05 8.4785990e-02 5.6313114e-05
 7.5026676e-02 5.5776527e-05 8.7720640e-05 8.1210542e-01 5.8720642e-01
 3.5508822e-03 3.8827773e-05 9.6037009e-05 7.5483483e-01 5.4614371e-05
 5.9147131e-01 0.0000000e+00 4.5062780e-01 0.0000000e+00 6.4219639e-05
 8.7249544e-05 7.0596099e-01 5.2907002e-05 5.0286913e-01 0.0000000e+00
 4.4129890e-01 0.0000000e+00 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0]
	 Most complex word: biblical 


	 Sentence after substitution: the story of christmas is based on the biblical accounts given in the gospel of matthew , namely - and the gospel of luke , namely - .

Simplification complete.Untokenised sentence: weelkes was later to find himself in trouble with the chichester cathedral authorities for his heavy drinking and immoderate behaviour .
Tokenised sentence: ['we

 17%|███████                                   | 60/359 [13:05<56:37, 11.36s/it]

Suggested top 5 subtitutions: ['behaviour', 'behavior', 'manner', 'nature', 'habits']
Complex probs: [0.0000000e+00 1.5032584e-04 1.4287654e-03 6.8978814e-05 2.1887189e-03
 1.5689623e-02 4.5297886e-05 5.6031120e-01 4.4490127e-05 8.7040789e-05
 8.2447052e-01 9.1301388e-01 9.0804416e-01 3.9784085e-05 2.2738523e-04
 3.5343304e-02 7.3084217e-01 6.4974054e-05 0.0000000e+00 9.2670697e-01
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0]
	 Most complex word: cathedral 


	 Sentence after substitution: weelkes was later to find himself in trouble with the chichester cathedral authorities for his heavy drinking and immoderate behaviour .

Simplification complete.Untokenised sentence: so far the ' celebrity ' episodes have included vic reeves , nancy sorrell , gaby roslin , scott mills , mark chapman , simon gregson , sue cleaver , carol thatcher , paul o ' grady and lee ryan .
Tokenised sentence: ['so', 'far', 'the', "'", 'celebrity', "'", 'episodes', 'have'

 17%|███████▏                                  | 61/359 [13:13<51:22, 10.34s/it]

Suggested top 5 subtitutions: ['guests', 'members', 'episodes', 'series', 'shows']
Complex probs: [2.0951721e-04 2.9507929e-03 1.1322455e-04 0.0000000e+00 7.4123573e-01
 0.0000000e+00 4.3112162e-01 1.3000901e-04 6.5053068e-02 6.4324832e-01
 6.9485134e-01 0.0000000e+00 5.2878821e-01 0.0000000e+00 0.0000000e+00
 0.0000000e+00 0.0000000e+00 0.0000000e+00 2.7368143e-01 3.2111537e-01
 0.0000000e+00 4.1242605e-03 6.1457711e-01 0.0000000e+00 3.8236108e-01
 0.0000000e+00 0.0000000e+00 4.9802673e-01 0.0000000e+00 0.0000000e+00
 4.1223758e-01 5.0265837e-01 0.0000000e+00 8.1284088e-04 1.0894960e-03
 0.0000000e+00 4.4962776e-01 6.9603499e-05 2.2051906e-02 3.9573327e-02
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
 0 0 0 0]
	 Most complex word: celebrity 


	 Sentence after substitution: so far the ' celebrity ' guests have included vic reeves , nancy sorrell , gaby roslin , scott mills , mark chapman , simon gregson , sue clea

 17%|███████▎                                  | 62/359 [13:25<53:04, 10.72s/it]

Suggested top 5 subtitutions: ['probe', 'station', 'port', 'drive', 'camera']
Complex probs: [1.8757390e-04 1.4234064e-04 8.4968668e-01 1.2331881e-04 1.1693531e-01
 1.5866283e-03 0.0000000e+00 0.0000000e+00 4.9832965e-05 2.5693072e-02
 6.0612030e-05 9.1966780e-05 7.2250289e-01 1.1301023e-04 1.7083983e-03
 9.4085723e-01 1.7310583e-03 3.8545808e-05 1.3336514e-03 7.2957373e-05
 0.0000000e+00 5.6701328e-04 1.1817225e-04 6.5672386e-01 2.8397457e-04
 8.9599961e-01 0.0000000e+00]
Binary complexity labels: [0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0]
	 Most complex word: jupiter 


	 Sentence after substitution: it was discovered by stephen p . synnott in images from the voyager 1 space probe taken on march 5 , 1979 while orbiting around jupiter .

Simplification complete.Untokenised sentence: gomaespuma was a spanish radio show , hosted by juan luis cano and guillermo fesser .
Tokenised sentence: ['gomaespuma', 'was', 'a', 'spanish', 'radio', 'show', ',', 'hosted', 'by', 'juan', '

 18%|███████▎                                  | 63/359 [13:40<59:12, 12.00s/it]

Suggested top 5 subtitutions: ['david', 'daniel', 'martin', 'antonio', 'jose']
Complex probs: [0.0000000e+00 1.3716068e-04 1.5043223e-04 1.9905458e-03 6.5261573e-02
 1.3293126e-03 0.0000000e+00 7.3482317e-01 8.9545007e-05 4.3780208e-01
 7.7768848e-03 0.0000000e+00 6.0020051e-05 1.0415611e-03 0.0000000e+00
 0.0000000e+00]
Binary complexity labels: [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
	 Most complex word: hosted 


	 Sentence after substitution: gomaespuma was a spanish radio show , hosted by juan luis cano and david fesser .

Simplification complete.Untokenised sentence: on 16 june 2009 , the official release date of the resistance was announced on the band ' s website .
Tokenised sentence: ['on', '16', 'june', '2009', ',', 'the', 'official', 'release', 'date', 'of', 'the', 'resistance', 'was', 'announced', 'on', 'the', 'band', "'", 's', 'website', '.']
Complex probs: [5.84098998e-05 1.06421154e-04 9.47877765e-04 1.16671575e-03
 0.00000000e+00 8.75847254e-05 3.68308634e-01 2.38194004e-01
