# Introduction

We tackle the problem of OCR post processing. In OCR, we map the image form of the document into the text domain. This is done first using an CNN+LSTM+CTC model, in our case based on tesseract. Since this output maps only image to text, we need something on top to validate and correct language semantics.

The idea is to build a language model, that takes the OCRed text and corrects it based on language knowledge. The langauge model could be:
- Char level: the aim is to capture the word morphology. In which case it's like a spelling correction system.
- Word level: the aim is to capture the sentence semnatics. But such systems suffer from the OOV problem.
- Fusion: to capture semantics and morphology language rules. The output has to be at char level, to avoid the OOV. However, the input can be char, word or both.

The fusion model target is to learn:

    p(char | char_context, word_context)

In this workbook we use seq2seq vanilla Keras implementation, adapted from the lstm_seq2seq example on Eng-Fra translation task. The adaptation involves:

- Adapt to spelling correction, on char level
- Pre-train on a noisy, medical sentences
- Fine tune a residual, to correct the mistakes of tesseract 
- Limit the input and output sequence lengths
- Enusre teacher forcing auto regressive model in the decoder
- Limit the padding per batch
- Learning rate schedule
- Bi-directional LSTM Encoder
- Bi-directional GRU Encoder


# Imports

In [1]:
from __future__ import print_function
import tensorflow as tf
import keras.backend as K
from keras.backend.tensorflow_backend import set_session
from keras.models import Model
from keras.layers import Input, LSTM, Dense, Bidirectional, Concatenate, GRU
from keras import optimizers
from keras.callbacks import ModelCheckpoint, TensorBoard, LearningRateScheduler
from keras.models import load_model
import numpy as np
import os
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from autocorrect import spell
import re
%matplotlib inline

Using TensorFlow backend.


# Utility functions

In [2]:
# Limit gpu allocation. allow_growth, or gpu_fraction
def gpu_alloc():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    set_session(tf.Session(config=config))

In [3]:
gpu_alloc()

In [4]:
def calculate_WER_sent(gt, pred):
    '''
    calculate_WER('calculating wer between two sentences', 'calculate wer between two sentences')
    '''
    gt_words = gt.lower().split(' ')
    pred_words = pred.lower().split(' ')
    d = np.zeros(((len(gt_words) + 1), (len(pred_words) + 1)), dtype=np.uint8)
    # d = d.reshape((len(gt_words)+1, len(pred_words)+1))

    # Initializing error matrix
    for i in range(len(gt_words) + 1):
        for j in range(len(pred_words) + 1):
            if i == 0:
                d[0][j] = j
            elif j == 0:
                d[i][0] = i

    # computation
    for i in range(1, len(gt_words) + 1):
        for j in range(1, len(pred_words) + 1):
            if gt_words[i - 1] == pred_words[j - 1]:
                d[i][j] = d[i - 1][j - 1]
            else:
                substitution = d[i - 1][j - 1] + 1
                insertion = d[i][j - 1] + 1
                deletion = d[i - 1][j] + 1
                d[i][j] = min(substitution, insertion, deletion)
    return d[len(gt_words)][len(pred_words)]

In [5]:
def calculate_WER(gt, pred):
    '''

    :param gt: list of sentences of the ground truth
    :param pred: list of sentences of the predictions
    both lists must have the same length
    :return: accumulated WER
    '''
#    assert len(gt) == len(pred)
    WER = 0
    nb_w = 0
    for i in range(len(gt)):
        #print(gt[i])
        #print(pred[i])
        WER += calculate_WER_sent(gt[i], pred[i])
        nb_w += len(gt[i])

    return WER / nb_w

In [6]:
def load_data_with_gt(file_name, num_samples, max_sent_len, min_sent_len, delimiter='\t', gt_index=1, prediction_index=0):
    '''Load data from txt file, with each line has: <TXT><TAB><GT>. The  target to the decoder muxt have \t as the start trigger and \n as the stop trigger.'''
    cnt = 0  
    input_texts = []
    gt_texts = []
    target_texts = []
    for row in open(file_name, encoding='utf8'):
        if cnt < num_samples :
            #print(row)
            sents = row.split(delimiter)
            input_text = sents[prediction_index]
            
            target_text = '\t' + sents[gt_index] + '\n'
            if len(input_text) > min_sent_len and len(input_text) < max_sent_len and len(target_text) > min_sent_len and len(target_text) < max_sent_len:
                cnt += 1
                
                input_texts.append(input_text)
                target_texts.append(target_text)
                gt_texts.append(sents[gt_index])
    return input_texts, target_texts, gt_texts

In [7]:
def load_data(file_name, num_samples, max_sent_len, min_sent_len):
    '''Load data from txt file, with each line has: <TXT><TAB><GT>. The  target to the decoder muxt have \t as the start trigger and \n as the stop trigger.'''
    cnt = 0  
    input_texts = []   
    
    #for row in open(file_name, encoding='utf8'):
    for row in open(file_name):
        if cnt < num_samples :            
            input_text = row           
            if len(input_text) > min_sent_len and len(input_text) < max_sent_len:
                cnt += 1                
                input_texts.append(input_text)
    return input_texts

In [8]:
def vectorize_data(input_texts, max_encoder_seq_length, num_encoder_tokens, vocab_to_int):
    
    if(len(input_texts) > max_encoder_seq_length):
        input_texts = input_texts[:max_encoder_seq_length]
    
    '''Prepares the input text and targets into the proper seq2seq numpy arrays'''
    encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length),
    dtype='float32')
    
    for i, input_text in enumerate(input_texts):
        for t, char in enumerate(input_text[:max_encoder_seq_length]):
            # c0..cn
            encoder_input_data[i, t] = vocab_to_int[char]
                
    return encoder_input_data

In [9]:
def decode_sequence(input_seq, encoder_model, decoder_model, num_decoder_tokens, max_decoder_seq_length, vocab_to_int, int_to_vocab):
    
    #print(max_decoder_seq_length)
    # Encode the input as state vectors.
    encoder_outputs, h, c  = encoder_model.predict(input_seq)
    states_value = [h,c]
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0] = vocab_to_int['\t']

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    #print(input_seq)
    attention_density = []
    i = 0
    special_chars = ['\\', '/', '-', '—' , ':', '[', ']', ',', '.', '"', ';', '%', '~', '(', ')', '{', '}', '$', '#']
    #special_chars = []
    while not stop_condition:
        #print(target_seq)
        output_tokens, attention, h, c  = decoder_model.predict(
            [target_seq, encoder_outputs] + states_value)
        #print(attention.shape)
        attention_density.append(attention[0][0])# attention is max_sent_len x 1 since we have num_time_steps = 1 for the output
        # Sample a token
        #print(output_tokens.shape)
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        
        #print(sampled_token_index)
        sampled_char = int_to_vocab[sampled_token_index]
        
        orig_char = int_to_vocab[int(input_seq[:,i][0])]
        
        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True
            #print('End', sampled_char, 'Len ', len(decoded_sentence), 'Max len ', max_decoder_seq_length)
            sampled_char = ''
        
        # Copy digits as it, since the spelling corrector is not good at digit corrections
        
        if(orig_char.isdigit() or orig_char in special_chars):
            decoded_sentence += orig_char            
        else:
            if(sampled_char.isdigit() or sampled_char in special_chars):
                decoded_sentence += ''
            else:
                decoded_sentence += sampled_char
        
        #decoded_sentence += sampled_char


        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]
        
        i += 1
        if(i > 48):
            i = 0
    attention_density = np.array(attention_density)
    
    # Word level spell correct
    '''
    corrected_decoded_sentence = ''
    for w in decoded_sentence.split(' '):
        corrected_decoded_sentence += spell(w) + ' '
    decoded_sentence = corrected_decoded_sentence
    '''
    return decoded_sentence, attention_density


In [60]:
def word_spell_correct(decoded_sentence):
    if(decoded_sentence == ''):
        return ''
    corrected_decoded_sentence = ''
    special_chars = ['\\', '/', '-', '—' , ':', '[', ']', ',', '.', '"', ';', '%', '~', '(', ')', '{', '}', '$', '&', '#', '☒', '■', '☐', '□', '☑', '@']
    for w in decoded_sentence.split(' '):
        #print(w)
        if((len(re.findall(r'\d+', w))==0) and not (w in special_chars)):
            corrected_decoded_sentence += spell(w) + ' '
        else:
            corrected_decoded_sentence += w + ' '
    return corrected_decoded_sentence

In [11]:
def clean_up_sentence(sentence, vocab):
    s = ''
    prev_char = ''
    for c in sentence.strip():
        if c not in vocab or (c == ' ' and prev_char == ' '):
            s += ''
        else:
            s += c
        prev_char = c
            
    return s

# Load data

# Load model params

In [12]:
data_path = '../../dat/'

In [13]:
max_sent_lengths = [50, 100]

In [14]:
vocab_file = {}
model_file = {}
encoder_model_file = {}
decoder_model_file = {}
model = {}
encoder_model = {}
decoder_model = {}
vocab = {}
vocab_to_int = {}
int_to_vocab = {}
max_sent_len = {}
min_sent_len = {}
num_decoder_tokens = {}
num_encoder_tokens = {}
max_encoder_seq_length = {}
max_decoder_seq_length = {}

In [15]:

for i in max_sent_lengths:
    vocab_file[i] = 'vocab-{}.npz'.format(i)
    model_file[i] = 'best_model-{}.hdf5'.format(i)
    encoder_model_file[i] = 'encoder_model-{}.hdf5'.format(i)
    decoder_model_file[i] = 'decoder_model-{}.hdf5'.format(i)
    
    vocab = np.load(file=vocab_file[i])
    vocab_to_int[i] = vocab['vocab_to_int'].item()
    int_to_vocab[i] = vocab['int_to_vocab'].item()
    max_sent_len[i] = vocab['max_sent_len']
    min_sent_len[i] = vocab['min_sent_len']
    input_characters = sorted(list(vocab_to_int))
    num_decoder_tokens[i] = num_encoder_tokens[i] = len(input_characters) #int(encoder_model.layers[0].input.shape[2])
    max_encoder_seq_length[i] = max_decoder_seq_length[i] = max_sent_len[i] - 1#max([len(txt) for txt in input_texts])
    
    model[i] = load_model(model_file[i])
    encoder_model[i] = load_model(encoder_model_file[i])
    decoder_model[i] = load_model(decoder_model_file[i])



In [16]:
num_samples = 1000000
#tess_correction_data = os.path.join(data_path, 'test_data.txt')
#input_texts = load_data(tess_correction_data, num_samples, max_sent_len, min_sent_len)

OCR_data = os.path.join(data_path, 'new_trained_data.txt')
#input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len, min_sent_len, delimiter='|',gt_index=0, prediction_index=1)
input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len=10000, min_sent_len=0)

In [17]:
# Sample data
print(len(input_texts))
for i in range(10):
    print(input_texts[i], '\n', target_texts[i])

1951
Me dieal Provider Roles: Treating  
 	Medical Provider Roles: Treating


Provider First Name: Christine  
 	Provider First Name: Christine


Provider Last Name: Nolen, MD  
 	Provider Last Name: Nolen, MD


Address Line 1 : 7 25 American Avenue  
 	Address Line 1 : 725 American Avenue


City. W’aukesha  
 	City: Waukesha


StatefProvinee: ‘WI  
 	State/Province: WI


Postal Code: 5 31 88  
 	Postal Code: 53188


Country". US  
 	Country:  US


Business Telephone: (2 62) 92 8- 1000  
 	Business Telephone: (262) 928- 1000


Date ot‘Pirst Visit: 1 2/01f20 17  
 	Date of First Visit: 12/01/2017




In [18]:
# Spell correct before inference
'''
input_texts_ = []
for sent in input_texts:
    sent_ = ''
    for word in sent.split(' '):
        sent_ += spell(word) + ' '
    input_texts_.append(sent_)
input_texts = input_texts_
input_texts_ = []
# Sample data
print(len(input_texts))
for i in range(10):
    print(input_texts[i], '\n', target_texts[i])
'''

"\ninput_texts_ = []\nfor sent in input_texts:\n    sent_ = ''\n    for word in sent.split(' '):\n        sent_ += spell(word) + ' '\n    input_texts_.append(sent_)\ninput_texts = input_texts_\ninput_texts_ = []\n# Sample data\nprint(len(input_texts))\nfor i in range(10):\n    print(input_texts[i], '\n', target_texts[i])\n"

In [19]:
decoded_sentences = []
corrected_sentences = []

#for seq_index in range(len(input_texts)):
results = open('RESULTS.md', 'w')
results.write('|OCR sentence|GT sentence|Char decoded sentence|Word decoded sentence|Sentence length (chars)|\n')
results.write('---------------|-----------|----------------|----------------|----------------|\n')
     

for i, input_text in enumerate(input_texts):
    #print(input_text)
    # Find the input length range to choose the proper model to use
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])
    
    

    target_text = gt_texts[i]
    
    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)
    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    print('GT sentence:', target_text.strip())
    print('Char Decoded sentence:', decoded_sentence)   
    print('Word Decoded sentence:', corrected_sentence) 
    results.write(' | ' + input_text + ' | ' + target_text.strip() + ' | ' + decoded_sentence + ' | ' + corrected_sentence + ' | ' + str(len_range) + ' | \n')
    decoded_sentences.append(decoded_sentence)
    corrected_sentences.append(corrected_sentence)
results.close()    

    

-Lenght =  50
Input sentence: Me dieal Provider Roles: Treating
GT sentence: Medical Provider Roles: Treating
Char Decoded sentence: Medical Provider Roles:Treating
Word Decoded sentence: Me deal Provider Roles Treating 
-Lenght =  50
Input sentence: Provider First Name: Christine
GT sentence: Provider First Name: Christine
Char Decoded sentence: Provider First Name: Christine
Word Decoded sentence: Provider First Name Christine 
-Lenght =  50
Input sentence: Provider Last Name: Nolen, MD
GT sentence: Provider Last Name: Nolen, MD
Char Decoded sentence: Provider Last Name: Norle, MD
Word Decoded sentence: Provider Last Name Dolens MD 
-Lenght =  50
Input sentence: Address Line 1 : 7 25 American Avenue
GT sentence: Address Line 1 : 725 American Avenue
Char Decoded sentence: Address Line 1:7A25ent Admentine Ave
Word Decoded sentence: Address Line 1 : 7 25 American Avenue 
-Lenght =  50
Input sentence: City. W’aukesha
GT sentence: City: Waukesha
Char Decoded sentence: City.States
Word Dec

KeyboardInterrupt: 

In [25]:
input_texts = ['SUBJECTIVE: This is a S-year-old +@W his left great toe with the handleh lacration.',
               'Thera was no handlebarthe lacration.',
               'Patiet last tet is needing this for school at this',
               'OBJECTIVE : The temp is 99.8, the f tha blood pressure 99/64, O2 sat 94 8/10 at this time.',
               'Left great toe the dorsl surface, extending ta th active hemorrhage at this time.',
               'Th anaathetized with a cotton ball sat Left this in place for 20 minutes.',
               'with Betadine again and injected th he tolerated very well.',
               'The wound sutures. Antibiotic eintment and g',
               'Patient tolerated very well. Pat',
               'IMPRESSION: Lacration te left grs',
               'PLAN: Patent is to do dressing ch advised as far as checking tha waurn it with soap and water.',
               'Sutures oy have any problems prior te that tim ona teaspoon three times a day rer Ibuprofan far pain, discomfort. Cg',
               'hite male who accidently dropped a bike onto ar end hitting the left great toe,'
               'causing a guard to the end of the bike, which caused anus shot is more that three years ago and time.',
               'nlse ef 105 and regular, resprations 286,% on room air.',
               'Patient rates hia pain at — there i15 noted a 3-om laceration across a lateral aspect of tha toe.',
               'There is no e toa ir cleansed with Betadine.',
               'It is then urated with 5 cu of 2% Hylocaine plain.',
               'We then cleansed ae toa with 3 cc of 2% Xylacaina plain',
               'which was then clesed with five 5-0 Prolene ressure dressing was then applied to the tos',
               'paient is given DPT 0.5 ee intramucular (IM).at toe.',
               'Kefylex 250 mg per 5 ml, the next seven days.',
               'He may use Tylenol or 11 if any problems.',
               'Unum Life Insurance Company of America 2211',               
               'Congress Street Portland, Maine 04122',
               'APPLICATION FOR GROUP CRITICAL LLNESS INSURANCE',
               'I Evidence of Insurability',
               '',
               'Application Type: @ New Enrollee Change to',
               'Existing Coverage  Reinstatement  Internal',
               'Replacement  Late Applicant  Rehire SECTION 1:',
               'Employee(Applicant) Information  Always',
               'Complete Employee Name(First, Middle, Last)',
               'Social Security Number Nikolas J Jones',
               '123 - 456 - 7890 Home Address(Street/ PO Box)',
               'Gender 1634 Stewert St  F  M City Date of Birth',
               '(mm / dd / yyyy) Seattle 06 / 15 / 1991 State Zip',
               'Code Home Phone # Washington 98101 854-555-1212',
               'Are you Actively at Work? Employee ID / Payroll #',
               ' Yes  No55624 a.Are you a U.S.Citizen or',
               'Canadian Citizen working in the U.S.? b.Are you',
               'legally authorized to work in  Yes  No(If No',
               'reply to part b) the U.S.?  Yes  No Employer',
               'Name Group Number Date of Hire(mm/ dd / yyyy)',
               'Facebook 11 - 555566 11 / 30 / 2016 Occupation',
               'Eligibility Class Software Engineer 7 Scheduled',
               'Number of Work Hours per Week Work Phone # 35',
               '854-555-6622 SECTION 2: Spouse Information ',
               'Complete Only if applying for Spouse coverage Name',
               '(First, Middle, Last) Social Security Number',
               'Gender Date of Birth(mm / dd / yyyy) Does the',
               '1019 - 07 - AZ 1',
              'if claint is for a child, please state your relationship 10 the child',
              'date of accident 3d _ time of accident ram. 0 p.m.',
              'have you slopped working? (of yes [1 no if yes, what was the last day that you worked? (mm/ddryy)_| —3 | —{% cnslamegs bil =']
               
for input_text in input_texts:
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    pre_corrected_sentence = word_spell_correct(input_text)
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    target_text = gt_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    print('Word Decoded sentence:', corrected_sentence) 
    print('\n')



Input sentence: SUBJECTIVE: This is a S-year-old +@W his left great toe with the handleh lacration.
Word Decoded sentence: SUBJECTIVES This is a S-year-old New his left great toe with the handled laceration 


Input sentence: Thera was no handlebarthe lacration.
Word Decoded sentence: Thera was no handlebarthe laceration 


Input sentence: Patiet last tet is needing this for school at this
Word Decoded sentence: Patient last tet is needing this for school at this 


Input sentence: OBJECTIVE : The temp is 99.8, the f tha blood pressure 99/64, O2 sat 94 8/10 at this time.
Word Decoded sentence: OBJECTIVE : The temp is 99.8, the f tha blood pressure 99/64, O2 sat 94 8/10 at this time 


Input sentence: Left great toe the dorsl surface, extending ta th active hemorrhage at this time.
Word Decoded sentence: Left great toe the dorsal surface extending ta th active hemorrhage at this time 


Input sentence: Th anaathetized with a cotton ball sat Left this in place for 20 minutes.
Word Decode

In [32]:

input_texts = ['text',
'',
'',
'',
'',
' ',
'',
'',
'',
'Fai',
'10',
'7521509',
'(FISTDEOO)',
'at',
'11/3/2017',
'5:23:19',
'from',
'-9373834004',
'Req',
'IC',
'2017:1030525109:292E.',
'Page',
'4',
'of',
'5',
'(C)',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'11/3/2017',
'FRI',
'8:26',
'FAX',
'2373834004',
'Kjooas00s',
'',
'',
'',
'as3-ursasy3',
'11:30:11',
'11/2/2017',
'vis',
'',
'',
'',
'®',
'®',
'&',
'ACCIDENT',
'CLAIM',
'FORM',
'',
'uu',
'num’',
'Tha',
'Benelits',
'Canter',
'',
'P.O.',
'Bax',
'100158,',
'Calumbin,',
'EC',
'20202-3150',
'',
'Tol-frea:',
'1-800-635-5587',
'Fax:',
'1-800-447-2488',
'',
'Gall',
'toll-free',
'Monday',
'through',
'Friday,',
'8',
'a.m.',
'lo',
'8',
'p.m,',
'Eagtarn',
'Time.',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'[',
'ATTENDING',
'PHYSICIAN',
'STATEMENT',
']',
'',
'',
'IneurexiPolicyt',
'alcar',
'Hama',
'(Lael',
'Name,',
'Flis!',
'Nama,',
'MI,',
'Suffix)',
'Data',
'of',
'Risth',
'{msmidrfyy)',
'-',
'',
'',
'Faupi',
'Nana',
'{Laut',
'Hume,',
'Flial',
'Numa,',
'1',
'Sut)',
'Dats',
'al',
'Bln',
'rAvad)',
'Ul',
'_',
'',
'-[ECIpENT',
'DETAILS',
']',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'a',
'thls',
'Gundilan',
'the',
'result',
'of',
'a',
'acddental',
'inury?',
'ves',
'O',
'No',
'if',
'yas,',
'dale',
'of',
'accident',
'qre/ddlyy)',
'[1',
'0]',
'[z]e',
'[=]',
'',
'',
'',
'Is',
'Mig',
'condition',
'Lhe',
'result',
'of',
'hefer',
'employment',
'£1',
'Yes',
'pNo',
'[1',
'Unknown',
'',
'',
'',
'Plaaze',
'verily',
'treatment',
'for',
'the',
'accident',
'lalad',
'above.',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'Dalaw',
'of',
'Diagnosis',
'Diagncsis',
'Description',
'Prosadure',
'Procedure',
'Dascription',
'',
'Branden',
'(Including',
'|',
'Cudo',
'(GD)',
'ous',
'',
'Confinement)',
'eR',
'ap',
'HAS',
'TTT',
'',
'BEEF',
'eR',
'',
'wiz]',
'.',
'S33,5XxA',
'Hh',
'rioes',
'ey',
'race',
'Word',
'',
'awqd]',
'',
'weak',
'3',
'n',
'[aveny',
'[d',
'',
'wifi',
'Wl',
'',
'oa',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'Has',
'lhe',
'pallet',
'bean',
'trastad',
'for',
'tha',
'same',
'ar',
'&',
'S(tilar',
'candillan',
'by',
'anolher',
'phyalelan',
'In',
'tha',
'past?',
'[1',
'Yen',
'Bho',
'',
'M',
'yor,',
'pioona',
'provid',
'tha',
'fares:',
'',
'',
'',
' ',
'',
'',
'',
'Diageosis:',
'Tramiment',
'Daten:',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'id',
'ya.1',
'#dving',
'Lhe',
'patient',
'to',
'clap',
'working?',
'RECEIVED',
'',
'It',
'yes,',
'B8',
'of',
'what',
'cate?',
'(mmidkyy)',
'',
'',
'',
'[23]',
'[117]',
'',
'',
'',
'[Ih',
'cielih',
'fa',
'rotated',
'to',
'normal',
'prepnency,',
'please',
'grovida',
'tha',
'idliawing:',
'NOV',
'',
'Expecigd',
'Delivery',
'Dale',
'(mimicd/yy)',
'Aclual',
'Delivery',
'Dale',
'{mmiddlyy',
'',
'',
'',
' ',
'',
'',
'',
'Phyeiclan',
'informaiton',
'HUMAN',
'REGOURCITE',
'',
'',
'',
'FRAUD',
'NOTICE:',
'Any',
'person',
'wha',
'knowingly',
'files',
'&',
'statement',
'of',
'clalm',
'containing',
'FALSE',
'or',
'misleading',
'information',
'8',
'',
'subject',
'to',
'criminal',
'and',
'elvil',
'penallies.',
'This',
'includes',
'Attending',
'Physician',
'portions',
'of',
'the',
'claim',
'farm.',
'',
'',
'',
'CS',
'yma',
'SEAS',
'Ta',
'hve',
'glan',
'=',
'',
'The',
'above',
'statements',
'ara',
'trun',
'And',
'rompints',
'to',
'tho',
'bot',
'of',
'my',
'knowledge',
'and',
'bolluf.',
'',
'',
'',
'Physician',
'Name',
'(Lea!',
'Name,',
'Firat',
'Name,',
'MI,',
'Suita)',
'Plases',
'Print',
'Co',
'FHman',
'log',
'Mm',
'',
'/',
'‘',
'',
'',
'',
'Medical',
'Speclaty',
'[Tr',
'eactal-',
']',
'|',
'D',
'of',
'r',
'of',
'Ch',
'2',
'',
'2',
'Le',
'',
'',
'==',
'Zoi!',
'M',
'o',
'“Fanart',
'',
'',
'=',
'Balfrone',
'ie',
'2',
'Sle',
'iu',
'',
'il',
'HY',
'BY',
'1942',
'Fax',
'Number',
'yz—',
'43',
'-8',
'7775',
'Fhyalafans',
'Tax',
'ID',
'Number.',
'',
'',
'',
'Aro',
'you',
'refateq',
'to',
'hiv',
'pollen?',
'0',
'Yoe',
'LlMo',
'|',
'yes,',
'wal',
'iv',
'the',
'relelianshipT',
'',
'',
'',
' ',
'',
'',
'',
' ',
' ',
'',
'',
'',
'Physlclan',
'Slgnature',
'Date',
'',
'CL-1023',
'-2717',
'=',
'',
'',
'',
' ',
'',
'',
'',
'—',]
               
for input_text in input_texts:
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    pre_corrected_sentence = word_spell_correct(input_text)
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    target_text = gt_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    #print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    
    #print('Word Decoded sentence:', corrected_sentence) 
    print(corrected_sentence) 
    #print('\n')



text 








Fai 
10 
7521509 
(FISTDEOO) 
at 
11/3/2017 
5:23:19 
from 
-9373834004 
Req 
IC 
2017:1030525109:292E. 
Page 
4 
of 
5 
Act 











11/3/2017 
FRI 
8:26 
FAX 
2373834004 
Kjooas00s 



as3-ursasy3 
11:30:11 
11/2/2017 
vis 



a 
a 
a 
ACCIDENT 
CLAIM 
FORM 

UU 
numb 
Tha 
Benefits 
Canter 

Poor 
Bax 
100158, 
Calumbin, 
EC 
20202-3150 

Tol-frea: 
1-800-635-5587 
Fax 
1-800-447-2488 

Gall 
toll-free 
Monday 
through 
Friday 
8 
am 
lo 
8 
pm 
Eastern 
Time 







































[ 
ATTENDING 
PHYSICIAN 
STATEMENT 
] 


IneurexiPolicyt 
altar 
Hama 
Lael 
Name 
Flisk 
Naman 
MID 
Suffix 
Data 
of 
Isth 
{msmidrfyy) 
- 


Fault 
Nana 
Claut 
Humet 
Filial 
Numac 
1 
Suth 
Days 
al 
Ban 
raved 
Ul 
a 

-[ECIpENT 
DETAILS 
] 



















a 
this 
Gundilan 
the 
result 
of 
a 
accidental 
inury 
yes 
O 
No 
if 
yas 
dale 
of 
accident 
qre/ddlyy) 
[1 
0] 
[z]e 
[=] 



Is 
Mig 
condition 
Lhe 
result 
of 
refer 
employment 
£1 
Yes 
no 
[1 
Unk

In [51]:
input_texts = ['☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement']
for input_text in input_texts:
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    print(input_text)
    pre_corrected_sentence = word_spell_correct(input_text)


    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    target_text = gt_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    
    print('Word Decoded sentence:', corrected_sentence) 
    #print(corrected_sentence) 
    #print('\n')



☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement
☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement 
☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement 


# Handwriting correction

In [54]:
num_samples = 1000000

OCR_data = os.path.join(data_path, 'handwritten_output.txt')
input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len=10000, min_sent_len=0, delimiter='|', gt_index=0, prediction_index=1)

# Sample data
print(len(input_texts))
for i in range(100):
    print(input_texts[i], '\n', target_texts[i])

3749
 is insisting on a policy of change . 
 
 	is insisting on a policy of change . 

 12/29/17 
 
 	12/29/17 

 SIAL)TH 
 
 	SLP(L) THA 

 Arcadia CA 91007 
 
 	Arcadia CA 91007 

 (012) 667 9375 
 
 	(012) 6674375 

 In this 200-fathom trench the herring do not tood the botton . 
 
 	In this 200-fathom trench the herring do not touch the bottom . 

 43638556X1 
 
 	43638556X1 

 Pretoria 
 
 	Pretoria 

 fiddaling about with bils of cost . 
 
 	fiddling about with bills of cost . 

 ( Fig. 3) . Loop threed tound liite finger t 
 
 	( Fig. 3 ) . Loop thread round little finger , 

 200681383 
 
 	200681383 

 for a working week of 34 to 36 houns . 
 
 	for a working week of 34 to 36 hours . 

 Daugher 
 
 	Daugther 

 Electronically Signed 
 
 	Electronically Signed 

 15122 
 
 	15122 

 50 
 
 	50 

 ShE WAS MOVVD A Picvic TablE TO SWEAP leAveS And DROREd iT oN hER BSC 
 
 	ShE wAs Moving A Picnic TAble To swEEp lEAvEs And DRoPEd iT on hER ToE 

 0724603309 
 
 	0724603509 

 lwas 

In [62]:
#for seq_index in range(len(input_texts)):
results = open('RESULTS_HW.md', 'w')
results.write('|HW sentence|Corrected sentence|GT sentence|\n')
results.write('|---------------|-----------|----------------|\n')

for input_text, target_text in zip(input_texts, target_texts):
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    #print(input_text)
    pre_corrected_sentence = word_spell_correct(input_text)
    #print(pre_corrected_sentence)
    
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    #target_text = target_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    
    print('Word Decoded sentence:', corrected_sentence)
    print('Ground truth sentence:', target_text)
    results.write(' | ' + input_text + ' | ' + corrected_sentence + ' | '+ target_text.strip() + ' | \n')
    #print(corrected_sentence) 
    #print('\n')
results.close()


Input sentence: is insisting on a policy of change .
Word Decoded sentence: is insisting on a policy of change . 
Ground truth sentence: 	is insisting on a policy of change . 

Input sentence: 12/29/17
Word Decoded sentence: 12/29/17 
Ground truth sentence: 	12/29/17 

Input sentence: SIAL)TH
Word Decoded sentence: SIAL)TH 
Ground truth sentence: 	SLP(L) THA 

Input sentence: Arcadia CA 91007
Word Decoded sentence: Arcadia CA 91007 
Ground truth sentence: 	Arcadia CA 91007 

Input sentence: (012) 667 9375
Word Decoded sentence: (012) 667 9375 
Ground truth sentence: 	(012) 6674375 

Input sentence: In this 200-fathom trench the herring do not tood the botton .
Word Decoded sentence: In this 200-fathom trench the herring do not good the cotton . 
Ground truth sentence: 	In this 200-fathom trench the herring do not touch the bottom . 

Input sentence: 43638556X1
Word Decoded sentence: 43638556X1 
Ground truth sentence: 	43638556X1 

Input sentence: Pretoria
Word Decoded sentence: Pretori

Input sentence: Gabarel Adam Viles
Word Decoded sentence: Gabarel Adam Miles 
Ground truth sentence: 	Gabriel Adam VIles 

Input sentence: whiplash- S13.9X85
Word Decoded sentence: whiplash S13.9X85 
Ground truth sentence: 	whiplash - S13.4XXD 

Input sentence: 9:15
Word Decoded sentence: 9:15 
Ground truth sentence: 	9:15 

Input sentence: 3/2/2018
Word Decoded sentence: 3/2/2018 
Ground truth sentence: 	3/2/2018 

Input sentence: 0825648359
Word Decoded sentence: 0825648359 
Ground truth sentence: 	0825648359 

Input sentence: mother
Word Decoded sentence: mother 
Ground truth sentence: 	mother 

Input sentence: Robertson
Word Decoded sentence: Robertson 
Ground truth sentence: 	Robertson 

Input sentence: 045149682X2
Word Decoded sentence: 045149682X2 
Ground truth sentence: 	045149682X2 

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: 0834540900
Word Decoded sentence: 0834540900 
Gr

Input sentence: Americans say Giermany is having it too
Word Decoded sentence: Americans say Germany is having it too 
Ground truth sentence: 	Americans say Germany is having it too 

Input sentence: being limited or an adjutment being made
Word Decoded sentence: being limited or an adjustment being made 
Ground truth sentence: 	being limited or an adjustment being made 

Input sentence: 2/25/18
Word Decoded sentence: 2/25/18 
Ground truth sentence: 	2/25/18 

Input sentence: Mis Nelaney of the script , and the great advantages
Word Decoded sentence: Mis Delaney of the script , and the great advantages 
Ground truth sentence: 	Miss Delaney of the script , and the great advantages 

Input sentence: 1906-09-03
Word Decoded sentence: 1906-09-03 
Ground truth sentence: 	1986-09-05 

Input sentence: 30 7 18
Word Decoded sentence: 30 7 18 
Ground truth sentence: 	30 7 18 

Input sentence: a man to mate sinple , stright- fervard thngs , and in
Word Decoded sentence: a man to mate single , str

Input sentence: reported on Mr. Weaver . He beiered
Word Decoded sentence: reported on Mrp Weaver . He briered 
Ground truth sentence: 	reported on Mr. Weaver . He believed 

Input sentence: 3/8/18
Word Decoded sentence: 3/8/18 
Ground truth sentence: 	3/8/18 

Input sentence: Fefasing to sit round the conference table .
Word Decoded sentence: Feasing to sit round the conference table . 
Ground truth sentence: 	refusing to sit round the conference table . 

Input sentence: P.O. Box 8012 Gneenstone 1616
Word Decoded sentence: Poor Box 8012 Greenstone 1616 
Ground truth sentence: 	P.O. Box 8012 Greenstone 1616 

Input sentence: DR. KeiTh Helton
Word Decoded sentence: DRY Keith Heston 
Ground truth sentence: 	Dr. KeiTh Helton 

Input sentence: 02/27/18
Word Decoded sentence: 02/27/18 
Ground truth sentence: 	02/27/18 

Input sentence: NtP
Word Decoded sentence: NTP 
Ground truth sentence: 	NIP 

Input sentence: 4
Word Decoded sentence: 4 
Ground truth sentence: 	4 

Input sentence: cinten

Input sentence: 810 The met, Grand National Boulivard, milnerton Ridg
Word Decoded sentence: 810 The met Grand National Boulivard, Millerton Ring 
Ground truth sentence: 	B16 The Met, Grand National Boulevard, Milnerton Ridge 

Input sentence: WARBNER STR. At SOmRRSKT WRS
Word Decoded sentence: WARNER STRA At SOmRRSKT WAS 
Ground truth sentence: 	WARBLER STR. 21 SOMERSET WES 

Input sentence: 02-26-18
Word Decoded sentence: 02-26-18 
Ground truth sentence: 	02-26-18 

Input sentence: "
Word Decoded sentence: " 
Ground truth sentence: 	" 

Input sentence: 083 685 0052
Word Decoded sentence: 083 685 0052 
Ground truth sentence: 	083 685 0052 

Input sentence: to disaus a commoh course of action . Sir Roy is
Word Decoded sentence: to disas a common course of action . Sir Roy is 
Ground truth sentence: 	to discuss a common course of action . Sir Roy is 

Input sentence: Feb 21/2018
Word Decoded sentence: Feb 21/2018 
Ground truth sentence: 	Feb 2nd 2018 

Input sentence: 04102
Word Decoded

Input sentence: 073 8634804
Word Decoded sentence: 073 8634804 
Ground truth sentence: 	073 863 4804 

Input sentence: blown ap . He has now revealed his full plants
Word Decoded sentence: blown ap . He has now revealed his full plants 
Ground truth sentence: 	blown up . He has now revealed his full plans 

Input sentence: RSA
Word Decoded sentence: RSA 
Ground truth sentence: 	RSA 

Input sentence: 129173288
Word Decoded sentence: 129173288 
Ground truth sentence: 	1295173288 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: Northern Rhodesia is a member of the Federation
Word Decoded sentence: Northern Rhodesia is a member of the Federation 
Ground truth sentence: 	Northern Rhodesia is a member of the Federation . 

Input sentence: baycatting the London talks on the
Word Decoded sentence: baycatting the London talks on the 
Ground truth sentence: 	boycotting the London talks on the 

Input sentence: 50 Halkett Str
Word Decoded sentence: 50

Input sentence: Hf Swegelaor
Word Decoded sentence: Hf Swegelaor 
Ground truth sentence: 	Hf Swiegelaa 

Input sentence: 04-19-18 approx
Word Decoded sentence: 04-19-18 approx 
Ground truth sentence: 	04-19-18 approx 

Input sentence: GAROENE RonO 3R, GATE2 MEPRANO, 1685
Word Decoded sentence: GARONNE rond 3R, GATE2 MEPRANO, 1685 
Ground truth sentence: 	GARDENS ROAD 38 , GATE 2 , MIDRAND , 1685 

Input sentence: would still fovw the abolitiom of the Hause
Word Decoded sentence: would still fow the abolition of the Hause 
Ground truth sentence: 	would still favour the abolition of the House 

Input sentence: 2-7-2700700
Word Decoded sentence: 2-7-2700700 
Ground truth sentence: 	207-230 0700 

Input sentence: Mr. Defenbaker 36 per cent , for Mr.
Word Decoded sentence: Mrp Diefenbaker 36 per cent , for Mrp 
Ground truth sentence: 	Mr. Diefenbaker 36 per cent , for Mr. 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input sentence: Nov' 2011
Word Decoded sentenc

Input sentence: the dondon talks on thi Potecenale's
Word Decoded sentence: the donjon talks on THI Potecenale's 
Ground truth sentence: 	the London talks on the Protectorate's 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: 46517
Word Decoded sentence: 46517 
Ground truth sentence: 	46517 

Input sentence: 187.8
Word Decoded sentence: 187.8 
Ground truth sentence: 	187.8 

Input sentence: 2-1/07/2018
Word Decoded sentence: 2-1/07/2018 
Ground truth sentence: 	27/07/2018 

Input sentence: Sir Roy's United Federal Party is
Word Decoded sentence: Sir royal United Federal Party is 
Ground truth sentence: 	Sir Roy's United Federal Party is 

Input sentence: consulted in May 1834 .
Word Decoded sentence: consulted in May 1834 . 
Ground truth sentence: 	consulted in May 1834 . 

Input sentence: 02/03/2018
Word Decoded sentence: 02/03/2018 
Ground truth sentence: 	02/03/2018 

Input sentence: UNEOLNVILLE
Word Decoded sentence: UNEOLNVILLE 
Ground trut

Input sentence: " "
Word Decoded sentence: " " 
Ground truth sentence: 	" " 

Input sentence: from this principle . It is a great pity that the
Word Decoded sentence: from this principle . It is a great pity that the 
Ground truth sentence: 	from this principle . It is a great pity that the 

Input sentence: nou-combatant help was wanted ; but they
Word Decoded sentence: nou-combatant help was wanted ; but they 
Ground truth sentence: 	non-combatant help was wanted ; but they 

Input sentence: Ortho Neb. ER
Word Decoded sentence: Ortho Nebr ER 
Ground truth sentence: 	Ortho Nebe. ER 

Input sentence: 6903225656080
Word Decoded sentence: 6903225656080 
Ground truth sentence: 	6903225656080 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: to America anyway .
Word Decoded sentence: to America anyway . 
Ground truth sentence: 	to America anyway . 

Input sentence: at his Wowhington Press conference admilted
Word Decoded sentence: at his Wowhington P

Input sentence: UA Ankle Euctmre S82.852A
Word Decoded sentence: UA Ankle Euctmre S82.852A 
Ground truth sentence: 	Left Ankle Fracture S82.852A 

Input sentence: 880626 5177087
Word Decoded sentence: 880626 5177087 
Ground truth sentence: 	8806265117087 

Input sentence: Wational independence Party ( 280000 member )
Word Decoded sentence: National independence Party ( 280000 member ) 
Ground truth sentence: 	National Independence Party ( 280,000 members ) 

Input sentence: TK
Word Decoded sentence: TK 
Ground truth sentence: 	TK 

Input sentence: S63.502A
Word Decoded sentence: S63.502A 
Ground truth sentence: 	S63.502A 

Input sentence: When Mr. Brown sut down Labour MPs
Word Decoded sentence: When Mrp Brown but down Labour Mps 
Ground truth sentence: 	When Mr. Brown sat down Labour MPs 

Input sentence: 01-06-2018
Word Decoded sentence: 01-06-2018 
Ground truth sentence: 	01-06-2018 

Input sentence: wiched , old - . " Mr. Brown went on : " We
Word Decoded sentence: wished , old - .

Input sentence: nupping out ports fom hordood . Mat men
Word Decoded sentence: cupping out ports fom hordood . Mat men 
Ground truth sentence: 	ripping out parts from hardwood . Most men 

Input sentence: Ky
Word Decoded sentence: Ky 
Ground truth sentence: 	Ky 

Input sentence: Mothine A Masweneng
Word Decoded sentence: Methine A Masweneng 
Ground truth sentence: 	Mokhine N Masweneng 

Input sentence: Pain in left foat
Word Decoded sentence: Pain in left foot 
Ground truth sentence: 	Pain in left foot 

Input sentence: ubout 150,000,000 has been frioeen
Word Decoded sentence: about 150,000,000 has been frozen 
Ground truth sentence: 	about 150,000,000 has been frozen . 

Input sentence: mother
Word Decoded sentence: mother 
Ground truth sentence: 	mother 

Input sentence: Unknown
Word Decoded sentence: Unknown 
Ground truth sentence: 	Unknown 

Input sentence: 22/20/18
Word Decoded sentence: 22/20/18 
Ground truth sentence: 	2/20/18 

Input sentence: injury per pt
Word Decoded sentenc

Input sentence: Mev Olga Burger
Word Decoded sentence: Mev Olga Burger 
Ground truth sentence: 	Mev Olga Burger 

Input sentence: 045134687X8
Word Decoded sentence: 045134687X8 
Ground truth sentence: 	045134687X8 

Input sentence: 3-5-18
Word Decoded sentence: 3-5-18 
Ground truth sentence: 	3-5-18 

Input sentence: IA
Word Decoded sentence: IA 
Ground truth sentence: 	IA 

Input sentence: 761212 0052084
Word Decoded sentence: 761212 0052084 
Ground truth sentence: 	761212 0052084 

Input sentence: PA
Word Decoded sentence: PA 
Ground truth sentence: 	PA 

Input sentence: Green Bay
Word Decoded sentence: Green Bay 
Ground truth sentence: 	Green Bay 

Input sentence: 10
Word Decoded sentence: 10 
Ground truth sentence: 	10 

Input sentence: receiving regular National Asristance
Word Decoded sentence: receiving regular National Assistance 
Ground truth sentence: 	receiving regular National Assistance 

Input sentence: 2-22-18
Word Decoded sentence: 2-22-18 
Ground truth sentence: 	2-22-

Input sentence: 9563724X5 41350303X8
Word Decoded sentence: 9563724X5 41350303X8 
Ground truth sentence: 	9563724X5 41350303X8 

Input sentence: alleged association with organisations black-
Word Decoded sentence: alleged association with organisations black 
Ground truth sentence: 	alleged association with organisations black- 

Input sentence: 3/2/18
Word Decoded sentence: 3/2/18 
Ground truth sentence: 	3/2/18 

Input sentence: Philadelphia
Word Decoded sentence: Philadelphia 
Ground truth sentence: 	Philadelphia 

Input sentence: 02/21/18
Word Decoded sentence: 02/21/18 
Ground truth sentence: 	02/21/18 

Input sentence: adjournments , until April 7 , finally had to be content to relamn
Word Decoded sentence: adjournments , until April 7 , finally had to be content to relamp 
Ground truth sentence: 	adjournments , until April 7 , finally had to be content to return 

Input sentence: Mr Maeleod went on with the conference
Word Decoded sentence: Mr Macleod went on with the conference

Input sentence: Pllsract 1 37, Rensbury
Word Decoded sentence: Pllsract 1 37, Rensbury 
Ground truth sentence: 	Vlokstreet 37, Rensburg 

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: Kanana
Word Decoded sentence: Kanaka 
Ground truth sentence: 	Kanana 

Input sentence: reoume tolay . President Kennedy tolay
Word Decoded sentence: resume today . President Kennedy today 
Ground truth sentence: 	resume today . President Kennedy today 

Input sentence: in the churchyard , " sacied to the memory off "
Word Decoded sentence: in the churchyard , " sacred to the memory off " 
Ground truth sentence: 	in the churchyard , " sacred to the memory of " - 

Input sentence: CC
Word Decoded sentence: CC 
Ground truth sentence: 	CC 

Input sentence: Raethaankes reasonane representanom , hut to
Word Decoded sentence: Raethaankes reasoning representanom , hut to 
Ground truth sentence: 	to Mr Kaunda's re

Input sentence: elmatie-contace obat. Com
Word Decoded sentence: elmatie-contace oath Com 
Ground truth sentence: 	elmarie_conradieebat.com 

Input sentence: 044766389X9
Word Decoded sentence: 044766389X9 
Ground truth sentence: 	044766389X9 

Input sentence: 99024
Word Decoded sentence: 99024 
Ground truth sentence: 	99024 

Input sentence: author with Miss Delaney of the script , and
Word Decoded sentence: author with Miss Delaney of the script , and 
Ground truth sentence: 	author with Miss Delaney of the script , and 

Input sentence: Po Bo S70 Greabrakrive 6525
Word Decoded sentence: Po Bo S70 Greabrakrive 6525 
Ground truth sentence: 	PO Box. 870 Greatbrakriver 6525 

Input sentence: Pebelieved he would perform " outstondy
Word Decoded sentence: Prebelieved he would perform " outstondy 
Ground truth sentence: 	He believed he would perform " outstanding 

Input sentence: 6/27/17
Word Decoded sentence: 6/27/17 
Ground truth sentence: 	6/27/17 

Input sentence: Now we have the strke

Input sentence: ritapot 25(a)gmal.con
Word Decoded sentence: ritapot 25(a)gmal.con 
Ground truth sentence: 	ritapot25egmail.com 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 16.09.1988
Word Decoded sentence: 16.09.1988 
Ground truth sentence: 	16.09.1988 

Input sentence: recorder orug gertly through 8.0, 7.0, 6.0
Word Decoded sentence: recorder drug gently through 8.0, 7.0, 6.0 
Ground truth sentence: 	recorder swung gently through 8.0 , 7.0 , 6.0 

Input sentence: TOOTH BRORE HAD TO OET A ROOT CANAL AND CRONN
Word Decoded sentence: TOOTH BROKE HAD TO OUT A ROOT CANAL AND CROWN 
Ground truth sentence: 	TOOTH BROKE HAD TO GET A ROOT CANAL AND CROWN 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: apartheid is bring applied ever more
Word Decoded sentence: apartheid is bring applied ever more 
Ground truth sentence: 	apartheid is being applied ever more 

Input sentence: 2/22/18
Word Decoded s

Input sentence: 3 . Hold loop in place between thumb
Word Decoded sentence: 3 . Hold loop in place between thumb 
Ground truth sentence: 	3 . Hold loop in place between thumb 

Input sentence: to say thet its 400 troops in the Congo
Word Decoded sentence: to say the its 400 troops in the Congo 
Ground truth sentence: 	to say that its 400 troops in the Congo 

Input sentence: 12/20/17
Word Decoded sentence: 12/20/17 
Ground truth sentence: 	12/20/17 

Input sentence: 99213
Word Decoded sentence: 99213 
Ground truth sentence: 	99213 

Input sentence: in Northor Rhodesia , but the Colonial Secretary ,
Word Decoded sentence: in Norther Rhodesia , but the Colonial Secretary , 
Ground truth sentence: 	in Northern Rhodesia , but the Colonial Secretary , 

Input sentence: 044411006X8
Word Decoded sentence: 044411006X8 
Ground truth sentence: 	044411006X8 

Input sentence: charger . Mr. Powll , white-faced and outwardly
Word Decoded sentence: charger . Mrp Poll , white-faced and outwardly 
Grou

Input sentence: 17642136407467 23945X9443544491X4
Word Decoded sentence: 17642136407467 23945X9443544491X4 
Ground truth sentence: 	17642136X0 + 40725945X9 + 43544491X4 

Input sentence: MD FACS
Word Decoded sentence: MD FACS 
Ground truth sentence: 	MD FACS 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: (6)
Word Decoded sentence: (6) 
Ground truth sentence: 	(6) 

Input sentence: Right Achilles Strain
Word Decoded sentence: Right Achilles Strain 
Ground truth sentence: 	Right Achilles Strain 

Input sentence: Dr Neuschwander
Word Decoded sentence: Dr Neuschwander 
Ground truth sentence: 	Dr. Neuschwander 

Input sentence: 712-239-4300
Word Decoded sentence: 712-239-4300 
Ground truth sentence: 	712-239-4300 

Input sentence: tcomes between 1954 and 1989 was gheat-
Word Decoded sentence: comes between 1954 and 1989 was great 
Ground truth sentence: 	incomes between 1954 and 1959 was great- 

Input sentence: 5
Word Decoded sentence: 5 
Gro

Input sentence: PO. B0X 50617 WRTERFAoNT CT 50617
Word Decoded sentence: POT B0X 50617 WRTERFAoNT CT 50617 
Ground truth sentence: 	P.O. BOX 50617 WATERFRONT CT 50617 

Input sentence: 49201
Word Decoded sentence: 49201 
Ground truth sentence: 	492041 

Input sentence: 3-19-18
Word Decoded sentence: 3-19-18 
Ground truth sentence: 	3-19-18 

Input sentence: A
Word Decoded sentence: A 
Ground truth sentence: 	4 

Input sentence: 6801180089083
Word Decoded sentence: 6801180089083 
Ground truth sentence: 	6801180089083 

Input sentence: 2-14-18
Word Decoded sentence: 2-14-18 
Ground truth sentence: 	2-14-18 

Input sentence: 3/8676
Word Decoded sentence: 3/8676 
Ground truth sentence: 	318676 

Input sentence: Recommended surgery-office visit
Word Decoded sentence: Recommended surgery-office visit 
Ground truth sentence: 	Recommended surgery-office visit 

Input sentence: DR Kent Johnson
Word Decoded sentence: DR Kent Johnson 
Ground truth sentence: 	DR Kent Johnson 

Input sentence: 20-4

Input sentence: Other Family
Word Decoded sentence: Other Family 
Ground truth sentence: 	Other Family 

Input sentence: 02/09/2018
Word Decoded sentence: 02/09/2018 
Ground truth sentence: 	02/09/2018 

Input sentence: WI
Word Decoded sentence: WI 
Ground truth sentence: 	WI 

Input sentence: 3-8-18
Word Decoded sentence: 3-8-18 
Ground truth sentence: 	3-8-18 

Input sentence: A0 turther visits schdlulal
Word Decoded sentence: A0 further visits schdlulal 
Ground truth sentence: 	no further visits scheduled 

Input sentence: 19
Word Decoded sentence: 19 
Ground truth sentence: 	19 

Input sentence: 10 per rent. , and where the handling and
Word Decoded sentence: 10 per rent , and where the handling and 
Ground truth sentence: 	10 per cent. , and where the handling and 

Input sentence: Tebogo 3 Masweneng
Word Decoded sentence: Tebogo 3 Masweneng 
Ground truth sentence: 	Tebogo S Masweneng 

Input sentence: for my country ! I may be mistakeen , 1tho' I cannot but
Word Decoded sentence:

Input sentence: Washington next week . A big slice of
Word Decoded sentence: Washington next week . A big slice of 
Ground truth sentence: 	Washington next week . A big slice of 

Input sentence: 5-6-18
Word Decoded sentence: 5-6-18 
Ground truth sentence: 	5-6-18 

Input sentence: Loganathan, Amritray
Word Decoded sentence: Loganathan, Amritray 
Ground truth sentence: 	Loganathan, Amritray 

Input sentence: TAumA
Word Decoded sentence: Trauma 
Ground truth sentence: 	Trauma 

Input sentence: 4402065026089
Word Decoded sentence: 4402065026089 
Ground truth sentence: 	4402065024089 

Input sentence: 6406085079088
Word Decoded sentence: 6406085079088 
Ground truth sentence: 	6406085079088 

Input sentence: 44200958X5 & 44201636X6
Word Decoded sentence: 44200958X5 & 44201636X6 
Ground truth sentence: 	44200958X5 + 44201636X6 

Input sentence: Sean Swanepoel
Word Decoded sentence: Sean Swanepoel 
Ground truth sentence: 	Sean Swanepoel 

Input sentence: on aetteat ae ofer of Shmnte
Word Dec

Input sentence: 784 712-0655
Word Decoded sentence: 784 712-0655 
Ground truth sentence: 	734 712-0655 

Input sentence: Ae they are mimpressed by the Germans' clain
Word Decoded sentence: Ae they are impressed by the Germans claim 
Ground truth sentence: 	And they are unimpressed by the Germans' claim 

Input sentence: 044827038X9
Word Decoded sentence: 044827038X9 
Ground truth sentence: 	044827038x9 

Input sentence: 01-08-2018
Word Decoded sentence: 01-08-2018 
Ground truth sentence: 	01-08-2018 

Input sentence: (sye-e52 te lowered his eyes and lewved her on the
Word Decoded sentence: (sye-e52 te lowered his eyes and leaved her on the 
Ground truth sentence: 	" 5Ye-es ? " he lowered his eyes and kissed her on the 

Input sentence: fird b more than 15,000,000 & last
Word Decoded sentence: find b more than 15,000,000 & last 
Ground truth sentence: 	time by more than 15,000,000 # last 

Input sentence: 2/02/18, 2/13/18, 2/20/18 3/6/18
Word Decoded sentence: 2/02/18, 2/13/18, 2/20/18 

Input sentence: 1950 , no cure has been fand for the 6lic douloureux . As
Word Decoded sentence: 1950 , no cure has been fand for the 6lic douloureux . As 
Ground truth sentence: 	1960s , no cure has been found for the 6tic douloureux . As 

Input sentence: 4
Word Decoded sentence: 4 
Ground truth sentence: 	4 

Input sentence: UPMC
Word Decoded sentence: USMC 
Ground truth sentence: 	UPMC 

Input sentence: WI
Word Decoded sentence: WI 
Ground truth sentence: 	WI 

Input sentence: 2/13
Word Decoded sentence: 2/13 
Ground truth sentence: 	2/13 

Input sentence: Tshosane K Mosweneng
Word Decoded sentence: Tshosane K Mosweneng 
Ground truth sentence: 	Tshosane K Masweneng 

Input sentence: meniaa lefttene
Word Decoded sentence: menial lefttene 
Ground truth sentence: 	menidus left knee 

Input sentence: Farnily Meokine
Word Decoded sentence: Family Meokine 
Ground truth sentence: 	Family MeDicine 

Input sentence: No
Word Decoded sentence: No 
Ground truth sentence: 	No 

Input sentence: 

Input sentence: egy-case on June 6 . Both the Chinose
Word Decoded sentence: egy-case on June 6 . Both the Chinone 
Ground truth sentence: 	egg-case on June 6 . Both the Chinese 

Input sentence: othir materals ; it is apueant material
Word Decoded sentence: other materials ; it is apueant material 
Ground truth sentence: 	other materials ; it is a pleasant material 

Input sentence: 066 249 0887
Word Decoded sentence: 066 249 0887 
Ground truth sentence: 	066 249 0887 

Input sentence: week last year and 37,000,000 up on
Word Decoded sentence: week last year and 37,000,000 up on 
Ground truth sentence: 	week last year and 37,000,000 up on 

Input sentence: studying them today . The conference will meet
Word Decoded sentence: studying them today . The conference will meet 
Ground truth sentence: 	studying them today . The conference will meet 

Input sentence: 9907130231084
Word Decoded sentence: 9907130231084 
Ground truth sentence: 	9907130231084 

Input sentence: K SEE DOCumentation

Input sentence: in wohil the indee fell by 1.8 to 37.5 per ant of th average for
Word Decoded sentence: in while the indeed fell by 1.8 to 37.5 per ant of th average for 
Ground truth sentence: 	in which the index fell by 1.8 to 97.5 per cent of the average for 

Input sentence: 2.22.18
Word Decoded sentence: 2.22.18 
Ground truth sentence: 	2.22.18 

Input sentence: 650930 5088 084
Word Decoded sentence: 650930 5088 084 
Ground truth sentence: 	650930 5088 084 

Input sentence: 3-2-18
Word Decoded sentence: 3-2-18 
Ground truth sentence: 	3-2-18 

Input sentence: 1/8/18
Word Decoded sentence: 1/8/18 
Ground truth sentence: 	1/8/18 

Input sentence: Daughter
Word Decoded sentence: Daughter 
Ground truth sentence: 	Daughter 

Input sentence: GRADY
Word Decoded sentence: GRADY 
Ground truth sentence: 	GRADY 

Input sentence: MD
Word Decoded sentence: MD 
Ground truth sentence: 	MD 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: Kurt D Rorenkrano


Input sentence: Arthur Bruce Norman
Word Decoded sentence: Arthur Bruce Norman 
Ground truth sentence: 	Arthur Bruce Norman 

Input sentence: 3/14/18
Word Decoded sentence: 3/14/18 
Ground truth sentence: 	3/14/18 

Input sentence: Torn Rotorr Cup
Word Decoded sentence: Torn Rotor Cup 
Ground truth sentence: 	Torn Rotor Cup 

Input sentence: 001608387400743te85X88X7/onmesoXxy
Word Decoded sentence: 001608387400743te85X88X7/onmesoXxy 
Ground truth sentence: 	001606557X7/007694423X0/015632288X7/017428567X6/017428568X4/017428569X2 

Input sentence: 10/12/18
Word Decoded sentence: 10/12/18 
Ground truth sentence: 	10/12/18 

Input sentence: Midlend
Word Decoded sentence: Midland 
Ground truth sentence: 	Midland 

Input sentence: to discuss the fiunction of a propeeed
Word Decoded sentence: to discuss the function of a proposed 
Ground truth sentence: 	to discuss the function of a proposed 

Input sentence: Only info provided attech
Word Decoded sentence: Only info provided attach 
Ground t

Input sentence: 36 PAUUASTRAAT, NHITE EIT SAL0AINA, 7395
Word Decoded sentence: 36 PAUUASTRAAT, WHITE IT SAL0AINA, 7395 
Ground truth sentence: 	36 DAHLIA STRAAT , WHITE CITY, SALDANHA, 7395 

Input sentence: America's dollar reserves . Dr. Adenauer's
Word Decoded sentence: americas dollar reserves . Dry Adenauer's 
Ground truth sentence: 	America's dollar reserves . Dr. Adenauer's 

Input sentence: the increased charges is a wicked ,
Word Decoded sentence: the increased charges is a wicked , 
Ground truth sentence: 	the increased charges is a wicked , 

Input sentence: development being limited or an
Word Decoded sentence: development being limited or an 
Ground truth sentence: 	development being limited or an 

Input sentence: Green Bay
Word Decoded sentence: Green Bay 
Ground truth sentence: 	Green Bay 

Input sentence: ADAMUNURO.CO.zA
Word Decoded sentence: ADAMUNURO.CO.zA 
Ground truth sentence: 	ADAMONURO.CO.ZA 

Input sentence: 3609 N DEXLE DR
Word Decoded sentence: 3609 N DEXIE

Input sentence: He bellived he wuld perfom outsandeng
Word Decoded sentence: He bellied he would perform outstanding 
Ground truth sentence: 	He believed he would perform " outstanding 

Input sentence: (5)
Word Decoded sentence: (5) 
Ground truth sentence: 	(5) 

Input sentence: Deep South . The negro is Mr. Robot i
Word Decoded sentence: Deep South . The negro is Mrp Robot i 
Ground truth sentence: 	Deep South . The negro is Mr. Robert # 

Input sentence: No
Word Decoded sentence: No 
Ground truth sentence: 	No 

Input sentence: Colonial Secretary , Mr. Iain Macleod ,
Word Decoded sentence: Colonial Secretary , Mrp Iain Macleod , 
Ground truth sentence: 	Colonial Secretary , Mr. Iain Macleod , 

Input sentence: index , wich has risen only slouly tince 1956 , was
Word Decoded sentence: index , wich has risen only slowly since 1956 , was 
Ground truth sentence: 	index , which has risen only slowly since 1956 , was 

Input sentence: MD/Orthopedics
Word Decoded sentence: MD/Orthopedics 


Input sentence: negotiatias with Sir Roy's representative ,
Word Decoded sentence: negotiations with Sir royal representative , 
Ground truth sentence: 	negotiations with Sir Roy's representative , 

Input sentence: Commonwealtn enay facility passinleda
Word Decoded sentence: Commonwealth nay facility passinleda 
Ground truth sentence: 	Commonwealth Governments every facility possible to 

Input sentence: Christiaan Daniel Jacobs
Word Decoded sentence: Christiaan Daniel Jacobs 
Ground truth sentence: 	Christiaan Daniel Jacobs 

Input sentence: Foreign Minister , and Mr. Heath . MR. Seliyn Llogd-
Word Decoded sentence: Foreign Minister , and Mrp Heath . MRP Selion Llogd- 
Ground truth sentence: 	Foreign Minister , and Mr. Heath . MR. Selwyn Lloyd - 

Input sentence: the riots in Istanbul , which eenlivened the NATO
Word Decoded sentence: the riots in Istanbul , which enlivened the NATO 
Ground truth sentence: 	the riots in Istanbul , which enlivened the NATO 

Input sentence: here the p

Input sentence: of Investigation had reported on Mr. Weaver .
Word Decoded sentence: of Investigation had reported on Mrp Weaver . 
Ground truth sentence: 	of Investigation had reported on Mr. Weaver . 

Input sentence: smiler coure in farming for 15 yearst
Word Decoded sentence: smiler coure in farming for 15 years 
Ground truth sentence: 	similar course in farming for 15 years ! 

Input sentence: 4-19-18
Word Decoded sentence: 4-19-18 
Ground truth sentence: 	4-19-19 

Input sentence: 3-29-18
Word Decoded sentence: 3-29-18 
Ground truth sentence: 	3-29-18 

Input sentence: Thank Jon
Word Decoded sentence: Thank Jon 
Ground truth sentence: 	Thank You 

Input sentence: in th 1960s , no cure has been fount for the
Word Decoded sentence: in th 1960s , no cure has been fount for the 
Ground truth sentence: 	in the 1960s , no cure has been found for the 

Input sentence: Teller, JAmes, M.D.
Word Decoded sentence: Teller Jamesy Made 
Ground truth sentence: 	Telfer, James, M.D. 

Input sente

Input sentence: Apfrox, 3:30 pm
Word Decoded sentence: Apfrox, 3:30 pm 
Ground truth sentence: 	Approx, 3:30 pm 

Input sentence: Lifting Hravg-Mercbandise
Word Decoded sentence: Lifting Hravg-Mercbandise 
Ground truth sentence: 	Lifting Heavy-Merchandise 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 99215
Word Decoded sentence: 99215 
Ground truth sentence: 	99215 

Input sentence: but it savs whot is necesrary - that
Word Decoded sentence: but it save whot is necessary - that 
Ground truth sentence: 	but it says what is necessary - that 

Input sentence: nent from the Goserimt were , matifae-
Word Decoded sentence: went from the Goserimt were , matifae- 
Ground truth sentence: 	sent from the Government were " unsatisfac- 

Input sentence: 3-1-2018
Word Decoded sentence: 3-1-2018 
Ground truth sentence: 	3-1-2018 

Input sentence: preeent
Word Decoded sentence: present 
Ground truth sentence: 	present 

Input sentence: 2-23-18
Word Deco

Input sentence: SugamaC. J. Jansen Vom Vunon
Word Decoded sentence: SugamaC. J Jansen Vom Upon 
Ground truth sentence: 	Susama -C.J. Jansen Van Vuuren 

Input sentence: 12/30/17
Word Decoded sentence: 12/30/17 
Ground truth sentence: 	12/30/17 

Input sentence: 12-15-17
Word Decoded sentence: 12-15-17 
Ground truth sentence: 	12-15-17 

Input sentence: DPM
Word Decoded sentence: DPM 
Ground truth sentence: 	DPM 

Input sentence: " The jackals bay when there is nothing belter they can do . "
Word Decoded sentence: " The jackals bay when there is nothing belter they can do . " 
Ground truth sentence: 	" The jackals bay when there is nothing better they can do . " 

Input sentence: Christiaan Daniel Jacoks
Word Decoded sentence: Christiaan Daniel Jacks 
Ground truth sentence: 	Christiaan Daniel Jacobs 

Input sentence: Pol plcasant Vailly 7d
Word Decoded sentence: Pol pleasant Vainly 7d 
Ground truth sentence: 	1301 Pleasant Valley Rd 

Input sentence: 3-15-18
Word Decoded sentence: 3-15-

Input sentence: Swiegelaa
Word Decoded sentence: Swiegelaa 
Ground truth sentence: 	* HJ Swiegelaa 

Input sentence: Limbur dicc herniation
Word Decoded sentence: Limbus disc herniation 
Ground truth sentence: 	Lumbar disc herniation 

Input sentence: 100
Word Decoded sentence: 100 
Ground truth sentence: 	100 

Input sentence: Po Box 2267, Jane turse, 1085
Word Decoded sentence: Po Box 2267, Jane turse 1085 
Ground truth sentence: 	PO Box 2267, Jane Furse, 1085 

Input sentence: 0901105434089
Word Decoded sentence: 0901105434089 
Ground truth sentence: 	0801105434089 

Input sentence: 3200 Northline Ave #200
Word Decoded sentence: 3200 Northline Ave #200 
Ground truth sentence: 	3200 Northline Ave. #200 

Input sentence: becomes en offence punshable with impritomnent
Word Decoded sentence: becomes en offence pushable with imprisonment 
Ground truth sentence: 	becomes an offence punishable with imprisonment 

Input sentence: Jan 4, 2018
Word Decoded sentence: Jan 4, 2018 
Ground truth 

Input sentence: of membes . THE two rival African
Word Decoded sentence: of members . THE two rival African 
Ground truth sentence: 	of members . THE two rival African 

Input sentence: Jorhua Logan , attempted but failed to
Word Decoded sentence: Joshua Logan , attempted but failed to 
Ground truth sentence: 	Joshua Logan , attempted but failed to 

Input sentence: NVI
Word Decoded sentence: VI 
Ground truth sentence: 	NVT 

Input sentence: 7009125065080
Word Decoded sentence: 7009125065080 
Ground truth sentence: 	7009125065080 

Input sentence: In a Taste of Honey Mr. Richardson has taken
Word Decoded sentence: In a Taste of Honey Mrp Richardson has taken 
Ground truth sentence: 	In A Taste of Honey Mr. Richardson has taken 

Input sentence: Multh Service ' and placing heavy hdeas
Word Decoded sentence: Multi Service a and placing heavy ideas 
Ground truth sentence: 	Health Service " and placing heavy burdens 

Input sentence: S06.0X0A
Word Decoded sentence: S06.0X0A 
Ground truth s

Input sentence: WILAM RHODET
Word Decoded sentence: WILMA RHODE 
Ground truth sentence: 	WILLIAM RHUDGS 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: Claire Anne Farrell
Word Decoded sentence: Claire Anne Farrell 
Ground truth sentence: 	Claire Anne Farrell 

Input sentence: 3-5-18
Word Decoded sentence: 3-5-18 
Ground truth sentence: 	3-5-18 

Input sentence: Geder whish Mimmter on October 4 , 1943 , expiessed
Word Decoded sentence: Gedder whish Mimmer on October 4 , 1943 , expressed 
Ground truth sentence: 	Order " which Himmler on October 4 , 1943 , expressed 

Input sentence: and arrived at Hounslow around t P77 , whee
Word Decoded sentence: and arrived at Hounslow around t P77 , whee 
Ground truth sentence: 	and arrived at Hounslow around 5 P.M. , where 

Input sentence: at (B) , Fig 1 . To mak i leg suchas
Word Decoded sentence: at By , Fig 1 . To mak i leg Suchos 
Ground truth sentence: 	at ( B ) , Fig. 1 . To make a leg such as 

Inpu

Input sentence: herholat(a)yebo.co.za
Word Decoded sentence: herholat(a)yebo.co.za 
Ground truth sentence: 	herholdt(a)yebo.co.za 

Input sentence: (S92.412A)
Word Decoded sentence: (S92.412A) 
Ground truth sentence: 	(S92.412A) 

Input sentence: 20/07/2018
Word Decoded sentence: 20/07/2018 
Ground truth sentence: 	20/07/2018 

Input sentence: Sooste enirsteel.coz9
Word Decoded sentence: Jooste enirsteel.coz9 
Ground truth sentence: 	sjoosteenjrsteel.co.za 

Input sentence: 2/13/18
Word Decoded sentence: 2/13/18 
Ground truth sentence: 	2/13/18 

Input sentence: 8100-00I
Word Decoded sentence: 8100-00I 
Ground truth sentence: 	8100-00+ 

Input sentence: 10/03/1974
Word Decoded sentence: 10/03/1974 
Ground truth sentence: 	10/03/1974 

Input sentence: 913 362 8317
Word Decoded sentence: 913 362 8317 
Ground truth sentence: 	913 362 8317 

Input sentence: Bohlale L Masweneng
Word Decoded sentence: Bohlale L Masweneng 
Ground truth sentence: 	Bohlale L Masweneng 

Input sentence: 87024624

Input sentence: (021 8537369
Word Decoded sentence: (021 8537369 
Ground truth sentence: 	(021) 8537369 

Input sentence: RSA
Word Decoded sentence: RSA 
Ground truth sentence: 	RSA 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 17693698X7
Word Decoded sentence: 17693698X7 
Ground truth sentence: 	17693698X7 

Input sentence: 39-141708
Word Decoded sentence: 39-141708 
Ground truth sentence: 	39-141708 

Input sentence: Patellar. endenitis
Word Decoded sentence: Patellar endenitis 
Ground truth sentence: 	Patellar tendonitis 

Input sentence: 072 118 12831 0846797619
Word Decoded sentence: 072 118 12831 0846797619 
Ground truth sentence: 	072 118 1283 / 0846797619 

Input sentence: S51.851A
Word Decoded sentence: S51.851A 
Ground truth sentence: 	S51.851A 

Input sentence: flatly rejected attempts by the Eisenhower
Word Decoded sentence: flatly rejected attempts by the Eisenhower 
Ground truth sentence: 	flatly rejected attempts by the Ei

Input sentence: frem the real probems to ficldling about
Word Decoded sentence: from the real problems to filling about 
Ground truth sentence: 	from the real problems to fiddling about 

Input sentence: 012-3330523
Word Decoded sentence: 012-3330523 
Ground truth sentence: 	012-3330523 

Input sentence: N/A.
Word Decoded sentence: Near 
Ground truth sentence: 	N/A. 

Input sentence: 00/03/1966
Word Decoded sentence: 00/03/1966 
Ground truth sentence: 	06/02/1944 

Input sentence: of 1913 .
Word Decoded sentence: of 1913 . 
Ground truth sentence: 	of 1913 . 

Input sentence: Regional Medical Clinic
Word Decoded sentence: Regional Medical Clinic 
Ground truth sentence: 	Regional Medical Clinic 

Input sentence: Intramedullary Nailing tibia
Word Decoded sentence: Intramedullary Nailing tibia 
Ground truth sentence: 	Intramedullary Nailing tibia 

Input sentence: 4-19-18 approx
Word Decoded sentence: 4-19-18 approx 
Ground truth sentence: 	4-19-18 approx 

Input sentence: and forefinger (

Input sentence: S39.0120, S06.0X0D, S13.4XXD
Word Decoded sentence: S39.0120, S06.0X0D, S13.4XXD 
Ground truth sentence: 	S39.0120, S06.0X0D, S13.4XXD 

Input sentence: (1) Motzus is deflienl kreod , ( LECIE.H AATe in Hebreul
Word Decoded sentence: (1) Motus is deflienl red , ( LECIE.H Late in Hebreul 
Ground truth sentence: 	( 1 ) Matzos is deficient bread , ( LECHEM ANJO in Hebrew ) , 

Input sentence: are denved it . One ares electricity board in 1958
Word Decoded sentence: are denied it . One ares electricity board in 1958 
Ground truth sentence: 	are denied it . One area electricity board in 1958 

Input sentence: an directed by Mr. Tony Richardson , who is also
Word Decoded sentence: an directed by Mrp Tony Richardson , who is also 
Ground truth sentence: 	an directed by Mr. Tony Richardson , who is also 

Input sentence: Unknown
Word Decoded sentence: Unknown 
Ground truth sentence: 	Unknown 

Input sentence: wretched - but it would not be too comfortable nor too eary to
Word De

Input sentence: closeg frantureof proxmellof teft great toe, nnitiatencounter
Word Decoded sentence: close frantureof proxmellof left great toe nnitiatencounter 
Ground truth sentence: 	Closed fractures of proximal of left great toe, initial encounter 

Input sentence: injection
Word Decoded sentence: injection 
Ground truth sentence: 	injection 

Input sentence: White Oak
Word Decoded sentence: White Oak 
Ground truth sentence: 	White Oak 

Input sentence: 66801180089088
Word Decoded sentence: 66801180089088 
Ground truth sentence: 	6801180089083 

Input sentence: 2-23-18
Word Decoded sentence: 2-23-18 
Ground truth sentence: 	2-23-18 

Input sentence: 2-23-18
Word Decoded sentence: 2-23-18 
Ground truth sentence: 	2-23-18 

Input sentence: Afrikaans
Word Decoded sentence: Afrikaans 
Ground truth sentence: 	Afrikaans 

Input sentence: 3/5/18
Word Decoded sentence: 3/5/18 
Ground truth sentence: 	3/5/18 

Input sentence: NM
Word Decoded sentence: NM 
Ground truth sentence: 	NM 

Input 

Input sentence: Richard Dawiels
Word Decoded sentence: Richard Daniels 
Ground truth sentence: 	Richard Daniels 

Input sentence: future . Daid Mr. Nkeumbuto last night i
Word Decoded sentence: future . Said Mrp Nkeumbuto last night i 
Ground truth sentence: 	future . Said Mr. Nkumbula last night : 

Input sentence: woould ry to get Germany to pay more . He
Word Decoded sentence: would ry to get Germany to pay more . He 
Ground truth sentence: 	would try to get Germany to pay more . He 

Input sentence: 2/27/18
Word Decoded sentence: 2/27/18 
Ground truth sentence: 	2/27/18 

Input sentence: Lancanns Houe despite the crisis which hod
Word Decoded sentence: Lancanns House despite the crisis which hod 
Ground truth sentence: 	Lancaster House despite the crisis which had 

Input sentence: 1/4/2018
Word Decoded sentence: 1/4/2018 
Ground truth sentence: 	1/4/2018 

Input sentence: Charity Lawder milk, APRN 2913 cypressrd ste 100
Word Decoded sentence: Charity Lawyer milk APRON 2913 cypress

Input sentence: spot. foyce Egyinton cables : President
Word Decoded sentence: spot force Egyinton cables : President 
Ground truth sentence: 	spot . Joyce Egginton cables : President 

Input sentence: 3/./09/ 2018
Word Decoded sentence: 3/./09/ 2018 
Ground truth sentence: 	31/07/2018 

Input sentence: 8803155085086
Word Decoded sentence: 8803155085086 
Ground truth sentence: 	8803155085086 

Input sentence: Tlam
Word Decoded sentence: Team 
Ground truth sentence: 	11am 

Input sentence: 3/2/2018
Word Decoded sentence: 3/2/2018 
Ground truth sentence: 	3/2/2018 

Input sentence: Dr. Anthony Ewald
Word Decoded sentence: Dry Anthony Ewald 
Ground truth sentence: 	Dr. Anthony Ewald 

Input sentence: Happy Mashadi Jacoline Mgene
Word Decoded sentence: Happy Mashadi Jacobine Mene 
Ground truth sentence: 	Happy Mashadi Jacoline Mfene 

Input sentence: (605) 755-5900
Word Decoded sentence: (605) 755-5900 
Ground truth sentence: 	(605) 755-5700 

Input sentence: 1011 6093018
Word Decoded sent

Input sentence: U5 Pamport ne 53678 4803
Word Decoded sentence: U5 Import ne 53678 4803 
Ground truth sentence: 	US Passport no 536784803 

Input sentence: Cachr T Jot de Blad , hes pleatd hat
Word Decoded sentence: Cachi T Jot de Blad , hes pleats hat 
Ground truth sentence: 	Capetown , Dr. Joost de Blank , has pleaded that 

Input sentence: 724-726-2488
Word Decoded sentence: 724-726-2488 
Ground truth sentence: 	724-776-2488 

Input sentence: MD
Word Decoded sentence: MD 
Ground truth sentence: 	MD 

Input sentence: Mase of Hony opens at the Fenceste Square Dheate
Word Decoded sentence: Mase of Bony opens at the Fenceste Square Death 
Ground truth sentence: 	Taste of Honey opens at the Leicester Square Theatre 

Input sentence: 07/24/17
Word Decoded sentence: 07/24/17 
Ground truth sentence: 	07/24/17 

Input sentence: 250-00
Word Decoded sentence: 250-00 
Ground truth sentence: 	250-00 

Input sentence: and tht Labour should not take any shogs
Word Decoded sentence: and the Labour 

Input sentence: KLEINSEUN
Word Decoded sentence: KLEINSEUN 
Ground truth sentence: 	KLEINSEUN 

Input sentence: 045141073X2
Word Decoded sentence: 045141073X2 
Ground truth sentence: 	045141073X2 

Input sentence: 100
Word Decoded sentence: 100 
Ground truth sentence: 	100 

Input sentence: Lorica MOORE
Word Decoded sentence: Lorica MOORE 
Ground truth sentence: 	LOURICA MOORE 

Input sentence: M25.511
Word Decoded sentence: M25.511 
Ground truth sentence: 	M25.511 

Input sentence: 2/3/18
Word Decoded sentence: 2/3/18 
Ground truth sentence: 	2/3/18 

Input sentence: Srevon D Adoms
Word Decoded sentence: Seven D Adams 
Ground truth sentence: 	Steven D Adams 

Input sentence: 8609075059089
Word Decoded sentence: 8609075059089 
Ground truth sentence: 	8609075059089 

Input sentence: Syncope
Word Decoded sentence: Syncope 
Ground truth sentence: 	Syncope 

Input sentence: /0411/182600
Word Decoded sentence: /0411/182600 
Ground truth sentence: 	(011) 418 2600 

Input sentence: Hospitaliz

Input sentence: a priest . here the priet is 1superceded byy the solouer - i
Word Decoded sentence: a priest . here the print is 1superceded by the colour - i 
Ground truth sentence: 	a priest : here the priest is 1superceded by the soldier - a 

Input sentence: Christiaan Daniel Jacobs
Word Decoded sentence: Christiaan Daniel Jacobs 
Ground truth sentence: 	Christiaan Daniel Jacobs 

Input sentence: 2-12-18
Word Decoded sentence: 2-12-18 
Ground truth sentence: 	2-12-18 

Input sentence: m
Word Decoded sentence: m 
Ground truth sentence: 	m 

Input sentence: 20
Word Decoded sentence: 20 
Ground truth sentence: 	20 

Input sentence: 30303
Word Decoded sentence: 30303 
Ground truth sentence: 	30303 

Input sentence: PGH
Word Decoded sentence: PG 
Ground truth sentence: 	PGH 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input sentence: that Labour should not take any stops whic would


Input sentence: Though they may gather some Leftwing support , a
Word Decoded sentence: Though they may gather some Leftwing support , a 
Ground truth sentence: 	Though they may gather some Left-wing support , a 

Input sentence: (015) 575 2366
Word Decoded sentence: (015) 575 2366 
Ground truth sentence: 	(015) 575 2366 

Input sentence: 402-637-0800
Word Decoded sentence: 402-637-0800 
Ground truth sentence: 	402-637-0800 

Input sentence: 28/11/1984
Word Decoded sentence: 28/11/1984 
Ground truth sentence: 	28/11/1984 

Input sentence: OV
Word Decoded sentence: OV 
Ground truth sentence: 	OV 

Input sentence: 018712931X7
Word Decoded sentence: 018712931X7 
Ground truth sentence: 	018712931X7 

Input sentence: 2135
Word Decoded sentence: 2135 
Ground truth sentence: 	2135 

Input sentence: 7861551088 0850
Word Decoded sentence: 7861551088 0850 
Ground truth sentence: 	186155 1088 0850 

Input sentence: throghont in terms of the cinema , and again
Word Decoded sentence: throughout in 

Input sentence: Mr.lain Macleod, is insisting on a policy of change.
Word Decoded sentence: Mr.lain Macleod is insisting on a policy of changes 
Ground truth sentence: 	Mr. Iain Macleod , is insisting on a policy of change . 

Input sentence: NVT
Word Decoded sentence: NOT 
Ground truth sentence: 	NVT 

Input sentence: 0833205302
Word Decoded sentence: 0833205302 
Ground truth sentence: 	0833205302 

Input sentence: Fiat a-81 , and the F104 Hlorfighter,
Word Decoded sentence: Fiat a-81 , and the F104 Hlorfighter, 
Ground truth sentence: 	Fiat G-91 , and the F 104 Starfighter , 

Input sentence: 0781400045
Word Decoded sentence: 0781400045 
Ground truth sentence: 	0781400045 

Input sentence: Faydewee(a)gmai.com
Word Decoded sentence: Faydewee(a)gmai.com 
Ground truth sentence: 	raydeweeegmail.com 

Input sentence: ORThopedic Surgeon
Word Decoded sentence: Orthopedic Surgeon 
Ground truth sentence: 	ORThopedic Surgeon 

Input sentence: 3/1/18
Word Decoded sentence: 3/1/18 
Ground truth 

Input sentence: A goad neightou toh thore Africans whe sill con-
Word Decoded sentence: A goad neighbor Toh thore Africans the sill con 
Ground truth sentence: 	A good neighbour to those Africans who will con- 

Input sentence: 44041121X3 & 40583758X4
Word Decoded sentence: 44041121X3 & 40583758X4 
Ground truth sentence: 	44044121X3 & 40583758X4 

Input sentence: in his post . Senator Robetson's committee
Word Decoded sentence: in his post . Senator Robetson's committee 
Ground truth sentence: 	in his post . Senator Robertson's committee 

Input sentence: 50
Word Decoded sentence: 50 
Ground truth sentence: 	50 

Input sentence: 034 303/146
Word Decoded sentence: 034 303/146 
Ground truth sentence: 	(034) 3931146 

Input sentence: Dr Ponna Thomas
Word Decoded sentence: Dr Donna Thomas 
Ground truth sentence: 	Dr. Donna Thomas 

Input sentence: Fries, Colleen
Word Decoded sentence: Friese Colleen 
Ground truth sentence: 	Fries, Colleen 

Input sentence: is M. Robert Weaver of New yark .

Input sentence: cross pieces and is made of 1/2 in. plywont
Word Decoded sentence: cross pieces and is made of 1/2 in plywont 
Ground truth sentence: 	cross pieces and is made of 1/2 in. plywood 

Input sentence: (Fhand contusion
Word Decoded sentence: Hand contusion 
Ground truth sentence: 	R hand contusion 

Input sentence: 2/28/18
Word Decoded sentence: 2/28/18 
Ground truth sentence: 	2/28/18 

Input sentence: 897.0
Word Decoded sentence: 897.0 
Ground truth sentence: 	897.0 

Input sentence: RSA
Word Decoded sentence: RSA 
Ground truth sentence: 	RSA 

Input sentence: manleenrona)gmail.com
Word Decoded sentence: manleenrona)gmail.com 
Ground truth sentence: 	marleenconegmail.com 

Input sentence: 021 8837369
Word Decoded sentence: 021 8837369 
Ground truth sentence: 	(021) 8537369 

Input sentence: ansieg Cenloncic net
Word Decoded sentence: answer Cenloncic net 
Ground truth sentence: 	ansiegelantic.net 

Input sentence: TS.BKaufman
Word Decoded sentence: TS.BKaufman 
Ground trut

Input sentence: tary at the Ministry of Agriculte .
Word Decoded sentence: vary at the Ministry of Agriculture . 
Ground truth sentence: 	tary at the Ministry of Agriculture , 

Input sentence: 31.7.2018
Word Decoded sentence: 31.7.2018 
Ground truth sentence: 	31.7.2018 

Input sentence: 19/10/2015
Word Decoded sentence: 19/10/2015 
Ground truth sentence: 	19/10/2015 

Input sentence: Harleysville Ins Co.
Word Decoded sentence: Harleysville Ins Co 
Ground truth sentence: 	Harleysville Ins Co. 

Input sentence: Mr. Iir Madleod , the Colonial Secrerany , denied
Word Decoded sentence: Mrp Sir Macleod , the Colonial Secretary , denied 
Ground truth sentence: 	Mr. Iain Macleod , the Colonial Secretary , denied 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 0825648859
Word Decoded sentence: 0825648859 
Ground truth sentence: 	0825648359 

Input sentence: right shoulder pain
Word Decoded sentence: right shoulder pain 
Ground truth sentence: 	ri

Input sentence: Kewedy at his Washingta Pess con-
Word Decoded sentence: Kenedy at his Washing Pess con 
Ground truth sentence: 	Kennedy at his Washington Press con- 

Input sentence: WNer Germon Government . It will now have to pay
Word Decoded sentence: Owner Germon Government . It will now have to pay 
Ground truth sentence: 	West German Government . It will now have to pay 

Input sentence: Gioffre , Ponald A
Word Decoded sentence: Gioffre , Donald A 
Ground truth sentence: 	Gioffre, Bonald A 

Input sentence: 959-2387746
Word Decoded sentence: 959-2387746 
Ground truth sentence: 	859-238-7746 

Input sentence: APRN
Word Decoded sentence: APRON 
Ground truth sentence: 	APRN 

Input sentence: 2-3-2018
Word Decoded sentence: 2-3-2018 
Ground truth sentence: 	2-3-2018 

Input sentence: 05/05/1962
Word Decoded sentence: 05/05/1962 
Ground truth sentence: 	05/05/1966 

Input sentence: 3-13-18
Word Decoded sentence: 3-13-18 
Ground truth sentence: 	3-13-18 

Input sentence: yes
Word Deco

Input sentence: 30/7/2018
Word Decoded sentence: 30/7/2018 
Ground truth sentence: 	30/7/2018 

Input sentence: Christiaan Daniel Jacobs
Word Decoded sentence: Christiaan Daniel Jacobs 
Ground truth sentence: 	Christiaan Daniel Jacobs 

Input sentence: 100
Word Decoded sentence: 100 
Ground truth sentence: 	100 

Input sentence: there a a cmmpated b4 pey cent poig
Word Decoded sentence: there a a compared b4 pay cent pig 
Ground truth sentence: 	There was a computed 8.4 per cent. swing 

Input sentence: The President will probabry inscuss the
Word Decoded sentence: The President will probably incuss the 
Ground truth sentence: 	The President will probably discuss the 

Input sentence: thoy become ( 1) tired or ( 2 ) more used
Word Decoded sentence: they become ( 1) tired or ( 2 ) more used 
Ground truth sentence: 	they become ( 1 ) tired , or ( 2 ) more used 

Input sentence: 3/8/18
Word Decoded sentence: 3/8/18 
Ground truth sentence: 	3/8/18 

Input sentence: 1-18-18
Word Decoded sen

Input sentence: Pomorama
Word Decoded sentence: Pomorama 
Ground truth sentence: 	Panorama 

Input sentence: 2-23-18
Word Decoded sentence: 2-23-18 
Ground truth sentence: 	2-23-18 

Input sentence: 8808070014087
Word Decoded sentence: 8808070014087 
Ground truth sentence: 	880807 0014 087 

Input sentence: 7311D
Word Decoded sentence: 7311D 
Ground truth sentence: 	7311D 

Input sentence: Absionalist Parties of Northern Rhodesia
Word Decoded sentence: Absionalist Parties of Northern Rhodesia 
Ground truth sentence: 	Nationalist Parties of Northern Rhodesia 

Input sentence: Man
Word Decoded sentence: Man 
Ground truth sentence: 	Man 

Input sentence: Open Pifl of right forearm
Word Decoded sentence: Open Pirl of right forearm 
Ground truth sentence: 	Open Dog bite of right forearm 

Input sentence: 1102216212087
Word Decoded sentence: 1102216212087 
Ground truth sentence: 	1102216212087 

Input sentence: now as follows : 1 . Personal adjustment 2 . Health
Word Decoded sentence: now as

Input sentence: Peeting tay to slide over a 30 in. table , but the height
Word Decoded sentence: Meeting tay to slide over a 30 in table , but the height 
Ground truth sentence: 	feeding tray to slide over a 30 in. table , but the height 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: (stt)
Word Decoded sentence: (stt) 
Ground truth sentence: 	(1st) 

Input sentence: Richard Daniels
Word Decoded sentence: Richard Daniels 
Ground truth sentence: 	Richard Daniels 

Input sentence: DANIEL E
Word Decoded sentence: DANIEL E 
Ground truth sentence: 	DANIEL E 

Input sentence: 3/6/18
Word Decoded sentence: 3/6/18 
Ground truth sentence: 	3/6/18 

Input sentence: slopknes in the Esenhower Administation
Word Decoded sentence: slopes in the Eisenhower Administration 
Ground truth sentence: 	slackness in the Eisenhower Administration 

Input sentence: everywhere apparent .
Word Decoded sentence: everywhere apparent . 
Ground truth sentence: 	everywhere ap

Input sentence: OUrpt 2/16/18
Word Decoded sentence: Out 2/16/18 
Ground truth sentence: 	OUTpt 2/16/18 

Input sentence: EOAAaR N8 3 . He rulftled the stirels irritably and
Word Decoded sentence: EOAAaR N8 3 . He rulftled the stipels irritably and 
Ground truth sentence: 	COURSE NO. 3 . He ruffled the sheets irritably and 

Input sentence: F ModE to dtop M. Gashbale frem muating
Word Decoded sentence: F mode to stop My Gashbale from mating 
Ground truth sentence: 	A MOVE to stop Mr. Gaitskell from nominating 

Input sentence: Delegate from Mr. Kenneth Kaunda's
Word Decoded sentence: Delegate from Mrp Kenneth Kaunda's 
Ground truth sentence: 	Delegates from Mr. Kenneth Kaunda's 

Input sentence: Orthprdic Surgery
Word Decoded sentence: Orthprdic Surgery 
Ground truth sentence: 	Orthopedic Surgery 

Input sentence: 02-03-18
Word Decoded sentence: 02-03-18 
Ground truth sentence: 	02-03-18 

Input sentence: 044) 690 3548
Word Decoded sentence: 044) 690 3548 
Ground truth sentence: 	(044)

Input sentence: Maigaietha Sisama Jchaia Potgieter.
Word Decoded sentence: Maigaietha Sisama Achaia Potgieter. 
Ground truth sentence: 	Margaretha Susanna Johanna Potgieter 

Input sentence: Lowdermilk, Charity
Word Decoded sentence: Lowdermilk, Charity 
Ground truth sentence: 	Lowdermilk, Charity 

Input sentence: 01/08/2018
Word Decoded sentence: 01/08/2018 
Ground truth sentence: 	01/08/2018 

Input sentence: 3-14-18
Word Decoded sentence: 3-14-18 
Ground truth sentence: 	3-14-18 

Input sentence: tent of the paper , and in
Word Decoded sentence: tent of the paper , and in 
Ground truth sentence: 	tent of the taper , and in 

Input sentence: script , and the great adventages to be
Word Decoded sentence: script , and the great advantages to be 
Ground truth sentence: 	script , and the great advantages to be 

Input sentence: Lane
Word Decoded sentence: Lane 
Ground truth sentence: 	Lane 

Input sentence: 044827038X9
Word Decoded sentence: 044827038X9 
Ground truth sentence: 	04482703

In [None]:
WER_spell_correction = calculate_WER(gt_texts, decoded_sentences)
print('WER_spell_correction |TEST= ', WER_spell_correction)

In [None]:
WER_spell_word_correction = calculate_WER(gt_texts, corrected_sentences)
print('WER_spell_word_correction |TEST= ', WER_spell_word_correction)

In [None]:
WER_OCR = calculate_WER(gt_texts, input_texts)
print('WER_OCR |TEST= ', WER_OCR)