# Introduction

We tackle the problem of OCR post processing. In OCR, we map the image form of the document into the text domain. This is done first using an CNN+LSTM+CTC model, in our case based on tesseract. Since this output maps only image to text, we need something on top to validate and correct language semantics.

The idea is to build a language model, that takes the OCRed text and corrects it based on language knowledge. The langauge model could be:
- Char level: the aim is to capture the word morphology. In which case it's like a spelling correction system.
- Word level: the aim is to capture the sentence semnatics. But such systems suffer from the OOV problem.
- Fusion: to capture semantics and morphology language rules. The output has to be at char level, to avoid the OOV. However, the input can be char, word or both.

The fusion model target is to learn:

    p(char | char_context, word_context)

In this workbook we use seq2seq vanilla Keras implementation, adapted from the lstm_seq2seq example on Eng-Fra translation task. The adaptation involves:

- Adapt to spelling correction, on char level
- Pre-train on a noisy, medical sentences
- Fine tune a residual, to correct the mistakes of tesseract 
- Limit the input and output sequence lengths
- Enusre teacher forcing auto regressive model in the decoder
- Limit the padding per batch
- Learning rate schedule
- Bi-directional LSTM Encoder
- Bi-directional GRU Encoder


# Imports

In [1]:
from __future__ import print_function
import tensorflow as tf
import keras.backend as K
from keras.backend.tensorflow_backend import set_session
from keras.models import Model
from keras.layers import Input, LSTM, Dense, Bidirectional, Concatenate, GRU
from keras import optimizers
from keras.callbacks import ModelCheckpoint, TensorBoard, LearningRateScheduler
from keras.models import load_model
import numpy as np
import os
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from autocorrect import spell
import re
%matplotlib inline

Using TensorFlow backend.


# Utility functions

In [2]:
# Limit gpu allocation. allow_growth, or gpu_fraction
def gpu_alloc():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    set_session(tf.Session(config=config))

In [3]:
gpu_alloc()

In [4]:
def calculate_WER_sent(gt, pred):
    '''
    calculate_WER('calculating wer between two sentences', 'calculate wer between two sentences')
    '''
    gt_words = gt.lower().split(' ')
    pred_words = pred.lower().split(' ')
    d = np.zeros(((len(gt_words) + 1), (len(pred_words) + 1)), dtype=np.uint8)
    # d = d.reshape((len(gt_words)+1, len(pred_words)+1))

    # Initializing error matrix
    for i in range(len(gt_words) + 1):
        for j in range(len(pred_words) + 1):
            if i == 0:
                d[0][j] = j
            elif j == 0:
                d[i][0] = i

    # computation
    for i in range(1, len(gt_words) + 1):
        for j in range(1, len(pred_words) + 1):
            if gt_words[i - 1] == pred_words[j - 1]:
                d[i][j] = d[i - 1][j - 1]
            else:
                substitution = d[i - 1][j - 1] + 1
                insertion = d[i][j - 1] + 1
                deletion = d[i - 1][j] + 1
                d[i][j] = min(substitution, insertion, deletion)
    return d[len(gt_words)][len(pred_words)]

In [5]:
def calculate_WER(gt, pred):
    '''

    :param gt: list of sentences of the ground truth
    :param pred: list of sentences of the predictions
    both lists must have the same length
    :return: accumulated WER
    '''
#    assert len(gt) == len(pred)
    WER = 0
    nb_w = 0
    for i in range(len(gt)):
        #print(gt[i])
        #print(pred[i])
        WER += calculate_WER_sent(gt[i], pred[i])
        nb_w += len(gt[i])

    return WER / nb_w

In [6]:
def load_data_with_gt(file_name, num_samples, max_sent_len, min_sent_len, delimiter='\t', gt_index=1, prediction_index=0):
    '''Load data from txt file, with each line has: <TXT><TAB><GT>. The  target to the decoder muxt have \t as the start trigger and \n as the stop trigger.'''
    cnt = 0  
    input_texts = []
    gt_texts = []
    target_texts = []
    for row in open(file_name, encoding='utf8'):
        if cnt < num_samples :
            #print(row)
            sents = row.split(delimiter)
            input_text = sents[prediction_index]
            
            target_text = '\t' + sents[gt_index] + '\n'
            if len(input_text) > min_sent_len and len(input_text) < max_sent_len and len(target_text) > min_sent_len and len(target_text) < max_sent_len:
                cnt += 1
                
                input_texts.append(input_text)
                target_texts.append(target_text)
                gt_texts.append(sents[gt_index])
    return input_texts, target_texts, gt_texts

In [7]:
def load_data(file_name, num_samples, max_sent_len, min_sent_len):
    '''Load data from txt file, with each line has: <TXT><TAB><GT>. The  target to the decoder muxt have \t as the start trigger and \n as the stop trigger.'''
    cnt = 0  
    input_texts = []   
    
    #for row in open(file_name, encoding='utf8'):
    for row in open(file_name):
        if cnt < num_samples :            
            input_text = row           
            if len(input_text) > min_sent_len and len(input_text) < max_sent_len:
                cnt += 1                
                input_texts.append(input_text)
    return input_texts

In [8]:
def vectorize_data(input_texts, max_encoder_seq_length, num_encoder_tokens, vocab_to_int):
    
    if(len(input_texts) > max_encoder_seq_length):
        input_texts = input_texts[:max_encoder_seq_length]
    
    '''Prepares the input text and targets into the proper seq2seq numpy arrays'''
    encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length),
    dtype='float32')
    
    for i, input_text in enumerate(input_texts):
        for t, char in enumerate(input_text[:max_encoder_seq_length]):
            # c0..cn
            encoder_input_data[i, t] = vocab_to_int[char]
                
    return encoder_input_data

In [9]:
def decode_sequence(input_seq, encoder_model, decoder_model, num_decoder_tokens, max_decoder_seq_length, vocab_to_int, int_to_vocab):
    
    #print(max_decoder_seq_length)
    # Encode the input as state vectors.
    encoder_outputs, h, c  = encoder_model.predict(input_seq)
    states_value = [h,c]
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0] = vocab_to_int['\t']

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    #print(input_seq)
    attention_density = []
    i = 0
    special_chars = ['\\', '/', '-', '—' , ':', '[', ']', ',', '.', '"', ';', '%', '~', '(', ')', '{', '}', '$', '#']
    #special_chars = []
    while not stop_condition:
        #print(target_seq)
        output_tokens, attention, h, c  = decoder_model.predict(
            [target_seq, encoder_outputs] + states_value)
        #print(attention.shape)
        attention_density.append(attention[0][0])# attention is max_sent_len x 1 since we have num_time_steps = 1 for the output
        # Sample a token
        #print(output_tokens.shape)
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        
        #print(sampled_token_index)
        sampled_char = int_to_vocab[sampled_token_index]
        
        orig_char = int_to_vocab[int(input_seq[:,i][0])]
        
        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True
            #print('End', sampled_char, 'Len ', len(decoded_sentence), 'Max len ', max_decoder_seq_length)
            sampled_char = ''
        
        # Copy digits as it, since the spelling corrector is not good at digit corrections
        
        if(orig_char.isdigit() or orig_char in special_chars):
            decoded_sentence += orig_char            
        else:
            if(sampled_char.isdigit() or sampled_char in special_chars):
                decoded_sentence += ''
            else:
                decoded_sentence += sampled_char
        
        #decoded_sentence += sampled_char


        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]
        
        i += 1
        if(i > 48):
            i = 0
    attention_density = np.array(attention_density)
    
    # Word level spell correct
    '''
    corrected_decoded_sentence = ''
    for w in decoded_sentence.split(' '):
        corrected_decoded_sentence += spell(w) + ' '
    decoded_sentence = corrected_decoded_sentence
    '''
    return decoded_sentence, attention_density


In [10]:
def word_spell_correct(decoded_sentence):
    if(decoded_sentence == ''):
        return ''
    corrected_decoded_sentence = ''
    special_chars = ['\\', '/', '-', '—' , ':', '[', ']', ',', '.', '"', ';', '%', '~', '(', ')', '{', '}', '$', '&', '#', '☒', '■', '☐', '□', '☑', '@']
    for w in decoded_sentence.split(' '):
        #print(w)
        if((len(re.findall(r'\d+', w))==0) and not (w in special_chars)):
            corrected_decoded_sentence += spell(w) + ' '
        else:
            corrected_decoded_sentence += w + ' '
    return corrected_decoded_sentence

In [11]:
def clean_up_sentence(sentence, vocab):
    s = ''
    prev_char = ''
    for c in sentence.strip():
        if c not in vocab or (c == ' ' and prev_char == ' '):
            s += ''
        else:
            s += c
        prev_char = c
            
    return s

# Load data

# Load model params

In [12]:
data_path = '../../dat/'

In [13]:
max_sent_lengths = [50, 100]

In [14]:
vocab_file = {}
model_file = {}
encoder_model_file = {}
decoder_model_file = {}
model = {}
encoder_model = {}
decoder_model = {}
vocab = {}
vocab_to_int = {}
int_to_vocab = {}
max_sent_len = {}
min_sent_len = {}
num_decoder_tokens = {}
num_encoder_tokens = {}
max_encoder_seq_length = {}
max_decoder_seq_length = {}

In [15]:

for i in max_sent_lengths:
    vocab_file[i] = 'vocab-{}.npz'.format(i)
    model_file[i] = 'best_model-{}.hdf5'.format(i)
    encoder_model_file[i] = 'encoder_model-{}.hdf5'.format(i)
    decoder_model_file[i] = 'decoder_model-{}.hdf5'.format(i)
    
    vocab = np.load(file=vocab_file[i])
    vocab_to_int[i] = vocab['vocab_to_int'].item()
    int_to_vocab[i] = vocab['int_to_vocab'].item()
    max_sent_len[i] = vocab['max_sent_len']
    min_sent_len[i] = vocab['min_sent_len']
    input_characters = sorted(list(vocab_to_int))
    num_decoder_tokens[i] = num_encoder_tokens[i] = len(input_characters) #int(encoder_model.layers[0].input.shape[2])
    max_encoder_seq_length[i] = max_decoder_seq_length[i] = max_sent_len[i] - 1#max([len(txt) for txt in input_texts])
    
    model[i] = load_model(model_file[i])
    encoder_model[i] = load_model(encoder_model_file[i])
    decoder_model[i] = load_model(decoder_model_file[i])



In [16]:
num_samples = 1000000
#tess_correction_data = os.path.join(data_path, 'test_data.txt')
#input_texts = load_data(tess_correction_data, num_samples, max_sent_len, min_sent_len)

OCR_data = os.path.join(data_path, 'new_trained_data.txt')
#input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len, min_sent_len, delimiter='|',gt_index=0, prediction_index=1)
input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len=10000, min_sent_len=0)

In [17]:
# Sample data
print(len(input_texts))
for i in range(10):
    print(input_texts[i], '\n', target_texts[i])

1951
Me dieal Provider Roles: Treating  
 	Medical Provider Roles: Treating


Provider First Name: Christine  
 	Provider First Name: Christine


Provider Last Name: Nolen, MD  
 	Provider Last Name: Nolen, MD


Address Line 1 : 7 25 American Avenue  
 	Address Line 1 : 725 American Avenue


City. W’aukesha  
 	City: Waukesha


StatefProvinee: ‘WI  
 	State/Province: WI


Postal Code: 5 31 88  
 	Postal Code: 53188


Country". US  
 	Country:  US


Business Telephone: (2 62) 92 8- 1000  
 	Business Telephone: (262) 928- 1000


Date ot‘Pirst Visit: 1 2/01f20 17  
 	Date of First Visit: 12/01/2017




In [18]:
# Spell correct before inference
'''
input_texts_ = []
for sent in input_texts:
    sent_ = ''
    for word in sent.split(' '):
        sent_ += spell(word) + ' '
    input_texts_.append(sent_)
input_texts = input_texts_
input_texts_ = []
# Sample data
print(len(input_texts))
for i in range(10):
    print(input_texts[i], '\n', target_texts[i])
'''

"\ninput_texts_ = []\nfor sent in input_texts:\n    sent_ = ''\n    for word in sent.split(' '):\n        sent_ += spell(word) + ' '\n    input_texts_.append(sent_)\ninput_texts = input_texts_\ninput_texts_ = []\n# Sample data\nprint(len(input_texts))\nfor i in range(10):\n    print(input_texts[i], '\n', target_texts[i])\n"

In [19]:
decoded_sentences = []
corrected_sentences = []

#for seq_index in range(len(input_texts)):
results = open('RESULTS.md', 'w')
results.write('|OCR sentence|GT sentence|Char decoded sentence|Word decoded sentence|Sentence length (chars)|\n')
results.write('---------------|-----------|----------------|----------------|----------------|\n')
     

for i, input_text in enumerate(input_texts):
    #print(input_text)
    # Find the input length range to choose the proper model to use
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])
    
    

    target_text = gt_texts[i]
    
    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)
    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    print('GT sentence:', target_text.strip())
    print('Char Decoded sentence:', decoded_sentence)   
    print('Word Decoded sentence:', corrected_sentence) 
    results.write(' | ' + input_text + ' | ' + target_text.strip() + ' | ' + decoded_sentence + ' | ' + corrected_sentence + ' | ' + str(len_range) + ' | \n')
    decoded_sentences.append(decoded_sentence)
    corrected_sentences.append(corrected_sentence)
results.close()    

    

-Lenght =  50
Input sentence: Me dieal Provider Roles: Treating
GT sentence: Medical Provider Roles: Treating
Char Decoded sentence: Medical Provider Roles:Treating
Word Decoded sentence: Me dieal Provider Roles Treating 
-Lenght =  50
Input sentence: Provider First Name: Christine
GT sentence: Provider First Name: Christine
Char Decoded sentence: Provider First Name: Chirisin
Word Decoded sentence: Provider First Name Christine 
-Lenght =  50
Input sentence: Provider Last Name: Nolen, MD
GT sentence: Provider Last Name: Nolen, MD
Char Decoded sentence: Provider Last Name: None,Me
Word Decoded sentence: Provider Last Name Dolens MD 
-Lenght =  50
Input sentence: Address Line 1 : 7 25 American Avenue
GT sentence: Address Line 1 : 725 American Avenue
Char Decoded sentence: Address Line 1:725 nemin Avent Avent
Word Decoded sentence: Address Line 1 : 7 25 American Avenue 
-Lenght =  50
Input sentence: City. W’aukesha
GT sentence: City: Waukesha
Char Decoded sentence: City. Worances
Word De

-Lenght =  100
Input sentence: pAvMENTsm'E—I: STATEMENT DATE WILL NOT APPEAR ON THIS STATEMENT
GT sentence: PAYMENTS RECEIVED AFTER STATE DATE WILL NOT APPEAR ON THIS STATEMENT
Char Decoded sentence: ACTIVE LaU—A:SY STTAT LATE TipSU TAYATILANSTEALALDINDGTATE—
Word Decoded sentence: pAvMENTsm'E—I: STATEMENT DATE WILL NOT APPEAR ON THIS STATEMENT 
-Lenght =  50
Input sentence: DATE
GT sentence: DATE
Char Decoded sentence: DATE
Word Decoded sentence: DATE 
-Lenght =  50
Input sentence: DESCRIPTION
GT sentence: DESCRIPTION
Char Decoded sentence: DESCRIPTION
Word Decoded sentence: DESCRIPTION 
-Lenght =  50
Input sentence: _AYMENTS
GT sentence: PAYMENTS
Char Decoded sentence: PAAMENTS
Word Decoded sentence: PAYMENTS 
-Lenght =  50
Input sentence: _DJUSTMENTS
GT sentence: ADJUSTMENTS
Char Decoded sentence: ADJUT DANE
Word Decoded sentence: ADJUSTMENTS 
-Lenght =  50
Input sentence: PAT'ENT BALANCE
GT sentence: PATIENTS BALANCE
Char Decoded sentence: PATTET BAMAE
Word Decoded sentence: PATIEN

-Lenght =  100
Input sentence: To pay your bill on line with a credit card, log on to www.ebixinc.comlpayonline.html.
GT sentence: To pay your bill on line with a credit card, log on to www.ebixinc.com/payonline.html.
Char Decoded sentence: To ppedich ont old polland pollysiot onghes,ctionship pou hypent pon one ponkendent Fing o,lep
Word Decoded sentence: To pay your bill on line with a credit cards log on to www.ebixinc.comlpayonline.html. 
-Lenght =  50
Input sentence: ACCOUNT# EMA297232
GT sentence: ACCOUNT# EMA297232
Char Decoded sentence: ACCOUNT# CAE297232
Word Decoded sentence: ACCOUNT EMA297232 
-Lenght =  100
Input sentence: PLACE OF SERVICE 11 Office 21 Inpatient 22 Outpatient Hospital 23 Emergency Room-Hospital
GT sentence: PLACE OF SERVICE 11 Office 21 Inpatient 22 Outpatient Hospital 23 Emergency Room-Hospital
Char Decoded sentence: PLINent Name For11y E PAT21TANT Ficate22Sprour Ccine Fing Unum 11e MIDe M21r Cound
Word Decoded sentence: PLACE OF SERVICE 11 Office 21 Inpat

-Lenght =  100
Input sentence: To disclose information, whether from before, during or after the date of this authorization, about my health, including HIV, AIDS or other disorders of the Immune system, use of drugs or alcohol, mental or phy5ica| histor , condition, advice or treatment (except this authorization does not authorize release of psychotherapy notesi, prescription drug history, earnings, financial or credit history, professional licenses, employment history, insurance claims and benefits, and all other claims and benefits, including Social Security claims and benefits (“My Information");
GT sentence: To disclose information, whether from before, during or after the date of this authorization, about my health, including HIV, AIDS or other disorders of the immune system, use of drugs or alcohol, mental or physical history, condition, advice or treatment (except this authorization does not authorize release of psychotherapy notes, prescription drug history, earnings, financial

-Lenght =  100
Input sentence: I signed on behalf of the Insured as (Relationship). If Power of Attorney Designee, Guardian, or Conservator, please attach a copy of the document granting authority.
GT sentence: I signed on behalf of the Insured as (Relationship). If Power of Attorney Designee, Guardian, or Conservator, please attach a copy of the document granting authority.
Char Decoded sentence: I signed on behalf of the Insured as (relationshiprohent Insunt If Nomenthe fol orent( the int chenk
Word Decoded sentence: I signed on behalf of the Insured as (Relationship). If Power of Attorney Designee Guardian or Conservatory please attach a copy of the document granting authority 
-Lenght =  100
Input sentence: Unum is a registered trademark and marketing brand of Unum Group and its insuring subsidiaries.
GT sentence: Unum is a registered trademark and marketing brand of Unum Group and its insuring subsidiaries.
Char Decoded sentence: Unum paricaress rof tertstrigheristur chigstromp th

-Lenght =  100
Input sentence: Total Employee Semi-Monthly Payroll Deduction: $20.20
GT sentence: Total Employee Semi-Monthly Payroll Deduction: $20.20
Char Decoded sentence: Total Molly Name-  older Mithilempe :$2er
Word Decoded sentence: Total Employee Semimonthly Payroll Deduction $20.20 
-Lenght =  100
Input sentence: Note: Final cost may vary slightly clue to rounding differences.
GT sentence: Note: Final cost may vary slightly clue to rounding differences.
Char Decoded sentence: Note: Finaliceshocestion mand care providert chind f:lly t ment ofy tiplentorf
Word Decoded sentence: Note Final cost may vary slightly clue to rounding differences 
-Lenght =  100
Input sentence: This information is in abbreviated form only. It is provided to give you a general understanding of your Group Critical Illness coverage.
GT sentence: This information is in abbreviated form only. It is provided to give you a general understanding of your Group Critical Illness coverage.
Char Decoded sentence: T

-Lenght =  100
Input sentence: 2. New tear of anterior cruciate ligament of right knee
GT sentence: 2. New tear of anterior cruciate ligament of right knee
Char Decoded sentence: 2. *rgiesed the the thenefithe for or takerting t2.nearertifite nestoriteretenting bent onint then2.
Word Decoded sentence: 2. New tear of anterior cruciate ligament of right knee 
-Lenght =  50
Input sentence: Social History
GT sentence: Social History
Char Decoded sentence: Social History
Word Decoded sentence: Social History 
-Lenght =  50
Input sentence: 0 Age reporting
GT sentence: • Age reporting
Char Decoded sentence: 0 Agereprepting
Word Decoded sentence: 0 Age reporting 
-Lenght =  50
Input sentence: 0 Consumes alcohol
GT sentence: • Consumes alcohol
Char Decoded sentence: 0 Confuses alb of
Word Decoded sentence: 0 Consumes alcohol 
-Lenght =  50
Input sentence: I Exercises regularly
GT sentence: • Exercises regularly
Char Decoded sentence: Espercissiol Dreformany
Word Decoded sentence: I Exercises re

-Lenght =  100
Input sentence: I Blood Pressure educational material provided to patient; Status:Complete; Done: 24Jan2018
GT sentence: • Blood Pressure educational material provided to patient; Status:Complete; Done: 24Jan2018
Char Decoded sentence: * coleng coderact of the tontouts for tour tour cordist toon one our chincerainustothessentt the tov
Word Decoded sentence: I Blood Pressure educational material provided to patient Status:Complete; Done 24Jan2018 
-Lenght =  100
Input sentence: ast Updated ByzRowan, Ann; 0112412018 9:19:29 AM;Ordered; ForzHealth Maintenance; Ordered By:Ho|m,Jason;
GT sentence: Last Updated By:Rowan, Ann; 01/24/2018 9:19:29 AM;Ordered; For:Health Maintenance; Ordered By:Holm, Jason;
Char Decoded sentence: * SUREDed thedered,;01124
Word Decoded sentence: ast Updated ByzRowan, Anne 0112412018 9:19:29 AM;Ordered; ForzHealth Maintenance Ordered By:Ho|m,Jason; 
-Lenght =  100
Input sentence: New tear of anterior cruciate ligament of right knee
GT sentence: New 

-Lenght =  50
Input sentence: 'OTR‘giilgf’Eiﬁgs 3 of3 3/22/18 3:09:54 PM
GT sentence: TWIN CITIES ORTHOPEDICS 3 of 3 3/22/18 3:09:54 PM
Char Decoded sentence: TWore ifflasing3Pho3i3/22/183:09:54
Word Decoded sentence: 'OTR‘giilgf’Eiﬁgs 3 of3 3/22/18 3:09:54 PM 
-Lenght =  50
Input sentence: TWIN CITIES
GT sentence: TWIN CITIES
Char Decoded sentence: TWIN CITIESS
Word Decoded sentence: TWIN CITIES 
-Lenght =  50
Input sentence: ORTHOPEDICS
GT sentence: ORTHOPEDICS
Char Decoded sentence: ORTHOPEDICS
Word Decoded sentence: ORTHOPEDICS 
-Lenght =  50
Input sentence: Twin Cities Orthopedics-Burnsville
GT sentence: Twin Cities Orthopedics-Burnsville
Char Decoded sentence: Thin Chiess Biness Ores-Birst VIle
Word Decoded sentence: Twin Cities Orthopedics-Burnsville 
-Lenght =  50
Input sentence: MRN: Date of Service: 01/24/2018 9:10AM
GT sentence: MRN: Date of Service: 01/24/2018 9:10AM
Char Decoded sentence: MRN: Dence Dose f:01/24/20189:10AAMIME
Word Decoded sentence: MRNA Date of Service 01

-Lenght =  50
Input sentence: Jillitﬁ'géiés 1 of 3 3/22/18 3:09:54 PM
GT sentence: TWIN CITIES ORTHOPEDICS 1 of 3 3/22/18 3:09:54 PM
Char Decoded sentence: JiPistofic Phy1Cin3 3/22/183:09:54
Word Decoded sentence: Jillitﬁ'géiés 1 of 3 3/22/18 3:09:54 PM 
-Lenght =  50
Input sentence: TWIN CITIES
GT sentence: TWIN CITIES
Char Decoded sentence: TWIN CITIESS
Word Decoded sentence: TWIN CITIES 
-Lenght =  50
Input sentence: ORTHOPEDICS
GT sentence: ORTHOPEDICS
Char Decoded sentence: ORTHOPEDICS
Word Decoded sentence: ORTHOPEDICS 
-Lenght =  50
Input sentence: Twin Cities Orthopedics-Burnsville
GT sentence: Twin Cities Orthopedics-Burnsville
Char Decoded sentence: Thin Chiess Biness Ores-Birst VIle
Word Decoded sentence: Twin Cities Orthopedics-Burnsville 
-Lenght =  50
Input sentence: Date of Service: 01/21/2018 7:30PM
GT sentence: Date of Service: 01/21/2018 7:30PM
Char Decoded sentence: Date of Service: 01/21/20187:30
Word Decoded sentence: Date of Service 01/21/2018 7:30PM 
-Lenght =  5

-Lenght =  50
Input sentence: SURGEON: JASON HOLM, M.D.
GT sentence: SURGEON: JASON HOLM, M.D.
Char Decoded sentence: SURGEON: JAST J LO,M..
Word Decoded sentence: SURGEON JASON HOLMS Made 
-Lenght =  50
Input sentence: DATE: 02/02/2018
GT sentence: DATE: 02/02/2018
Char Decoded sentence: DATE: 02/02/2018
Word Decoded sentence: DATE 02/02/2018 
-Lenght =  50
Input sentence: 05/09/1980
GT sentence: 05/09/1980
Char Decoded sentence: 05/09/1980
Word Decoded sentence: 05/09/1980 
-Lenght =  50
Input sentence: PREOPERATIVE DIAGNOSES:
GT sentence: PREOPERATIVE DIAGNOSES:
Char Decoded sentence: PRERERATIVE IA INSUSS:
Word Decoded sentence: PREOPERATIVE DIAGNOSES 
-Lenght =  50
Input sentence: 1. Right knee anterior cruciate ligament tear.
GT sentence: 1. Right knee anterior cruciate ligament tear.
Char Decoded sentence: 1. Right ancie staccer and inthopent to tear.
Word Decoded sentence: 1. Right knee anterior cruciate ligament tears 
-Lenght =  50
Input sentence: 2. Me dial collateral ligame

-Lenght =  100
Input sentence: The anterior cruciate ligament was secured with SutureTape sutures through each bundle of the ligament extending from - distal to proximal. This was secured in a running locking fashion. The anatomic footprint of the ACL was demarcated along the lateral wall of the notch and a 6-9 guide was introduced here. The FlipCutter was placed and a small amount of socket was drilled at the anatomic footprint. Multiple K-wire puncture holes were placed in the lateral wall of the notch in a microfracture fashion to further augment healing here. The sutures were passed and a TightRope suture preloaded with-
GT sentence: The anterior cruciate ligament was secured with SutureTape sutures through each bundle of the ligament extending from distal to proximal. This was secured in a running locking fashion. The anatomic footprint of the ACL was demarcated along the lateral wall of the notch and a 6-9 guide was introduced here. The FlipCutter was placed and a small amount of

-Lenght =  50
Input sentence: Neurologic -. DTR normal. Sensation intact. -
GT sentence: Neurologic -. DTR normal. Sensation intact.
Char Decoded sentence: Nemureal  -.es Re STrea.ion  STentact In.
Word Decoded sentence: Neurologic of DTR normal Sensation intact - 
-Lenght =  100
Input sentence: Eyes-. Visual acuity is normal to the written word.
GT sentence: Eyes -. Visual acuity is normal to the written word.
Char Decoded sentence: Empl-.ee Firt awe taticabe st Yed to the pontor dity -.
Word Decoded sentence: Eyes Visual acuity is normal to the written words 
-Lenght =  50
Input sentence: ENT-. Hearing Intact to the spoken word.
GT sentence: ENT -. Hearing intact to the spoken word.
Char Decoded sentence: EN -.Hent Instrint Information worked o.
Word Decoded sentence: ENTER Hearing Intact to the spoken words 
-Lenght =  50
Input sentence: Musculoskeletal -
GT sentence: Musculoskeletal -
Char Decoded sentence: Muscols of Rela
Word Decoded sentence: Musculoskeletal - 
-Lenght =  50
Inp

-Lenght =  50
Input sentence: Electronicaliy signed by : Jamie Birkelo, PA;
GT sentence: Electronically signed by : Jamie Birkelo, PA;
Char Decoded sentence: Electronicable sine by  :nstable by Rel,
Word Decoded sentence: Electronicaliy signed by : Jamie Birkelo, PAY 
-Lenght =  50
Input sentence: altiltliigélés
GT sentence: TWIN CITIES ORTHOPEDICS
Char Decoded sentence: PalliallWIOWIERCIOPITECYSICERCHEDICERIORE
Word Decoded sentence: altiltliigélés 
-Lenght =  50
Input sentence: Plan
GT sentence: Plan
Char Decoded sentence: Pay
Word Decoded sentence: Plan 
-Lenght =  50
Input sentence: Knee injury
GT sentence: Knee injury
Char Decoded sentence: Knee injury
Word Decoded sentence: Knee injury 
-Lenght =  100
Input sentence: 0 XRAY KNEES BILAT STANDING AP, LAT RT, SUNRISE BILAT; Status:Complete;
GT sentence: • XRAY KNEES BILAT STANDING AP, LAT RT, SUNRISE BILAT; Status:Complete;
Char Decoded sentence: 0 AThMATer That IDe PAITAT TAt,FiYe PA,UTA TALLIk0ND AbI Napbent Infor MITy A
Word Deco

-Lenght =  100
Input sentence: Postoperatively, the patient will be touchdown weightbearing only for the ﬁrst three weeks to allow some early healing of the MCL'. She was then allowed to progressively weight bear as tolerated with the knee locked in extension. She will then progress back onto the standard ACL rehabilitation protocol.
GT sentence: Postoperatively, the patient will be touchdown weightbearing only for the first three weeks to allow some early healing of the MCL. She was then allowed to progressively weight bear as tolerated with the knee locked in extension. She will then progress back onto the standard ACL rehabilitation protocol.
Char Decoded sentence: Postef Forelent,foe Fing fowed go the the the follent the polle ,omple tonkenticideliged toere preff
Word Decoded sentence: Postoperatively the patient will be touchdown weightbearing only for the rst three weeks to allow some early healing of the MCL'. She was then allowed to progressively weight bear as tolerated with t

-Lenght =  100
Input sentence: Psychiatric -. Alert and oriented x3. Normal mood and affect.
GT sentence: Psychiatric -. Alert and oriented x3. Normal mood and affect.
Char Decoded sentence: Pay operted -.enctident ond porthe3.no stalistu ty theck
Word Decoded sentence: Psychiatric of Alert and oriented x3. Normal mood and affect 
-Lenght =  100
Input sentence: Cardiovascular -. Examination of extremities for edema and/or varicosities is negative. Peripheral pulses intact.
GT sentence: Cardiovascular -. Examination of extremities for edema and/or varicosities is negative. Peripheral pulses intact.
Char Decoded sentence: Catiearr Fingha-.entishas Fartitarger arait and sifergerugharger-.ratian herare trour regtigher arif
Word Decoded sentence: Cardiovascular of Examination of extremities for edema andlor varicosities is negative Peripheral pulses intact 
-Lenght =  50
Input sentence: Neurologic -. Sensation intact.
GT sentence: Neurologic -. Sensation intact.
Char Decoded sentence: Neuro

-Lenght =  50
Input sentence: Date doctor indicated you were unable to work
GT sentence: Date doctor indicated you were unable to work
Char Decoded sentence: Date of inct in about in gooderated to the work
Word Decoded sentence: Date doctor indicated you were unable to work 
-Lenght =  50
Input sentence: When is your next visit?
GT sentence: When is your next visit?
Char Decoded sentence: When is Number to betis ?
Word Decoded sentence: When is your next visit 
-Lenght =  50
Input sentence: Treated in emergency room — no
GT sentence: Treated in emergency room - no
Char Decoded sentence: Treated ince ergent  oo p—o
Word Decoded sentence: Treated in emergency room — no 
-Lenght =  50
Input sentence: Admitted to hospital — no
GT sentence: Admitted to hospital - no
Char Decoded sentence: Admitt to thoo alb p—ling  po 
Word Decoded sentence: Admitted to hospital — no 
-Lenght =  50
Input sentence: Add doctors details — yes
GT sentence: Add doctors details - yes
Char Decoded sentence: Add do

-Lenght =  50
Input sentence: Customer Policy #:
GT sentence: Customer Policy #:
Char Decoded sentence: Customer Policy #:
Word Decoded sentence: Customer Policy of 
-Lenght =  50
Input sentence: EE Name:
GT sentence: EE Name:
Char Decoded sentence: EE Name:
Word Decoded sentence: EE Name 
-Lenght =  100
Input sentence: The information below is provided to give you a general summary of your coverage and premium consistent with the benefits outlined in your certificate.
GT sentence: The information below is provided to give you a general summary of your coverage and premium consistent with the benefits outlined in your certificate.
Char Decoded sentence: Thefient Admeer for mener and ofatimplo her chimyour theficater Fom your ored Toling madifour corge 
Word Decoded sentence: The information below is provided to give you a general summary of your coverage and premium consistent with the benefits outlined in your certificates 
-Lenght =  50
Input sentence: In r v r T We Coverage
GT sente

-Lenght =  50
Input sentence: Leave type — full
GT sentence: Leave type - full
Char Decoded sentence: Leave type — fule
Word Decoded sentence: Leave type — full 
-Lenght =  50
Input sentence: Last day worked
GT sentence: Last day worked
Char Decoded sentence: Last dor worked
Word Decoded sentence: Last day worked 
-Lenght =  50
Input sentence: Resident State on LDW — NC
GT sentence: Resident State on LDW - NC
Char Decoded sentence: Resident to ste Last —NC
Word Decoded sentence: Resident State on LDW — NC 
-Lenght =  50
Input sentence: Full shift last day - no
GT sentence: Full shift last day - no
Char Decoded sentence: Full stait last day -no
Word Decoded sentence: Full shift last day - no 
-Lenght =  50
Input sentence: Absence start time
GT sentence: Absence start time
Char Decoded sentence: Absecee strartime
Word Decoded sentence: Absence start time 
-Lenght =  50
Input sentence: Absence end time
GT sentence: Absence end time
Char Decoded sentence: Abseccee enditime
Word Decoded sen

-Lenght =  50
Input sentence: Country: US
GT sentence: Country: US
Char Decoded sentence: Country: US
Word Decoded sentence: Country US 
-Lenght =  50
Input sentence: ZIP:
GT sentence: ZIP:
Char Decoded sentence: ZAP:AG
Word Decoded sentence: ZIP 
-Lenght =  50
Input sentence: Primary address changed: Yes
GT sentence: Primary address changed: Yes
Char Decoded sentence: Primary add s change Y:s
Word Decoded sentence: Primary address changed Yes 
-Lenght =  50
Input sentence: Physical Address:
GT sentence: Physical Address:
Char Decoded sentence: Physical Address:
Word Decoded sentence: Physical Address 
-Lenght =  50
Input sentence: Address Line 1:
GT sentence: Address Line 1:
Char Decoded sentence: Address Line 1:
Word Decoded sentence: Address Line 1: 
-Lenght =  50
Input sentence: Address Line 2:
GT sentence: Address Line 2:
Char Decoded sentence: Address Line 2:
Word Decoded sentence: Address Line 2: 
-Lenght =  50
Input sentence: City:
GT sentence: City:
Char Decoded sentence: City

-Lenght =  100
Input sentence: C. Information About the Patlont (if different {rem Insuroleollcyholdor) Check one: El Spouse El Domestic Partner El Dependent Child
GT sentence: C. Information About the Patient (If different from Insured/Policyholder) Check one:
Char Decoded sentence: C. ITer Informeducarentlour chinc(raificy thons{ou. Fienthe Fient orentithoustovim(loverencrinco{er.
Word Decoded sentence: C Information About the Patlont if different rem Insuroleollcyholdor) Check one El Spouse El Domestic Partner El Dependent Child 
-Lenght =  50
Input sentence: Last Name Suffix Flrst Name Ml
GT sentence: Last Name Suffix First Name MI
Char Decoded sentence: Last Name Suffix First Name MI
Word Decoded sentence: Last Name Suffix Flrst Name Ml 
-Lenght =  50
Input sentence: Date at Birth tmrnlddr’yy)
GT sentence: Date of Birth (mm/dd/yy)
Char Decoded sentence: Date ath Birth Rirged yo
Word Decoded sentence: Date at Birth tmrnlddr’yy) 
-Lenght =  50
Input sentence: Social Security Member


-Lenght =  50
Input sentence: Ifyee‘ date of accident (mmlddiyy) Ell—I
GT sentence: If yes, date of accident (mm/dd/yy)
Char Decoded sentence: If yes mated mmddy (odeation)Ele
Word Decoded sentence: Lfyee date of accident mmlddiyy Elli 
-Lenght =  100
Input sentence: to this oondiiion the result ofhlsfher employment El Yea Efﬁe El Unknown
GT sentence: Is this condition the result of his/her employment Yes No Unknown
Char Decoded sentence: Sthes Tory free Mon working thene provint thenthe ine thene Incerion be Tyrent ond Yeshent work wise
Word Decoded sentence: to this oondiiion the result ofhlsfher employment El Yea Else El Unknown 
-Lenght =  100
Input sentence: Please verify treatment for the accident listed above
GT sentence: Please verify treatment for the accident listed above
Char Decoded sentence: Please apeche for ore provier cattithed co te ticate
Word Decoded sentence: Please verify treatment for the accident listed above 
-Lenght =  50
Input sentence: Detee orService (includ

-Lenght =  50
Input sentence: Transaction identiﬁer:
GT sentence: Transaction identifier:
Char Decoded sentence: Transaction identifie:
Word Decoded sentence: Transaction identiﬁer: 
-Lenght =  50
Input sentence: Patient identiﬁer:
GT sentence: Patient identifier:
Char Decoded sentence: Patient identifie:
Word Decoded sentence: Patient identiﬁer: 
-Lenght =  50
Input sentence: Subtotal:
GT sentence: Subtotal:
Char Decoded sentence: Subtotal:
Word Decoded sentence: Subtotal 
-Lenght =  50
Input sentence: Sales Tax:
GT sentence: Sales Tax:
Char Decoded sentence: Sales Tax:
Word Decoded sentence: Sales Tax 
-Lenght =  50
Input sentence: Total:
GT sentence: Total:
Char Decoded sentence: Total:
Word Decoded sentence: Total 
-Lenght =  50
Input sentence: [customer copy)
GT sentence: (customer copy)
Char Decoded sentence: [customer copy)
Word Decoded sentence: customer copy 
-Lenght =  50
Input sentence: ORTHOA?LANTA, L.L.C.
GT sentence: ORTHOATLANTA, L.L.C.
Char Decoded sentence: ORTHOALATAL

-Lenght =  50
Input sentence: Dateltime:
GT sentence: Date/time:
Char Decoded sentence: Datetime:
Word Decoded sentence: Dateltime: 
-Lenght =  50
Input sentence: Record number:
GT sentence: Record number:
Char Decoded sentence: Record number:
Word Decoded sentence: Record number 
-Lenght =  50
Input sentence: Type:
GT sentence: Type:
Char Decoded sentence: Type:
Word Decoded sentence: Type 
-Lenght =  50
Input sentence: Trace number:
GT sentence: Trace number:
Char Decoded sentence: Trace number:
Word Decoded sentence: Trace number 
-Lenght =  50
Input sentence: Account number:
GT sentence: Account number:
Char Decoded sentence: Account number:
Word Decoded sentence: Account number 
-Lenght =  50
Input sentence: Transaction reference number:
GT sentence: Transaction reference number:
Char Decoded sentence: Transaction reference number:
Word Decoded sentence: Transaction reference number 
-Lenght =  50
Input sentence: Cardholder name:
GT sentence: Cardholder name:
Char Decoded sentence

-Lenght =  100
Input sentence: To assist in the evaluation or administration of my claim(s), I authorize Unum Group, its subsidiaries and duly authorized representatives (“Unum") to share personal health and ﬁnancial information relating to my ciaim with the family members, friends. andfor other third parties listed below:
GT sentence: To assist in the evaluation or administration of my claim(s), I authorize Unum Group, its subsidiaries and duly authorized representatives ("Unum") to share personal health and financial information relating to my claim with the family members, friends, and/or other third parties listed below:
Char Decoded sentence: Total Monthesurnation wirsthes Mouts for ationshipmy thessment posthe the the tintithessms emanessun
Word Decoded sentence: To assist in the evaluation or administration of my claim(s), I authorize Unum Group its subsidiaries and duly authorized representatives (“Unum") to share personal health and nancial information relating to my ciaim wit

-Lenght =  50
Input sentence: Confirmation of Coverage
GT sentence: Confirmation of Coverage
Char Decoded sentence: Confirmation of Coverage
Word Decoded sentence: Confirmation of Coverage 
-Lenght =  50
Input sentence: Customer #:
GT sentence: Customer #:
Char Decoded sentence: Customer #:
Word Decoded sentence: Customer of 
-Lenght =  50
Input sentence: EE Name:
GT sentence: EE Name:
Char Decoded sentence: EE Name:
Word Decoded sentence: EE Name 
-Lenght =  100
Input sentence: The information below is provided to give you a general summary of your coverage and premium consistent with the benefits outlined in your certificate.
GT sentence: The information below is provided to give you a general summary of your coverage and premium consistent with the benefits outlined in your certificate.
Char Decoded sentence: Thefient Admeer for mener and ofatimplo her chimyour theficater Fom your ored Toling madifour corge 
Word Decoded sentence: The information below is provided to give you a gene

-Lenght =  100
Input sentence: Technique: Multiplanar, multisequence imaging of the left knee was performed without the use of intravenous gadolinium.
GT sentence: Technique: Multiplanar, multisequence imaging of the left knee was performed without the use of intravenous gadolinium.
Char Decoded sentence: Then Pofi:he for Umane,re Fur chithe fon Surgentict Suffip:evintionship, Pher nertent onshnes forken
Word Decoded sentence: Technique Multiplanar, multisequence imaging of the left knee was performed without the use of intravenous gadolinium 
-Lenght =  50
Input sentence: FINDINGS:
GT sentence: FINDINGS:
Char Decoded sentence: FINDINGS:
Word Decoded sentence: FINDINGS 
-Lenght =  100
Input sentence: On coronal sequence, there is a horizontal flap tearthroughout the posterior medial meniscal horn with focal radial tearwithin the posterior horn. Separate complex tear at the junction of the posterior medial meniscai horn and root. The medial meniscai body is extruded by 0.3 am. No displa

-Lenght =  50
Input sentence: p—IEDMONT HEALTHéARé“
GT sentence: PIEDMONT HEALTHCARE
Char Decoded sentence: P—IDENT HEALTHCARER
Word Decoded sentence: piedmont HEALTHéARé“ 
-Lenght =  50
Input sentence: Group No:
GT sentence: Group No:
Char Decoded sentence: Group No:
Word Decoded sentence: Group Not 
-Lenght =  50
Input sentence: Date:
GT sentence: Date:
Char Decoded sentence: Date:
Word Decoded sentence: Date 
-Lenght =  50
Input sentence: Explanation of Benefits
GT sentence: Explanation of Benefits
Char Decoded sentence: Explaning on Benefits
Word Decoded sentence: Explanation of Benefits 
-Lenght =  50
Input sentence: Page I 012 (continued on back)
GT sentence: Page 1 of 2 (continued on back)
Char Decoded sentence: Page In012o(e Contined bnsubs)
Word Decoded sentence: Page I 012 continued on back 
-Lenght =  50
Input sentence: Provld-or: GEORGE STONE M0
GT sentence: Provider: GEORGE STONE MD
Char Decoded sentence: Provid-r:GORTH STONE SED0
Word Decoded sentence: Provld-or: GEORGE S

-Lenght =  100
Input sentence: I authorize the followin persons: health care professionals. hospitals, clinics, laboratories, pharmacies and all other medical or me really related providers, facilities or services, rehabilitation profesSionals, vocational evaluators. health plans, insurance companies, third party administrators, insurance producers, insurance service providers. consumer reporting agencres including credit bureaus, GEINEX Se‘mces, LLC, The Advocator Group and other Social Security advocacy vendors, professional licensing bodies, employers, attorneys. ﬁnancial institutions and/or banks, and governmental entities:
GT sentence: I authorize the following persons: health care professionals, hospitals, clinics, laboratories, pharmacies and all other medical or medically related providers, facilities or services, rehabilitation professionals, vocational evaluators. health plans, insurance companies, third party administrators, insurance producers, insurance service providers. 

-Lenght =  100
Input sentence: If I do not sign this authorization or if l_alter or revoke it, except as speciﬁed above, Unum may not be able to evaluate or administer my claim(s),_ which may lead to my claim(s) being denied. i may revoke this authorization at any time by sending written notice to the address above. I understand that revocation Will not apply to any informattﬁn thﬂf ”hi lm I'ﬂt‘tl IDQ‘I'Q nl" ”inﬁll-inn: I'H'ih?
GT sentence: If I do not sign this authorization or if I alter or revoke it, except as specified above, Unum may not be able to evaluate or administer my claim(s), which may lead to my claim(s) being denied. I may revoke this authorization at any time by sending written notice to the address above. I understand that revocation will not apply to any information that Unum requires or discloses
Char Decoded sentence: If knowrevere Yestr of an and ofyse theness chesed to tharke shink rofitestionshiprofithathe the ine
Word Decoded sentence: If I do not sign this aut

-Lenght =  100
Input sentence: Any person who knowingly and with the intent to injure. defraud or deceive an insurance company presents a false or fraudulent claim for payment of a loss or beneﬁt or knowingly presents false information in an application for insurance is guilty of a crime and may be subject to ﬁnes and confinement in prison.
GT sentence: Any person who knowingly and with the intent to injure, defraud or deceive an insurance company presents a false or fraudulent claim for payment of a loss or benefit or knowingly presents false information in an application for insurance is guilty of a crime and may be subject to fines and confinement in prison.
Char Decoded sentence: Aucial chicuredupite the dick ons forint the ponthe for your chinceraifite the inghiviventuristu ing
Word Decoded sentence: Any person who knowingly and with the intent to injured defraud or deceive an insurance company presents a false or fraudulent claim for payment of a loss or benet or knowingly presen

-Lenght =  50
Input sentence: Medical Pl'oxitler Information — Hospitalization
GT sentence: Medical Provider Information - Hospitalization
Char Decoded sentence: Medical Provider Information  —ospitalization
Word Decoded sentence: Medical Pl'oxitler Information — Hospitalization 
-Lenght =  50
Input sentence: Hospital Name. Minnesota Valley Surgery Center
GT sentence: Hospital Name: Minnesota Valley Surgery Center
Char Decoded sentence: Hospital Name. Vilation  Huspotal Farmenter
Word Decoded sentence: Hospital Name Minnesota Valley Surgery Center 
-Lenght =  50
Input sentence: Address Line 1: 1000 W 140th St #102
GT sentence: Address Line 1: 1000 W 140th St #102
Char Decoded sentence: Address Line 1: 1000 140 Ent#102
Word Decoded sentence: Address Line 1: 1000 W 140th St #102 
-Lenght =  50
Input sentence: City. Bm‘nsville
GT sentence: City: Burnsville
Char Decoded sentence: City. BmodE Vill
Word Decoded sentence: City Bm‘nsville 
-Lenght =  50
Input sentence: Claim Tji'pe: VB Acciden

-Lenght =  50
Input sentence: unum‘t
GT sentence: unum
Char Decoded sentence: unum
Word Decoded sentence: unum 
-Lenght =  50
Input sentence: . O . ACCIDENT CLAIM FORM
GT sentence: ACCIDENT CLAIM FORM
Char Decoded sentence: .CCI.ENT CLAIM FLOM CORM
Word Decoded sentence: . O . ACCIDENT CLAIM FORM 
-Lenght =  50
Input sentence: The Beneﬁts Center
GT sentence: The Benefits Center
Char Decoded sentence: The Benefits Center
Word Decoded sentence: The Benets Center 
-Lenght =  100
Input sentence: Call toll—free Monday through Friday, 8 am. to 8 pm. Eastern Time.
GT sentence: Call toll-free Monday through Friday, 8 a.m. to 8 p.m. Eastern Time.
Char Decoded sentence: Callurge —he Ifumber conin ty thari,i8est.the 8or hole ti—ntientirent Ind for
Word Decoded sentence: Call toilufree Monday through Friday 8 am to 8 pm Eastern Time 
-Lenght =  50
Input sentence: ATTENDING PHYSICIAN STATEMENT (PLEASE PRINT)
GT sentence: ATTENDING PHYSICIAN STATEMENT (PLEASE PRINT)
Char Decoded sentence: ATTENDING 

-Lenght =  100
Input sentence: B. Com late this section for disablllt claims onl .
GT sentence: B. Complete this section for disability claims only.
Char Decoded sentence: B. Tabe Dall date the din al and cationsh pall par.d of Acale aron
Word Decoded sentence: By Com late this section for disablllt claims onl . 
-Lenght =  100
Input sentence: if this claim Is related to normal pregnancy, please provide the following:
GT sentence: If this claim Is related to normal pregnancy, please provide the following:
Char Decoded sentence: If thin ale of tarighted foryhy thale pace i,g the call cationsh pheng ielllicales
Word Decoded sentence: if this claim Is related to normal pregnancy please provide the following 
-Lenght =  50
Input sentence: Expected Delivery Date: (mmlddlyy)
GT sentence: Expected Delivery Date: (mm/dd/yy)
Char Decoded sentence: Expected Delivery Date: (mmddyy)
Word Decoded sentence: Expected Delivery Date mmlddlyy 
-Lenght =  50
Input sentence: Actual Delivery Date: (mmlddly

-Lenght =  100
Input sentence: Last Payment Date: 01 124(2018 Guarantor Payments Made Since Last Statement:
GT sentence: Last Payment Date: 01/24/2018 Guarantor Payments Made Since Last Statement:
Char Decoded sentence: Last Patit In Sat: 01a124(2018tientat Satian Ty ne thatienth thati:n01i124(
Word Decoded sentence: Last Payment Date 01 124(2018 Guarantor Payments Made Since Last Statement 
-Lenght =  100
Input sentence: Current Statement Date: 02/09/2018 Current Gua rantor Balance Due:
GT sentence: Current Statement Date: 02/09/2018 Current Guarantor Balance Due:
Char Decoded sentence: Curedmaneserated Tyr :a02/09/2018Instructionshing Remint Syoused
Word Decoded sentence: Current Statement Date 02/09/2018 Current Gua rantor Balance Due 
-Lenght =  50
Input sentence: Summary of Charges
GT sentence: Summary of Charges
Char Decoded sentence: Summary of Charges
Word Decoded sentence: Summary of Charges 
-Lenght =  50
Input sentence: Amount Insurance Amount Patient
GT sentence: Amount Ins

-Lenght =  100
Input sentence: To Unum Group and its subsidiaries, Unum Life Insurance Company ofAmerica, Provident Life and Accident Insurance Compan , The Paul Revere Life Insurance Company, and persons who evaluate claims for any of those companies (“ num”);
GT sentence: To Unum Group and its subsidiaries, Unum Life Insurance Company of America, Provident Life and Accident Insurance Company, The Paul Revere Life Insurance Company, and persons who evaluate claims for any of those companies ("Unum");
Char Decoded sentence: Tounid pathing St behturintur as t,e fonit Sprantin Laning ffind pound h por of Act,ng prationship
Word Decoded sentence: To Unum Group and its subsidiaries Unum Life Insurance Company ofAmerica, Provident Life and Accident Insurance Compan , The Paul Revere Life Insurance Company and persons who evaluate claims for any of those companies of num”); 
-Lenght =  100
Input sentence: So that Unum may evaluate and administer my claims, including providing assistance with

-Lenght =  50
Input sentence: CL-1116 (11/14)
GT sentence: CL-1116 (11/14)
Char Decoded sentence: CL-1116 (11/14)
Word Decoded sentence: CL-1116 (11/14) 
-Lenght =  50
Input sentence: Spou s 0 Information
GT sentence: Spouse Information
Char Decoded sentence: Spouse 0nformation
Word Decoded sentence: Spou s 0 Information 
-Lenght =  50
Input sentence: First Name:
GT sentence: First Name:
Char Decoded sentence: First Name:
Word Decoded sentence: First Name 
-Lenght =  50
Input sentence: Middle Name/Initial:
GT sentence: Middle Name/Initial:
Char Decoded sentence: Middle Name/Initial:
Word Decoded sentence: Middle Name/Initial: 
-Lenght =  50
Input sentence: Last Name:
GT sentence: Last Name:
Char Decoded sentence: Last Name:
Word Decoded sentence: Last Name 
-Lenght =  50
Input sentence: S ocial Security Number:
GT sentence: Social Security Number:
Char Decoded sentence: Social Security Number:
Word Decoded sentence: S ocial Security Number 
-Lenght =  50
Input sentence: Binh Date:
GT s

-Lenght =  100
Input sentence: Date of Surgery (mmiddlyy) . El Inpatient X? Outpatient (choose one)
GT sentence: Date of Surgery (mm/dd/yy) Inpatient Outpatient (choose one)
Char Decoded sentence: Date of Fcrithio(e mated)r.urecoummed y mintion rementithe m(
Word Decoded sentence: Date of Surgery mmiddlyy . El Inpatient X Outpatient choose one 
-Lenght =  50
Input sentence: Surgical rocedure CPT Code:
GT sentence: Surgical Procedure CPT Code:
Char Decoded sentence: Surgictal Cocured Code
Word Decoded sentence: Surgical rocedure CPT Code 
-Lenght =  100
Input sentence: FRAUD N TICE: Any person who knowingly files a statement of claim containing faise or misleading information issub'ect to criminal and civil penalties. This includes Attending Physician portions of the claim form.
GT sentence: FRAUD NOTICE: Any person who knowingly files a statement of claim containing false or misleading information is subject to criminal and civil penalties. This includes Attending Physician portions of

-Lenght =  50
Input sentence: QUEETIONS?? Please call
GT sentence: QUESTIONS?? Please call (952) 232-1110
Char Decoded sentence: QUESTE ON Please cale
Word Decoded sentence: QUEETIONS?? Please call 
-Lenght =  50
Input sentence: LAST ‘PAYMENT DATE'
GT sentence: LAST PAYMENT DATE
Char Decoded sentence: LAST PAY PATIENTCAL
Word Decoded sentence: LAST PAYMENT DATE 
-Lenght =  50
Input sentence: ,Your insurance on file is:
GT sentence: Your insurance on file is:
Char Decoded sentence: ,oune instruce informentio:
Word Decoded sentence: Your insurance on file is 
-Lenght =  50
Input sentence: UNITED HEALTHCARE
GT sentence: UNITED HEALTHCARE
Char Decoded sentence: UNITE HEALTHCARE
Word Decoded sentence: UNITED HEALTHCARE 
-Lenght =  50
Input sentence: THAFK 200
GT sentence: THANK YOU
Char Decoded sentence: THAPHO200
Word Decoded sentence: THAFK 200 
-Lenght =  50
Input sentence: TOTAL DUE
GT sentence: TOTAL DUE
Char Decoded sentence: TOTAL DUE
Word Decoded sentence: TOTAL DUE 
-Lenght =  50
I

-Lenght =  50
Input sentence: Employee ID Type; Employcc ID
GT sentence: Employee ID Type: Employee ID
Char Decoded sentence: Employee ID ED E;ployer ICD
Word Decoded sentence: Employee ID Type Employcc ID 
-Lenght =  50
Input sentence: T‘mfﬂnynn TD:
GT sentence: Employee ID:
Char Decoded sentence: Tement no A:
Word Decoded sentence: T‘mfﬂnynn TD 
-Lenght =  50
Input sentence: Employer Nana;
GT sentence: Employer Name:
Char Decoded sentence: Employer Name;
Word Decoded sentence: Employer Nanas 
-Lenght =  50
Input sentence: Gender! Male
GT sentence: Gender: Male
Char Decoded sentence: Gender Male
Word Decoded sentence: Gender Male 
-Lenght =  50
Input sentence: Marital Status; Single
GT sentence: Marital Status: Single
Char Decoded sentence: Marital Status; Singer
Word Decoded sentence: Marital Status Single 
-Lenght =  50
Input sentence: 0c: Title: Rasanﬂxxe:
GT sentence: Occ Title: ResinMixer
Char Decoded sentence: 0h: Tetic: Rasasme:t
Word Decoded sentence: 0c: Title Rasanﬂxxe: 
-Le

-Lenght =  50
Input sentence: Eff Date:
GT sentence: Eff Date:
Char Decoded sentence: Eff Date:
Word Decoded sentence: Eff Date 
-Lenght =  50
Input sentence: 'T'Prm Darn:
GT sentence: Term Date:
Char Decoded sentence: TrPrA Dar
Word Decoded sentence: 'T'Prm Darns 
-Lenght =  50
Input sentence: Plan Larnlngs:
GT sentence: Plan Earnings:
Char Decoded sentence: Plan Larnen
Word Decoded sentence: Plan Larnlngs: 
-Lenght =  50
Input sentence: r , = _ E-Tl ELIZABETH
GT sentence: ST. ELIZABETH
Char Decoded sentence: Th,S VB -TH ZITE Zi
Word Decoded sentence: r , a a Evil ELIZABETH 
-Lenght =  50
Input sentence: EUGEWOOD
GT sentence: EDGEWOOD
Char Decoded sentence: EDGEWOOD
Word Decoded sentence: EUGEWOOD 
-Lenght =  50
Input sentence: OP Notes
GT sentence: OP Notes
Char Decoded sentence: OP Notes
Word Decoded sentence: OP Notes 
-Lenght =  50
Input sentence: MFIN: DOB:
GT sentence: MRN: DOB:
Char Decoded sentence: FIIN: DOB:
Word Decoded sentence: MIND DOB 
-Lenght =  50
Input sentence: Acct

-Lenght =  50
Input sentence: Operative 8‘ Procedure Notes (continued)
GT sentence: Operative & Procedure Notes (continued)
Char Decoded sentence: Ourth Note8 Provider Noure co(ntined u)bdutied
Word Decoded sentence: Operative 8‘ Procedure Notes continued 
-Lenght =  100
Input sentence: E} Note si need in Larkin. John J, MD at 32’2032813 8:36 AM continued
GT sentence: Op Note by Larkin, John J, MD at 3/20/18 8:36 AM (continued)
Char Decoded sentence: E}Mine F Tee Dated Fired .idnkent,Dase D
Word Decoded sentence: E Note si need in Larkins John J MD at 32’2032813 8:36 AM continued 
-Lenght =  50
Input sentence: butt)? Larkin,John J, MD
GT sentence: Author: Larkin, John J MD
Char Decoded sentence: Luth)hon Lar,ing J,MDO
Word Decoded sentence: button Larkin,John J MD 
-Lenght =  50
Input sentence: Igor- Orthopedio
GT sentence: Service: Orthopedic
Char Decoded sentence: Inor-Or propitied
Word Decoded sentence: Igor Orthopedio 
-Lenght =  50
Input sentence: tr Ego: Physioan
GT sentence: Aut

-Lenght =  50
Input sentence: SEE". E LIZAE E'E'H
GT sentence: ST. ELIZABETH
Char Decoded sentence: SEm".EZTIVE EMS
Word Decoded sentence: SEE E LIZAE E'E'H 
-Lenght =  50
Input sentence: Edgewood
GT sentence: EDGEWOOD
Char Decoded sentence: EDGredod
Word Decoded sentence: Edgewood 
-Lenght =  50
Input sentence: FACESHEE’T
GT sentence: FACESHEET
Char Decoded sentence: FAACSEETENTEENTE
Word Decoded sentence: FACESHEET 
-Lenght =  50
Input sentence: MRN: Doe Sex:
GT sentence: MRN: DOB Sex:
Char Decoded sentence: MRN: Dt Dise:
Word Decoded sentence: MRNA Doe Sex 
-Lenght =  50
Input sentence: Patient Demographics .
GT sentence: Patient Demographics
Char Decoded sentence: Patient Deraphearges.
Word Decoded sentence: Patient Demographics . 
-Lenght =  50
Input sentence: Name
GT sentence: Name
Char Decoded sentence: Name
Word Decoded sentence: Name 
-Lenght =  50
Input sentence: Patient ID
GT sentence: Patient ID
Char Decoded sentence: Patient ID
Word Decoded sentence: Patient ID 
-Lenght = 

-Lenght =  50
Input sentence: Operative 8: Procedure Notee
GT sentence: Operative & Procedure Notes
Char Decoded sentence: Operative8:roces Nore Note
Word Decoded sentence: Operative 8: Procedure Notee 
-Lenght =  100
Input sentence: Brief 0p Note by Larkin. John J. MD at 3i‘161i2018 5:82 PM
GT sentence: Brief Op Note by Larkin, John J. MD at 3/16/2018 5:02 PM
Char Decoded sentence: Bite F0ng M  Manes Mie. No WVI. AD  3161e201PMD Nu0ber Mo Nuthe M. Las
Word Decoded sentence: Brief 0p Note by Larkins John J MD at 3i‘161i2018 5:82 PM 
-Lenght =  50
Input sentence: Printed by123?9 at$ﬁ11i18 1:1? PM " Page1
GT sentence: Printed by 12379 at 4/11/18 1:17 PM Page 1
Char Decoded sentence: Printed Ph1239Phy$11181:1 PH"Pagne1
Word Decoded sentence: Printed by123?9 at$ﬁ11i18 1:1? PM " Page1 
-Lenght =  50
Input sentence: unum
GT sentence: unum
Char Decoded sentence: unum
Word Decoded sentence: unum 
-Lenght =  50
Input sentence: SHORT TERM DISABILITY CLAIM FORM
GT sentence: SHORT TERM DISABILITY 

-Lenght =  100
Input sentence: If yes. please pmvida iraﬁlmanl dates (mnﬁddfyy): From Through
GT sentence: If yes, please provide treatment dates (mm/dd/yy): From Through
Char Decoded sentence: If yes. hains fameneriay mey hemaymhe(pay hess):reand .hing ahishilyis
Word Decoded sentence: If yes please pmvida iraﬁlmanl dates (mnﬁddfyy): From Through 
-Lenght =  100
Input sentence: Is the pelican-Li's condition work reiaiad‘? El Yes No C] Unknown
GT sentence: Is the patient’s condition work related?  Yes No Unknown
Char Decoded sentence: Is St Nament o-s Cost Insicatid ays forthe paring thons inding -odicall wip
Word Decoded sentence: Is the pelican-Li's condition work reiaiad El Yes No C Unknown 
-Lenght =  50
Input sentence: Patient’s Height:
GT sentence: Patient’s Height:
Char Decoded sentence: Patient’s Hegati:n
Word Decoded sentence: patients Height 
-Lenght =  50
Input sentence: Paliant’s Weighi
GT sentence: Patient's Weight:
Char Decoded sentence: Palican’s Weilich
Word Decoded sen

-Lenght =  100
Input sentence: Certiﬁcation of Health Care Provider for Employee's Sorlous Health Condition
GT sentence: Certification of Health Care Provider for Employee's Serious Health Condition
Char Decoded sentence: Ceporos rounte or Ffolicshonshipropperict onship If Pow Mines rovint the por
Word Decoded sentence: Certification of Health Care Provider for employees Sorlous Health Condition 
-Lenght =  50
Input sentence: 
GT sentence: Note: If the certification is not completed in English, the employee may be asked to furnish a translation.
Char Decoded sentence: 
Word Decoded sentence: 
-Lenght =  100
Input sentence: duration of: common. malt-nont. otn. Your answer should be your boat nullmnta based upon your mettle-I knowledge, amputation. and mmlnnttnn of tho-patient. Be an memo ”you om; team: sum as "Hm-two. " "unknown," or "ttrdlmmlmta" may not be sum to dehumi- FMM com-ago urnlt your moor-loo: to tho oonolnnn for which your pattern In Booking lam. EM
GT sentence: INSTRUCTION

-Lenght =  100
Input sentence: a. Dmia}ofpasth'ealmam(sl: cHr-IJ-IB‘ arl‘lvﬁf sa paw-v9 wade-r?
GT sentence: a. Date(s) of past treatment(s):
Char Decoded sentence: I.sigy }ess Ine fare(as: wir- f-r youred ryp whe.wanth}d br mier cate(
Word Decoded sentence: a Dmia}ofpasth'ealmam(sl: cHr-IJ-IB‘ arl‘lvﬁf sa paw-v9 wade-r? 
-Lenght =  100
Input sentence: b. Dotooi) of ontiolpatad mam-tantra): (-1"! w"? '1 F
GT sentence: b. Date(s) of anticipated treatment(s):
Char Decoded sentence: b.opport )iont Nome Fin Fon -irst N):h(-1" "
Word Decoded sentence: by Dotooi) of ontiolpatad mam-tantra): (-1"! was '1 F 
-Lenght =  100
Input sentence: 5. 9. Indicate the eatlrnatoo number of monontlallvtalulu). ondtnr estimated duration at madlcal treahnenUvialt:
GT sentence: 8. a. Indicate the estimated number of treatment(s)/visit(s), and/or estimated duration of medical treatment/visit:
Char Decoded sentence: 5.T9.at al t al mpleat and mbe bllt nout mant the5.o9.nthe folmer allt the thent thon Tolly then

-Lenght =  100
Input sentence: 13. Answer the tottowlng questions for en lurerml'ﬂanl leave or at reduced work schedufe.
GT sentence: 13. Answer the following questions for an intermittent leave or a reduced work schedule.
Char Decoded sentence: 13.tion cone fitemented to bessedered co te of th13.s Tururged tonkerged spou the forere and ow l13.
Word Decoded sentence: 13. Answer the tottowlng questions for en lurerml'ﬂanl leave or at reduced work schedufe 
-Lenght =  100
Input sentence: a, It! It madloelly nawnary for the patient to he oFl' work due to tnoclc ﬂora-ups on an Interrntttent boots or to work teal than the petlent‘u normal work echeduia? You a
GT sentence: a. Is it medically necessary for the patient to be off work due to episodic flare-ups on an intermittent basis or to work less than the patient's normal work schedule? Yes No
Char Decoded sentence: F, Tyee Phon of Une Date tient pry Date thent reme,t went ress  Infor the pontertiffit poneanes the,
Word Decoded sentence: a 

-Lenght =  50
Input sentence: If yes. as of what date? lmrnl-ddl'yy)
GT sentence: If yes as of what date? (mm/dd/yy)
Char Decoded sentence: If yes. as of what date w Redi-al you)
Word Decoded sentence: If yes as of what date lmrnl-ddl'yy) 
-Lenght =  50
Input sentence: Physician Information
GT sentence: Physician Information
Char Decoded sentence: Physician Information
Word Decoded sentence: Physician Information 
-Lenght =  100
Input sentence: FRAUD NOTICE: Any person who knowin ly files a statement of claim containing false or misleading information is subioot to criminal and civil Renalties. This ncludeo Attending F’hﬁlcian Bunions or the claim form
GT sentence: FRAUD NOTICE: Any person who knowingly files a statement of claim containing false or misleading information is subject to criminal and civil penalties. This includes Attending Physician portions of the claim form.
Char Decoded sentence: FR Iofing ar:Se ffing Fing forenting and ofyst al and cane F:s ser nasician y are Dating

-Lenght =  100
Input sentence: If your patient has CURRENT RESTRICTIONS (activities patient should not do) andlor LIMITATIONS (activities patient cannot do) list below. Please be speciﬁc and understand that a reply of l“no Work” or “totally disabled" will not enable us to awaluate your patient'e claim for benefits and mayraSultin us having to contact ypufor clariﬁcation.
GT sentence: If your patient has CURRENT RESTRICTIONS (activities patient should not do) and/or LIMITATIONS (activities patient cannot do) list below. Please be specific and understand that a reply of “no work” or “totally disabled” will not enable us to evaluate your patient’s claim for benefits and may result in us having to contact you for clarification.
Char Decoded sentence: If Yos ANTurment In Stit INT Lapt Nation( Rentionshint Insons ind whent Nation Forint Ma(othe Pratid
Word Decoded sentence: If your patient has CURRENT RESTRICTIONS activities patient should not do andlor LIMITATIONS activities patient cannot 

-Lenght =  100
Input sentence: To Unum Group and its subsidiaries, Unum Life Insurance Company ofAmerica, Provident Life and Accident Insurance Compan , The Paul Revere Life Insurance Company, and persons who evaluate claims for any of those companies (“ num");
GT sentence: To Unum Group and its subsidiaries, Unum Life Insurance Company of America, Provident Life and Accident Insurance Company, The Paul Revere Life Insurance Company, and persons who evaluate claims for any of those companies ("Unum");
Char Decoded sentence: Tounid pathing St behturintur as t,e fonit Sprantin Laning ffind pound h por of Act,ng prationship
Word Decoded sentence: To Unum Group and its subsidiaries Unum Life Insurance Company ofAmerica, Provident Life and Accident Insurance Compan , The Paul Revere Life Insurance Company and persons who evaluate claims for any of those companies of num"); 
-Lenght =  100
Input sentence: So that Unum may evaluate and administer my claims, including providing assistance with

-Lenght =  50
Input sentence: First Name:
GT sentence: First Name:
Char Decoded sentence: First Name:
Word Decoded sentence: First Name 
-Lenght =  50
Input sentence: Middle Name/initial:
GT sentence: Middle Name/Initial:
Char Decoded sentence: Middle Name/inilali:
Word Decoded sentence: Middle Name/initial: 
-Lenght =  50
Input sentence: Last Name:
GT sentence: Last Name:
Char Decoded sentence: Last Name:
Word Decoded sentence: Last Name 
-Lenght =  50
Input sentence: Social Security Number:
GT sentence: Social Security Number:
Char Decoded sentence: Social Security Number:
Word Decoded sentence: Social Security Number 
-Lenght =  50
Input sentence: Bil‘lh Date:
GT sentence: Birth Date:
Char Decoded sentence: Birlh Date:
Word Decoded sentence: bill Date 
-Lenght =  50
Input sentence: Gender:
GT sentence: Gender:
Char Decoded sentence: Gender:
Word Decoded sentence: Gender 
-Lenght =  50
Input sentence: Claim Event Information
GT sentence: Claim Event Information
Char Decoded sentence:

-Lenght =  100
Input sentence: To Unum Group and its subsidiaries, Unum Life Insurance Company ofAmerica, Provident Life and Accident Insurance Compan , The Paul Revere Life Insurance Company, and persons who evaluate claims for any of those companies (“ num”);
GT sentence: To Unum Group and its subsidiaries, Unum Life Insurance Company of America, Provident Life and Accident Insurance Company, The Paul Revere Life Insurance Company, and persons who evaluate claims for any of those companies (“Unum”);
Char Decoded sentence: Tounid pathing St behturintur as t,e fonit Sprantin Laning ffind pound h por of Act,ng prationship
Word Decoded sentence: To Unum Group and its subsidiaries Unum Life Insurance Company ofAmerica, Provident Life and Accident Insurance Compan , The Paul Revere Life Insurance Company and persons who evaluate claims for any of those companies of num”); 
-Lenght =  100
Input sentence: So that Unum may evaluate and administer my claims, including providing assistance with

-Lenght =  50
Input sentence: Encounter Date: 02! 1 2/20 1:
GT sentence: Encounter Date: 02/12/2018
Char Decoded sentence: Encountabil D:t02 1 2/201:
Word Decoded sentence: Encounter Date 02! 1 2/20 1: 
-Lenght =  100
Input sentence: Progress Notes by Souha Hakim, MD at 02/12/18 1445
GT sentence: Progress Notes by Souha Hakim, MD at 02/12/18 1445
Char Decoded sentence: Provider First Name My   , 02/12/18h144Saict the mm
Word Decoded sentence: Progress Notes by Souha Hakims MD at 02/12/18 1445 
-Lenght =  50
Input sentence: Author: Souha Hakim, MD
GT sentence: Author: Souha Hakim, MD
Char Decoded sentence: Author: Suraima Hak,s MD
Word Decoded sentence: Author Souha Hakims MD 
-Lenght =  50
Input sentence: Service; (none)
GT sentence: Service: (none)
Char Decoded sentence: Serche;(ofie)
Word Decoded sentence: Service none 
-Lenght =  50
Input sentence: Author Type: Physician
GT sentence: Author Type: Physician
Char Decoded sentence: Author Type: Physician
Word Decoded sentence: Author T

-Lenght =  100
Input sentence: Constitutional: He appears well-developed. He is active.
GT sentence: Constitutional: He appears well-developed. He is active.
Char Decoded sentence: Conth plons pl:is at of Ureacio- harklica.ed ation  arainaconc:d
Word Decoded sentence: Constitutional He appears well-developed. He is active 
-Lenght =  50
Input sentence: HENT:
GT sentence: HENT:
Char Decoded sentence: HEEN:
Word Decoded sentence: HENT 
-Lenght =  50
Input sentence: Right Ear: Tympanic membrane normal.
GT sentence: Right Ear: Tympanic membrane normal.
Char Decoded sentence: Right Ext:e Emploment Type no malm.tal
Word Decoded sentence: Right Ears Tympanic membrane normal 
-Lenght =  50
Input sentence: Nose: Nase normal.
GT sentence: Nose: Nose normal.
Char Decoded sentence: Nos:Name narmal.
Word Decoded sentence: Nose Nase normal 
-Lenght =  50
Input sentence: Mouth/Throat: Oropharynx is clear.
GT sentence: Mouth/Throat: Oropharynx is clear.
Char Decoded sentence: Mourt/ary O:thorations li

-Lenght =  100
Input sentence: If your blood pressure on today‘s visit was greater than 120/80 you may be at risk of developing pro-hypertension or hypertension. We recommend that you follow-up with your Primary Care Provider for further evaluation.
GT sentence: If your blood pressure on today's visit was greater than 120/80 you may be at risk of developing pre-hypertension or hypertension. We recommend that you follow-up with your Primary Care Provider for further evaluation.
Char Decoded sentence: If in  ous old path ofher arat The pay of thes optifiot ofyst the phy dur chily thof you the phe por
Word Decoded sentence: If your blood pressure on todays visit was greater than 120/80 you may be at risk of developing pro-hypertension or hypertension We recommend that you followup with your Primary Care Provider for further evaluation 
-Lenght =  100
Input sentence: If your BMI on today's visit was less than 18.5 or greater than or equal to 25, we recommend that you follow-up with your Pr

-Lenght =  50
Input sentence: Social Secun'ty Number:
GT sentence: Social Security Number:
Char Decoded sentence: Social Security Number:
Word Decoded sentence: Social security Number 
-Lenght =  50
Input sentence: Birth Date:
GT sentence: Birth Date:
Char Decoded sentence: Birth Date:
Word Decoded sentence: Birth Date 
-Lenght =  50
Input sentence: Gender:
GT sentence: Gender:
Char Decoded sentence: Gender:
Word Decoded sentence: Gender 
-Lenght =  50
Input sentence: Language Preference:
GT sentence: Language Preference:
Char Decoded sentence: Language Preference:
Word Decoded sentence: Language Preference 
-Lenght =  50
Input sentence: Address Line 1:
GT sentence: Address Line 1:
Char Decoded sentence: Address Line 1:
Word Decoded sentence: Address Line 1: 
-Lenght =  50
Input sentence: City:
GT sentence: City:
Char Decoded sentence: City:
Word Decoded sentence: City 
-Lenght =  50
Input sentence: Stater‘Pronnce:
GT sentence: State/Province:
Char Decoded sentence: StateProvince:
Word

-Lenght =  50
Input sentence: Patient Name:
GT sentence: Patient Name:
Char Decoded sentence: Patient Name:
Word Decoded sentence: Patient Name 
-Lenght =  50
Input sentence: Patient DOB:
GT sentence: Patient DOB:
Char Decoded sentence: Patient DOB:
Word Decoded sentence: Patient DOB 
-Lenght =  50
Input sentence: Date ofVisit: February II 2018
GT sentence: Date of Visit: February 11 2018
Char Decoded sentence: Date of Firs: Visity Il In2018
Word Decoded sentence: Date opvisit February II 2018 
-Lenght =  50
Input sentence: Seen By: Vijay Patel, MD
GT sentence: Seen By: Vijay Patel, MD
Char Decoded sentence: Seen Ba: Vital Pate, MD
Word Decoded sentence: Seen By Vijay Patel MD 
-Lenght =  50
Input sentence: Location: MedExprcss Jackson, N West Ave
GT sentence: Location: MedExpress Jackson, N West Ave
Char Decoded sentence: Location: Mess access or cac,es   Wese
Word Decoded sentence: Location medexpress Jackson N West Ave 
-Lenght =  50
Input sentence: 1325 North West Avenue
GT sentenc

-Lenght =  50
Input sentence: Liotmtry". US
GT sentence: Country: US
Char Decoded sentence: Litmer U".URE
Word Decoded sentence: Liotmtry". US 
-Lenght =  50
Input sentence: Business Telephone: (517) 783-1779
GT sentence: Business Telephone: (517) 783-1779
Char Decoded sentence: Business Telephone: (517) 783-1779
Word Decoded sentence: Business Telephone (517) 783-1779 
-Lenght =  50
Input sentence: Date ofFirst Visit 02/12/2018
GT sentence: Date of First Visit: 02/12/2018
Char Decoded sentence: Date of First Visit02/12/2018
Word Decoded sentence: Date ofFirst Visit 02/12/2018 
-Lenght =  100
Input sentence: VIPdira] Pl'mirlel' Information — Hospitalirm'inn
GT sentence: Medical Provider Information - Hospitalization
Char Decoded sentence: IIdicad]Fory NamePrisiterincurn—ng tonmand
Word Decoded sentence: VIPdira] Pl'mirlel' Information — Hospitalirm'inn 
-Lenght =  50
Input sentence: Hospital N anle: Medlixpress
GT sentence: Hospital Name: MedExpress
Char Decoded sentence: Hospital Name

-Lenght =  50
Input sentence: 31199018
GT sentence: 3/19/2018
Char Decoded sentence: 31199018
Word Decoded sentence: 31199018 
-Lenght =  50
Input sentence: Claim Details
GT sentence: Claim Details
Char Decoded sentence: Claim Details
Word Decoded sentence: Claim Details 
-Lenght =  50
Input sentence: Member ND:
GT sentence: Member No:
Char Decoded sentence: Member ND:
Word Decoded sentence: Member ND 
-Lenght =  50
Input sentence: Amount Billed:
GT sentence: Amount Billed: $ 264.00
Char Decoded sentence: Amount Billed:
Word Decoded sentence: Amount Billed 
-Lenght =  50
Input sentence: Allowed Amnum:
GT sentence: Allowed Amount: $ 136.22
Char Decoded sentence: Allowed Amoun:
Word Decoded sentence: Allowed Amnum: 
-Lenght =  50
Input sentence: Service Date: 03:0912013
GT sentence: Service Date: 03/09/2018
Char Decoded sentence: Service Dat:03:0912013
Word Decoded sentence: Service Date 03:0912013 
-Lenght =  50
Input sentence: Paramount Paid: $ 101.22
GT sentence: Paramount Paid: $ 101

-Lenght =  50
Input sentence: Spou s 0 Information
GT sentence: Spouse Information
Char Decoded sentence: Spouse 0nformation
Word Decoded sentence: Spou s 0 Information 
-Lenght =  50
Input sentence: First Name:
GT sentence: First Name:
Char Decoded sentence: First Name:
Word Decoded sentence: First Name 
-Lenght =  50
Input sentence: Middle Name/initial:
GT sentence: Middle Name/Initial:
Char Decoded sentence: Middle Name/inilali:
Word Decoded sentence: Middle Name/initial: 
-Lenght =  50
Input sentence: Last Name:
GT sentence: Last Name:
Char Decoded sentence: Last Name:
Word Decoded sentence: Last Name 
-Lenght =  50
Input sentence: Social Security Number:
GT sentence: Social Security Number:
Char Decoded sentence: Social Security Number:
Word Decoded sentence: Social Security Number 
-Lenght =  50
Input sentence: Birth Date:
GT sentence: Birth Date:
Char Decoded sentence: Birth Date:
Word Decoded sentence: Birth Date 
-Lenght =  50
Input sentence: Gender:
GT sentence: Gender:
Char 

-Lenght =  50
Input sentence: Postal Code: 43623
GT sentence: Postal Code: 43623
Char Decoded sentence: Postal Code: 43623
Word Decoded sentence: Postal Code 43623 
-Lenght =  50
Input sentence: Country". US
GT sentence: Country: US
Char Decoded sentence: Country".US
Word Decoded sentence: Country US 
-Lenght =  50
Input sentence: Business Telephone: (419) 474- 1210
GT sentence: Business Telephone: (419) 474-1210
Char Decoded sentence: Business Telephone: (419) 474-1210
Word Decoded sentence: Business Telephone (419) 474- 1210 
-Lenght =  50
Input sentence: Business Fax (419) 474-3076
GT sentence: Business Fax: (419) 474-3076
Char Decoded sentence: Business Fax (419)474-3076
Word Decoded sentence: Business Fax (419) 474-3076 
-Lenght =  50
Input sentence: Date ofFirst Visit: 03/09/2018
GT sentence: Date of First Visit: 03/09/2018
Char Decoded sentence: Date of First Visi:03/09/2018
Word Decoded sentence: Date ofFirst Visit 03/09/2018 
-Lenght =  50
Input sentence: Date ofNen Visit: 03/

-Lenght =  50
Input sentence: State/PrOVirice:
GT sentence: State/Province:
Char Decoded sentence: State/Province:
Word Decoded sentence: State/PrOVirice: 
-Lenght =  50
Input sentence: Postal Code
GT sentence: Postal Code:
Char Decoded sentence: Postal Code
Word Decoded sentence: Postal Code 
-Lenght =  50
Input sentence: Country.
GT sentence: Country:
Char Decoded sentence: Country.
Word Decoded sentence: Country 
-Lenght =  50
Input sentence: Best Phone Number to be Reached During the Day:
GT sentence: Best Phone Number to be Reached During the Day:
Char Decoded sentence: Best Phone Number to be Reached During the Day:
Word Decoded sentence: Best Phone Number to be Reached During the Day 
-Lenght =  50
Input sentence: Email Address:
GT sentence: Email Address:
Char Decoded sentence: Email Address:
Word Decoded sentence: Email Address 
-Lenght =  50
Input sentence: unumQ
GT sentence: unum
Char Decoded sentence: unum
Word Decoded sentence: unumQ 
-Lenght =  50
Input sentence: The Bene

-Lenght =  100
Input sentence: I also authorize Unum to disclose My Information to the following persons (for the purpose of reporting claim status or experience, or so that the moment may carry out health care operations, claims payment, administrative or audit functions related to any benefit, plan or claim): any employee benefit plan sponsored b my employer; any person providing services or insurance benefits to (or on behalf of) my employer, any suc plan or claim, or any benefit offered by Unum; or, the Social Securit Administration. Unum will not condition the payment of insurance benefits on whether I authorize the disc osures described in this paragraph. For the purposes of these disclosures by Unum, this authorization is valid for one year or for the length of time otherwise permitted by law.
GT sentence: I also authorize Unum to disclose My Information to the following persons (for the purpose of reporting claim status or experience, or so that the recipient may carry out heal

-Lenght =  50
Input sentence: Time ofAccident: 14:45
GT sentence: Time of Accident: 14:45
Char Decoded sentence: Time of Actien:14:45
Word Decoded sentence: Time ofaccidenl 14:45 
-Lenght =  50
Input sentence: Accident Date: 02/170018
GT sentence: Accident Date: 02/17/2018
Char Decoded sentence: Accident Date: 02/17001
Word Decoded sentence: Accident Date 02/170018 
-Lenght =  50
Input sentence: Diagnosis Code: knee injury
GT sentence: Diagnosis Code: knee injury
Char Decoded sentence: Diagnosis Code: Cene injury
Word Decoded sentence: Diagnosis Code knee injury 
-Lenght =  50
Input sentence: Sn rg er)’ Information
GT sentence: Surgery Information
Char Decoded sentence: Sngrgery)Information
Word Decoded sentence: Sn rg era Information 
-Lenght =  50
Input sentence: 15 Surgery Required: No
GT sentence: Is Surgery Required: No
Char Decoded sentence: 15 Surgery Required: No
Word Decoded sentence: 15 Surgery Required No 
-Lenght =  50
Input sentence: Medical Proxitler Information , Physici

In [20]:
input_texts = ['SUBJECTIVE: This is a S-year-old +@W his left great toe with the handleh lacration.',
               'Thera was no handlebarthe lacration.',
               'Patiet last tet is needing this for school at this',
               'OBJECTIVE : The temp is 99.8, the f tha blood pressure 99/64, O2 sat 94 8/10 at this time.',
               'Left great toe the dorsl surface, extending ta th active hemorrhage at this time.',
               'Th anaathetized with a cotton ball sat Left this in place for 20 minutes.',
               'with Betadine again and injected th he tolerated very well.',
               'The wound sutures. Antibiotic eintment and g',
               'Patient tolerated very well. Pat',
               'IMPRESSION: Lacration te left grs',
               'PLAN: Patent is to do dressing ch advised as far as checking tha waurn it with soap and water.',
               'Sutures oy have any problems prior te that tim ona teaspoon three times a day rer Ibuprofan far pain, discomfort. Cg',
               'hite male who accidently dropped a bike onto ar end hitting the left great toe,'
               'causing a guard to the end of the bike, which caused anus shot is more that three years ago and time.',
               'nlse ef 105 and regular, resprations 286,% on room air.',
               'Patient rates hia pain at — there i15 noted a 3-om laceration across a lateral aspect of tha toe.',
               'There is no e toa ir cleansed with Betadine.',
               'It is then urated with 5 cu of 2% Hylocaine plain.',
               'We then cleansed ae toa with 3 cc of 2% Xylacaina plain',
               'which was then clesed with five 5-0 Prolene ressure dressing was then applied to the tos',
               'paient is given DPT 0.5 ee intramucular (IM).at toe.',
               'Kefylex 250 mg per 5 ml, the next seven days.',
               'He may use Tylenol or 11 if any problems.',
               'Unum Life Insurance Company of America 2211',               
               'Congress Street Portland, Maine 04122',
               'APPLICATION FOR GROUP CRITICAL LLNESS INSURANCE',
               'I Evidence of Insurability',
               '',
               'Application Type: @ New Enrollee Change to',
               'Existing Coverage  Reinstatement  Internal',
               'Replacement  Late Applicant  Rehire SECTION 1:',
               'Employee(Applicant) Information  Always',
               'Complete Employee Name(First, Middle, Last)',
               'Social Security Number Nikolas J Jones',
               '123 - 456 - 7890 Home Address(Street/ PO Box)',
               'Gender 1634 Stewert St  F  M City Date of Birth',
               '(mm / dd / yyyy) Seattle 06 / 15 / 1991 State Zip',
               'Code Home Phone # Washington 98101 854-555-1212',
               'Are you Actively at Work? Employee ID / Payroll #',
               ' Yes  No55624 a.Are you a U.S.Citizen or',
               'Canadian Citizen working in the U.S.? b.Are you',
               'legally authorized to work in  Yes  No(If No',
               'reply to part b) the U.S.?  Yes  No Employer',
               'Name Group Number Date of Hire(mm/ dd / yyyy)',
               'Facebook 11 - 555566 11 / 30 / 2016 Occupation',
               'Eligibility Class Software Engineer 7 Scheduled',
               'Number of Work Hours per Week Work Phone # 35',
               '854-555-6622 SECTION 2: Spouse Information ',
               'Complete Only if applying for Spouse coverage Name',
               '(First, Middle, Last) Social Security Number',
               'Gender Date of Birth(mm / dd / yyyy) Does the',
               '1019 - 07 - AZ 1',
              'if claint is for a child, please state your relationship 10 the child',
              'date of accident 3d _ time of accident ram. 0 p.m.',
              'have you slopped working? (of yes [1 no if yes, what was the last day that you worked? (mm/ddryy)_| —3 | —{% cnslamegs bil =']
               
for input_text in input_texts:
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    pre_corrected_sentence = word_spell_correct(input_text)
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    target_text = gt_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    print('Word Decoded sentence:', corrected_sentence) 
    print('\n')



Input sentence: SUBJECTIVE: This is a S-year-old +@W his left great toe with the handleh lacration.
Word Decoded sentence: SUBJECTIVE This is a S-year-old Now his left great toe with the handle laceration 


Input sentence: Thera was no handlebarthe lacration.
Word Decoded sentence: Thera was no handlebarthe laceration 


Input sentence: Patiet last tet is needing this for school at this
Word Decoded sentence: Patient last tet is needing this for school at this 


Input sentence: OBJECTIVE : The temp is 99.8, the f tha blood pressure 99/64, O2 sat 94 8/10 at this time.
Word Decoded sentence: OBJECTIVE : The temp is 99.8, the f tha blood pressure 99/64, O2 sat 94 8/10 at this time 


Input sentence: Left great toe the dorsl surface, extending ta th active hemorrhage at this time.
Word Decoded sentence: Left great toe the dorsal surface extending ta th active hemorrhage at this time 


Input sentence: Th anaathetized with a cotton ball sat Left this in place for 20 minutes.
Word Decoded 

In [21]:

input_texts = ['text',
'',
'',
'',
'',
' ',
'',
'',
'',
'Fai',
'10',
'7521509',
'(FISTDEOO)',
'at',
'11/3/2017',
'5:23:19',
'from',
'-9373834004',
'Req',
'IC',
'2017:1030525109:292E.',
'Page',
'4',
'of',
'5',
'(C)',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'11/3/2017',
'FRI',
'8:26',
'FAX',
'2373834004',
'Kjooas00s',
'',
'',
'',
'as3-ursasy3',
'11:30:11',
'11/2/2017',
'vis',
'',
'',
'',
'®',
'®',
'&',
'ACCIDENT',
'CLAIM',
'FORM',
'',
'uu',
'num’',
'Tha',
'Benelits',
'Canter',
'',
'P.O.',
'Bax',
'100158,',
'Calumbin,',
'EC',
'20202-3150',
'',
'Tol-frea:',
'1-800-635-5587',
'Fax:',
'1-800-447-2488',
'',
'Gall',
'toll-free',
'Monday',
'through',
'Friday,',
'8',
'a.m.',
'lo',
'8',
'p.m,',
'Eagtarn',
'Time.',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'[',
'ATTENDING',
'PHYSICIAN',
'STATEMENT',
']',
'',
'',
'IneurexiPolicyt',
'alcar',
'Hama',
'(Lael',
'Name,',
'Flis!',
'Nama,',
'MI,',
'Suffix)',
'Data',
'of',
'Risth',
'{msmidrfyy)',
'-',
'',
'',
'Faupi',
'Nana',
'{Laut',
'Hume,',
'Flial',
'Numa,',
'1',
'Sut)',
'Dats',
'al',
'Bln',
'rAvad)',
'Ul',
'_',
'',
'-[ECIpENT',
'DETAILS',
']',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'a',
'thls',
'Gundilan',
'the',
'result',
'of',
'a',
'acddental',
'inury?',
'ves',
'O',
'No',
'if',
'yas,',
'dale',
'of',
'accident',
'qre/ddlyy)',
'[1',
'0]',
'[z]e',
'[=]',
'',
'',
'',
'Is',
'Mig',
'condition',
'Lhe',
'result',
'of',
'hefer',
'employment',
'£1',
'Yes',
'pNo',
'[1',
'Unknown',
'',
'',
'',
'Plaaze',
'verily',
'treatment',
'for',
'the',
'accident',
'lalad',
'above.',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'Dalaw',
'of',
'Diagnosis',
'Diagncsis',
'Description',
'Prosadure',
'Procedure',
'Dascription',
'',
'Branden',
'(Including',
'|',
'Cudo',
'(GD)',
'ous',
'',
'Confinement)',
'eR',
'ap',
'HAS',
'TTT',
'',
'BEEF',
'eR',
'',
'wiz]',
'.',
'S33,5XxA',
'Hh',
'rioes',
'ey',
'race',
'Word',
'',
'awqd]',
'',
'weak',
'3',
'n',
'[aveny',
'[d',
'',
'wifi',
'Wl',
'',
'oa',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'Has',
'lhe',
'pallet',
'bean',
'trastad',
'for',
'tha',
'same',
'ar',
'&',
'S(tilar',
'candillan',
'by',
'anolher',
'phyalelan',
'In',
'tha',
'past?',
'[1',
'Yen',
'Bho',
'',
'M',
'yor,',
'pioona',
'provid',
'tha',
'fares:',
'',
'',
'',
' ',
'',
'',
'',
'Diageosis:',
'Tramiment',
'Daten:',
'',
'',
'',
' ',
'',
'',
'',
' ',
'',
'',
'',
'id',
'ya.1',
'#dving',
'Lhe',
'patient',
'to',
'clap',
'working?',
'RECEIVED',
'',
'It',
'yes,',
'B8',
'of',
'what',
'cate?',
'(mmidkyy)',
'',
'',
'',
'[23]',
'[117]',
'',
'',
'',
'[Ih',
'cielih',
'fa',
'rotated',
'to',
'normal',
'prepnency,',
'please',
'grovida',
'tha',
'idliawing:',
'NOV',
'',
'Expecigd',
'Delivery',
'Dale',
'(mimicd/yy)',
'Aclual',
'Delivery',
'Dale',
'{mmiddlyy',
'',
'',
'',
' ',
'',
'',
'',
'Phyeiclan',
'informaiton',
'HUMAN',
'REGOURCITE',
'',
'',
'',
'FRAUD',
'NOTICE:',
'Any',
'person',
'wha',
'knowingly',
'files',
'&',
'statement',
'of',
'clalm',
'containing',
'FALSE',
'or',
'misleading',
'information',
'8',
'',
'subject',
'to',
'criminal',
'and',
'elvil',
'penallies.',
'This',
'includes',
'Attending',
'Physician',
'portions',
'of',
'the',
'claim',
'farm.',
'',
'',
'',
'CS',
'yma',
'SEAS',
'Ta',
'hve',
'glan',
'=',
'',
'The',
'above',
'statements',
'ara',
'trun',
'And',
'rompints',
'to',
'tho',
'bot',
'of',
'my',
'knowledge',
'and',
'bolluf.',
'',
'',
'',
'Physician',
'Name',
'(Lea!',
'Name,',
'Firat',
'Name,',
'MI,',
'Suita)',
'Plases',
'Print',
'Co',
'FHman',
'log',
'Mm',
'',
'/',
'‘',
'',
'',
'',
'Medical',
'Speclaty',
'[Tr',
'eactal-',
']',
'|',
'D',
'of',
'r',
'of',
'Ch',
'2',
'',
'2',
'Le',
'',
'',
'==',
'Zoi!',
'M',
'o',
'“Fanart',
'',
'',
'=',
'Balfrone',
'ie',
'2',
'Sle',
'iu',
'',
'il',
'HY',
'BY',
'1942',
'Fax',
'Number',
'yz—',
'43',
'-8',
'7775',
'Fhyalafans',
'Tax',
'ID',
'Number.',
'',
'',
'',
'Aro',
'you',
'refateq',
'to',
'hiv',
'pollen?',
'0',
'Yoe',
'LlMo',
'|',
'yes,',
'wal',
'iv',
'the',
'relelianshipT',
'',
'',
'',
' ',
'',
'',
'',
' ',
' ',
'',
'',
'',
'Physlclan',
'Slgnature',
'Date',
'',
'CL-1023',
'-2717',
'=',
'',
'',
'',
' ',
'',
'',
'',
'—',]
               
for input_text in input_texts:
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    pre_corrected_sentence = word_spell_correct(input_text)
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    target_text = gt_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    #print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    
    #print('Word Decoded sentence:', corrected_sentence) 
    print(corrected_sentence) 
    #print('\n')



text 








Fai 
10 
7521509 
(FISTDEOO) 
at 
11/3/2017 
5:23:19 
from 
-9373834004 
Req 
IC 
2017:1030525109:292E. 
Page 
4 
of 
5 
Act 











11/3/2017 
FRI 
8:26 
FAX 
2373834004 
Kjooas00s 



as3-ursasy3 
11:30:11 
11/2/2017 
vis 



a 
a 
& 
ACCIDENT 
CLAIM 
FORM 

UU 
numb 
Tha 
Benelits 
Canter 

Poor 
Bax 
100158, 
Calumbin, 
EC 
20202-3150 

Tol-frea: 
1-800-635-5587 
Fax 
1-800-447-2488 

Gall 
toilufree 
Monday 
through 
Friday 
8 
army 
lo 
8 
pump 
Eastern 
Time 







































[ 
ATTENDING 
PHYSICIAN 
STATEMENT 
] 


IneurexiPolicyt 
altar 
Hama 
Lael 
Name 
Flisk 
Nama 
MID 
Suffix 
Data 
of 
Rieth 
{msmidrfyy) 
- 


Fault 
Nana 
Claut 
Hume 
Flail 
Numac 
1 
Suto 
Dats 
al 
Ban 
bravado 
Ul 
a 

-[ECIpENT 
DETAILS 
] 



















a 
thls 
Gundilan 
the 
result 
of 
a 
accidental 
inury 
ves 
O 
No 
if 
yas 
dale 
of 
accident 
qre/ddlyy) 
[1 
0] 
ze 
[=] 



Is 
Mig 
condition 
Lhe 
result 
of 
refer 
employment 
£1 
Yes 
pno 
[1 


In [22]:
input_texts = ['☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement']
for input_text in input_texts:
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    #print(input_text)
    pre_corrected_sentence = word_spell_correct(input_text)
    #print(pre_corrected_sentence)

    input_text_ = input_text
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    target_text = gt_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text_)
    #print('-Lenght = ', len_range)
    print('Input sentence:', input_text_)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    print('Char Decoded sentence:', decoded_sentence)   
    
    print('Word Decoded sentence:', corrected_sentence) 
    #print(corrected_sentence) 
    #print('\n')



Input sentence: ☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement
Char Decoded sentence: * Flesign th pate Une or ton EMDatientt the ton Unes Frenef
Word Decoded sentence: ☑ @ New Enrollee ☐ Change to Existing Coverage ☐ Reinstatement 


# Handwriting correction

In [30]:
num_samples = 1000000

OCR_data = os.path.join(data_path, 'handwritten_output.txt')
input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len=10000, min_sent_len=0, delimiter='|', gt_index=0, prediction_index=1)

# Sample data
print(len(input_texts))
for i in range(100):
    print(input_texts[i], '\n', gt_texts[i])

3749
 is insisting on a policy of change . 
 
 is insisting on a policy of change . 
 12/29/17 
 
 12/29/17 
 SIAL)TH 
 
 SLP(L) THA 
 Arcadia CA 91007 
 
 Arcadia CA 91007 
 (012) 667 9375 
 
 (012) 6674375 
 In this 200-fathom trench the herring do not tood the botton . 
 
 In this 200-fathom trench the herring do not touch the bottom . 
 43638556X1 
 
 43638556X1 
 Pretoria 
 
 Pretoria 
 fiddaling about with bils of cost . 
 
 fiddling about with bills of cost . 
 ( Fig. 3) . Loop threed tound liite finger t 
 
 ( Fig. 3 ) . Loop thread round little finger , 
 200681383 
 
 200681383 
 for a working week of 34 to 36 houns . 
 
 for a working week of 34 to 36 hours . 
 Daugher 
 
 Daugther 
 Electronically Signed 
 
 Electronically Signed 
 15122 
 
 15122 
 50 
 
 50 
 ShE WAS MOVVD A Picvic TablE TO SWEAP leAveS And DROREd iT oN hER BSC 
 
 ShE wAs Moving A Picnic TAble To swEEp lEAvEs And DRoPEd iT on hER ToE 
 0724603309 
 
 0724603509 
 lwas car ping a box into the basement Nhe

In [24]:
#for seq_index in range(len(input_texts)):
results = open('RESULTS_HW.md', 'w')
results.write('|HW sentence|Corrected sentence|GT sentence|\n')
results.write('|---------------|-----------|----------------|\n')
decoded_sentences = []
corrected_sentences = []
for input_text, target_text in zip(input_texts, target_texts):
    len_range = max_sent_lengths[-1] # Take the longest range
    for length in max_sent_lengths:
        if(len(input_text) < length):
            len_range = length
            break
    #print(len_range)
    #print(input_text)
    pre_corrected_sentence = word_spell_correct(input_text)
    #print(pre_corrected_sentence)
    
    input_text = clean_up_sentence(input_text, vocab_to_int[len_range])
    encoder_input_data = vectorize_data(input_texts=[input_text], max_encoder_seq_length=max_encoder_seq_length[len_range], num_encoder_tokens=num_encoder_tokens[len_range], vocab_to_int=vocab_to_int[len_range])



    #target_text = target_texts[i]

    input_seq = encoder_input_data
    #print(input_seq.shape)
    #print(max_decoder_seq_length[len_range])
    #print(max_decoder_seq_length)

    decoded_sentence,_  = decode_sequence(input_seq, encoder_model[len_range], decoder_model[len_range], num_decoder_tokens[len_range],  max_decoder_seq_length[len_range], vocab_to_int[len_range], int_to_vocab[len_range])
    corrected_sentence = word_spell_correct(input_text)
    #print('-Lenght = ', len_range)
    print('Input sentence:', input_text)
    #print('Spell Decoded sentence:', pre_corrected_sentence) 
    #print('Char Decoded sentence:', decoded_sentence)   
    
    print('Word Decoded sentence:', corrected_sentence)
    print('Ground truth sentence:', target_text)
    results.write(' | ' + input_text + ' | ' + corrected_sentence + ' | '+ target_text.strip() + ' | \n')
    decoded_sentences.append(decoded_sentence)
    corrected_sentences.append(corrected_sentence)
    #print(corrected_sentence) 
    #print('\n')
    
results.close()


Input sentence: is insisting on a policy of change .
Word Decoded sentence: is insisting on a policy of change . 
Ground truth sentence: 	is insisting on a policy of change . 

Input sentence: 12/29/17
Word Decoded sentence: 12/29/17 
Ground truth sentence: 	12/29/17 

Input sentence: SIAL)TH
Word Decoded sentence: SIAL)TH 
Ground truth sentence: 	SLP(L) THA 

Input sentence: Arcadia CA 91007
Word Decoded sentence: Arcadia CA 91007 
Ground truth sentence: 	Arcadia CA 91007 

Input sentence: (012) 667 9375
Word Decoded sentence: (012) 667 9375 
Ground truth sentence: 	(012) 6674375 

Input sentence: In this 200-fathom trench the herring do not tood the botton .
Word Decoded sentence: In this 200-fathom trench the herring do not good the cotton . 
Ground truth sentence: 	In this 200-fathom trench the herring do not touch the bottom . 

Input sentence: 43638556X1
Word Decoded sentence: 43638556X1 
Ground truth sentence: 	43638556X1 

Input sentence: Pretoria
Word Decoded sentence: Pretori

Input sentence: Gabarel Adam Viles
Word Decoded sentence: Gabriel Adam Viles 
Ground truth sentence: 	Gabriel Adam VIles 

Input sentence: whiplash- S13.9X85
Word Decoded sentence: whiplash S13.9X85 
Ground truth sentence: 	whiplash - S13.4XXD 

Input sentence: 9:15
Word Decoded sentence: 9:15 
Ground truth sentence: 	9:15 

Input sentence: 3/2/2018
Word Decoded sentence: 3/2/2018 
Ground truth sentence: 	3/2/2018 

Input sentence: 0825648359
Word Decoded sentence: 0825648359 
Ground truth sentence: 	0825648359 

Input sentence: mother
Word Decoded sentence: mother 
Ground truth sentence: 	mother 

Input sentence: Robertson
Word Decoded sentence: Robertson 
Ground truth sentence: 	Robertson 

Input sentence: 045149682X2
Word Decoded sentence: 045149682X2 
Ground truth sentence: 	045149682X2 

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: 0834540900
Word Decoded sentence: 0834540900 
Gr

Input sentence: being limited or an adjutment being made
Word Decoded sentence: being limited or an adjustment being made 
Ground truth sentence: 	being limited or an adjustment being made 

Input sentence: 2/25/18
Word Decoded sentence: 2/25/18 
Ground truth sentence: 	2/25/18 

Input sentence: Mis Nelaney of the script , and the great advantages
Word Decoded sentence: Mis Delaney of the script , and the great advantages 
Ground truth sentence: 	Miss Delaney of the script , and the great advantages 

Input sentence: 1906-09-03
Word Decoded sentence: 1906-09-03 
Ground truth sentence: 	1986-09-05 

Input sentence: 30 7 18
Word Decoded sentence: 30 7 18 
Ground truth sentence: 	30 7 18 

Input sentence: a man to mate sinple , stright- fervard thngs , and in
Word Decoded sentence: a man to mate single , stright forward things , and in 
Ground truth sentence: 	a man to make simple , straight-forward things , and in 

Input sentence: Wili be 11 German divisions in Nate
Word Decoded sentenc

Input sentence: P.O. Box 8012 Gneenstone 1616
Word Decoded sentence: Poor Box 8012 Greenstone 1616 
Ground truth sentence: 	P.O. Box 8012 Greenstone 1616 

Input sentence: DR. KeiTh Helton
Word Decoded sentence: DRY KeiTh Helton 
Ground truth sentence: 	Dr. KeiTh Helton 

Input sentence: 02/27/18
Word Decoded sentence: 02/27/18 
Ground truth sentence: 	02/27/18 

Input sentence: NtP
Word Decoded sentence: NTP 
Ground truth sentence: 	NIP 

Input sentence: 4
Word Decoded sentence: 4 
Ground truth sentence: 	4 

Input sentence: cintence Datahase 03-189
Word Decoded sentence: intense Database 03-189 
Ground truth sentence: 	Sentence Database P03-189 

Input sentence: In the Pollowing year , 1380 , the last and
Word Decoded sentence: In the Following year , 1380 , the last and 
Ground truth sentence: 	In the following year , 1380 , the last and 

Input sentence: 402-609-1500
Word Decoded sentence: 402-609-1500 
Ground truth sentence: 	402-609-1500 

Input sentence: Karnes, Jonathon L.
Word

Input sentence: WARBNER STR. At SOmRRSKT WRS
Word Decoded sentence: WARBIER STR At Somerset WAS 
Ground truth sentence: 	WARBLER STR. 21 SOMERSET WES 

Input sentence: 02-26-18
Word Decoded sentence: 02-26-18 
Ground truth sentence: 	02-26-18 

Input sentence: "
Word Decoded sentence: " 
Ground truth sentence: 	" 

Input sentence: 083 685 0052
Word Decoded sentence: 083 685 0052 
Ground truth sentence: 	083 685 0052 

Input sentence: to disaus a commoh course of action . Sir Roy is
Word Decoded sentence: to disas a common course of action . Sir Roy is 
Ground truth sentence: 	to discuss a common course of action . Sir Roy is 

Input sentence: Feb 21/2018
Word Decoded sentence: Feb 21/2018 
Ground truth sentence: 	Feb 2nd 2018 

Input sentence: 04102
Word Decoded sentence: 04102 
Ground truth sentence: 	04102 

Input sentence: M84.3A5A
Word Decoded sentence: M84.3A5A 
Ground truth sentence: 	M84.375A 

Input sentence: 0r261538
Word Decoded sentence: 0r261538 
Ground truth sentence: 	DN 

Input sentence: baycatting the London talks on the
Word Decoded sentence: boycotting the London talks on the 
Ground truth sentence: 	boycotting the London talks on the 

Input sentence: 50 Halkett Str
Word Decoded sentence: 50 Halkett Str 
Ground truth sentence: 	50 Halkett str 

Input sentence: 004-41495-760170
Word Decoded sentence: 004-41495-760170 
Ground truth sentence: 	004-41495-760170 

Input sentence: a foun in the industrial North of England
Word Decoded sentence: a foun in the industrial North of England 
Ground truth sentence: 	a town in the industrial North of England 

Input sentence: gelaotkly ; it is less expensive than most
Word Decoded sentence: gelaotkly ; it is less expensive than most 
Ground truth sentence: 	quickly ; it is less expensive than most 

Input sentence: issue behind the Health Service ; the other
Word Decoded sentence: issue behind the Health Service ; the other 
Ground truth sentence: 	issue behind the Health Service ; the other 

Input sentence: 29

Input sentence: GAROENE RonO 3R, GATE2 MEPRANO, 1685
Word Decoded sentence: GARONNE ron 3R, GATE2 MEPRANO, 1685 
Ground truth sentence: 	GARDENS ROAD 38 , GATE 2 , MIDRAND , 1685 

Input sentence: would still fovw the abolitiom of the Hause
Word Decoded sentence: would still fow the abolition of the Hause 
Ground truth sentence: 	would still favour the abolition of the House 

Input sentence: 2-7-2700700
Word Decoded sentence: 2-7-2700700 
Ground truth sentence: 	207-230 0700 

Input sentence: Mr. Defenbaker 36 per cent , for Mr.
Word Decoded sentence: Mr Diefenbaker 36 per cent , for Mr 
Ground truth sentence: 	Mr. Diefenbaker 36 per cent , for Mr. 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input sentence: Nov' 2011
Word Decoded sentence: Novo 2011 
Ground truth sentence: 	Nov' 2011 

Input sentence: Please sce the following.
Word Decoded sentence: Please SCE the following 
Ground truth sentence: 	Please see the following. 

Input sentence: March 28, 201

Input sentence: the dondon talks on thi Potecenale's
Word Decoded sentence: the london talks on THI Potecenale's 
Ground truth sentence: 	the London talks on the Protectorate's 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: 46517
Word Decoded sentence: 46517 
Ground truth sentence: 	46517 

Input sentence: 187.8
Word Decoded sentence: 187.8 
Ground truth sentence: 	187.8 

Input sentence: 2-1/07/2018
Word Decoded sentence: 2-1/07/2018 
Ground truth sentence: 	27/07/2018 

Input sentence: Sir Roy's United Federal Party is
Word Decoded sentence: Sir royal United Federal Party is 
Ground truth sentence: 	Sir Roy's United Federal Party is 

Input sentence: consulted in May 1834 .
Word Decoded sentence: consulted in May 1834 . 
Ground truth sentence: 	consulted in May 1834 . 

Input sentence: 02/03/2018
Word Decoded sentence: 02/03/2018 
Ground truth sentence: 	02/03/2018 

Input sentence: UNEOLNVILLE
Word Decoded sentence: UNEOLNVILLE 
Ground trut

Input sentence: from this principle . It is a great pity that the
Word Decoded sentence: from this principle . It is a great pity that the 
Ground truth sentence: 	from this principle . It is a great pity that the 

Input sentence: nou-combatant help was wanted ; but they
Word Decoded sentence: nou-combatant help was wanted ; but they 
Ground truth sentence: 	non-combatant help was wanted ; but they 

Input sentence: Ortho Neb. ER
Word Decoded sentence: Ortho Nebe ER 
Ground truth sentence: 	Ortho Nebe. ER 

Input sentence: 6903225656080
Word Decoded sentence: 6903225656080 
Ground truth sentence: 	6903225656080 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: to America anyway .
Word Decoded sentence: to America anyway . 
Ground truth sentence: 	to America anyway . 

Input sentence: at his Wowhington Press conference admilted
Word Decoded sentence: at his Washington Press conference admitted 
Ground truth sentence: 	at his Washington Press conf

Input sentence: UA Ankle Euctmre S82.852A
Word Decoded sentence: UA Ankle Euctmre S82.852A 
Ground truth sentence: 	Left Ankle Fracture S82.852A 

Input sentence: 880626 5177087
Word Decoded sentence: 880626 5177087 
Ground truth sentence: 	8806265117087 

Input sentence: Wational independence Party ( 280000 member )
Word Decoded sentence: National independence Party ( 280000 member ) 
Ground truth sentence: 	National Independence Party ( 280,000 members ) 

Input sentence: TK
Word Decoded sentence: TK 
Ground truth sentence: 	TK 

Input sentence: S63.502A
Word Decoded sentence: S63.502A 
Ground truth sentence: 	S63.502A 

Input sentence: When Mr. Brown sut down Labour MPs
Word Decoded sentence: When Mr Brown but down Labour MPs 
Ground truth sentence: 	When Mr. Brown sat down Labour MPs 

Input sentence: 01-06-2018
Word Decoded sentence: 01-06-2018 
Ground truth sentence: 	01-06-2018 

Input sentence: wiched , old - . " Mr. Brown went on : " We
Word Decoded sentence: wished , old - . 

Input sentence: nupping out ports fom hordood . Mat men
Word Decoded sentence: cupping out ports fom hardwood . Mat men 
Ground truth sentence: 	ripping out parts from hardwood . Most men 

Input sentence: Ky
Word Decoded sentence: Ky 
Ground truth sentence: 	Ky 

Input sentence: Mothine A Masweneng
Word Decoded sentence: Methine A Masweneng 
Ground truth sentence: 	Mokhine N Masweneng 

Input sentence: Pain in left foat
Word Decoded sentence: Pain in left foot 
Ground truth sentence: 	Pain in left foot 

Input sentence: ubout 150,000,000 has been frioeen
Word Decoded sentence: about 150,000,000 has been frozen 
Ground truth sentence: 	about 150,000,000 has been frozen . 

Input sentence: mother
Word Decoded sentence: mother 
Ground truth sentence: 	mother 

Input sentence: Unknown
Word Decoded sentence: Unknown 
Ground truth sentence: 	Unknown 

Input sentence: 22/20/18
Word Decoded sentence: 22/20/18 
Ground truth sentence: 	2/20/18 

Input sentence: injury per pt
Word Decoded senten

Input sentence: 3-5-18
Word Decoded sentence: 3-5-18 
Ground truth sentence: 	3-5-18 

Input sentence: IA
Word Decoded sentence: IA 
Ground truth sentence: 	IA 

Input sentence: 761212 0052084
Word Decoded sentence: 761212 0052084 
Ground truth sentence: 	761212 0052084 

Input sentence: PA
Word Decoded sentence: PA 
Ground truth sentence: 	PA 

Input sentence: Green Bay
Word Decoded sentence: Green Bay 
Ground truth sentence: 	Green Bay 

Input sentence: 10
Word Decoded sentence: 10 
Ground truth sentence: 	10 

Input sentence: receiving regular National Asristance
Word Decoded sentence: receiving regular National Assistance 
Ground truth sentence: 	receiving regular National Assistance 

Input sentence: 2-22-18
Word Decoded sentence: 2-22-18 
Ground truth sentence: 	2-22-18 

Input sentence: 3-13-18
Word Decoded sentence: 3-13-18 
Ground truth sentence: 	3-13-18 

Input sentence: Christiaan Daniel Jacobs
Word Decoded sentence: Christiaan Daniel Jacobs 
Ground truth sentence: 	Christi

Input sentence: alleged association with organisations black-
Word Decoded sentence: alleged association with organisations black 
Ground truth sentence: 	alleged association with organisations black- 

Input sentence: 3/2/18
Word Decoded sentence: 3/2/18 
Ground truth sentence: 	3/2/18 

Input sentence: Philadelphia
Word Decoded sentence: Philadelphia 
Ground truth sentence: 	Philadelphia 

Input sentence: 02/21/18
Word Decoded sentence: 02/21/18 
Ground truth sentence: 	02/21/18 

Input sentence: adjournments , until April 7 , finally had to be content to relamn
Word Decoded sentence: adjournments , until April 7 , finally had to be content to relamp 
Ground truth sentence: 	adjournments , until April 7 , finally had to be content to return 

Input sentence: Mr Maeleod went on with the conference
Word Decoded sentence: Mr Macleod went on with the conference 
Ground truth sentence: 	Mr. Macleod went on with the conference 

Input sentence: 08-06-1964
Word Decoded sentence: 08-06-1964 

Input sentence: Pllsract 1 37, Rensbury
Word Decoded sentence: Pllsract 1 37, Rensbury 
Ground truth sentence: 	Vlokstreet 37, Rensburg 

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: Kanana
Word Decoded sentence: Kanana 
Ground truth sentence: 	Kanana 

Input sentence: reoume tolay . President Kennedy tolay
Word Decoded sentence: resume today . President Kennedy today 
Ground truth sentence: 	resume today . President Kennedy today 

Input sentence: in the churchyard , " sacied to the memory off "
Word Decoded sentence: in the churchyard , " sacred to the memory off " 
Ground truth sentence: 	in the churchyard , " sacred to the memory of " - 

Input sentence: CC
Word Decoded sentence: CC 
Ground truth sentence: 	CC 

Input sentence: Raethaankes reasonane representanom , hut to
Word Decoded sentence: Raethaankes reasoning representanom , hut to 
Ground truth sentence: 	to Mr Kaunda's re

Input sentence: elmatie-contace obat. Com
Word Decoded sentence: elmatie-contace oath Com 
Ground truth sentence: 	elmarie_conradieebat.com 

Input sentence: 044766389X9
Word Decoded sentence: 044766389X9 
Ground truth sentence: 	044766389X9 

Input sentence: 99024
Word Decoded sentence: 99024 
Ground truth sentence: 	99024 

Input sentence: author with Miss Delaney of the script , and
Word Decoded sentence: author with Miss Delaney of the script , and 
Ground truth sentence: 	author with Miss Delaney of the script , and 

Input sentence: Po Bo S70 Greabrakrive 6525
Word Decoded sentence: Po Bo S70 Greabrakrive 6525 
Ground truth sentence: 	PO Box. 870 Greatbrakriver 6525 

Input sentence: Pebelieved he would perform " outstondy
Word Decoded sentence: Prebelieved he would perform " outstondy 
Ground truth sentence: 	He believed he would perform " outstanding 

Input sentence: 6/27/17
Word Decoded sentence: 6/27/17 
Ground truth sentence: 	6/27/17 

Input sentence: Now we have the strke

Input sentence: ritapot 25(a)gmal.con
Word Decoded sentence: ritapot 25(a)gmal.con 
Ground truth sentence: 	ritapot25egmail.com 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 16.09.1988
Word Decoded sentence: 16.09.1988 
Ground truth sentence: 	16.09.1988 

Input sentence: recorder orug gertly through 8.0, 7.0, 6.0
Word Decoded sentence: recorder org gently through 8.0, 7.0, 6.0 
Ground truth sentence: 	recorder swung gently through 8.0 , 7.0 , 6.0 

Input sentence: TOOTH BRORE HAD TO OET A ROOT CANAL AND CRONN
Word Decoded sentence: TOOTH BROKE HAD TO OUT A ROOT CANAL AND CROWN 
Ground truth sentence: 	TOOTH BROKE HAD TO GET A ROOT CANAL AND CROWN 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: apartheid is bring applied ever more
Word Decoded sentence: apartheid is bring applied ever more 
Ground truth sentence: 	apartheid is being applied ever more 

Input sentence: 2/22/18
Word Decoded se

Input sentence: 2-7-12
Word Decoded sentence: 2-7-12 
Ground truth sentence: 	2-7-12 

Input sentence: 33,3
Word Decoded sentence: 33,3 
Ground truth sentence: 	33,3 

Input sentence: 3 . Hold loop in place between thumb
Word Decoded sentence: 3 . Hold loop in place between thumb 
Ground truth sentence: 	3 . Hold loop in place between thumb 

Input sentence: to say thet its 400 troops in the Congo
Word Decoded sentence: to say the its 400 troops in the Congo 
Ground truth sentence: 	to say that its 400 troops in the Congo 

Input sentence: 12/20/17
Word Decoded sentence: 12/20/17 
Ground truth sentence: 	12/20/17 

Input sentence: 99213
Word Decoded sentence: 99213 
Ground truth sentence: 	99213 

Input sentence: in Northor Rhodesia , but the Colonial Secretary ,
Word Decoded sentence: in Norther Rhodesia , but the Colonial Secretary , 
Ground truth sentence: 	in Northern Rhodesia , but the Colonial Secretary , 

Input sentence: 044411006X8
Word Decoded sentence: 044411006X8 
Ground tr

Input sentence: timbee as shown in tig. 1 . Although the timber will have aleeady
Word Decoded sentence: timber as shown in Tiga 1 . Although the timber will have already 
Ground truth sentence: 	timber as shown in Fig. 1 . Although the timber will have already 

Input sentence: 80 Sese Hill JR pr No
Word Decoded sentence: 80 Sese Hill JR pr No 
Ground truth sentence: 	80 Jess Hill Jr Dr NG 

Input sentence: M47.812
Word Decoded sentence: M47.812 
Ground truth sentence: 	M47.812 

Input sentence: of 700 , told Kennedy that he should
Word Decoded sentence: of 700 , told Kennedy that he should 
Ground truth sentence: 	of 100 , told Kennedy that he should 

Input sentence: 045141073X2
Word Decoded sentence: 045141073X2 
Ground truth sentence: 	045141073X2 

Input sentence: 1 . Grasp thread near end between thumb
Word Decoded sentence: 1 . Grasp thread near end between thumb 
Ground truth sentence: 	1 . Grasp thread near end between thumb 

Input sentence: 17642136407467 23945X9443544491X4

Input sentence: (706) 353 1630
Word Decoded sentence: (706) 353 1630 
Ground truth sentence: 	(706) 353 1630 

Input sentence: PA
Word Decoded sentence: PA 
Ground truth sentence: 	PA 

Input sentence: PN295605
Word Decoded sentence: PN295605 
Ground truth sentence: 	DN295605 

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: MO
Word Decoded sentence: MO 
Ground truth sentence: 	MO 

Input sentence: 9925
Word Decoded sentence: 9925 
Ground truth sentence: 	99215 

Input sentence: PO. B0X 50617 WRTERFAoNT CT 50617
Word Decoded sentence: POT B0X 50617 WRTERFAoNT CT 50617 
Ground truth sentence: 	P.O. BOX 50617 WATERFRONT CT 50617 

Input sentence: 49201
Word Decoded sentence: 49201 
Ground truth sentence: 	492041 

Input sentence: 3-19-18
Word Decoded sentence: 3-19-18 
Ground truth sentence: 	3-19-18 

Input sentence: A
Word Decoded sentence: A 
Ground truth sentence: 	4 

Input sentence: 

Input sentence: 23 BRUNES COURT,17 PALn AVENUE, KEMPTON PARK 1619
Word Decoded sentence: 23 BRUTES COURT,17 Paln AVENUE KEMPTON PARK 1619 
Ground truth sentence: 	23 BRUWES COURT , 17 PALM AVENUE, KEMPTON PARK 1619 

Input sentence: 0818462310
Word Decoded sentence: 0818462310 
Ground truth sentence: 	0818462310 

Input sentence: 44041121X3 4 40583758X4
Word Decoded sentence: 44041121X3 4 40583758X4 
Ground truth sentence: 	44044121X3 & 40583758X4 

Input sentence: Ar Roys Federal Government in the
Word Decoded sentence: Ar Rays Federal Government in the 
Ground truth sentence: 	Sir Roy's Federal Government in the 

Input sentence: 3-10-2018
Word Decoded sentence: 3-10-2018 
Ground truth sentence: 	3-10-2018 

Input sentence: Other Family
Word Decoded sentence: Other Family 
Ground truth sentence: 	Other Family 

Input sentence: 02/09/2018
Word Decoded sentence: 02/09/2018 
Ground truth sentence: 	02/09/2018 

Input sentence: WI
Word Decoded sentence: WI 
Ground truth sentence: 	WI 

I

Input sentence: PA
Word Decoded sentence: PA 
Ground truth sentence: 	PA 

Input sentence: has to puss Mr. Weaver's nomination blgire it
Word Decoded sentence: has to puss Mr weavers nomination Blaire it 
Ground truth sentence: 	has to pass Mr. Weaver's nomination before it 

Input sentence: 8 Rilemoods, Church Road, Walmer, Pott Elizabch , 6070
Word Decoded sentence: 8 Rilemoods, Church Road Warmer Pott Elizabeth , 6070 
Ground truth sentence: 	8 Riverwoods, Church Road, Walmer, Port Elizabeth , 6070 

Input sentence: Post Operative visit
Word Decoded sentence: Post Operative visit 
Ground truth sentence: 	Post Operative visit 

Input sentence: 02/09/1958
Word Decoded sentence: 02/09/1958 
Ground truth sentence: 	02/09/1958 

Input sentence: 12/29/17
Word Decoded sentence: 12/29/17 
Ground truth sentence: 	12/29/17 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input sentence: Son
Word Decoded sentence: Son 
Ground truth sentence: 	Seun 

Input sentence: 314

Input sentence: At 9.40 Mr. Edusei , Hinister of Frenpert and pie-
Word Decoded sentence: At 9.40 Mr Edusei , Minister of Frenpert and pie 
Ground truth sentence: 	At 9.40 Mr. Edusei , Minister of Transport and pro- 

Input sentence: a direct anwer . George Brown
Word Decoded sentence: a direct answer . George Brown 
Ground truth sentence: 	a direct answer . George Brown 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: Feits Lne nif be tt as Labour M8
Word Decoded sentence: Fits Lne if be tt as Labour M8 
Ground truth sentence: 	Foot's line will be that as Labour MPs 

Input sentence: resume today . PRESIDENI KENNEDY today
Word Decoded sentence: resume today . PRESIDENT KENNEDY today 
Ground truth sentence: 	resume today . PRESIDENT KENNEDY today 

Input sentence: 402-354-0707
Word Decoded sentence: 402-354-0707 
Ground truth sentence: 	402-354-0707 

Input sentence: EAT TN CALDREN'S HSPITAL
Word Decoded sentence: EAT TN CALDRONS HOSPITAL 
Groun

Input sentence: 25-1356849
Word Decoded sentence: 25-1356849 
Ground truth sentence: 	25-1356849 

Input sentence: with bills of cost .
Word Decoded sentence: with bills of cost . 
Ground truth sentence: 	with bills of cost . 

Input sentence: Joplin
Word Decoded sentence: Joplin 
Ground truth sentence: 	Joplin 

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: 15002
Word Decoded sentence: 15002 
Ground truth sentence: 	15002 

Input sentence: story , and the marriage of the central charae
Word Decoded sentence: story , and the marriage of the central charge 
Ground truth sentence: 	story , and the marriage of the central charac- 

Input sentence: Dr. Anthony MeBrdde
Word Decoded sentence: Dry Anthony MeBrdde 
Ground truth sentence: 	Dr. Anthony McBride 

Input sentence: polyarthralgias
Word Decoded sentence: polyarthralgias 
Ground truth sentence: 	polyarthralgias 

Input sentence: weed 

Input sentence: munority 2 THERE is no evidence that the
Word Decoded sentence: minority 2 THERE is no evidence that the 
Ground truth sentence: 	minority ? THERE is no evidence that the 

Input sentence: 2-12-18
Word Decoded sentence: 2-12-18 
Ground truth sentence: 	2-12-18 

Input sentence: Pretoria
Word Decoded sentence: Pretoria 
Ground truth sentence: 	Pretoria 

Input sentence: 1:05 pm
Word Decoded sentence: 1:05 pm 
Ground truth sentence: 	1:05 pm 

Input sentence: the first of these is due in 1963 Meanwhile , each
Word Decoded sentence: the first of these is due in 1963 Meanwhile , each 
Ground truth sentence: 	the first of these is due in 1963 . Meanwhile , each 

Input sentence: egy-case on June 6 . Both the Chinose
Word Decoded sentence: egy-case on June 6 . Both the Chinese 
Ground truth sentence: 	egg-case on June 6 . Both the Chinese 

Input sentence: othir materals ; it is apueant material
Word Decoded sentence: other materials ; it is apueant material 
Ground truth sen

Input sentence: and wet parements , the school play-groundts ,
Word Decoded sentence: and wet pavements , the school play-groundts , 
Ground truth sentence: 	and wet pavements , the school play-grounds , 

Input sentence: t be demonthianons " Jesterday tr Roys
Word Decoded sentence: t be demonthianons " Yesterday tr Rays 
Ground truth sentence: 	to be demonstrations . " Yesterday Sir Roy's 

Input sentence: 317485173
Word Decoded sentence: 317485173 
Ground truth sentence: 	31-1483173 

Input sentence: 02-03-18
Word Decoded sentence: 02-03-18 
Ground truth sentence: 	02-03-18 

Input sentence: 57701
Word Decoded sentence: 57701 
Ground truth sentence: 	57701 

Input sentence: 2-9-2018
Word Decoded sentence: 2-9-2018 
Ground truth sentence: 	2-9-2018 

Input sentence: in wohil the indee fell by 1.8 to 37.5 per ant of th average for
Word Decoded sentence: in while the indeed fell by 1.8 to 37.5 per ant of th average for 
Ground truth sentence: 	in which the index fell by 1.8 to 97.5 per 

Input sentence: his days as Brtain's chief UN dele-
Word Decoded sentence: his days as brains chief UN dele 
Ground truth sentence: 	his days as Britain's chief UN dele- 

Input sentence: ANITA JOHANNA HARRIS
Word Decoded sentence: ANITA JOHANNA HARRIS 
Ground truth sentence: 	ANITA JOHANNA HARRIS 

Input sentence: A0L5
Word Decoded sentence: A0L5 
Ground truth sentence: 	ADLS 

Input sentence: 6thiparse which always resulted , and in the
Word Decoded sentence: 6thiparse which always resulted , and in the 
Ground truth sentence: 	6impasse which always resulted , and in the 

Input sentence: 9563724X5 41350303X8
Word Decoded sentence: 9563724X5 41350303X8 
Ground truth sentence: 	9563724X5 41350303X8 

Input sentence: Arthur Bruce Norman
Word Decoded sentence: Arthur Bruce Norman 
Ground truth sentence: 	Arthur Bruce Norman 

Input sentence: 3/14/18
Word Decoded sentence: 3/14/18 
Ground truth sentence: 	3/14/18 

Input sentence: Torn Rotorr Cup
Word Decoded sentence: Torn Rotors Cup 
G

Input sentence: M.D.
Word Decoded sentence: Made 
Ground truth sentence: 	M.D. 

Input sentence: 9811235790087
Word Decoded sentence: 9811235790087 
Ground truth sentence: 	9811235790087 

Input sentence: mammoth hitchens of the 18th century-
Word Decoded sentence: mammoth kitchens of the 18th century 
Ground truth sentence: 	mammoth kitchens of the 18th century - 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: 26/07/2018
Word Decoded sentence: 26/07/2018 
Ground truth sentence: 	26/07/2018 

Input sentence: 36 PAUUASTRAAT, NHITE EIT SAL0AINA, 7395
Word Decoded sentence: 36 PAUUASTRAAT, WHITE IT SAL0AINA, 7395 
Ground truth sentence: 	36 DAHLIA STRAAT , WHITE CITY, SALDANHA, 7395 

Input sentence: America's dollar reserves . Dr. Adenauer's
Word Decoded sentence: americans dollar reserves . Dry Adenauer's 
Ground truth sentence: 	America's dollar reserves . Dr. Adenauer's 

Input sentence: the increased charges is a wicked ,
Word Decoded sentenc

Input sentence: Paobable Rotston Cuff trprE
Word Decoded sentence: Paobable Rosston Cuff there 
Ground truth sentence: 	Paobable Rotator Cuff teAR 

Input sentence: Capacity 77
Word Decoded sentence: Capacity 77 
Ground truth sentence: 	Capacity 77 

Input sentence: RSA
Word Decoded sentence: RSA 
Ground truth sentence: 	RSA 

Input sentence: Posinet Sife 38 Elspark 1462
Word Decoded sentence: Posnet Sife 38 Elspark 1462 
Ground truth sentence: 	Postnet Suite 38 Elspark 1462 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: He bellived he wuld perfom outsandeng
Word Decoded sentence: He bellied he would perform outstanding 
Ground truth sentence: 	He believed he would perform " outstanding 

Input sentence: (5)
Word Decoded sentence: (5) 
Ground truth sentence: 	(5) 

Input sentence: Deep South . The negro is Mr. Robot i
Word Decoded sentence: Deep South . The negro is Mr Robot i 
Ground truth sentence: 	Deep South . The negro is Mr. Robert 

Input sentence: lumbar sprain/strain
Word Decoded sentence: lumbar sprain/strain 
Ground truth sentence: 	lumbar sprain/strain 

Input sentence: in the Hy hoch , only 30 miles from Glengow , a
Word Decoded sentence: in the Hy hoch , only 30 miles from Glasgow , a 
Ground truth sentence: 	in the Holy Loch , only 30 miles from Glasgow , a 

Input sentence: MO
Word Decoded sentence: MO 
Ground truth sentence: 	MO 

Input sentence: at te is to be backed by Mr. Will
Word Decoded sentence: at te is to be backed by Mr Will 
Ground truth sentence: 	and he is to be backed by Mr. Will 

Input sentence: 2.22.18
Word Decoded sentence: 2.22.18 
Ground truth sentence: 	2.22.18 

Input sentence: negotiatias with Sir Roy's representative ,
Word Decoded sentence: negotiations with Sir royal representative , 
Ground truth sentence: 	negotiations with Sir Roy's representative , 

Input sentence: Commonwealtn enay facility passinleda
Word Decoded sentence: Commonwealth envy facility passinleda 
Ground tru

Input sentence: GA
Word Decoded sentence: GA 
Ground truth sentence: 	GA 

Input sentence: Christopher Radkowski, MD.
Word Decoded sentence: Christopher Radkowski, MD 
Ground truth sentence: 	Christopher Radkowski, MD. 

Input sentence: thet he has abandoned pens to visit Desient de
Word Decoded sentence: the he has abandoned pens to visit Descent de 
Ground truth sentence: 	And he has abandoned plans to visit President de 

Input sentence: ruthlessly .
Word Decoded sentence: ruthlessly . 
Ground truth sentence: 	ruthlessly . 

Input sentence: Orthopedic
Word Decoded sentence: Orthopedic 
Ground truth sentence: 	Orthopedic 

Input sentence: NA
Word Decoded sentence: NA 
Ground truth sentence: 	NA 

Input sentence: 544-524-8000
Word Decoded sentence: 544-524-8000 
Ground truth sentence: 	541-524-8000 

Input sentence: P.080X 15203, Beacon Bey, S205
Word Decoded sentence: P.080X 15203, Beacon Bey S205 
Ground truth sentence: 	P.O. Box 15203, Beacon Bay , 5205 

Input sentence: of Investi

Input sentence: 4
Word Decoded sentence: 4 
Ground truth sentence: 	4 

Input sentence: 2/16/18
Word Decoded sentence: 2/16/18 
Ground truth sentence: 	2/16/18 

Input sentence: 01-08-2018
Word Decoded sentence: 01-08-2018 
Ground truth sentence: 	01-08-2018 

Input sentence: Nkrumeh's Convention Porty " ofter powerpul eddreses
Word Decoded sentence: Nkrumeh's Convention Porty " ofter powerful address 
Ground truth sentence: 	Nkrumah's Convention Party " after powerful addresses 

Input sentence: 02/23/18
Word Decoded sentence: 02/23/18 
Ground truth sentence: 	02/23/18 

Input sentence: in the 1960s , no cure has been found for
Word Decoded sentence: in the 1960s , no cure has been found for 
Ground truth sentence: 	in the 1960s , no cure has been found for 

Input sentence: NC
Word Decoded sentence: NC 
Ground truth sentence: 	NC 

Input sentence: Apfrox, 3:30 pm
Word Decoded sentence: Approx 3:30 pm 
Ground truth sentence: 	Approx, 3:30 pm 

Input sentence: Lifting Hravg-Mercbandise

Input sentence: " Untoimliche Geskuchten " ( 1800 ) , five ghost staries
Word Decoded sentence: " Untoimliche Geskuchten " ( 1800 ) , five ghost stories 
Ground truth sentence: 	" Unheimliche Geschichten " ( 1920 ) , five ghost stories 

Input sentence: And lrey are (1 ) Malze ; ( 8 be- Herbs ) ( 3 ) Salt-waler and
Word Decoded sentence: And grey are (1 ) Male ; ( 8 be Herbs ) ( 3 ) Salt-waler and 
Ground truth sentence: 	And they are ( 1 ) Matzo ; ( 2 ) Bitter Herbs ; ( 3 ) Salt-water and 

Input sentence: 220 May12 West
Word Decoded sentence: 220 May12 West 
Ground truth sentence: 	220 May 12 West 

Input sentence: Chiropractic
Word Decoded sentence: Chiropractic 
Ground truth sentence: 	Chiropractic 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: N.A.
Word Decoded sentence: Near 
Ground truth sentence: 	N.A. 

Input sentence: SugamaC. J. Jansen Vom Vunon
Word Decoded sentence: SugamaC. J Jansen Vom Upon 
Ground truth sentence: 	Susama -

Input sentence: FAuny MED CN3
Word Decoded sentence: Fanny MED CN3 
Ground truth sentence: 	FAMILY MEDICINE 

Input sentence: Dogter
Word Decoded sentence: Dogter 
Ground truth sentence: 	Dogter 

Input sentence: TerGbetnment decided to adjust
Word Decoded sentence: TerGbetnment decided to adjust 
Ground truth sentence: 	The Government decided to adjust 

Input sentence: that the Government should ever h hin led away
Word Decoded sentence: that the Government should ever h hin led away 
Ground truth sentence: 	that the Government should ever have been led away 

Input sentence: 165 lbs
Word Decoded sentence: 165 lbs 
Ground truth sentence: 	165 lbs 

Input sentence: aiffoset from that of Labr . They would
Word Decoded sentence: aiffoset from that of Labor . They would 
Ground truth sentence: 	different from that of Labour . They would 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input s

Input sentence: comnunists' former base 60 miler
Word Decoded sentence: communists former base 60 miler 
Ground truth sentence: 	communists' former base 60 miles 

Input sentence: ATL
Word Decoded sentence: ATL 
Ground truth sentence: 	ATL 

Input sentence: M.D.
Word Decoded sentence: Made 
Ground truth sentence: 	M.D. 

Input sentence: CC
Word Decoded sentence: CC 
Ground truth sentence: 	CC 

Input sentence: action ageinst the dan maling industry , 3e
Word Decoded sentence: action against the dan making industry , 3e 
Ground truth sentence: 	action against the drug making industry , # 

Input sentence: DELmarie Var Nisherl
Word Decoded sentence: DELmarie Var Fishery 
Ground truth sentence: 	Delmarie Van Niekerk 

Input sentence: 083 324 3959
Word Decoded sentence: 083 324 3959 
Ground truth sentence: 	083 324 3959 

Input sentence: 7611295003088
Word Decoded sentence: 7611295003088 
Ground truth sentence: 	7611295003088 

Input sentence: British Government gives in to Sir Roy and
Wor

Input sentence: 259/6 Heath Road, Small Farms, 1984
Word Decoded sentence: 259/6 Heath Road Small Farms 1984 
Ground truth sentence: 	259/4 Heath Road, Small Farms, 1984 

Input sentence: 26-12-1974
Word Decoded sentence: 26-12-1974 
Ground truth sentence: 	26-12-1974 

Input sentence: Capitec Bank
Word Decoded sentence: Capitec Bank 
Ground truth sentence: 	Capitec Bank 

Input sentence: The stans had beem but byy the rem strie wele
Word Decoded sentence: The Stans had been but by the rem strike were 
Ground truth sentence: 	The stores had been hit by the same strike wave 

Input sentence: (2d)
Word Decoded sentence: (2d) 
Ground truth sentence: 	(2nd) 

Input sentence: 10/1/18
Word Decoded sentence: 10/1/18 
Ground truth sentence: 	10/1/18 

Input sentence: WILAM RHODET
Word Decoded sentence: ILAM RHODE 
Ground truth sentence: 	WILLIAM RHUDGS 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: Claire Anne Farrell
Word Decoded sentence: Claire Ann

Input sentence: SYNCOPE
Word Decoded sentence: SYNCOPE 
Ground truth sentence: 	SYNCOPE 

Input sentence: Orthovedics
Word Decoded sentence: Orthopedics 
Ground truth sentence: 	OrthoDedics 

Input sentence: Fiance
Word Decoded sentence: Fiance 
Ground truth sentence: 	Fiance 

Input sentence: MD
Word Decoded sentence: MD 
Ground truth sentence: 	MD 

Input sentence: herholat(a)yebo.co.za
Word Decoded sentence: herholat(a)yebo.co.za 
Ground truth sentence: 	herholdt(a)yebo.co.za 

Input sentence: (S92.412A)
Word Decoded sentence: (S92.412A) 
Ground truth sentence: 	(S92.412A) 

Input sentence: 20/07/2018
Word Decoded sentence: 20/07/2018 
Ground truth sentence: 	20/07/2018 

Input sentence: Sooste enirsteel.coz9
Word Decoded sentence: Jooste enirsteel.coz9 
Ground truth sentence: 	sjoosteenjrsteel.co.za 

Input sentence: 2/13/18
Word Decoded sentence: 2/13/18 
Ground truth sentence: 	2/13/18 

Input sentence: 8100-00I
Word Decoded sentence: 8100-00I 
Ground truth sentence: 	8100-00+ 



Input sentence: gake . Second in command is Mr.
Word Decoded sentence: take . Second in command is Mr 
Ground truth sentence: 	gate . Second in command is Mr. 

Input sentence: (021 8537369
Word Decoded sentence: (021 8537369 
Ground truth sentence: 	(021) 8537369 

Input sentence: RSA
Word Decoded sentence: RSA 
Ground truth sentence: 	RSA 

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: 17693698X7
Word Decoded sentence: 17693698X7 
Ground truth sentence: 	17693698X7 

Input sentence: 39-141708
Word Decoded sentence: 39-141708 
Ground truth sentence: 	39-141708 

Input sentence: Patellar. endenitis
Word Decoded sentence: Patellar adenitis 
Ground truth sentence: 	Patellar tendonitis 

Input sentence: 072 118 12831 0846797619
Word Decoded sentence: 072 118 12831 0846797619 
Ground truth sentence: 	072 118 1283 / 0846797619 

Input sentence: S51.851A
Word Decoded sentence: S51.851A 
Ground truth sentence: 	S51.851A 

Input sentence: flatly r

Input sentence: crochet Rooks range in size from numker 3/0 ,
Word Decoded sentence: crochet Rooks range in size from number 3/0 , 
Ground truth sentence: 	crochet hooks range in size from number 3/0 , 

Input sentence: (806) 793-0043
Word Decoded sentence: (806) 793-0043 
Ground truth sentence: 	(806) 793-0043 

Input sentence: 01/02/18
Word Decoded sentence: 01/02/18 
Ground truth sentence: 	01/02/18 

Input sentence: frem the real probems to ficldling about
Word Decoded sentence: from the real problems to filling about 
Ground truth sentence: 	from the real problems to fiddling about 

Input sentence: 012-3330523
Word Decoded sentence: 012-3330523 
Ground truth sentence: 	012-3330523 

Input sentence: N/A.
Word Decoded sentence: Near 
Ground truth sentence: 	N/A. 

Input sentence: 00/03/1966
Word Decoded sentence: 00/03/1966 
Ground truth sentence: 	06/02/1944 

Input sentence: of 1913 .
Word Decoded sentence: of 1913 . 
Ground truth sentence: 	of 1913 . 

Input sentence: Regional M

Input sentence: Cecilia J. Owens (nee Thiersen)
Word Decoded sentence: Cecilia J Owens knee Thiersen) 
Ground truth sentence: 	Cecilia J. Owens (nee Thiersen) 

Input sentence: 04-13-18
Word Decoded sentence: 04-13-18 
Ground truth sentence: 	04-13-18 

Input sentence: 2-21-18
Word Decoded sentence: 2-21-18 
Ground truth sentence: 	2-21-18 

Input sentence: 3083845201
Word Decoded sentence: 3083845201 
Ground truth sentence: 	3083845201 

Input sentence: 3/5/18
Word Decoded sentence: 3/5/18 
Ground truth sentence: 	3/5/18 

Input sentence: LEFT WRIST SPRAIN
Word Decoded sentence: LEFT WRIST SPRAIN 
Ground truth sentence: 	LEFT WRIST SPRAIN 

Input sentence: 5
Word Decoded sentence: 5 
Ground truth sentence: 	5 

Input sentence: S39.0120, S06.0X0D, S13.4XXD
Word Decoded sentence: S39.0120, S06.0X0D, S13.4XXD 
Ground truth sentence: 	S39.0120, S06.0X0D, S13.4XXD 

Input sentence: (1) Motzus is deflienl kreod , ( LECIE.H AATe in Hebreul
Word Decoded sentence: (1) Motzas is deflienl red , 

Input sentence: cramped and primitie evisterce .
Word Decoded sentence: cramped and primitive existence . 
Ground truth sentence: 	cramped and primitive existence . 

Input sentence: Jacto building this year is likely to be 40 per vent
Word Decoded sentence: Facto building this year is likely to be 40 per vent 
Ground truth sentence: 	factory building this year is likely to be 40 per cent 

Input sentence: 614 533 3289
Word Decoded sentence: 614 533 3289 
Ground truth sentence: 	614 533 3289 

Input sentence: 2-17-18
Word Decoded sentence: 2-17-18 
Ground truth sentence: 	2-17-18 

Input sentence: House of Lords should be abolished and
Word Decoded sentence: House of Lords should be abolished and 
Ground truth sentence: 	House of Lords should be abolished and 

Input sentence: 0833207869
Word Decoded sentence: 0833207869 
Ground truth sentence: 	0838209869 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: closeg frantureof proxmellof teft great t

Input sentence: plans do not give the Ofricons the overall
Word Decoded sentence: plans do not give the Ofricons the overall 
Ground truth sentence: 	plans do not give the Africans the overall 

Input sentence: 3 Naw , dispuded Wille , 3" Jurt
Word Decoded sentence: 3 Naw , disputed Will , 3" Just 
Ground truth sentence: 	" 3Naw , " disputed Willie . 3" Just 

Input sentence: 402 354-1975
Word Decoded sentence: 402 354-1975 
Ground truth sentence: 	402 354-1975 

Input sentence: Orthopeadic Surgeon
Word Decoded sentence: Orthopeadic Surgeon 
Ground truth sentence: 	Orthopeadic Surgeon 

Input sentence: (6)
Word Decoded sentence: (6) 
Ground truth sentence: 	(6) 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input sentence: 6801180089083
Word Decoded sentence: 6801180089083 
Ground truth sentence: 	6801180089083 

Input sentence: 816-468-5437
Word Decoded sentence: 816-468-5437 
Ground truth sentence: 	816-468-5437 

Input sentence: 101023 516
Word Decoded sen

Input sentence: status post lumbary lumbar saceal Fusian
Word Decoded sentence: status post lumbar lumbar sacral Fusion 
Ground truth sentence: 	status post lumbar + lumbar sacral fusion 

Input sentence: 8
Word Decoded sentence: 8 
Ground truth sentence: 	8 

Input sentence: 100 ench Mayer otr, Preboric Noord
Word Decoded sentence: 100 each Mayer otra Preboric Noord 
Ground truth sentence: 	160 Erich Mayer str , Pretoria Noord 

Input sentence: 6801180089083
Word Decoded sentence: 6801180089083 
Ground truth sentence: 	6801180089083 

Input sentence: 60631
Word Decoded sentence: 60631 
Ground truth sentence: 	60631 

Input sentence: mislaken , 1tho I cannct but feor that the
Word Decoded sentence: mistaken , 1tho I cannot but for that the 
Ground truth sentence: 	mistaken , 1tho' I cannot but fear that the 

Input sentence: 100
Word Decoded sentence: 100 
Ground truth sentence: 	100 

Input sentence: 10/01/18
Word Decoded sentence: 10/01/18 
Ground truth sentence: 	10/01/18 

Input se

Input sentence: Electronically Signed
Word Decoded sentence: Electronically Signed 
Ground truth sentence: 	Electronically Signed 

Input sentence: 1924 I was in a different pari in Surrey .
Word Decoded sentence: 1924 I was in a different pari in Surrey . 
Ground truth sentence: 	1924 I was in a different parish in Surrey . 

Input sentence: In yaungen two-adalt houscholds to 4.8 por cent below in families with
Word Decoded sentence: In younger two-adalt households to 4.8 por cent below in families with 
Ground truth sentence: 	in younger two-adult households to 4.8 per cent below in families with 

Input sentence: 07937 99687
Word Decoded sentence: 07937 99687 
Ground truth sentence: 	079 37 99687 

Input sentence: the blly adult , 2 7/2-in.-long insedt ,
Word Decoded sentence: the belly adult , 2 7/2-in.-long insert , 
Ground truth sentence: 	the fully adult , 2 1/2-in.-long insect , 

Input sentence: U5 Pamport ne 53678 4803
Word Decoded sentence: U5 Import ne 53678 4803 
Ground tr

Input sentence: warrenaa)sgmasyptame.co.ze
Word Decoded sentence: warrenaa)sgmasyptame.co.ze 
Ground truth sentence: 	warren(a)sigmasystems.co.za 

Input sentence: exfremely easy to work , the four sides leing
Word Decoded sentence: extremely easy to work , the four sides being 
Ground truth sentence: 	extremely easy to work , the four sides being 

Input sentence: Felief when this strife-torn land pefs
Word Decoded sentence: Relief when this strife-torn land pens 
Ground truth sentence: 	relief when this strife-torn land gets 

Input sentence: 03/05/18
Word Decoded sentence: 03/05/18 
Ground truth sentence: 	04/05/18 

Input sentence: ap om 100 . Britain's businers men are right to back
Word Decoded sentence: ap om 100 . britain business men are right to back 
Ground truth sentence: 	up on 1960 . Britain's business men are right to back 

Input sentence: KLEINSEUN
Word Decoded sentence: KLEINSEUN 
Ground truth sentence: 	KLEINSEUN 

Input sentence: 045141073X2
Word Decoded sentence: 0

Input sentence: Surgery puin meds. Physicur therapy
Word Decoded sentence: Surgery pain meds Physical therapy 
Ground truth sentence: 	Surgery, pain meds. Physician Therapy 

Input sentence: 044827038X9
Word Decoded sentence: 044827038X9 
Ground truth sentence: 	044827038X9 

Input sentence: Inguinal Hernia Repair
Word Decoded sentence: Inguinal Hernia Repair 
Ground truth sentence: 	Inguinal Hernia Repair 

Input sentence: always come first .
Word Decoded sentence: always come first . 
Ground truth sentence: 	always come first . 

Input sentence: David Markham
Word Decoded sentence: David Markham 
Ground truth sentence: 	David Markham 

Input sentence: a priest . here the priet is 1superceded byy the solouer - i
Word Decoded sentence: a priest . here the print is 1superceded by the soldier - i 
Ground truth sentence: 	a priest : here the priest is 1superceded by the soldier - a 

Input sentence: Christiaan Daniel Jacobs
Word Decoded sentence: Christiaan Daniel Jacobs 
Ground truth sen

Input sentence: thon aown the Joot-crrifaths resoition . M
Word Decoded sentence: thon down the Joot-crrifaths resolution . M 
Ground truth sentence: 	turn down the Foot-Griffiths resolution . Mr. 

Input sentence: 68137
Word Decoded sentence: 68137 
Ground truth sentence: 	68137 

Input sentence: egeat ( B) , Fig. 3 . A saw cnt is
Word Decoded sentence: egest ( By , Fig 3 . A saw cut is 
Ground truth sentence: 	as at ( B ) , Fig. 3 . A saw cut is 

Input sentence: S82.62xA
Word Decoded sentence: S82.62xA 
Ground truth sentence: 	S82.62xA 

Input sentence: White Oak
Word Decoded sentence: White Oak 
Ground truth sentence: 	White Oak 

Input sentence: 12/28/2017
Word Decoded sentence: 12/28/2017 
Ground truth sentence: 	12/28/2017 

Input sentence: RC170193
Word Decoded sentence: RC170193 
Ground truth sentence: 	RC 170193 

Input sentence: 07/09/1986
Word Decoded sentence: 07/09/1986 
Ground truth sentence: 	07/09/1986 

Input sentence: 18901
Word Decoded sentence: 18901 
Ground truth 

Input sentence: della.fourie(a)gmail.com
Word Decoded sentence: della.fourie(a)gmail.com 
Ground truth sentence: 	della.fourie(a)gmail.com 

Input sentence: eouves in th rarly 19th centiry whh
Word Decoded sentence: houses in th early 19th century who 
Ground truth sentence: 	classes in the early 19th century which 

Input sentence: 01/18/2018
Word Decoded sentence: 01/18/2018 
Ground truth sentence: 	01/18/2018 

Input sentence: 11/19/17
Word Decoded sentence: 11/19/17 
Ground truth sentence: 	11/19/17 

Input sentence: 3
Word Decoded sentence: 3 
Ground truth sentence: 	3 

Input sentence: 423) 622-6249
Word Decoded sentence: 423) 622-6249 
Ground truth sentence: 	423) 622-6249 

Input sentence: 3-2-18
Word Decoded sentence: 3-2-18 
Ground truth sentence: 	3-2-18 

Input sentence: 043260039X3
Word Decoded sentence: 043260039X3 
Ground truth sentence: 	043260039X3 

Input sentence: industry , whose profits had risen by up to 400
Word Decoded sentence: industry , whose profits had rise

Input sentence: Labour has to have an adequate nmmber of
Word Decoded sentence: Labour has to have an adequate number of 
Ground truth sentence: 	Labour has to have an adequate number of 

Input sentence: 32027689
Word Decoded sentence: 32027689 
Ground truth sentence: 	382027689 

Input sentence: No Meight pearing- Bt is LpN tws on feot
Word Decoded sentence: No Might hearing Bt is lpn TWS on felt 
Ground truth sentence: 	No Weight bearing - Pt is LPN + is on feet 

Input sentence: 1109225097086
Word Decoded sentence: 1109225097086 
Ground truth sentence: 	1109225097086 

Input sentence: CCl White oak
Word Decoded sentence: CCl White oak 
Ground truth sentence: 	CCl White Oak 

Input sentence: Tofee Leludta or homes Donna
Word Decoded sentence: Tozee Leludta or homes Donna 
Ground truth sentence: 	Lopez, Claudia or Thomas Donna 

Input sentence: 2-14-2018
Word Decoded sentence: 2-14-2018 
Ground truth sentence: 	3-14-2018 

Input sentence: 2/5/18
Word Decoded sentence: 2/5/18 
Ground 

Input sentence: rever were . " Intercusted by engry Tories,
Word Decoded sentence: rever were . " Interrupted by angry Toriest 
Ground truth sentence: 	never were . " Interrupted by angry Tories , 

Input sentence: SD
Word Decoded sentence: SD 
Ground truth sentence: 	SD 

Input sentence: All aillfs relations were 4makrodeb now , but tom
Word Decoded sentence: All ailles relations were 4makrodeb now , but tom 
Ground truth sentence: 	All Sally's relations were 4makrodeb now , but Tom 

Input sentence: 3-10-18
Word Decoded sentence: 3-10-18 
Ground truth sentence: 	3-10-18 

Input sentence: suture removal
Word Decoded sentence: suture removal 
Ground truth sentence: 	suture removal 

Input sentence: 247.89
Word Decoded sentence: 247.89 
Ground truth sentence: 	247.89 

Input sentence: 1/5/2018
Word Decoded sentence: 1/5/2018 
Ground truth sentence: 	1/5/2018 

Input sentence: cross pieces and is made of 1/2 in. plywont
Word Decoded sentence: cross pieces and is made of 1/2 in plywood 
G

Input sentence: 23 BRUNES COURT , 17 PALMAVENUE, KEMPRN PARK 16r9
Word Decoded sentence: 23 BRUTES COURT , 17 PALMAVENUE, HEMPEN PARK 16r9 
Ground truth sentence: 	23 BRUWES COURT , 17 PALM AVENUE, KEMPTON PARK 1619 

Input sentence: 3-12018
Word Decoded sentence: 3-12018 
Ground truth sentence: 	3-1 2018 

Input sentence: 3:00
Word Decoded sentence: 3:00 
Ground truth sentence: 	3:00 

Input sentence: Podiatry
Word Decoded sentence: Podiatry 
Ground truth sentence: 	Podiatry 

Input sentence: PO Box 870, Grootbrakrivie 6525
Word Decoded sentence: PO Box 870, Grootbrakrivie 6525 
Ground truth sentence: 	PO Box 870, Grootbrakrivie 6525 

Input sentence: 29-09-2013
Word Decoded sentence: 29-09-2013 
Ground truth sentence: 	29-09-2013 

Input sentence: 2-24-18
Word Decoded sentence: 2-24-18 
Ground truth sentence: 	2-24-18 

Input sentence: needy than Labour had to make it in 1950 . And
Word Decoded sentence: needy than Labour had to make it in 1950 . And 
Ground truth sentence: 	needy th

Input sentence: Christiaan Daniel Jacobs
Word Decoded sentence: Christiaan Daniel Jacobs 
Ground truth sentence: 	Cristiaan Daniel Jacobs 

Input sentence: 1066249 0887
Word Decoded sentence: 1066249 0887 
Ground truth sentence: 	066 249 0887 

Input sentence: 1972-07-19
Word Decoded sentence: 1972-07-19 
Ground truth sentence: 	1972-07-19 

Input sentence: 7/10/17
Word Decoded sentence: 7/10/17 
Ground truth sentence: 	7/10/17 

Input sentence: WNILLIAM RHODBS
Word Decoded sentence: WILLIAM RHODUS 
Ground truth sentence: 	WILLIAM RHUDGS 

Input sentence: diferent - fundamentally different from that of Labor . They
Word Decoded sentence: different - fundamentally different from that of Labor . They 
Ground truth sentence: 	different - fundamentally different from that of Labour . They 

Input sentence: 8O Bouquet street, Rosettenville 2190
Word Decoded sentence: 8O Bouquet street Rosettenville 2190 
Ground truth sentence: 	84 Bouquet Street, Rosettenville 2190 

Input sentence: 2135
Wo

Input sentence: N/A
Word Decoded sentence: Na 
Ground truth sentence: 	N/A 

Input sentence: inelading a compulsory savings sheme which the
Word Decoded sentence: including a compulsory savings shame which the 
Ground truth sentence: 	including a compulsory savings scheme which the 

Input sentence: on Fetruary 20.
Word Decoded sentence: on February 20. 
Ground truth sentence: 	on February 20 . 

Input sentence: 2
Word Decoded sentence: 2 
Ground truth sentence: 	2 

Input sentence: are dealing with a noble edifice which
Word Decoded sentence: are dealing with a noble edifice which 
Ground truth sentence: 	are dealing with a noble edifice which 

Input sentence: 40122
Word Decoded sentence: 40122 
Ground truth sentence: 	40422 

Input sentence: Service to the Jxchequer will have increased over
Word Decoded sentence: Service to the Exchequer will have increased over 
Ground truth sentence: 	service to the Exchequer will have increased over 

Input sentence: 1/31/18
Word Decoded sentence

Input sentence: Cons ytoielge
Word Decoded sentence: Cons ytoielge 
Ground truth sentence: 	Chris Ethridqe 

Input sentence: 192lb.
Word Decoded sentence: 192lb. 
Ground truth sentence: 	192lb. 

Input sentence: Labour's Colonidl spokesman , maid Sir Roy had
Word Decoded sentence: labours Colonial spokesman , maid Sir Roy had 
Ground truth sentence: 	Labour's Colonial spokesman , said Sir Roy had 

Input sentence: 011023-5164
Word Decoded sentence: 011023-5164 
Ground truth sentence: 	011 023-5164 

Input sentence: 924.3
Word Decoded sentence: 924.3 
Ground truth sentence: 	924.3 

Input sentence: I
Word Decoded sentence: I 
Ground truth sentence: 	1 

Input sentence: maltind an unflincting witmess to what
Word Decoded sentence: maltine an unflinching witness to what 
Ground truth sentence: 	maintained an unflinching witness to what 

Input sentence: 513-528-1809
Word Decoded sentence: 513-528-1809 
Ground truth sentence: 	513-528-1209 

Input sentence: Pomorama
Word Decoded sentence: 

Input sentence: to Gsub-lil on the tower .
Word Decoded sentence: to Gsub-lil on the tower . 
Ground truth sentence: 	to 4sub-lil on the lower . 

Input sentence: MSonde, Anthony
Word Decoded sentence: MSonde, Anthony 
Ground truth sentence: 	McBride,Anthony 

Input sentence: 614 533 3780
Word Decoded sentence: 614 533 3780 
Ground truth sentence: 	614 533 3780 

Input sentence: Marianne, M Fischer
Word Decoded sentence: Marianne M Fischer 
Ground truth sentence: 	Marianne, M Fischer 

Input sentence: Plastic Surgery
Word Decoded sentence: Plastic Surgery 
Ground truth sentence: 	Plastic Surgery 

Input sentence: Ke
Word Decoded sentence: Ke 
Ground truth sentence: 	* 

Input sentence: 100
Word Decoded sentence: 100 
Ground truth sentence: 	100 

Input sentence: Peeting tay to slide over a 30 in. table , but the height
Word Decoded sentence: Meeting tay to slide over a 30 in table , but the height 
Ground truth sentence: 	feeding tray to slide over a 30 in. table , but the height 

Inp

Input sentence: Bl6 The met, Grand Natienal Baule vard , Miterton Ridge
Word Decoded sentence: Bl6 The met Grand National Batule hard , Miterton Ridge 
Ground truth sentence: 	B16 The Met, Grand National Boulevard , Milnerton Ridge 

Input sentence: The gauge can nou be ised to nit in the tapers
Word Decoded sentence: The gauge can not be ised to nit in the tapers 
Ground truth sentence: 	The gauge can now be used to nick in the tapers 

Input sentence: 27-1217
Word Decoded sentence: 27-1217 
Ground truth sentence: 	27.12.47 

Input sentence: as above
Word Decoded sentence: as above 
Ground truth sentence: 	as above 

Input sentence: laceration repain
Word Decoded sentence: laceration remain 
Ground truth sentence: 	laceration repair 

Input sentence: OUrpt 2/16/18
Word Decoded sentence: Out 2/16/18 
Ground truth sentence: 	OUTpt 2/16/18 

Input sentence: EOAAaR N8 3 . He rulftled the stirels irritably and
Word Decoded sentence: EOAAaR N8 3 . He ruffled the stires irritably and 
Ground

Input sentence: Maigaietha Sisama Jchaia Potgieter.
Word Decoded sentence: Maigaietha Sama Achaia Potgieter. 
Ground truth sentence: 	Margaretha Susanna Johanna Potgieter 

Input sentence: Lowdermilk, Charity
Word Decoded sentence: Lowdermilk Charity 
Ground truth sentence: 	Lowdermilk, Charity 

Input sentence: 01/08/2018
Word Decoded sentence: 01/08/2018 
Ground truth sentence: 	01/08/2018 

Input sentence: 3-14-18
Word Decoded sentence: 3-14-18 
Ground truth sentence: 	3-14-18 

Input sentence: tent of the paper , and in
Word Decoded sentence: tent of the paper , and in 
Ground truth sentence: 	tent of the taper , and in 

Input sentence: script , and the great adventages to be
Word Decoded sentence: script , and the great advantages to be 
Ground truth sentence: 	script , and the great advantages to be 

Input sentence: Lane
Word Decoded sentence: Lane 
Ground truth sentence: 	Lane 

Input sentence: 044827038X9
Word Decoded sentence: 044827038X9 
Ground truth sentence: 	044827038X9

In [31]:
WER_spell_correction = calculate_WER(gt_texts, decoded_sentences)
print('WER_spell_correction |TEST= ', WER_spell_correction)

WER_spell_correction |TEST=  0.200670591446


In [32]:
WER_spell_word_correction = calculate_WER(gt_texts, corrected_sentences)
print('WER_spell_word_correction |TEST= ', WER_spell_word_correction)

WER_spell_word_correction |TEST=  0.0458757326471


In [33]:
WER_OCR = calculate_WER(gt_texts, input_texts)
print('WER_OCR |TEST= ', WER_OCR)

WER_OCR |TEST=  0.145528740562
