# Introduction

We tackle the problem of OCR post processing. In OCR, we map the image form of the document into the text domain. This is done first using an CNN+LSTM+CTC model, in our case based on tesseract. Since this output maps only image to text, we need something on top to validate and correct language semantics.

The idea is to build a language model, that takes the OCRed text and corrects it based on language knowledge. The langauge model could be:
- Char level: the aim is to capture the word morphology. In which case it's like a spelling correction system.
- Word level: the aim is to capture the sentence semnatics. But such systems suffer from the OOV problem.
- Fusion: to capture semantics and morphology language rules. The output has to be at char level, to avoid the OOV. However, the input can be char, word or both.

The fusion model target is to learn:

    p(char | char_context, word_context)

In this workbook we use seq2seq vanilla Keras implementation, adapted from the lstm_seq2seq example on Eng-Fra translation task. The adaptation involves:

- Adapt to spelling correction, on char level
- Pre-train on a noisy, medical sentences
- Fine tune a residual, to correct the mistakes of tesseract 
- Limit the input and output sequence lengths
- Enusre teacher forcing auto regressive model in the decoder
- Limit the padding per batch
- Learning rate schedule
- Bi-directional LSTM Encoder
- Bi-directional GRU Encoder


# Imports

In [1]:
from __future__ import print_function
import tensorflow as tf
import keras.backend as K
from keras.backend.tensorflow_backend import set_session
from keras.models import Model
from keras.layers import Input, LSTM, Dense, Bidirectional, Concatenate, GRU
from keras import optimizers
from keras.callbacks import ModelCheckpoint, TensorBoard, LearningRateScheduler
from keras.models import load_model
import numpy as np
import os
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from autocorrect import spell
%matplotlib inline

Using TensorFlow backend.


# Utility functions

In [2]:
# Limit gpu allocation. allow_growth, or gpu_fraction
def gpu_alloc():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    set_session(tf.Session(config=config))

In [3]:
gpu_alloc()

In [4]:
def calculate_WER_sent(gt, pred):
    '''
    calculate_WER('calculating wer between two sentences', 'calculate wer between two sentences')
    '''
    gt_words = gt.lower().split(' ')
    pred_words = pred.lower().split(' ')
    d = np.zeros(((len(gt_words) + 1), (len(pred_words) + 1)), dtype=np.uint8)
    # d = d.reshape((len(gt_words)+1, len(pred_words)+1))

    # Initializing error matrix
    for i in range(len(gt_words) + 1):
        for j in range(len(pred_words) + 1):
            if i == 0:
                d[0][j] = j
            elif j == 0:
                d[i][0] = i

    # computation
    for i in range(1, len(gt_words) + 1):
        for j in range(1, len(pred_words) + 1):
            if gt_words[i - 1] == pred_words[j - 1]:
                d[i][j] = d[i - 1][j - 1]
            else:
                substitution = d[i - 1][j - 1] + 1
                insertion = d[i][j - 1] + 1
                deletion = d[i - 1][j] + 1
                d[i][j] = min(substitution, insertion, deletion)
    return d[len(gt_words)][len(pred_words)]

In [5]:
def calculate_WER(gt, pred):
    '''

    :param gt: list of sentences of the ground truth
    :param pred: list of sentences of the predictions
    both lists must have the same length
    :return: accumulated WER
    '''
#    assert len(gt) == len(pred)
    WER = 0
    nb_w = 0
    for i in range(len(gt)):
        #print(gt[i])
        #print(pred[i])
        WER += calculate_WER_sent(gt[i], pred[i])
        nb_w += len(gt[i])

    return WER / nb_w

In [6]:
def load_data_with_gt(file_name, num_samples, max_sent_len, min_sent_len, delimiter='\t', gt_index=1, prediction_index=0):
    '''Load data from txt file, with each line has: <TXT><TAB><GT>. The  target to the decoder muxt have \t as the start trigger and \n as the stop trigger.'''
    cnt = 0  
    input_texts = []
    gt_texts = []
    target_texts = []
    for row in open(file_name, encoding='utf8'):
        if cnt < num_samples :
            #print(row)
            sents = row.split(delimiter)
            input_text = sents[prediction_index]
            
            target_text = '\t' + sents[gt_index] + '\n'
            if len(input_text) > min_sent_len and len(input_text) < max_sent_len and len(target_text) > min_sent_len and len(target_text) < max_sent_len:
                cnt += 1
                
                input_texts.append(input_text)
                target_texts.append(target_text)
                gt_texts.append(sents[gt_index])
    return input_texts, target_texts, gt_texts

In [7]:
def load_data(file_name, num_samples, max_sent_len, min_sent_len):
    '''Load data from txt file, with each line has: <TXT><TAB><GT>. The  target to the decoder muxt have \t as the start trigger and \n as the stop trigger.'''
    cnt = 0  
    input_texts = []   
    
    #for row in open(file_name, encoding='utf8'):
    for row in open(file_name):
        if cnt < num_samples :            
            input_text = row           
            if len(input_text) > min_sent_len and len(input_text) < max_sent_len:
                cnt += 1                
                input_texts.append(input_text)
    return input_texts

In [8]:
def vectorize_data(input_texts, max_encoder_seq_length, num_encoder_tokens, vocab_to_int):
    '''Prepares the input text and targets into the proper seq2seq numpy arrays'''
    encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length),
    dtype='float32')
    
    for i, input_text in enumerate(input_texts):
        for t, char in enumerate(input_text[:max_encoder_seq_length]):
            # c0..cn
            encoder_input_data[i, t] = vocab_to_int[char]
                
    return encoder_input_data

In [9]:
def decode_sequence(input_seq, encoder_model, decoder_model, num_decoder_tokens, int_to_vocab):
    # Encode the input as state vectors.
    encoder_outputs, h, c  = encoder_model.predict(input_seq)
    states_value = [h,c]
    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0] = vocab_to_int['\t']

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    #print(input_seq)
    attention_density = []
    i = 0
    special_chars = ['\\', '/', '-', '—' , ':', '[', ']', ',', '.', '"', ';', '%', '~', '(', ')', '{', '}', '$']
    #special_chars = []
    while not stop_condition:
        #print(target_seq)
        output_tokens, attention, h, c  = decoder_model.predict(
            [target_seq, encoder_outputs] + states_value)
        #print(attention.shape)
        attention_density.append(attention[0][0])# attention is max_sent_len x 1 since we have num_time_steps = 1 for the output
        # Sample a token
        #print(output_tokens.shape)
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        
        #print(sampled_token_index)
        sampled_char = int_to_vocab[sampled_token_index]
        
        orig_char = int_to_vocab[int(input_seq[:,i][0])]
        
        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True
            sampled_char = ''
        
        # Copy digits as it, since the spelling corrector is not good at digit corrections
        if(orig_char.isdigit() or orig_char in special_chars):
            decoded_sentence += orig_char            
        else:
            if(sampled_char.isdigit() or sampled_char in special_chars):
                decoded_sentence += ''
            else:
                decoded_sentence += sampled_char
        
        #decoded_sentence += sampled_char


        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1))
        target_seq[0, 0] = sampled_token_index

        # Update states
        states_value = [h, c]
        
        i += 1
        if(i > 48):
            i = 0
    attention_density = np.array(attention_density)
    
    # Word level spell correct
    '''
    corrected_decoded_sentence = ''
    for w in decoded_sentence.split(' '):
        corrected_decoded_sentence += spell(w) + ' '
    decoded_sentence = corrected_decoded_sentence
    '''
    return decoded_sentence, attention_density


In [10]:
def word_spell_correct(decoded_sentence):
    corrected_decoded_sentence = ''
    for w in decoded_sentence.split(' '):
        corrected_decoded_sentence += spell(w) + ' '
    return corrected_decoded_sentence

# Load data

# Load model params

In [11]:
data_path = '../../dat/'

In [12]:
max_sent_len = 50

In [13]:
vocab_file = 'vocab-{}.npz'.format(max_sent_len)
model_file = 'best_model-{}.hdf5'.format(max_sent_len)
encoder_model_file = 'encoder_model-{}.hdf5'.format(max_sent_len)
decoder_model_file = 'decoder_model-{}.hdf5'.format(max_sent_len)

In [14]:
vocab = np.load(file=vocab_file)
vocab_to_int = vocab['vocab_to_int'].item()
int_to_vocab = vocab['int_to_vocab'].item()
max_sent_len = vocab['max_sent_len']
min_sent_len = vocab['min_sent_len']



In [15]:
input_characters = sorted(list(vocab_to_int))
num_decoder_tokens = num_encoder_tokens = len(input_characters) #int(encoder_model.layers[0].input.shape[2])
max_encoder_seq_length = max_decoder_seq_length = max_sent_len - 1#max([len(txt) for txt in input_texts])


In [16]:
num_samples = 1000000
#tess_correction_data = os.path.join(data_path, 'test_data.txt')
#input_texts = load_data(tess_correction_data, num_samples, max_sent_len, min_sent_len)

OCR_data = os.path.join(data_path, 'new_trained_data.txt')
#input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len, min_sent_len, delimiter='|',gt_index=0, prediction_index=1)
input_texts, target_texts, gt_texts = load_data_with_gt(OCR_data, num_samples, max_sent_len, min_sent_len)

In [17]:
# Sample data
print(len(input_texts))
for i in range(10):
    print(input_texts[i], '\n', target_texts[i])

1451
Me dieal Provider Roles: Treating  
 	Medical Provider Roles: Treating


Provider First Name: Christine  
 	Provider First Name: Christine


Provider Last Name: Nolen, MD  
 	Provider Last Name: Nolen, MD


Address Line 1 : 7 25 American Avenue  
 	Address Line 1 : 725 American Avenue


City. W’aukesha  
 	City: Waukesha


StatefProvinee: ‘WI  
 	State/Province: WI


Postal Code: 5 31 88  
 	Postal Code: 53188


Country". US  
 	Country:  US


Business Telephone: (2 62) 92 8- 1000  
 	Business Telephone: (262) 928- 1000


Date ot‘Pirst Visit: 1 2/01f20 17  
 	Date of First Visit: 12/01/2017




In [18]:
# Spell correct before inference
'''
input_texts_ = []
for sent in input_texts:
    sent_ = ''
    for word in sent.split(' '):
        sent_ += spell(word) + ' '
    input_texts_.append(sent_)
input_texts = input_texts_
input_texts_ = []
# Sample data
print(len(input_texts))
for i in range(10):
    print(input_texts[i], '\n', target_texts[i])
'''

"\ninput_texts_ = []\nfor sent in input_texts:\n    sent_ = ''\n    for word in sent.split(' '):\n        sent_ += spell(word) + ' '\n    input_texts_.append(sent_)\ninput_texts = input_texts_\ninput_texts_ = []\n# Sample data\nprint(len(input_texts))\nfor i in range(10):\n    print(input_texts[i], '\n', target_texts[i])\n"

In [19]:
#model.load_weights(model_file)

model = load_model(model_file)

In [20]:
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 115)    13225       input_1[0][0]                    
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) [(None, None, 512),  761856      embedding_1[0][0]                
__________________________________________________________________________________________________
embedding_

In [21]:
encoder_model = load_model(encoder_model_file)
decoder_model = load_model(decoder_model_file)



In [22]:
encoder_model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 115)    13225       input_1[0][0]                    
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) [(None, None, 512),  761856      embedding_1[0][0]                
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           bidirectional_1[0][1]            
                                                                 bidirectional_1[0][3]            
__________

In [23]:
decoder_model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, None)         0                                            
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, None, 115)    13225       input_2[0][0]                    
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, 512)          0                                            
__________________________________________________________________________________________________
input_5 (InputLayer)            (None, 512)          0                                            
__________________________________________________________________________________________________
lstm_2 (LS

In [24]:

encoder_input_data = vectorize_data(input_texts=input_texts, max_encoder_seq_length=max_encoder_seq_length, num_encoder_tokens=num_encoder_tokens, vocab_to_int=vocab_to_int)

# Sample output from train data
decoded_sentences = []

for seq_index in range(len(input_texts)):
    # Take one sequence (part of the training set)
    # for trying out decoding.

    input_seq = encoder_input_data[seq_index: seq_index + 1]
    target_text = gt_texts[seq_index]
    decoded_sentence,_  = decode_sequence(input_seq, encoder_model, decoder_model, num_decoder_tokens, int_to_vocab)
    corrected_sentence = word_spell_correct(decoded_sentence)
    print('-Lenght = ', len(input_seq))
    print('Input sentence:', input_texts[seq_index])
    print('GT sentence:', target_text.strip())
    print('Char Decoded sentence:', decoded_sentence)   
    print('Char Decoded sentence:', corrected_sentence)   
    decoded_sentences.append(decoded_sentence)
    
results.close()

-Lenght =  1
Input sentence: Me dieal Provider Roles: Treating 
GT sentence: Medical Provider Roles: Treating
Char Decoded sentence: Medical Provider Roles:Treating 
Char Decoded sentence: Medical Provider Roles:Treating a 
-Lenght =  1
Input sentence: Provider First Name: Christine 
GT sentence: Provider First Name: Christine
Char Decoded sentence: Provider First Name: Christine 
Char Decoded sentence: Provider First Name Christine a 
-Lenght =  1
Input sentence: Provider Last Name: Nolen, MD 
GT sentence: Provider Last Name: Nolen, MD
Char Decoded sentence: Provider Last Name: Nolen, MD 
Char Decoded sentence: Provider Last Name Dolens MD a 
-Lenght =  1
Input sentence: Address Line 1 : 7 25 American Avenue 
GT sentence: Address Line 1 : 725 American Avenue
Char Decoded sentence: Address Line 1 : 725 Americal Avenuer
Char Decoded sentence: Address Line a a 725 America Avenue 
-Lenght =  1
Input sentence: City. W’aukesha 
GT sentence: City: Waukesha
Char Decoded sentence: City.Stakes 

-Lenght =  1
Input sentence: RETURN THIS PORTION WITH YOURIPAYMENIT 
GT sentence: RETURN THIS PORTION WITH YOUR PAYMENT
Char Decoded sentence: RETURN THIS PORTION WITH YOUR PAYMENT 
Char Decoded sentence: RETURN THIS PORTION WITH YOUR PAYMENT a 
-Lenght =  1
Input sentence: - VIASTERCARD
GT sentence: MASTERCARD
Char Decoded sentence: -ASTERCARD
Char Decoded sentence: MASTERCARD 
-Lenght =  1
Input sentence: DISCOVER
GT sentence: DISCOVER
Char Decoded sentence: DISCOVER
Char Decoded sentence: DISCOVER 
-Lenght =  1
Input sentence: RATIENTIIAME
GT sentence: PATIENT NAME
Char Decoded sentence: PATIENT'S IAME
Char Decoded sentence: PATIENTS SAME 
-Lenght =  1
Input sentence: DUEDATE 
GT sentence: DUE DATE
Char Decoded sentence: DUE DATE
Char Decoded sentence: DUE DATE 
-Lenght =  1
Input sentence: GUARANTORID
GT sentence: GUARANTOR ID
Char Decoded sentence: GUARANTOR ID
Char Decoded sentence: GUARANTOR ID 
-Lenght =  1
Input sentence: BALANCE DUE
GT sentence: BALANCE DUE
Char Decoded sente

-Lenght =  1
Input sentence: Dependent Information 
GT sentence: Dependent Information
Char Decoded sentence: Dependent Information
Char Decoded sentence: Dependent Information 
-Lenght =  1
Input sentence: First Name: 
GT sentence: First Name:
Char Decoded sentence: First Name: 
Char Decoded sentence: First Name a 
-Lenght =  1
Input sentence: Middle Nameﬂnitial: 
GT sentence: Middle Name/Initial:
Char Decoded sentence: Middle NameInitia: 
Char Decoded sentence: Middle NameInitia: a 
-Lenght =  1
Input sentence: Last Name: 
GT sentence: Last Name:
Char Decoded sentence: Last Name: 
Char Decoded sentence: Last Name a 
-Lenght =  1
Input sentence: Social Security Number: 
GT sentence: Social Security Number:
Char Decoded sentence: Social Security Number: 
Char Decoded sentence: Social Security Number a 
-Lenght =  1
Input sentence: Birth Date: 
GT sentence: Birth Date:
Char Decoded sentence: Birth Date: 
Char Decoded sentence: Birth Date a 
-Lenght =  1
Input sentence: Gender: 
GT sente

-Lenght =  1
Input sentence: 0 Lives with family 
GT sentence: • Lives with family
Char Decoded sentence: 0 Lives with family
Char Decoded sentence: a Lives with family 
-Lenght =  1
Input sentence: 0 Married 
GT sentence: • Married
Char Decoded sentence: 0 Married 
Char Decoded sentence: a Married a 
-Lenght =  1
Input sentence: 0 Tobacco quit date established (287.891) 
GT sentence: • Tobacco quit date established (287.891)
Char Decoded sentence: 0 Tobacco quit date established (287.891)
Char Decoded sentence: a Tobacco quit date established (287.891) 
-Lenght =  1
Input sentence: 0 : 10 years 
GT sentence: • : 10 years
Char Decoded sentence: 0 : 10 years
Char Decoded sentence: a a of years 
-Lenght =  1
Input sentence: Current Made 
GT sentence: Current Made
Char Decoded sentence: Current Meds
Char Decoded sentence: Current Meds 
-Lenght =  1
Input sentence: 3. Multi-Vitamin TABS; 
GT sentence: 3. Multi-Vitamin TABS;
Char Decoded sentence: 3. Multi-Vitamin TABS;
Char Decoded sentenc

-Lenght =  1
Input sentence: 1. Knee injury (889.90XA) 
GT sentence: 1. Knee injury (S89.90XA)
Char Decoded sentence: 1. Knee injury (889.90XA)
Char Decoded sentence: of Knee injury (889.90XA) 
-Lenght =  1
Input sentence: Past Medical History 
GT sentence: Past Medical History
Char Decoded sentence: Past Medical History
Char Decoded sentence: Past Medical History 
-Lenght =  1
Input sentence: 0 No signiﬁcant past medical history 
GT sentence: • No significant past medical history
Char Decoded sentence: 0 No significant past medical history
Char Decoded sentence: a No significant past medical history 
-Lenght =  1
Input sentence: Surgical History 
GT sentence: Surgical History
Char Decoded sentence: Surgical History
Char Decoded sentence: Surgical History 
-Lenght =  1
Input sentence: 0 History of Ankle Surgery 
GT sentence: • History of Ankle Surgery
Char Decoded sentence: 0 History of Ankle Surgery 
Char Decoded sentence: a History of Ankle Surgery a 
-Lenght =  1
Input sentence: Fam

-Lenght =  1
Input sentence: POSTOPERATIVE DIAGNOSES: 
GT sentence: POSTOPERATIVE DIAGNOSES:
Char Decoded sentence: POTOTERATIVE DIAGNOSES:
Char Decoded sentence: POTOTERATIVE DIAGNOSES 
-Lenght =  1
Input sentence: Right knee anterior cruciate ligament tear. 
GT sentence: 1. Right knee anterior cruciate ligament tear.
Char Decoded sentence: i ght knee anterior cruciate ligament tea.
Char Decoded sentence: i get knee anterior cruciate ligament tea 
-Lenght =  1
Input sentence: 2. Medial collateral ligament tear. 
GT sentence: 2. Medial collateral ligament tear.
Char Decoded sentence: 2. Medial collateral ligament tear.
Char Decoded sentence: of Medial collateral ligament tears 
-Lenght =  1
Input sentence: PROCEDURES PERFORMED: 
GT sentence: PROCEDURES PERFORMED:
Char Decoded sentence: PROCEDURES PERFORMED: 
Char Decoded sentence: PROCEDURES PERFORMED a 
-Lenght =  1
Input sentence: SURGEON: Jason Helm, MD. 
GT sentence: SURGEON: Jason Holm, M.D.
Char Decoded sentence: SURGEON: Jason H

-Lenght =  1
Input sentence: 1._No Known Aliergies 
GT sentence: 1. No Known Allergies
Char Decoded sentence: 1. No Known Alliteries 
Char Decoded sentence: of No Known Alliteries a 
-Lenght =  1
Input sentence: Physical Exam 
GT sentence: Physical Exam
Char Decoded sentence: Physical Exam
Char Decoded sentence: Physical Exam 
-Lenght =  1
Input sentence: Diagnosis 
GT sentence: Diagnosis
Char Decoded sentence: Diagnosis
Char Decoded sentence: Diagnosis 
-Lenght =  1
Input sentence: Plan 
GT sentence: Plan
Char Decoded sentence: Plan Cand
Char Decoded sentence: Plan Cand 
-Lenght =  1
Input sentence: DiscussionlSummary 
GT sentence: Discussion/Summary
Char Decoded sentence: DiscussionSummary 
Char Decoded sentence: DiscussionSummary a 
-Lenght =  1
Input sentence: Signatures 
GT sentence: Signatures
Char Decoded sentence: Signatures
Char Decoded sentence: Signatures 
-Lenght =  1
Input sentence: Electronicaliy signed by : Jamie Birkelo, PA; 
GT sentence: Electronically signed by : Jami

-Lenght =  1
Input sentence: Neurologic -. Sensation intact. 
GT sentence: Neurologic -. Sensation intact.
Char Decoded sentence: Neurologic -. Sensation intact.
Char Decoded sentence: Neurologic of Sensation intact 
-Lenght =  1
Input sentence: Right knee: 
GT sentence: Right knee:
Char Decoded sentence: Right knee: 
Char Decoded sentence: Right knee a 
-Lenght =  1
Input sentence: 2+ effusion 
GT sentence: 2+ effusion
Char Decoded sentence: 2+ effusion
Char Decoded sentence: of effusion 
-Lenght =  1
Input sentence: nonTTP along the medial joint line. 
GT sentence: nonTTP along the medial joint line.
Char Decoded sentence: nonTTP along the medial joint line.
Char Decoded sentence: nonTTP along the medial joint line 
-Lenght =  1
Input sentence: nonTTP along the lateral joint line 
GT sentence: nonTTP along the lateral joint line
Char Decoded sentence: nonTTP along the lateral joint line 
Char Decoded sentence: nonTTP along the lateral joint line a 
-Lenght =  1
Input sentence: no par

-Lenght =  1
Input sentence: Health insurance provider — bcbs 
GT sentence: Health insurance provider - bcbs
Char Decoded sentence: Health insurance provider — bcbs
Char Decoded sentence: Health insurance provider a BCBS 
-Lenght =  1
Input sentence: Fax paperwork - yes 
GT sentence: Fax paperwork - yes
Char Decoded sentence: Fax paperwork - yes No
Char Decoded sentence: Fax paperwork a yes No 
-Lenght =  1
Input sentence: Attention of - Tellie 
GT sentence: Attention of - Tellie
Char Decoded sentence: Attention of - Tellie 
Char Decoded sentence: Attention of a Tellee a 
-Lenght =  1
Input sentence: Fax number 
GT sentence: Fax number
Char Decoded sentence: Fax number 
Char Decoded sentence: Fax number a 
-Lenght =  1
Input sentence: Refax paperwork — yes 
GT sentence: Refax paperwork - yes
Char Decoded sentence: Refax paperwork — yes
Char Decoded sentence: Relax paperwork a yes 
-Lenght =  1
Input sentence: Notes # 
GT sentence: Notes #
Char Decoded sentence: Notes #
Char Decoded sen

-Lenght =  1
Input sentence: Absence end time  
GT sentence: Absence end time
Char Decoded sentence: Absence end time 
Char Decoded sentence: Absence end time a 
-Lenght =  1
Input sentence: Any overtime? - no 
GT sentence: Any overtime? - no
Char Decoded sentence: Any Coverame? - no
Char Decoded sentence: Any Coverage a no 
-Lenght =  1
Input sentence: Lunch break — 
GT sentence: Lunch break -
Char Decoded sentence: Lunch break —
Char Decoded sentence: Lunch break a 
-Lenght =  1
Input sentence: Total time absent — 
GT sentence: Total time absent -
Char Decoded sentence: Total time absent —
Char Decoded sentence: Total time absent a 
-Lenght =  1
Input sentence: Absence reason — episode 
GT sentence: Absence reason - episode
Char Decoded sentence: Absence reason — episode
Char Decoded sentence: Absence reason a episode 
-Lenght =  1
Input sentence: Leave start date 
GT sentence: Leave start date
Char Decoded sentence: Leave start date Zip
Char Decoded sentence: Leave start date Zip 
-

-Lenght =  1
Input sentence: Address Line 2: 
GT sentence: Address Line 2:
Char Decoded sentence: Address Line 2: 
Char Decoded sentence: Address Line of a 
-Lenght =  1
Input sentence: City: 
GT sentence: City:
Char Decoded sentence: City: 
Char Decoded sentence: City a 
-Lenght =  1
Input sentence: State: 
GT sentence: State:
Char Decoded sentence: State: 
Char Decoded sentence: State a 
-Lenght =  1
Input sentence: Country: 
GT sentence: Country:
Char Decoded sentence: Country: 
Char Decoded sentence: Country a 
-Lenght =  1
Input sentence: ZIP: 
GT sentence: ZIP:
Char Decoded sentence: ZIP: 
Char Decoded sentence: ZIP a 
-Lenght =  1
Input sentence: Effective From Date: 
GT sentence: Effective From Date:
Char Decoded sentence: Effective From Date: 
Char Decoded sentence: Effective From Date a 
-Lenght =  1
Input sentence: Effective To Date:  
GT sentence: Effective To Date:
Char Decoded sentence: Effective To Date: 
Char Decoded sentence: Effective To Date a 
-Lenght =  1
Input sen

-Lenght =  1
Input sentence: lfyee. as of what date”? (mrna'ddiyy)
GT sentence: If yes, as of what date”? (mm/dd/yy)
Char Decoded sentence: fffye. as of what date”? (mmddyy
Char Decoded sentence: fffye. as of what date (mmddyy 
-Lenght =  1
Input sentence: ExPeoled Dei‘ ery _ate (mmlddiyy)
GT sentence: Expected Delivery Date (mm/dd/yy)
Char Decoded sentence: ExPloled Dedite fry Cat(lly
Char Decoded sentence: explored Debite fry fatally 
-Lenght =  1
Input sentence: Actual Delivgry Date (
GT sentence: Actual Delivery Date (mm/dd/yy)
Char Decoded sentence: Actual Delivery Date (mmddyy
Char Decoded sentence: Actual Delivery Date (mmddyy 
-Lenght =  1
Input sentence: Physician In rmatia
GT sentence: Physician Information
Char Decoded sentence: Physician Information
Char Decoded sentence: Physician Information 
-Lenght =  1
Input sentence: C. Signatura oft-mending Physician
GT sentence: C. Signature of Attending Physician
Char Decoded sentence: C. Signature oft-rnding Physician
Char Decoded

-Lenght =  1
Input sentence: TIER 1 Family MOOF’ 
GT sentence: TIER 1 Family MOOP Max
Char Decoded sentence: TIER 1 Family MOOP LE
Char Decoded sentence: TIER a Family MOOP LE 
-Lenght =  1
Input sentence: TIER 1 Individual Deductible 
GT sentence: TIER 1 Individual Deductible
Char Decoded sentence: TIER 1 Individual Deductible
Char Decoded sentence: TIER a Individual Deductible 
-Lenght =  1
Input sentence: TIER 1 10011110001 MDOP Max 
GT sentence: TIER 1 Individual MOOP Max
Char Decoded sentence: TIER 1 10011110001 Max Max 
Char Decoded sentence: TIER a 10011110001 Max Max a 
-Lenght =  1
Input sentence: TIER 2 Family  
GT sentence: TIER 2 Family Deductible
Char Decoded sentence: TIER 2 Family @
Char Decoded sentence: TIER a Family a 
-Lenght =  1
Input sentence: TIER 2 Family MOO? Max 
GT sentence: TIER 2 Family MOOP Max
Char Decoded sentence: TIER 2 Family MOOP Max
Char Decoded sentence: TIER a Family MOOP Max 
-Lenght =  1
Input sentence: TIER 2 100111.109I  
GT sentence: TIER 2 I

-Lenght =  1
Input sentence: ongi'h'al Insurance P'ian' ‘SELF PAY“ [o] 
GT sentence: Original Insurance Plan *SELF PAY* [0]
Char Decoded sentence: Surgional Insurance Plance PAY*  PAY*[]
Char Decoded sentence: Surgical Insurance Place PAY a PAY*[] 
-Lenght =  1
Input sentence: Superwslng :Eroyider 
GT sentence: Supervising Provider
Char Decoded sentence: Superving P:ovider 
Char Decoded sentence: Supering Provider a 
-Lenght =  1
Input sentence: Ease? For  Copay
GT sentence: Reason For Payment Copay
Char Decoded sentence: Raso For Foopay
Char Decoded sentence: Rash For Floppy 
-Lenght =  1
Input sentence: Method o'f Payment
GT sentence: Method of Payment
Char Decoded sentence: Method of Payment
Char Decoded sentence: Method of Payment 
-Lenght =  1
Input sentence: Ar'nount
GT sentence: Amount
Char Decoded sentence: Arount
Char Decoded sentence: Around 
-Lenght =  1
Input sentence: Total Payment Amount 
GT sentence: Total Payment Amount
Char Decoded sentence: Total Payment Amount 
Char 

-Lenght =  1
Input sentence: Result Status: 
GT sentence: Result Status:
Char Decoded sentence: Result Status: 
Char Decoded sentence: Result Status a 
-Lenght =  1
Input sentence: Final result 
GT sentence: Final result
Char Decoded sentence: Final result
Char Decoded sentence: Final result 
-Lenght =  1
Input sentence: Piedmont Healllleare 
GT sentence: Piedmont Healthcare
Char Decoded sentence: Piedmont Health are
Char Decoded sentence: Piedmont Health are 
-Lenght =  1
Input sentence: PO Box I000 
GT sentence: PO Box 1000
Char Decoded sentence: OP Box 000
Char Decoded sentence: OP Box 000 
-Lenght =  1
Input sentence: Piscataway, NJ 03855-1000 
GT sentence: Piscataway, NJ 08855-1000
Char Decoded sentence: Piscataway, NJ 03855-100
Char Decoded sentence: Piscataway NJ 03855-100 
-Lenght =  1
Input sentence: Electronic Service Requested 1—377-601-3835 
GT sentence: Electronic Service Requested
Char Decoded sentence: Electronic Service Requested 1—377-601-383
Char Decoded sentence: Ele

-Lenght =  1
Input sentence: C Dually. US 
GT sentence: Country: US
Char Decoded sentence: Country.US 
Char Decoded sentence: Country.US a 
-Lenght =  1
Input sentence: Business Telephone: (952) 512- 5625 
GT sentence: Business Telephone: (952) 512- 5625
Char Decoded sentence: Business Telephone: (952) 512- 5625
Char Decoded sentence: Business Telephone (952) 512- 5625 
-Lenght =  1
Input sentence: Date ofl-‘irst Visit: 01/212018 
GT sentence: Date of First Visit: 01/21/2018
Char Decoded sentence: Date of -irth Visit:01/212018
Char Decoded sentence: Date of birth Visit:01/212018 
-Lenght =  1
Input sentence: Date ofNeXt Visit: 03/132018 
GT sentence: Date of Next Visit: 03/13/2018
Char Decoded sentence: Date of Next Visi:03/132018
Char Decoded sentence: Date of Next Visi:03/132018 
-Lenght =  1
Input sentence: Medical Pl'oxitler Information — Hospitalization 
GT sentence: Medical Provider Information - Hospitalization
Char Decoded sentence: Medical Provider Information  —ospitalization

-Lenght =  1
Input sentence: Date ﬁrst unable to work (mmlddlyy)
GT sentence: Date first unable to work (mm/dd/yy)
Char Decoded sentence: Date first unable to work(mmddy)
Char Decoded sentence: Date first unable to work(mmddy) 
-Lenght =  1
Input sentence: MRI Yes El No Date: (mmlddlyy) I 11?, 
GT sentence: MRI Yes No Date: (mm/dd/yy)
Char Decoded sentence: MRI Yes No Date D:t( mmdd)yy11
Char Decoded sentence: MRI Yes No Date Duty mmdd)yy11 
-Lenght =  1
Input sentence: Expected Delivery Date: (mmlddlyy) 
GT sentence: Expected Delivery Date: (mm/dd/yy)
Char Decoded sentence: Expected Delivery Date: (mmddyy) 
Char Decoded sentence: Expected Delivery Date (mmddyy) a 
-Lenght =  1
Input sentence: Actual Delivery Date: (mmlddlyy)
GT sentence: Actual Delivery Date: (mm/dd/yy)
Char Decoded sentence: Actual Delivery Date: (mmddyy)
Char Decoded sentence: Actual Delivery Date (mmddyy) 
-Lenght =  1
Input sentence: Date First Unable to Work (mmlddlyy) 
GT sentence: Date First Unable to Work (mm/

-Lenght =  1
Input sentence: ATTENDING PHYSICIAN STATEMENT (Contlnued) 
GT sentence: ATTENDING PHYSICIAN STATEMENT (Continued)
Char Decoded sentence: ATTENDING PHYSICIAN STATEMENT (Continued)
Char Decoded sentence: ATTENDING PHYSICIAN STATEMENT Continued 
-Lenght =  1
Input sentence: Facility Name 
GT sentence: Facility Name
Char Decoded sentence: Facility Name 
Char Decoded sentence: Facility Name a 
-Lenght =  1
Input sentence: Address 
GT sentence: Address
Char Decoded sentence: Address
Char Decoded sentence: Address 
-Lenght =  1
Input sentence: City State Zip 
GT sentence: City State Zip
Char Decoded sentence: City State Zip
Char Decoded sentence: City State Zip 
-Lenght =  1
Input sentence: Date Surge Performed (mmlddlyy): 
GT sentence: Date Surgery Performed (mm/dd/yy):
Char Decoded sentence: Date Surger Performed(mmddy): 
Char Decoded sentence: Date Surger Performed(mmddy): a 
-Lenght =  1
Input sentence: Diagnosis: .. lCD Code: 
GT sentence: Diagnosis: ICD Code:
Char Decoded s

-Lenght =  1
Input sentence: (1 ° ‘ 
GT sentence: unum
Char Decoded sentence: (1um
Char Decoded sentence: sum 
-Lenght =  1
Input sentence: November 30, 2016 
GT sentence: November 30, 2016
Char Decoded sentence: November 30,2016
Char Decoded sentence: November 30,2016 
-Lenght =  1
Input sentence: Confirmation of Coverage 
GT sentence: Confirmation of Coverage
Char Decoded sentence: Confirmation of Coverage
Char Decoded sentence: Confirmation of Coverage 
-Lenght =  1
Input sentence: Employer: 
GT sentence: Employer:
Char Decoded sentence: Employer:
Char Decoded sentence: Employers 
-Lenght =  1
Input sentence: Group Policy #: 
GT sentence: Group Policy #:
Char Decoded sentence: Group Policy #:
Char Decoded sentence: Group Policy of 
-Lenght =  1
Input sentence: Customer Policy #: 
GT sentence: Customer Policy #:
Char Decoded sentence: Customer Policy #:
Char Decoded sentence: Customer Policy of 
-Lenght =  1
Input sentence: EE Name: 
GT sentence: EE Name:
Char Decoded sentence: EE Na

-Lenght =  1
Input sentence: EALHJ' 11.93 Type , [‘UlAI Ry
GT sentence: Earnings Type: Hourly
Char Decoded sentence: Earning11.93e Med,c[ PAIN R
Char Decoded sentence: Earning11.93e medico PAIN R 
-Lenght =  1
Input sentence: Earn1ng€ Mnda: Monthly
GT sentence: Earnings Mode: Monthly
Char Decoded sentence: Earn1ng Midan:ationshly
Char Decoded sentence: earning Midan:ationshly 
-Lenght =  1
Input sentence: ﬁftsr Tax: 0.069
GT sentence: After Tax: 0.000
Char Decoded sentence: After Tax: 0.069
Char Decoded sentence: After Tax 0.069 
-Lenght =  1
Input sentence: Repart Group' ?6
GT sentence: Report Group: 26
Char Decoded sentence: Repart Group 6
Char Decoded sentence: Repart Group a 
-Lenght =  1
Input sentence: Product: Short Term Blsab111ty
GT sentence: Product: Short Term Disability
Char Decoded sentence: Product: Short Term Bisab111y
Char Decoded sentence: Products Short Term Bisab111y 
-Lenght =  1
Input sentence: Product Type: £53
GT sentence: Product Type: ASO
Char Decoded sentence:

-Lenght =  1
Input sentence: Acct #: 
GT sentence: Acct #:
Char Decoded sentence: Acct# :
Char Decoded sentence: Accts a 
-Lenght =  1
Input sentence: Adm: 3115:2013, arc: 3.1 some 
GT sentence: Adm: 3/16/18, D/C: 3/16/18
Char Decoded sentence: Adm: 3115:2013,c F:r3.1 some 
Char Decoded sentence: Admd 3115:2013,c F:r3.1 some a 
-Lenght =  1
Input sentence: Operative 8‘ Procedure Notes (continued) 
GT sentence: Operative & Procedure Notes (continued)
Char Decoded sentence: Operative 8 Procedure Notes (ontinued)
Char Decoded sentence: Operative a Procedure Notes continued 
-Lenght =  1
Input sentence: butt)? Larkin,John J, MD 
GT sentence: Author: Larkin, John J MD
Char Decoded sentence: Auth)r Larki, John,J MD 
Char Decoded sentence: author Larkin John MD a 
-Lenght =  1
Input sentence: Igor- Orthopedio 
GT sentence: Service: Orthopedic
Char Decoded sentence: Inso-  Orthopedic
Char Decoded sentence: Insol a Orthopedic 
-Lenght =  1
Input sentence: tr Ego: Physioan 
GT sentence: Author T

-Lenght =  1
Input sentence: Sei‘f SEH Yes Personali'i-‘amrly 
GT sentence: Acct Type Personal/Family
Char Decoded sentence: STail Sels Personaldicla -es
Char Decoded sentence: Stail Sels Personaldicla yes 
-Lenght =  1
Input sentence: Address 
GT sentence: Address
Char Decoded sentence: Address
Char Decoded sentence: Address 
-Lenght =  1
Input sentence: Phone 
GT sentence: Phone
Char Decoded sentence: Phone 
Char Decoded sentence: Phone a 
-Lenght =  1
Input sentence: Coverage Information (for Hospital Account: t 
GT sentence: Coverage Information (for Hospital Account )
Char Decoded sentence: Inspital Information (for Hospital Account:
Char Decoded sentence: Hospital Information for Hospital Account 
-Lenght =  1
Input sentence: PIC) Payori'ﬁlan 
GT sentence: F/O Payor/Plan
Char Decoded sentence: Pay)Paynorilann
Char Decoded sentence: Pay)Paynorilann 
-Lenght =  1
Input sentence: Precert # 
GT sentence: Precert #
Char Decoded sentence: Precert? #
Char Decoded sentence: Precepts a 
-

-Lenght =  1
Input sentence: I’xod‘ac‘t Typo: Leave Hgmt Svnz
GT sentence: Product Type: Leave Mgmt Svc
Char Decoded sentence: Indicat Type L:ave Tym Sivn
Char Decoded sentence: Indicate Type Leave Tm Sign 
-Lenght =  1
Input sentence: Funding: Not Applicable 
GT sentence: Funding: Not Applicable
Char Decoded sentence: Funding: Not Applicable 
Char Decoded sentence: Funding Not Applicable a 
-Lenght =  1
Input sentence: State Plan; No 
GT sentence: State Plan: No
Char Decoded sentence: State Plan; No 
Char Decoded sentence: State Plan No a 
-Lenght =  1
Input sentence: Employee Coverage; Yes 
GT sentence: Employee Coverage: Yes
Char Decoded sentence: Employee Coverage; Yes 
Char Decoded sentence: Employee Coverage Yes a 
-Lenght =  1
Input sentence: Employﬁr Coverage: YER 
GT sentence: Employer Coverage: Yes
Char Decoded sentence: Employer Coverage: Yes
Char Decoded sentence: Employer Coverage Yes 
-Lenght =  1
Input sentence: Policy No.: 
GT sentence: Policy No.:
Char Decoded sentence

-Lenght =  1
Input sentence: _ 3—0/1») I. [Lg/ZKJAJ 
GT sentence: Medical Specialty
Char Decoded sentence: St3—0/1I) &.y[Zi/ht 
Char Decoded sentence: St3—0/1I) &.y[Zi/ht a 
-Lenght =  1
Input sentence: Address 
GT sentence: Address
Char Decoded sentence: Address
Char Decoded sentence: Address 
-Lenght =  1
Input sentence: Gib] EUGEW’ﬂd’ﬂ K State Zip H/ﬂ/O 
GT sentence: City State Zip
Char Decoded sentence: Cit] State Zip
Char Decoded sentence: City State Zip 
-Lenght =  1
Input sentence: Telephone Number 
GT sentence: Telephone Number
Char Decoded sentence: Telephone Number 
Char Decoded sentence: Telephone Number a 
-Lenght =  1
Input sentence: Fax Number 
GT sentence: Fax Number
Char Decoded sentence: Fax Number 
Char Decoded sentence: Fax Number a 
-Lenght =  1
Input sentence: g g . SHORT TERM DISABILITY CLAIM FORM 
GT sentence: SHORT TERM DISABILITY CLAIM FORM
Char Decoded sentence: ART .HORE TER DISARILIZY CATIE FORM 
Char Decoded sentence: ART SHORE TER DISABILITY CATIE FORM a 


-Lenght =  1
Input sentence: CL-1116 (11114) 
GT sentence: CL-1116 (11/14)
Char Decoded sentence: CL-1116 (11114)
Char Decoded sentence: CL-1116 (11114) 
-Lenght =  1
Input sentence: Encounter Date: 02! 1 2/20 1: 
GT sentence: Encounter Date: 02/12/2018
Char Decoded sentence: Encounter Date: 0212/20
Char Decoded sentence: Encounter Date 0212/20 
-Lenght =  1
Input sentence: Author: Souha Hakim, MD 
GT sentence: Author: Souha Hakim, MD
Char Decoded sentence: Author: Souha Hakim, MD 
Char Decoded sentence: Author Socha Hakims MD a 
-Lenght =  1
Input sentence: Service; (none) 
GT sentence: Service: (none)
Char Decoded sentence: Service; (none) 
Char Decoded sentence: Service none a 
-Lenght =  1
Input sentence: Author Type: Physician 
GT sentence: Author Type: Physician
Char Decoded sentence: Author Type: Physician
Char Decoded sentence: Author Type Physician 
-Lenght =  1
Input sentence: Fiied: 03l12i18 0017 
GT sentence: Filed: 03/12/18 0017
Char Decoded sentence: Filed: 0312i180017 
C

-Lenght =  1
Input sentence: C lairn Event Identiﬁer: 26774 53 
GT sentence: Clairn Event Identifier: 2677453
Char Decoded sentence: Claimar Event Identifie:2677453
Char Decoded sentence: Claimer Event Identifie:2677453 
-Lenght =  1
Input sentence: Electronically Signed Indicator: Yes 
GT sentence: Electronically Signed Indicator: Yes
Char Decoded sentence: Electronically Signed Indicator: Yes
Char Decoded sentence: Electronically Signed Indicator Yes 
-Lenght =  1
Input sentence: Electronically Signed Date: Thursday 
GT sentence: Electronically Signed Date: Thursday
Char Decoded sentence: Electronically Signed Date: Thursday
Char Decoded sentence: Electronically Signed Date Thursday 
-Lenght =  1
Input sentence: Claim Tji'pe: V'B Accident - Accidental Injury 
GT sentence: Claim Type: VB Accident - Accidental Injury
Char Decoded sentence: Claim Type :B Accident  A-cidental Injury
Char Decoded sentence: Claim Type B Accident a Accidental Injury 
-Lenght =  1
Input sentence: Policg'h ol

-Lenght =  1
Input sentence: Taken on 0231112018 at 3:47 PM: 
GT sentence: Taken on 02/11/2018 at 3:47 PM:
Char Decoded sentence: Take on 0231112018 t a3:47
Char Decoded sentence: Take on 0231112018 t a3:47 
-Lenght =  1
Input sentence: BP: 112158 mmHg 
GT sentence: BP: 112/58 mmHg
Char Decoded sentence: BP: 112158 mmHg 
Char Decoded sentence: BPM 112158 mmHg a 
-Lenght =  1
Input sentence: PULSE: 75 bpm 
GT sentence: PULSE: 75 bpm
Char Decoded sentence: PULSE: 75 bpm
Char Decoded sentence: PULSE of bpm 
-Lenght =  1
Input sentence: RESP: 18 breathslmin 
GT sentence: RESP: 18 breaths/min
Char Decoded sentence: RESP: 18 breathsmin
Char Decoded sentence: RESP of breathsmin 
-Lenght =  1
Input sentence: TEMP: 98.5 
GT sentence: TEMP: 98.5
Char Decoded sentence: TEMP: 98.5 
Char Decoded sentence: TEMPS 98.5 a 
-Lenght =  1
Input sentence: WEIGHT: 114 |b(51.71 kg) 
GT sentence: WEIGHT: 114 lb(51.71 kg)
Char Decoded sentence: WEIGHT: 114 lb(51.71 kg) 
Char Decoded sentence: WEIGHT 114 lb(51.

-Lenght =  1
Input sentence: Policg'h old 91': Owner Information 
GT sentence: Policyholder: Owner Information
Char Decoded sentence: Policyholder91n:r Information 
Char Decoded sentence: Policyholder91n:r Information a 
-Lenght =  1
Input sentence: First Name: 
GT sentence: First Name:
Char Decoded sentence: First Name: 
Char Decoded sentence: First Name a 
-Lenght =  1
Input sentence: Last Name: 
GT sentence: Last Name:
Char Decoded sentence: Last Name: 
Char Decoded sentence: Last Name a 
-Lenght =  1
Input sentence: Social Secun'ty Number: 
GT sentence: Social Security Number:
Char Decoded sentence: Social Security Number: 
Char Decoded sentence: Social Security Number a 
-Lenght =  1
Input sentence: Birth Date: 
GT sentence: Birth Date:
Char Decoded sentence: Birth Date: 
Char Decoded sentence: Birth Date a 
-Lenght =  1
Input sentence: Gender: 
GT sentence: Gender:
Char Decoded sentence: Gender: 
Char Decoded sentence: Gender a 
-Lenght =  1
Input sentence: Language Preference: 


-Lenght =  1
Input sentence: AMOUNT BILL ED 
GT sentence: AMOUNT BILLED
Char Decoded sentence: AMOUNT BILLED ON
Char Decoded sentence: AMOUNT BILLED ON 
-Lenght =  1
Input sentence: ALLOWED AMOUNT 
GT sentence: ALLOWED AMOUNT
Char Decoded sentence: ALLOWED AMOUNT 
Char Decoded sentence: ALLOWED AMOUNT a 
-Lenght =  1
Input sentence: PARAMOU NT PAID 
GT sentence: PARAMOUNT PAID
Char Decoded sentence: PARAMOUNT PAID DID 
Char Decoded sentence: PARAMOUNT PAID DID a 
-Lenght =  1
Input sentence: 9 Indicates additional information i- available. 
GT sentence: Indicates additional information is available.
Char Decoded sentence: 9ndicates additional information is -vailable
Char Decoded sentence: indicates additional information is available 
-Lenght =  1
Input sentence: Employer Nana: 
GT sentence: Employer Name:
Char Decoded sentence: Employer Name: 
Char Decoded sentence: Employer Name a 
-Lenght =  1
Input sentence: Electron [1: S u Innis sion 
GT sentence: Electronic: Submission
Char Dec

-Lenght =  1
Input sentence: Date ofVisit/A d‘miss ion 03/09/2018 
GT sentence: Date of Visit/Admission: 03/09/2018
Char Decoded sentence: Date of Visi/Admiss on 03/09/201
Char Decoded sentence: Date of Visi/Admiss on 03/09/201 
-Lenght =  1
Input sentence: Date ofDiseharge: 03/09/2018 
GT sentence: Date of Discharge: 03/09/2018
Char Decoded sentence: Date of Discharg:03/09/2018
Char Decoded sentence: Date of Discharg:03/09/2018 
-Lenght =  1
Input sentence: Procedure: ER visit, me 
GT sentence: Procedure: ER visit, Xray
Char Decoded sentence: Procedure: ER visit,Number 
Char Decoded sentence: Procedure ER visit,Number a 
-Lenght =  1
Input sentence: Emplomi ent In formation 
GT sentence: Employment Information
Char Decoded sentence: Employment Information
Char Decoded sentence: Employment Information 
-Lenght =  1
Input sentence: Medical Provider Roles: 
GT sentence: Medical Provider Roles: Treating
Char Decoded sentence: Medical Provider Roles: 
Char Decoded sentence: Medical Provide

-Lenght =  1
Input sentence: Accident Work Related: No
GT sentence: Accident Work Related: No
Char Decoded sentence: Accident Work Related: No
Char Decoded sentence: Accident Work Related No 
-Lenght =  1
Input sentence: Time ofAccident: 14:45 
GT sentence: Time of Accident: 14:45
Char Decoded sentence: Time of Acciden:14:45 
Char Decoded sentence: Time of Acciden:14:45 a 
-Lenght =  1
Input sentence: Accident Date: 02/170018 
GT sentence: Accident Date: 02/17/2018
Char Decoded sentence: Accident Date: 02/170018
Char Decoded sentence: Accident Date 02/170018 
-Lenght =  1
Input sentence: Diagnosis Code: knee injury
GT sentence: Diagnosis Code: knee injury
Char Decoded sentence: Diagnosis Code: knee injury
Char Decoded sentence: Diagnosis Code knee injury 
-Lenght =  1
Input sentence: Sn rg er)’ Information 
GT sentence: Surgery Information
Char Decoded sentence: Surgery )nformation 
Char Decoded sentence: Surgery information a 
-Lenght =  1
Input sentence: 15 Surgery Required: No 
GT s

NameError: name 'results' is not defined

In [None]:
WER_spell_correction = calculate_WER(gt_texts, decoded_sentences)
print('WER_spell_correction |TEST= ', WER_spell_correction)

In [None]:
WER_OCR = calculate_WER(gt_texts, input_texts)
print('WER_OCR |TEST= ', WER_OCR)