# Project: Second-Order HMM for typos correction



The goal is to design a model to correct typos in texts without a dictionaries.

In this problem, a state refers to the correct letter that should have been typed, and an observation refers to the actual letter that is typed. Given a sequence of outputs/observations (i.e., actually typed letters), the problem is to reconstruct the hidden state sequence (i.e., the intended sequence of letters). Thus, data for this problem looks like:

* [('t', 't'), ('h', 'h'), ('w', 'e'), ('k', 'm')]
* [('f', 'f'), ('o', 'o'), ('r', 'r'), ('m', 'm')] 

The first example is misspelled: the observation is thwk while the correct word is them. The second example is correctly typed.

Data for this problem was generated as follows: starting with a text document, in this case, the Unabomber's Manifesto, which was chosen not for political reasons, but for its convenience being available on-line and of about the right length, all numbers and punctuation were converted to white space and all letters converted to lower case. The remaining text is a sequence only over the lower case letters and the space character, represented in the data files by an underscore character. Next, typos were artificially added to the data as follows: with 90% probability, the correct letter is transcribed, but with 10% probability, a randomly chosen neighbor (on an ordinary physical keyboard) of the letter is transcribed instead. Space characters are always transcribed correctly. In a harder variant of the problem, the rate of errors is increased to 20%.

The dataset in an archive, see the shared drive to download it. This archive contains 4 pickles: train10 and test10 constitute the dataset with 10% or spelling errors, while train20 and test20 the one with 20% or errors.


## Part 1 : First Order HMM

In [1]:
import nltk
from numpy import array, ones, zeros
import sys

class HMM:
        def __init__(self, state_list, observation_list,
                 transition_proba = None,
                 observation_proba = None,
                 initial_state_proba = None):
            """Builds a new Hidden Markov Model
            state_list is the list of state symbols [q_0...q_(N-1)]
            observation_list is the list of observation symbols [v_0...v_(M-1)]
            transition_proba is the transition probability matrix
                [a_ij] a_ij = Pr(Y_(t+1)=q_i|Y_t=q_j)
            observation_proba is the observation probablility matrix
                [b_ki] b_ki = Pr(X_t=v_k|Y_t=q_i)
            initial_state_proba is the initial state distribution
                [pi_i] pi_i = Pr(Y_0=q_i)"""
            print("HMM created with: ")
            self.N = len(state_list) # The number of states
            self.M = len(observation_list) # The number of words in the vocabulary
            print(str(self.N)+" states")
            print(str(self.M)+" observations")
            self.omega_Y = state_list # Keep the vocabulary of tags
            self.omega_X = observation_list # Keep the vocabulary of tags
            # Init. of the 3 distributions : observation, transition and initial states
            if transition_proba is None:
                self.transition_proba = zeros( (self.N, self.N), float) 
            else:
                self.transition_proba=transition_proba
            if observation_proba is None:
                self.observation_proba = zeros( (self.M, self.N), float) 
            else:
                self.observation_proba=observation_proba
            if initial_state_proba is None:
                self.initial_state_proba = zeros( (self.N,), float ) 
            else:
                self.initial_state_proba=initial_state_proba
            # Since everything will be stored in numpy arrays, it is more convenient and compact to 
            # handle words and tags as indices (integer) for a direct access. However, we also need 
            # to keep the mapping between strings (word or tag) and indices. 
            self.make_indexes()

        def make_indexes(self):
            """Creates the reverse table that maps states/observations names
            to their index in the probabilities arrays"""
            self.Y_index = {}
            omega_Y_keys = [key for key in self.omega_Y.keys()]
            omega_X_keys = [key for key in self.omega_X.keys()]
            for i in range(self.N):
                self.Y_index[omega_Y_keys[i]] = i
            self.X_index = {}
            for i in range(self.M):
                self.X_index[omega_X_keys[i]] = i
        
        def compute_init_state_proba(self, data):
            for sent in data:
                self.initial_state_proba[self.Y_index[sent[0][1]]]+=1
            self.initial_state_proba/=len(data)
            
        def compute_observation_probas(self, data):            
            for phr in data:
                for word in phr:
                    x = self.X_index[word[0]]
                    y = self.Y_index[word[1]]
                    self.observation_proba[x][y] += 1
            self.observation_proba /= np.sum(self.observation_proba, axis=1)[:, np.newaxis]
             
        def compute_transition_probas(self, data):            
            for phr in data:
                for i in range(len(phr) - 1):
                    yplus1 = self.Y_index[phr[i + 1][1]]
                    y = self.Y_index[phr[i][1]]
                    self.transition_proba[y][yplus1] += 1
            self.transition_proba /= np.sum(self.transition_proba, axis=1)[:, np.newaxis]
            
        def init_parameters(self, train_set):
            self.compute_init_state_proba(train_set)
            self.compute_observation_probas(train_set)
            self.compute_transition_probas(train_set)
            
        def forward(self, obs):
            alpha = np.zeros((len(obs), len(self.Y_index)))
            alpha[0] = self.initial_state_proba\
                        * self.observation_proba[self.X_index[obs[0][0]]]
            for i in range(1, len(alpha)):
                alpha[i] = self.observation_proba[self.X_index[obs[i][0]]] *\
                np.sum(self.transition_proba.T * alpha[i - 1], axis=1)
            return alpha
        
        def backward(self, obs):
            beta = np.zeros((len(obs), len(self.Y_index)))
            beta[-1] = ones(len(self.Y_index))
            for i in range(len(obs) - 2, -1, -1):
                beta[i] = np.sum(beta[i + 1]\
                                 * self.observation_proba[self.X_index[obs[i + 1][0]]]\
                                 * self.transition_proba, axis=1)
            return beta
        
        def decode(self, alpha, beta):
            prob = alpha * beta
            preds = prob.argmax(axis=1)
            keys = [key for key in self.omega_Y.keys()]
            return [keys[pred_ind] for pred_ind in preds]
        
        def viterbi(self, obs):
            mu_max = np.zeros(len(obs))
            tmp = self.initial_state_proba * self.observation_proba[self.X_index[obs[0][0]]]
            index = [np.argmax(tmp)]
            mu_max[0] = max(tmp)
            for i in range(1, len(obs)):
                tmp = self.observation_proba[self.X_index[obs[i][0]]]\
                        * self.transition_proba[self.Y_index[obs[i - 1][1]]]\
                        * mu_max[i - 1]
                index.append(np.argmax(tmp))
                mu_max[i] = max(tmp)
            keys = [key for key in self.omega_Y.keys()]
            return [keys[ind] for ind in index]
            
        def score_eval(self, test):
            error = 0
            elements = 0
            errors_corrected = 0
            errors_added = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                alpha = self.forward(word)
                beta = self.backward(word)
                preds = self.decode(alpha, beta)
                elements += len(preds)
                for x, y, pred in zip(base, truth, preds):
                    if pred != x and pred == y:
                        errors_corrected += 1
                    if pred != y:
                        error += 1
                        if x == y:
                            errors_added += 1
            return error / elements, errors_corrected, errors_added
        
        def score_viterbi(self, test):
            error = 0
            elements = 0
            errors_corrected = 0
            errors_added = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                preds = self.viterbi(word)
                elements += len(preds)
                for x, y, pred in zip(base, truth, preds):
                    if pred != x and pred == y:
                        errors_corrected += 1
                    if pred != y:
                        error += 1
                        if x == y:
                            errors_added += 1
            return error / elements, errors_corrected, errors_added

        def score_dummy(self, test):
            error = 0
            elements = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                elements += len(truth)
                for x, y in zip(base, truth):
                    if x != y:
                        error += 1
            return error / elements
        
        def results_hmm(self, test):
            error_test, fb_corrected, fb_added = self.score_eval(test)
            viterbi_error_test, vit_corrected, vit_added = self.score_viterbi(test)
            error_dummy = self.score_dummy(test)
            print("Error forward-backward = {:.2%}, {} errors corrected, {} errors added"
                  .format(error_test, fb_corrected, fb_added))
            print("Error viterbi = {:.2%}, {} errors corrected, {} errors added"
                  .format(viterbi_error_test, vit_corrected, vit_added))
            print("Error with nothing changed = {:.2%}".format(error_dummy))

## Lecture & séparation des données

In [2]:
import pickle
import numpy as np

In [3]:
train10 = pickle.load(open("./typos-data/train10.pkl", "rb"))
train20 = pickle.load(open("./typos-data/train20.pkl", "rb"))
test10 = pickle.load(open("./typos-data/test10.pkl", "rb"))
test20 = pickle.load(open("./typos-data/test20.pkl", "rb"))

## Création du vocabulaire & du HMM

In [4]:
def distrib_x_y_data(data):
    set_tag = []
    set_mot = []
    dict_tag = dict()
    dict_mot = dict()
    for phrase in data:
        for mot in phrase:
            if not(mot[1] in set_tag):
                set_tag.append(mot[1])
                dict_tag[mot[1]]=0
            dict_tag[mot[1]]+=1
            if not(mot[0] in set_mot):
                set_mot.append(mot[0])
                dict_mot[mot[0]]=0
            dict_mot[mot[0]]+=1
    return dict_mot, dict_tag

### Corpus with 10% errors

In [5]:
obs_list, state_list = distrib_x_y_data(train10)

hmm = HMM(state_list, obs_list)

HMM created with: 
26 states
26 observations


In [6]:
hmm.init_parameters(train10)

hmm.results_hmm(test10)

Error forward-backward = 8.29%, 332 errors corrected, 194 errors added
Error viterbi = 8.92%, 296 errors corrected, 204 errors added
Error with nothing changed = 10.18%


### Corpus with 20% errors

In [7]:
obs_list, state_list = distrib_x_y_data(train20)

hmm_20 = HMM(state_list, obs_list)

HMM created with: 
26 states
26 observations


In [8]:
hmm_20.init_parameters(train20)

hmm_20.results_hmm(test20)

Error forward-backward = 15.04%, 1453 errors corrected, 724 errors added
Error viterbi = 15.85%, 1401 errors corrected, 808 errors added
Error with nothing changed = 19.41%


## Part 2 : Second order HMM

In [9]:
import nltk
from numpy import array, ones, zeros
import sys

class HMM2(HMM):
        def __init__(self, state_list, observation_list,
                 transition_proba = None,
                 observation_proba = None,
                 initial_state_proba = None):
            super().__init__(state_list, observation_list)
            self.observation_proba_order2 = zeros((self.M, self.N, self.N), float) 
            self.transition_proba_order2 = zeros((self.N, self.N, self.N), float) 
            
        def compute_observation_probas_order2(self, data):            
            for phr in data:
                for i in range(1, len(phr)):
                    x = self.X_index[phr[i][0]]
                    y = self.Y_index[phr[i][1]]
                    yminus1 = self.Y_index[phr[i - 1][1]]
                    self.observation_proba_order2[x][yminus1][y] += 1
            sumPlus1 = np.sum(self.observation_proba_order2, axis=2)[:, :, np.newaxis]
            self.observation_proba_order2 /= np.where(sumPlus1 == 0, 1, sumPlus1)
        
        def compute_transition_probas_order2(self, data):  
            nb_trigrams = 0.0
            nb_bigrams = 0.0
            unigram = zeros(self.N)
            bigrams = zeros((self.N, self.N))
            trigrams = zeros((self.N, self.N, self.N))
            for phr in data:
                for i in range(2, len(phr)):
                    y = self.Y_index[phr[i][1]]
                    unigram[y] += 1
                    yminus1 = self.Y_index[phr[i - 1][1]]
                    bigrams[yminus1][y] += 1
                    nb_bigrams += 1.0
                    yminus2 = self.Y_index[phr[i - 2][1]]
                    trigrams[yminus2][yminus1][y] += 1
                    nb_trigrams += 1.0
                    
            k3 = (np.log(trigrams + 1) + 999) / (np.log(trigrams + 1) + 1000)
            k2 = (np.log(bigrams + 1) + 1) / (np.log(bigrams + 1) + 1000)
            lambda1 = k3
            lambda2 = (1 - k3) * k2
            lambda3 = (1 - k3) * (1 - k2)
            for phr in data:
                for i in range(2, len(phr)):
                    y = self.Y_index[phr[i][1]]
                    yminus1 = self.Y_index[phr[i - 1][1]]
                    yminus2 = self.Y_index[phr[i - 2][1]]
                    self.transition_proba_order2[yminus2][yminus1][y] = lambda1[yminus2][yminus1][y] \
                                                * trigrams[yminus2][yminus1][y]\
                                                + lambda2[yminus2][yminus1][y] * bigrams[yminus1][y]\
                                                + lambda3[yminus2][yminus1][y] * unigram[y]
                    
            sumPlus1 = np.sum(self.transition_proba_order2, axis=1)[:, np.newaxis, :]
            self.transition_proba_order2 /= np.where(sumPlus1 == 0, 1, sumPlus1)
            
        def init_parameters(self, train_set):
            super().init_parameters(train_set)
            self.compute_observation_probas_order2(train_set)
            self.compute_transition_probas_order2(train_set)
        
        def viterbi(self, obs):
            keys = [key for key in self.omega_Y.keys()]
            
            delta = np.zeros(len(obs))
            tmp = self.initial_state_proba * self.observation_proba[self.X_index[obs[0][0]]]
            phi = [np.argmax(tmp)]
            delta[0] = max(tmp)
            
            if len(obs) < 2:
                return [keys[ind] for ind in phi]
            else:
# For second order observation probabilities                
#                 tmp2 = self.observation_proba_order2[self.X_index[obs[1][0]]][self.Y_index[obs[0][1]]]\
#                             * self.transition_proba_order1[self.Y_index[obs[0][1]]]\
#                             * delta[0]
                tmp2 = self.observation_proba[self.X_index[obs[1][0]]]\
                            * self.transition_proba[self.Y_index[obs[0][1]]]\
                            * delta[0]
                phi.append(np.argmax(tmp2))
                delta[1] = max(tmp2)

                for i in range(2, len(obs)):
# For second order observation probabilities                
#                     tmp = self.observation_proba_order2[self.X_index[obs[i][0]]][self.Y_index[obs[i - 1][1]]]\
#                             * self.transition_proba[self.Y_index[obs[i - 2][1]]][self.Y_index[obs[i - 1][1]]]\
#                             * delta[i - 1]
                    tmp = self.observation_proba[self.X_index[obs[i][0]]]\
                        * self.transition_proba_order2[self.Y_index[obs[i - 2][1]]][self.Y_index[obs[i - 1][1]]]\
                        * delta[i - 1]
                    phi.append(np.argmax(tmp))
                    delta[i] = max(tmp)
                return [keys[ind] for ind in phi]
   
        def results_hmm(self, test):
            viterbi_error_test, vit_corrected, vit_added = self.score_viterbi(test)
            error_dummy = self.score_dummy(test)
            print("Error viterbi = {:.2%}, {} errors corrected, {} errors added"
                  .format(viterbi_error_test, vit_corrected, vit_added))
            print("Error with nothing changed = {:.2%}".format(error_dummy))

### Corpus with 10% errors

In [10]:
obs_list, state_list = distrib_x_y_data(train10)

hmm2 = HMM2(state_list, obs_list)

HMM created with: 
26 states
26 observations


In [11]:
hmm2.init_parameters(train10)

In [12]:
hmm2.results_hmm(test10)

Error viterbi = 6.82%, 355 errors corrected, 109 errors added
Error with nothing changed = 10.18%


### Corpus with 20% errors

In [13]:
obs_list, state_list = distrib_x_y_data(train20)

hmm2_20 = HMM2(state_list, obs_list)

HMM created with: 
26 states
26 observations


In [14]:
hmm2_20.init_parameters(train20)

hmm2_20.results_hmm(test20)

Error viterbi = 12.35%, 1625 errors corrected, 447 errors added
Error with nothing changed = 19.41%


### Remarks

- Using first order observation probabilites (only knowing the current state) makes better results than observation probabilities knowing the current state and the previous state.
- Weirdly, summing counts in transition probabilites for the axis 1, hence having the sum of probabilities equals to 1 for the state at time t - 1, instead of summing at 1 for the state at time t (axis 2), also provides better results.

## Handling deletions and insertions

First, in order to be able to handle deletions and/or insertions, we need a training corpus possessing those two features.
It can easily be created by randomly remove or add letters in our current corpus. We just need to specify how it is represented in the corpus. 

One solution would be to represent insertions with the observation being a letter and the tag the empty string, and the opposite works for deletion.

For example : 
- Word cat with an inserted b at second position : [('c', 'c'), ('b', ''), ('a', 'a'), ('t', 't')]
- Word cat with the t deleted : [('c', 'c'), ('a', 'a'), ('', 't')]

#### How to handle this in the HMM ?

The simplest solution would probably be to consider insertion and deletion as tags. Their counts will then be computed at the same time as everything else.

In that case, knowing if we should add or remove a letter will also be decided in the same way, i.e. if their probability is the highest.

For insertion, this should be enough by deleting the letter. But as for deletion, we also need to know which letter we should put where we decided that a letter was missing.

One solution would be to see which is the most probable letter knowing the last states, or the last several states, but it won't probably be sufficient to have good results. One solution probably a bit better, would be to wait for reaching the end of the word and decide considering all observed states. Of course, this solution would also be harder to implement.

The best solution would of course be to have a full dictionary of possible observations and match the observed words with it to find the correct missing letter. Obviously, this solution will only work if the number of possible observations is finite.

## A try to handle noisy insertion

### Creation of train and test sets

In [15]:
def noisy_insertions(dataset, insertion_probability):
    set_tags = []
    for word in dataset:
        for letter in word:
            if letter[1] not in set_tags:
                set_tags.append(letter[1])
    new_data = []
    for word in dataset:
        new_word = []
        rand_insert = [np.random.random() < insertion_probability for _ in word]
        for i, b in enumerate(rand_insert):
            if b:
                rand_letter = (set_tags[np.random.randint(len(set_tags))], '')
                new_word.append(rand_letter)
            new_word.append(word[i])
        new_data.append(new_word)
    return new_data

In [16]:
import nltk
from numpy import array, ones, zeros
import sys

class HMM3(HMM2):

        
        def score_viterbi(self, test):
            error = 0
            elements = 0
            errors_corrected = 0
            errors_added = 0
            correctly_deleted = 0
            uncorrectly_deleted = 0
            noisy_notcorrected = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                preds = self.viterbi(word)
                elements += len(preds)
                for x, y, pred in zip(base, truth, preds):
                    if pred != x and pred == y:
                        if y == '':
                            correctly_deleted += 1
                        errors_corrected += 1
                    if pred != y:
                        error += 1
                        if x == y:
                            errors_added += 1
                        if pred == '':
                            uncorrectly_deleted += 1
                        if y == '':
                            noisy_notcorrected += 1
            return error / elements, (uncorrectly_deleted + noisy_notcorrected) / elements,\
                    errors_corrected, errors_added, correctly_deleted, uncorrectly_deleted

        def score_dummy(self, test):
            error = 0
            insertion_errors = 0
            elements = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                elements += len(truth)
                for x, y in zip(base, truth):
                    if x != y:
                        error += 1
                        if y == '':
                            insertion_errors += 1
            return error / elements, insertion_errors / elements
        
        def results_hmm(self, test):
            viterbi_error_test, vit_noise_errors, vit_corrected, vit_added, vit_noise, \
            vit_unc_noise = self.score_viterbi(test)
            error_dummy, ins_errors_dummy = self.score_dummy(test)
            print("Error viterbi = {:.2%}, including {:.2%} errors for correcting noisy insertions.\
            \n{} errors corrected, {} errors added, {} noisy insertions removed, {} letters unjustly removed"
                  .format(viterbi_error_test, vit_noise_errors, vit_corrected, \
                          vit_added, vit_noise, vit_unc_noise))
            print("Error with nothing changed = {:.2%}, noisy insertion errors = {:.2%}"
                  .format(error_dummy, ins_errors_dummy))

In [17]:
new_train10 = noisy_insertions(train10, 0.05)
new_test10 = noisy_insertions(test10, 0.05)

In [18]:
obs_list, state_list = distrib_x_y_data(new_train10)

hmm3 = HMM3(state_list, obs_list)

HMM created with: 
27 states
26 observations


In [19]:
hmm3.init_parameters(new_train10)

In [20]:
hmm3.results_hmm(new_test10)

Error viterbi = 10.62%, including 4.26% errors for correcting noisy insertions.            
411 errors corrected, 118 errors added, 93 noisy insertions removed, 56 letters unjustly removed
Error with nothing changed = 14.43%, noisy insertion errors = 4.74%


#### Remarks

We think that the performance for removing noisy insertions will increase along the size of the train set. The more there are examples of transitions with noisy insertion, the more the chance of correctly removing those noisy insertions.
As of now, it does work, but isn't really good.

## A try to handle characters deletion

### Creation of train and test sets

In [21]:
def noisy_deletions(dataset, deletion_probability):
    set_tags = []
    for word in dataset:
        for letter in word:
            if letter[1] not in set_tags:
                set_tags.append(letter[1])
    new_data = []
    for word in dataset:
        new_word = []
        rand_delete = [np.random.random() < deletion_probability for _ in word]
        for i, b in enumerate(rand_delete):
            if b:
                new_word.append(('', word[i][1]))
            else:
                new_word.append(word[i])
        new_data.append(new_word)
    return new_data

In [22]:
import nltk
from numpy import array, ones, zeros
import sys

class HMM4(HMM2):
    
        def score_viterbi(self, test):
            error = 0
            elements = 0
            errors_corrected = 0
            errors_added = 0
            deletion_detected = 0
            deletion_added = 0
            deletion_undetected = 0
            correctly_undeleted = 0
            uncorrectly_undeleted = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                preds = self.viterbi(word)
                elements += len(preds)
                for x, y, pred in zip(base, truth, preds):
                    if pred != x and pred == y:
                        if x == '':
                            correctly_undeleted += 1
                            deletion_detected += 1
                        errors_corrected += 1
                    if pred != y:
                        error += 1
                        if x == y:
                            errors_added += 1
                        if x == '':
                            if pred != '':
                                deletion_detected += 1
                            else:
                                deletion_undetected += 1
                            uncorrectly_undeleted += 1
                        if pred == '' and x != '':
                            deletion_added += 1
            return error / elements, (deletion_undetected + deletion_added + uncorrectly_undeleted) / elements,\
                    errors_corrected, errors_added, correctly_undeleted, deletion_undetected,\
                        deletion_added, uncorrectly_undeleted

        def score_dummy(self, test):
            error = 0
            deletion_errors = 0
            elements = 0
            for word in test:
                base = [letter for (letter, _) in word]
                truth = [tag for (_, tag) in word]
                elements += len(truth)
                for x, y in zip(base, truth):
                    if x != y:
                        error += 1
                        if x == '':
                            deletion_errors += 1
            return error / elements, deletion_errors / elements
        
        def results_hmm(self, test):
            viterbi_error_test, vit_noise_errors, vit_corrected, vit_added, vit_noise,\
            vit_undet_noise, vit_add_noise, vit_unc_undel = self.score_viterbi(test)
            error_dummy, del_errors_dummy = self.score_dummy(test)
            print("Error viterbi = {:.2%}, {:.2%} errors for correcting noisy deletions.\
            \n{} errors corrected, {} errors added, {} deletion correctly corrected, {} deletions not detected,\
            \n{} deletions added, {} deletions detected but uncorrectly corrected."
                  .format(viterbi_error_test, vit_noise_errors, vit_corrected, vit_added, vit_noise,\
                          vit_undet_noise, vit_add_noise, vit_unc_undel))
            print("Error with nothing changed = {:.2%}, noisy deletion errors = {:.2%}"
                  .format(error_dummy, del_errors_dummy))

In [23]:
new2_train10 = noisy_deletions(train10, 0.05)
new2_test10 = noisy_deletions(test10, 0.05)

In [24]:
obs_list, state_list = distrib_x_y_data(new2_train10)

hmm4 = HMM4(state_list, obs_list)

HMM created with: 
26 states
27 observations


In [25]:
hmm4.init_parameters(new2_train10)

In [26]:
hmm4.results_hmm(new2_test10)

Error viterbi = 9.78%, 3.33% errors for correcting noisy deletions.            
460 errors corrected, 104 errors added, 117 deletion correctly corrected, 0 deletions not detected,            
0 deletions added, 244 deletions detected but uncorrectly corrected.
Error with nothing changed = 14.64%, noisy deletion errors = 4.93%


#### Remarks

It works suprisingly well, but it probably is because there is problem in the representation of deleted characters. It kind of transforms the deletion problem in a substitution problem with '' being a special character only present in observations.

Since we want to keep the tag corresponding to the deleted character, we end up making it trivial for the HMM to detect deletions since the observation is ''.

Instead, we should try to find a way to keep the tag, AND, directly reduce the size of the word for us to be oblivious to whether a letter is missing.

We thought about it but didn't manage to find a good way to overcome this limitation.