### Spelling Recommender

I will create two different spelling recommenders, that each take a list of misspelled words and recommends a correctly spelled word for every word in the list.

For every misspelled word, the recommender will find the word in correct_spellings that has the shortest distance (each of the two different recommenders will use a different distance measure), and starts with the same letter as the misspelled word, and return that word as a recommendation.

In [2]:
import nltk
from nltk.corpus import words
from nltk.util import ngrams

correct_spellings = words.words()

Jaccard distance on the ngrams of the two words.

In [15]:
def jaccard_on_ngrams(misspelled_words, ngrams_deg=3):
    reccomendation_list=[]
    for entry in misspelled_words:
        distance=nltk.jaccard_distance(set(ngrams(entry, ngrams_deg)), set(ngrams(correct_spellings[0], ngrams_deg)))
        reccomendation = ''
        for word in correct_spellings:
            if word[0]==entry[0] and nltk.jaccard_distance(set(ngrams(entry, ngrams_deg)), set(ngrams(word, ngrams_deg))) < distance:
                distance = nltk.jaccard_distance(set(ngrams(entry, ngrams_deg)), set(ngrams(word, ngrams_deg)))
                reccomendation = word
        reccomendation_list.append(reccomendation)
    return reccomendation_list

misspelled_words=['cormulent', 'incendenece', 'validrate']    
print('Spelling recomendations for words:\n',
      misspelled_words,'\n',
      'using Jaccard distance on the trigrams of the two words:\n',
      jaccard_on_ngrams(misspelled_words),
      'using Jaccard distance on the 4-grams of the two words:\n',
      jaccard_on_ngrams(misspelled_words, 4))

Spelling recomendations for words:
 ['cormulent', 'incendenece', 'validrate'] 
 using Jaccard distance on the trigrams of the two words:
 ['corpulent', 'indecence', 'validate'] using Jaccard distance on the 4-grams of the two words:
 ['cormus', 'incendiary', 'valid']


Edit distance on the two words with transpositions.

In [17]:
def edit_with_transpositions(misspelled_words):
    reccomendation_list=[]
    for entry in misspelled_words:
        distance = nltk.edit_distance(entry, correct_spellings[0], transpositions=True)
        reccomendation = ''
        for word in correct_spellings:
            if entry[0]==word[0] and nltk.edit_distance(entry, word, transpositions=True) < distance:
                distance = nltk.edit_distance(entry, word, transpositions=True)
                reccomendation = word
        reccomendation_list.append(reccomendation)
    return reccomendation_list 

misspelled_words=['cormulent', 'incendenece', 'validrate']   
print('Spelling recomendations for words:\n',
      misspelled_words,'\n',
      'using Edit distance on the two words with transpositions.:\n',
      edit_with_transpositions(misspelled_words))

Spelling recomendations for words:
 ['cormulent', 'incendenece', 'validrate'] 
 using Edit distance on the two words with transpositions.:
 ['corpulent', 'intendence', 'validate']
