# Spelling Recommender

Create three different spelling recommenders, that each take a list of misspelled words and recommends a correctly spelled word for every word in the list.

For every misspelled word, the recommender should find the word in the nltk corpus of correct_spellings that has the shortest distance*, and starts with the same letter as the misspelled word, and return that word as a recommendation.

*Each of the three different recommenders will use a different distance measure (outlined below).

In [3]:
import nltk
from nltk.corpus import words
nltk.download('words')

correct_spellings = words.words()

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\csaip\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


This recommender provides recommendations for the words provided using the following distance metric:

**[Jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index) on the trigrams of the two words.**

In [4]:
def jacard_trigram(entries=['cormulent', 'incendenece', 'validrate']):
    
    d = [1] * len(entries)
    recom = [w[0] for w in entries]
    for i in correct_spellings:
        for j in range(len(entries)):
            if i.startswith(entries[j][0]):
                a = set(nltk.ngrams(entries[j], n=3))
                b = set(nltk.ngrams(i, n=3))
                if nltk.jaccard_distance(a,b)< d[j]:
                    d[j] = nltk.jaccard_distance(a,b)
                    recom[j] = i
    return recom
    
jacard_trigram()

['corpulent', 'indecence', 'validate']

This recommender provides recommendations for the words provided using the following distance metric:

**[Jaccard distance](https://en.wikipedia.org/wiki/Jaccard_index) on the 4-grams of the two words.**

In [7]:
def jacard_four_gram(entries=['cormulent', 'incendenece', 'validrate']):
    
    d = [1] * len(entries)
    recom = [w[0] for w in entries]
    for i in correct_spellings:
        for j in range(len(entries)):
            if i.startswith(entries[j][0]):
                a = set(nltk.ngrams(entries[j], n=4))
                b = set(nltk.ngrams(i, n=4))
                if nltk.jaccard_distance(a,b)< d[j]:
                    d[j] = nltk.jaccard_distance(a,b)
                    recom[j] = i
    return recom
    
jacard_four_gram()

['cormus', 'incendiary', 'valid']

This recommender provides recommendations for the words provided using the following distance metric:

**[Damerau–Levenshtein distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance)**

In [9]:
def levenshtein_distance(entries=['cormulent', 'incendenece', 'validrate']):
    
    l= len(entries)
    recom = [w[0] for w in entries]
    d = [nltk.edit_distance(recom[i],entries[i],transpositions = True) for i in range(l)]
    for i in correct_spellings:
        w=i.lower()
        for j in range(l):
            if w.startswith(entries[j][0]):
                if nltk.edit_distance(w,entries[j],transpositions = True)< d[j]:
                    d[j] = nltk.edit_distance(w,entries[j],transpositions = True)
                    recom[j] = w          
    return recom # Your answer here 
    
levenshtein_distance()

['corpulent', 'intendence', 'validate']