# Spell Checking and Auto Correction in Python
#### "Pure Python Spell Checking based on Peter Norvig’s blog post on setting up a simple spell checking algorithm.“

- It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.


- $pyspellchecker$ supports multiple languages including English, Spanish, German, French, and Portuguese. For information on how the dictionaries were created and how they can be updated and improved, please see the Dictionary Creation and Updating section of the readme!


- $pyspellchecker$ supports Python 3 and Python 2.7 but, as always, Python 3 is the preferred version!


- $pyspellchecker$ allows for the setting of the Levenshtein Distance (up to two) to check. For longer words, it is highly recommended to use a distance of 1 and not the default 2. See the quickstart to find how one can change the distance parameter.


Reference: https://pypi.org/project/pyspellchecker/


In [2]:
pip install pyspellchecker

Note: you may need to restart the kernel to use updated packages.


In [5]:
# Load the package
from spellchecker import SpellChecker

In [6]:
# load default word frequency list
spell = SpellChecker()

In [7]:
dir(SpellChecker)

['_SpellChecker__edit_distance_alt',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '_case_sensitive',
 '_check_if_should_check',
 '_distance',
 '_tokenizer',
 '_word_frequency',
 'candidates',
 'correction',
 'distance',
 'edit_distance_1',
 'edit_distance_2',
 'export',
 'known',
 'split_words',
 'unknown',
 'word_frequency',
 'word_probability']

In [8]:
misspelled = spell.unknown(['Kaatie', 'Loondoon', 'stockk','happy','hardwor','here'])

In [9]:
for word in misspelled:
    # Find the one `most likely` answer
    print(spell.correction(word))
    # Find a list of `likely` options
    print(spell.candidates(word))

hardwork
{'hardfor', 'hardwork'}
london
{'london'}
katie
{'katie'}
stock
{'stock', 'stocky', 'stocks'}


## If the Word Frequency list is not to your liking, you can add additional text to generate a more appropriate list for your use case.