Skip to content

aghriss/SymSpell

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SymSpell

Implementation in Python3

The idea is to use Levenshtein distance to correct words, but only using the delete operation (without insertion or transposition). Since the delete cost is lower, the larger the dictionary the more computations are spared.

The algorithm is as follows:

  1. Parameters : max_distance, words_list
  2. Initiate Dictionary
  3. for word in words_list:
  • if word not in dictionary :
    • dictionary[word] = (empty_word_list,0)
  • else:
    • dictionary[word][1] += 1 (add one more occurence)
  • deletes = generate_deletes(word, max_distance)
  • for delete in deletes:
    • if delete in dictionary:
      • dictionary[delete][0].add(word)
    • else:
      • dictionary[delete][0]=([word],0)

Once the dictionary has been built, to correct a word:

  • we generate its deletes,
  • look for the deletes in the dictionary,
  • retrieve generative words and calculate distances from these words
  • sort, in ascending order,by (distance, -occurences)

About

Python3 implementation of SymSpell

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages