Skip to content

Find the closest string in a lexicon using shared n-grams as a fast proxy for edit distance.

Notifications You must be signed in to change notification settings

cgyulay/approx-string-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

approx-string-search

This class implements fast approximate string search in a large lexicon by building an n-gram lookup table. Shared n-grams between a query word and lexicon words can be used as a proxy for edit distance. For example, this might be used by a spell checker to find close matches in a dictionary.

This system is based on the paper "Finding Approximate Matches in Large Lexicons" (Zobel & Dart, 1995).

Instructions

Usage

from approx_lookup_table import ApproxLookupTable

# build table
lexicon = ["Elizabeth", "Eleanor", "Eliana", "Elane"]
lt = ApproxLookupTable(lexicon)

# query
print lt.query("Ellzabeth")
>> "Elizabeth"

# query k closest
print lt.query_k("Ellzabeth", 2)
>> ["Elizabeth", "Beth"]

About

Find the closest string in a lexicon using shared n-grams as a fast proxy for edit distance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages