GitHub - mahmoudimus/python-ngrams: N-grams approximate string matching implementation in pure Python

This is a pute Python library that allows you to compare texts or strings using an n-gram model and cosine similarity. N-grams are tuples of length n consisting of subsequent tokens from a text. For example, if we treat words as tokens, then the first few trigrams (3-grams) of the license will be:

'this work ‘as-is’',
'work ‘as-is’ we',
'‘as-is’ we provide',
'we provide no',
'provide no warranty'.
...

Depending on what you choose as the basic token (words or characters) you can use this library for approximate string matching (finding misspellings, etc.) or as a "good enough" method of checking whether two texts [are similar] Lee.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ngrams		ngrams
README.markdown		README.markdown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ngrams

ngrams

README.markdown

README.markdown

Repository files navigation

About

Releases

Packages

Languages

mahmoudimus/python-ngrams

Folders and files

Latest commit

History

ngrams

ngrams

README.markdown

README.markdown

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages