Skip to content

Salauyou/Fonetic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fonetic

Text search utils, fuzzy search algorithms and collections to make their use easier.

Main packages and classes:

####text

  • Word — representation of a piece of text as some string value, which internally holds mapping to source where it was initially extracted from. Across transformations, such mapping remains unchanged or may change accordingly if word length changes, so after all operations it is clear simple to align resulting word to its initial source. Word implements CharSequence of its value, making it easy to use it in search algorithms, utility methods etc;

  • Words — utility class to produce Words and play with them (extract from String, join, split etc).

This may be useful when you work with documents containing markup tags and other special entities—you first extract text as a collection or Words from the document, then process/modify them, and finally apply modifications to source, leaving markup untouched. (For example, you need to search and highlight dictionary entries in html document.)

####search

  • FoneticSearch — original algorithm to search for phonetically similar occurrences of pattern in text. The main goal is to allow not only phonetic variations, but also non-phonetic misspells, that commonly used Metaphone and Soundex don't handle—misspelled word there are very likely to be encoded differently than original;

  • LcsSearch — search for matches using gapped longest common subsequence.

####collect [in progress, subject to change] — collections to help in search algorithms, incliding:

A few small demos to see how it works: src/main/java/demos

About

Text search utils, fuzzy search algorithms etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages