-
Notifications
You must be signed in to change notification settings - Fork 0
RaVbaker/search_in_memory_php
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SearchInMemory as a reseach about full text/fault-tolerant searches SearchInMemory is my first impression about how works and how should work full text search engines or fault-tolerant searches (FTS). It begins as a research about how stuff works. It was also a test for: is it hard to write own full text search engine? You may consider it as a experiment to uncover how full text search engines works. I have also tried a BinaryTree implementation of index, but it wasn't so good for me as HashIndex. But even when I finished a really huge part of code there is still place for improvements(you can use it as a roadmap for your own FTS), like: - n-gram indexing and searching with levenstein sorting - wildcard searching - excluding some phrases from results - caching generated indexed on some memory based structure on disk - steeming words to others - improved and more complex way to do faceting - a socket connector for searcher from a unix level - index updating and deleting particular records - whole phrases in " " signs, like: "billy bob" to match exactly this phrase (need to improve inverted indexes in HashIndex) - possibilities of import/export data from indexes using formats: json/xml/csv - tweaking results based on special criteria or queries Cheers, Rafal "RaVbaker" Piekarski Contact: web: http://about.me/ravbaker twitter: ravbaker github: https://github.com/RaVbaker Great start for your own research: - http://en.wikipedia.org/wiki/Levenshtein_distance - a minimal knowlegde about comparing similar words - http://en.wikipedia.org/wiki/Inverted_index - goot start for building indexes - specially full inverted indexes - http://en.wikipedia.org/wiki/N-gram - N-grams, what it is and why? - http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html - quite old Google Research department post about n-grams in practise - with a large available dataset. - http://ngrams.googlelabs.com/ - a practise usage of ngrams with Books Ngram Viewer from Google. - http://today.java.net/pub/a/today/2005/08/09/didyoumean.html - great article about how Did you mean works with Lucene. Very inspiring post - but mainly about Java. - http://framework.zend.com/code/filedetails.php?repname=Zend+Framework&path=%2Ftrunk%2Flibrary%2FZend%2FSearch%2FLucene.php - sourcecode from Zend Framework with their PHP implementation of Lucene. Nice source of thougts. - http://www.ir.uwaterloo.ca/book/ - A book when you think BIG. It's about building your own service for full text search engine scalable almost like Google/Bing. Lots of theory and C/C++ code and algorithms. For very begining I suggest reading an excerpt from chapter 4 - Static inverted indicies - http://www.ir.uwaterloo.ca/book/04-static-inverted-indices.pdf - Helpful php functions: http://www.php.net/manual/en/function.levenshtein.php, http://www.php.net/manual/en/function.metaphone.php, http://php.net/manual/en/function.soundex.php, http://docs.php.net/manual/en/language.types.array.php :)
About
SearchInMemory is a reseach about full text (fault-tolerant) searches
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published