Skip to content

A nascent tutorial/intro to search engine ideas in Nim

Notifications You must be signed in to change notification settings

c-blake/nimsearch

Repository files navigation

The basic idea is covered in ixAllInOneEasy1.nim. With that understood, one can add concept-level features like word stemming as well as systems-level optimizations. (To use stemming, snowball-stemmer must be installed.)

As many such features are added, index creation & query code grows dependencies on same-repo modules such as xml.nim which may be of broader interest to people using the stdlib XML parser and pack.nim which can also serve as a simple, self-contained key-value store. (If retaining "updatability" of the index were more important than space efficiency/memory density then instead of pack.nim, the techniques of https://github.com/c-blake/suggest would be appropriate which keeps a persistent external hash table to lists of wildly varying length in Nim MemFiles.)

There is a script called diffs.sh that shows a set of interesting changes and some miscellaneous results, build/bench scripts/patches. I had originally planned to write some good exposition of all of these edits, but oh well. Many are small. Just run ./diffs.sh | your-diff-viewer or diff=viewer ./diffs.sh. (Of course, I recommend https://github.com/c-blake/hldiff piped to less, but there are many.)

Some of these scripts may assume that you have either copies or symbolic or hard links to saved/pre-downloaded data files such as enwiki-*. These files are too big to realistically include in the repository, but ambitious readers should have little trouble getting them using data.sh (and re-compressing with parallel decompressing zstd to get .zs files if they do not want to wait forever and a day for decompression). data.sh itself also uses catz from the https://github.com/c-blake/nio package.

That's about it. This is not the more fully explained tutorial/article work I had originally set out to do. Related ideas recently arose in the Forum. So, it seemed worth putting out there. If you have a specific question, raise an issue.

About

A nascent tutorial/intro to search engine ideas in Nim

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages