Keyvi - the short form for "Key value index" is a key value store (KVS) optimized for size and lookup speed. The usage of shared memory makes it scalable and resistant. The biggest difference to other stores is the underlying data structure based on finite state machine. Storage is very space efficient, fast and by design makes various sorts of approximate matching be it fuzzy string matching or geo highly efficient. The immutable FST data structure can be used stand-alone if online writes are not required.
This is the continuation of cliqz-oss/keyvi. Keyvi was initially developed at Cliqz by Hendrik Muhs and others. For more information, please refer to https://github.com/cliqz-oss/keyvi
- BBuzz2016 talk
- Announcement blog post
- Search Meetup Munich Jan 2016
- Progscon 2017 talk
- Search Meetup Munich Apr 2018
Precompiled binary wheels are available for OS X and Linux on PyPi. To install use:
pip install keyvi
The core part is a C++ header-only library, but the TPIE 3rdparty library needs to be compiled once. The commandline tools are also part of the C++ code. For instructions check the Readme file.
For the python extension of keyvi check the Readme file in the python subfolder.
- Using python keyvi with EMR (mrjob or pyspark)
If you like to go deep down in the basics, keyvi is inspired by the following 2 papers:
- Sparse Array (See Storing a Sparse Table, Robert E. Tarjan et al. http://infolab.stanford.edu/pub/cstr/reports/cs/tr/78/683/CS-TR-78-683.pdf)
- Incremental, which means minimization is done on the fly (See Incremental Construction of Minimal Acyclic Finite-State Automata, J. Daciuk et al.: http://www.mitpressjournals.org/doi/pdf/10.1162/089120100561601)
Licence and 3rdparty dependencies
keyvi is licenced under apache license 2.0, see licence for details.
In addition keyvi uses 3rdparty libraries which define their own licence. Please check their respective licence. The 3rdparty libraries can be found at keyvi/3rdparty.