TopK: an almost deterministic top k elements counter Redis module
The basic algorithm (1st one): https://www.cs.berkeley.edu/~satishr/cs270/sp11/rough-notes/Streaming-two.pdf
When TopK reaches cardinality of
k and a new element needs to be added, the following eviction policy it practiced:
- The frequencies of all existing observations are decreased by 1.
- If there is an observation that can be evicted, it is replaced by the element.
Quick start guide
- Build a Redis server with support for modules.
- Build the TopK module:
- To load a module, Start Redis with the
--loadmodule /path/to/module.sooption, add it as a directive to the configuration file or send a
TOPK.ADD key k elem [elem ...]
Complexity: O(log(N)), where N is the size of the zset.
Adds elements to a topk zset.
k is the topk zset's maximal size. If
k is zero, the topk zset has no size limit (sort of).
- elem can't begin with the prefix 'TOPK'.
- Using different a
kfrom call to call is perfectly possible, and could be even interesting.
- Optimization in the variadic form: first incr existing elements, then do one sweep of decr, then remove < 1 as needed. This may screw the probabilities (someone can probably dis/prove), so just use non variadic calls to play safe if you want.
An error is returned when:
- The zset isn't a topk
- The operation requires running
Reply: Integer. If positive, it is the number of new elements added, if negative it is the number of elements removed due to k overflow, if 0 then only the offset was updated.
TOPK.PRANK key ele [ele ...]
Returns the percentile rank for the element.
Reply: Array of Integers, nil if key or element not found
TOPK.PRANGE key from to [DESC|ASC]
Complexity: nasty too
Returns the elements in the percentile range.
to must be between 0 and 100, and
from must be less than or equal to
to. The optional switch determines the reply's sort order, where
DESC (the default) means ordering from highest to lowest frequency.
Reply: Array of strings.
TOPK.SHRINK key [k]
Complexity: O(N) where N is the number of members in the zset.
Resets the global offset after applying it to all members, trims size to
k if specified and not zero.
Reply: String, "OK"
TOPK.DEBUG command key [arg]
command should be:
MAKE- requires an integer
arg, fills topk
argobservations with respective frequencies.
SHOW- displays useful (?) information about the topk zset.
Issue reports, pull and feature requests are welcome.
AGPLv3 - see LICENSE