Sketching algorithms implemented in Chapel
Sketching is a relatively recent development in the theoretical field of Stochastic Streaming Algorithms, which deals with algorithms that can extract information from a stream of data in a single pass (sometimes called “one-touch” processing) using various randomization techniques.
HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset ... The HyperLogLog algorithm can estimate cardinalities well beyond 10^9 with a relative accuracy (standard error) of 2% while only using 1.5kb of memory.
count-distinct problem (also known in applied mathematics as the cardinality estimation problem) is the problem of finding the number of distinct elements in a data stream with repeated elements.
HyperLogLog takes advantage of the randomized distribution of bits from hashing functions in order to estimate how many things you would’ve needed to see in order to experience a specific phenomenon.