Skip to content
/ gohll Public

A simple implementation of HyperLogLog in plain go for 32-bit hash functions.

License

Notifications You must be signed in to change notification settings

avisagie/gohll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gohll

A simple implementation of HyperLogLog in plain Go for 32-bit hash functions.

HyperLogLog lets you work out an approximate count of the number of unique items in a large set of items, using a very small amount memory. "Small" means not keeping a physical set of all the items you've seen stream past. A 32-bit hash function on the items you count and about 5000 bits of memory (8192 bits for this Go implementation) should suffice to count up to 1e7 unique items to within about 2%.

E.g. you have a log of IP addresses visiting your extremely busy website... You might want to know how many unique IPs your visitors hail from, even though (obviously) everyone visits your site ten times a day. After that you want to know how many uniques there were everyday, and you want to the freedom to be able to aggregate the unique IPs for the week, or month, or last 3 days. HyperLogLog counters let you take the union of the 7 sets for the week, and still get an approximate count out at the end that collapses the duplicates, all for those 8000 bits per count.

For inspiration read this and this and the original paper.

Installation

go get github.com/avisagie/gohll

About

A simple implementation of HyperLogLog in plain go for 32-bit hash functions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages