Skip to content

A Misra-Gries approach to the heavy hitters problem

License

Notifications You must be signed in to change notification settings

adamdrake/hitters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This is a library to address the top-k items problem. It uses the Misra-Gries algorithm to provide the k items in a stream or data set of length m which appear more frequently than m/k, using only k-1 counters in memory.

Note that while this is an effective algorithm for large data sets, and does not contain false negatives, it can contain arbitrarily-false positives.

Example

package main

import (
	"fmt"
	"io/ioutil"
	"net/http"
	"strings"

	"github.com/adamdrake/hitters"
)

func main() {
	hitters, err := hitters.New(10) // This will keep track of items which appear with frequency datasetSize/50
	if err != nil {
		panic("cannot init hitters")
	}
	resp, err := http.Get("http://www.gutenberg.org/cache/epub/2680/pg2680.txt")
	source, err := ioutil.ReadAll(resp.Body)
	resp.Body.Close()
	if err != nil {
		panic("cannot get text")
	}
	cleaned := strings.Replace(string(source), "\n", " ", -1)
	tokens := strings.Split(cleaned, " ")
	hitters.Add(tokens...) // This adds one or more items to the Hitters
	fmt.Println(hitters.Items())
}

This will return a map with item 5 having a count of 4 and item b having a count of 1.

About

A Misra-Gries approach to the heavy hitters problem

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages