Skip to content

Added count sketch#230

Merged
c-dickens merged 2 commits intoapache:count-sketch-pythonfrom
c-dickens:add-count-sketch
Aug 12, 2021
Merged

Added count sketch#230
c-dickens merged 2 commits intoapache:count-sketch-pythonfrom
c-dickens:add-count-sketch

Conversation

@c-dickens
Copy link
Contributor

Added a python implementation of count sketch for frequency estimation and heavy hitter detection.

@c-dickens c-dickens requested a review from jmalkin August 9, 2021 16:37

def is_empty(self):
"""
Returns True if the sketch self.table is empty and is False otherwise.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a valid philosophical debate over whether adding and then subtracting an item means a sketch is truly empty. It might seem trivial with a single item, but imagine merging 2 sketches that were created with the same items but opposite weights -- if we treat it as empty, we lose the information that it has ingested data.

Certainly open to the idea that we don't care about a distinction, but it's worth thinking about what we want to capture by "empty"

Copy link
Contributor

@jmalkin jmalkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable start, but I think there's still a fair amount to do before it's production ready.

  • there's no serialization, so it's not yet suitable for distributed use
  • merge doesn't check for sketch compatibility, including matching table sizes. in order to be meaningful, I think we'd also want the hash functions to match?
  • we really need to figure out a better hash function, as that seems important for ingesting arbitrary(-ish, in python) input types.

But I think this is sufficient to pull into a branch so we can start collaborative improvements

@c-dickens c-dickens merged commit 719f1e1 into apache:count-sketch-python Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants