Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tunable HyperMinHash impl #2

Merged
merged 13 commits into from Jan 11, 2019
Merged

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Jan 11, 2019

This PR cleans up a lot of the repo and adds HyperMinHash, an implementation that is based on HyperLogLog with the addition of bias correction from HyperLogLog++.

ChristianHansen and others added 4 commits January 4, 2019 16:03
* interface-based groundwork

* clean up doc

* Use interfaces

* cleanup

* update doc

* doc strings, appease checkstyle
* wip

* but out shared logic WIP

* WIP tunable impl

* add comment explaining LSB-aligned packing

* long packer

* change ifaces and add HMH Combiner

* I need to learn to read

* HMH Combiner code complete

* how many hours does it take to write one class

* add bias, remove unused param

* add HyperMinHash serde, combiner, and implement IntersectionSketch for HMH

* cardinality estimation

* don't take the derivative

* a bunch of stuff

* add linear counting support

* push serde tests and random test runner

* wip tests

* tests build, but do not pass

* use collections

* Fix tests

🕺🏼  🔥  (⌐■_■)

* Test tunable impl (#4)

* change README

* praise baby jesus

* remove comment

* goddamn goddamn

* handle out of order points

* add notice

*  p >= 4
}

/**
* @param _128BitHash
* Create a BetaMinHash from the serialized representation returned by {@link #getBytes()}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated to reference the SerDe class

}

public long cardinality() {
return BetaMinHashCardinalityGetter.cardinality(this);
// @Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to remove this. We shouldn't have commented-out code checked in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to implement a SerDe class instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with BetaMinHashSerde


@Override
public boolean offer(byte[] bytes) {
// MetroHash128 hash = MetroHash.hash128(HASH_SEED, bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this

@sherifnada sherifnada changed the title Add tunable HMH impl Add tunable HyperMinHash impl Jan 11, 2019
@sherifnada sherifnada merged commit d3e26e8 into LiveRamp:master Jan 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants