New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tunable HyperMinHash impl #2
Conversation
* interface-based groundwork * clean up doc * Use interfaces * cleanup * update doc * doc strings, appease checkstyle
* wip * but out shared logic WIP * WIP tunable impl * add comment explaining LSB-aligned packing * long packer * change ifaces and add HMH Combiner * I need to learn to read * HMH Combiner code complete * how many hours does it take to write one class * add bias, remove unused param * add HyperMinHash serde, combiner, and implement IntersectionSketch for HMH * cardinality estimation * don't take the derivative * a bunch of stuff * add linear counting support * push serde tests and random test runner * wip tests * tests build, but do not pass * use collections * Fix tests 🕺🏼 🔥 (⌐■_■) * Test tunable impl (#4) * change README * praise baby jesus * remove comment * goddamn goddamn * handle out of order points * add notice * p >= 4
} | ||
|
||
/** | ||
* @param _128BitHash | ||
* Create a BetaMinHash from the serialized representation returned by {@link #getBytes()}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be updated to reference the SerDe
class
} | ||
|
||
public long cardinality() { | ||
return BetaMinHashCardinalityGetter.cardinality(this); | ||
// @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to remove this. We shouldn't have commented-out code checked in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to implement a SerDe
class instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaced with BetaMinHashSerde
|
||
@Override | ||
public boolean offer(byte[] bytes) { | ||
// MetroHash128 hash = MetroHash.hash128(HASH_SEED, bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this
add BetaMinHash Serde and tests
This PR cleans up a lot of the repo and adds
HyperMinHash
, an implementation that is based onHyperLogLog
with the addition of bias correction fromHyperLogLog++
.