Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,12 +116,14 @@ the union of counters is lossless in the sense that you end up with the same cou
you would have arrived at had you observed the union of all of the individual events.

* For an intersection of counters, there's no good theoretical bound on the relative
error. In practice, and especially for intersections involving a small number of sets,
the relative error you obtain tends to be in relation to the size of the union of the
sets involved. For example, if you have two sets, each of cardinality 5000 and observe
both sets through HyperLogLog counters with parameter b=10 (3% relative error), you can
expect the intersection estimate to be within 10000 * 0.03 = 300 of the actual intersection
size.
error. In practice, the relative error is largely a function of the relative size of
the sets, the amount they overlap, and the number of sets being intersected. If the
error of any term in the inclusion-exclusion formula is as large as the intersection
cardinality, then the estimate will be useless. For the best results, intersect only
two or three sets of roughly the same size. For instance, given two sets whose
cardinalities are within one order of magnitude and whose intersection is roughly 10%
of the smaller set, the error (relative to the true intersection cardinality) would be
about 10-30%.

* For time queries, the relative error applies to the size of the set within the time
range you've queried. For example, given a set of cardinality 1,000,000 that has had
Expand Down