Skip to content
Permalink
Browse files
Update website.
  • Loading branch information
leerho committed May 2, 2022
1 parent 82b2fdb commit 39f1903dc906855d572747400089956378ddbc79
Showing 2 changed files with 13 additions and 13 deletions.
@@ -69,9 +69,9 @@ There are many situations where is valuable to understand the distribution of va

There are two different families of quantiles sketches, the original [quantiles/DoublesSketch](https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/quantiles/DoublesSketch.java), which can be operated either on-heap or off-heap, and is also available in a Java Generic form for arbitrary comparable objects.

Later we developed the [kll/KllFloatsSketch](https://github.com/apache/datasketches-java/blob/master/src/main/java/org/apache/datasketches/kll/KllFloatsSketch.java) (Named after its authors), which is also a quantiles sketch, that achieves near optimal small size for a given accuracy.
Later we developed the [kll/KllSketch](https://datasketches.apache.org/docs/KLL/KLLSketch.html) (Named after its authors), which is also a quantiles sketch, that achieves near optimal small size for a given accuracy.

The most recent sketch in this group is called the Relative Error Quantiles sketch, which is a cousin of the KLL sketch except that it provides very high accuracy at one of the ends of the rank domain. If your application requires high accuracy primarily for the very high ranks, e.g., the 99.999%ile, or the very low ranks, e.g. the .00001%ile, and you can give up some accuracy at the other end of the rank scale, this sketch is designed for you.
The most recent sketch in this group is called the Relative Error Quantiles [ReqSketch](https://datasketches.apache.org/docs/REQ/ReqSketch.html), which is a cousin of the KLL sketch except that it provides very high accuracy at one of the ends of the rank domain. If your application requires high accuracy primarily for the very high ranks, e.g., the 99.999%ile, or the very low ranks, e.g. the .00001%ile, and you can give up some accuracy at the other end of the rank scale, this sketch is designed for you.

## Frequent Items / Heavy Hitters Sketches

@@ -19,22 +19,22 @@ layout: doc_page
specific language governing permissions and limitations
under the License.
-->
## KLL sketch
## KLL Sketch

Implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per bit.
See <a href="https://arxiv.org/abs/1603.05346v2">this paper</a>.
See <a href="https://arxiv.org/abs/1603.05346v2">Optimal Quantile Approximation in Streams, by Zohar Karnin, Kevin Lang, Edo Liberty</a>.
The name KLL is composed of the initial letters of the last names of the authors.

The usage of KllFloatsSketch is very similar to DoublesSketch. Because the key feature of this sketch is compactness, it was implemented with float values instead of double values.
The usage of KllSketch is very similar to DoublesSketch. The key feature of this sketch is its compactness for a given accuracy. It is implemented with both float and double values and can be configured for use on-heap or off-heap (Direct mode).
The parameter K that affects the accuracy and the size of the sketch is not restricted to powers of 2.
The default of 200 was chosen to yield approximately the same normalized rank error (1.65%) as the default DoublesSketch (K=128, error 1.73%).
The default of 200 was chosen to yield approximately the same normalized rank error (1.65%) as the original DoublesSketch (K=128, error 1.73%).

### Java example

```
import org.apache.datasketches.kll.KllFloatsSketch;
KllFloatsSketch sketch = new KllFloatsSketch();
KllFloatsSketch sketch = KllFloatsSketch.newHeapInstance();
int n = 1000000;
for (int i = 0; i < n; i++) {
sketch.update(i);
@@ -43,25 +43,25 @@ float median = sketch.getQuantile(0.5);
double rankOf1000 = sketch.getRank(1000);
```

### Differences of KllFloatsSketch from DoublesSketch
### Differences of KllSketch from the original quantiles DoublesSketch

* KLL has a smaller size for the same accuracy
* KLL is slightly faster to update
* The KLL parameter K doesn't have to be power of 2
* KLL operates with float values instead of double values
* KLL operates with either float values or double values
* KLL uses a merge method rather than a union object
* KLL does not offer direct, off-heap implementation
* KLL does not have separate updatable and compact forms

The starting point for the comparison is setting K in such a way that rank error would be approximately the same. As pointed out above, the default K for both sketches should achieve this. Here is the comparison of the single-sided normalized rank error (getRank() method) for the default K:

<img class="doc-img-full" src="{{site.docs_img_dir}}/kll/kll200-vs-ds128-rank-error.png" alt="RankError" />

DoublesSketch has two forms with different serialized sizes: UpdateDoublesSketch and CompactDoublesSketch. KllFloatsSketch has no such distinction. It is always serialized in a compact form, and it is not much bigger than that in memory. Here is the comparison of serialized sizes:
DoublesSketch has two forms with different serialized sizes: UpdateDoublesSketch and CompactDoublesSketch. The KLL sketches makes this distinction differently. When the KllSketch is serialized using *toByteArray()* it is always in a compact form and immutable. When the KllSketch is on-heap it is always updatable. It can be created off-heap using the static factory method *newDirectInstance(...)* method, which is also updatable. It is possible to move from off-heap (Direct) to on-heap using the *heapify(Memory)* method. The *merge(...)* method will work with off-heap sketches, on-heap sketches and Memory wrapped compact byte arrays.

Here is the comparison of serialized sizes:

<img class="doc-img-full" src="{{site.docs_img_dir}}/kll/kll200-vs-ds128-size.png" alt="SerializedSize" />

Some part of the size difference above is due to using items of float type as opposed to double type. Here is the comparison of the number of retained items to see the difference with no influence of the size of the item type:
Here is the comparison of the number of retained items to see the difference with no influence of the size of the item type:

<img class="doc-img-full" src="{{site.docs_img_dir}}/kll/kll200-vs-ds128-items.png" alt="NumberOfRetainedItems" />

0 comments on commit 39f1903

Please sign in to comment.