Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fromExpHistogram / toExpHistogram functions, which will encode/decode quantileDD to/from OpenTelemetry Exponential Histogram format #63529

Open
UnamedRus opened this issue May 8, 2024 · 1 comment
Labels

Comments

@UnamedRus
Copy link
Contributor

UnamedRus commented May 8, 2024

Use case

Implementation: https://github.com/newrelic-experimental/newrelic-sketch-java

It's also match implementation used by opentelemetry specification for exponential-histograms https://opentelemetry.io/blog/2022/exponential-histograms/

Which can increase adoption of ClickHouse for destination of opentelemetry data.

Specification
https://github.com/open-telemetry/oteps/blob/main/text/0149-exponential-histogram.md

Actually DDSketch is already exponential histogram, so need to wrap my head about how it convert values
https://github.com/DataDog/sketches-java/blob/264143c1451684b0f6985fd223e81eff40547278/src/protobuf/proto/DDSketch.proto#L12

@UnamedRus UnamedRus changed the title Implement support for NrSketch quantile. Add fromExpHistogram / toExpHistogram functions, which will encode/decode quantileDD to/from OpenTelemetry Exponential Histogram format May 9, 2024
@UnamedRus
Copy link
Contributor Author

Proto files for DDSketch and ExponentialHistogram

DDSketch:

  // The mapping between positive values and the bin indexes they belong to.
  IndexMapping mapping = 1;

  // The store for keeping track of positive values.
  Store positiveValues = 2;

  // The store for keeping track of negative values. A negative value v is mapped using its positive opposite -v.
  Store negativeValues = 3;

  // The count for the value zero and its close neighborhood (whose width depends on the mapping).
  double zeroCount = 4;

  // The gamma parameter of the mapping, such that bin index that a value v belongs to is roughly equal to
  // log(v)/log(gamma).
  double gamma = 1;

  // An offset that can be used to shift all bin indexes.
  double indexOffset = 2;

  // To speed up the computation of the index a value belongs to, the computation of the log may be approximated using
  // the fact that the log to the base 2 of powers of 2 can be computed at a low cost from the binary representation of
  // the input value. Other values can be approximated by interpolating between successive powers of 2 (linearly,
  // quadratically or cubically).
  // NONE means that the log is to be computed exactly (no interpolation).
  Interpolation interpolation = 3;
  enum Interpolation {
    NONE = 0;
    LINEAR = 1;
    QUADRATIC = 2;
    CUBIC = 3;
    QUARTIC = 4;
  }
}

message Store {
  // The bin counts, encoded sparsely.
  map<sint32, double> binCounts = 1;

  // The bin counts, encoded contiguously. The values of contiguousBinCounts are the counts for the bins of indexes
  // o, o+1, o+2, etc., where o is contiguousBinIndexOffset.
  repeated double contiguousBinCounts = 2 [packed = true];
  sint32 contiguousBinIndexOffset = 3;
}

ExponentialHistogram:

  // count is the number of values in the population. Must be
  // non-negative. This value must be equal to the sum of the "bucket_counts"
  // values in the positive and negative Buckets plus the "zero_count" field.
  fixed64 count = 4;

  // sum of the values in the population. If count is zero then this field
  // must be zero.
  //
  // Note: Sum should only be filled out when measuring non-negative discrete
  // events, and is assumed to be monotonic over the values of these events.
  // Negative events *can* be recorded, but sum should not be filled out when
  // doing so.  This is specifically to enforce compatibility w/ OpenMetrics,
  // see: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#histogram
  optional double sum = 5;
  
  // scale describes the resolution of the histogram.  Boundaries are
  // located at powers of the base, where:
  //
  //   base = (2^(2^-scale))
  //
  // The histogram bucket identified by `index`, a signed integer,
  // contains values that are greater than (base^index) and
  // less than or equal to (base^(index+1)).
  //
  // The positive and negative ranges of the histogram are expressed
  // separately.  Negative values are mapped by their absolute value
  // into the negative range using the same scale as the positive range.
  //
  // scale is not restricted by the protocol, as the permissible
  // values depend on the range of the data.
  sint32 scale = 6;

  // zero_count is the count of values that are either exactly zero or
  // within the region considered zero by the instrumentation at the
  // tolerated degree of precision.  This bucket stores values that
  // cannot be expressed using the standard exponential formula as
  // well as values that have been rounded to zero.
  //
  // Implementations MAY consider the zero bucket to have probability
  // mass equal to (zero_count / count).
  fixed64 zero_count = 7;

  // positive carries the positive range of exponential bucket counts.
  Buckets positive = 8;

  // negative carries the negative range of exponential bucket counts.
  Buckets negative = 9;

message Buckets {
    // Offset is the bucket index of the first entry in the bucket_counts array.
    // 
    // Note: This uses a varint encoding as a simple form of compression.
    sint32 offset = 1;

    // Count is an array of counts, where count[i] carries the count
    // of the bucket at index (offset+i).  count[i] is the count of
    // values greater than base^(offset+i) and less or equal to than
    // base^(offset+i+1).
    //
    // Note: By contrast, the explicit HistogramDataPoint uses
    // fixed64.  This field is expected to have many buckets,
    // especially zeros, so uint64 has been selected to ensure
    // varint encoding.
    repeated uint64 bucket_counts = 2;
  } 

Mapping:

Exp    = DDSketch

count = zeroCount + sum(positive) + sum(negative)
sum   = zeroCount + sum(positive)

zero_count = zeroCount 

index = log(v) / log(gamma) = ln(gamma)(value)
gamma^index = value

[(base^index), (base^index+1)]  = [(gamma^index), (gamma^index+1)]

base = gamma

base = (2^(2^-scale))
gamma = 2^(2^-scale))
scale = - log2(log2(gamma))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant