Skip to content

Sketchlib CountMin Support#215

Merged
milindsrivastava1997 merged 10 commits into
mainfrom
sketchlib-count-min
Mar 28, 2026
Merged

Sketchlib CountMin Support#215
milindsrivastava1997 merged 10 commits into
mainfrom
sketchlib-count-min

Conversation

@GnaneshGnani
Copy link
Copy Markdown
Contributor

Summary

Integrates sketchlib-rust for Count-Min Sketch implementation, introducing an enum-based backend that allows runtime switching between legacy and sketchlib implementations. This PR adds the sketchlib-rust dependency and refactors Count-Min Sketch to support dual backends.

Changes

New Files

asap-common/sketch-core/src/count_min_sketchlib.rs

  • Sketchlib-rust integration layer for Count-Min Sketch
  • Type alias: SketchlibCms = CountMin<Vector2D<f64>, RegularPath>
  • Helper functions:
    • new_sketchlib_cms(): Create fresh sketch
    • sketchlib_cms_from_matrix(): Build from existing matrix
    • matrix_from_sketchlib_cms(): Convert to legacy format
    • sketchlib_cms_update(): Update sketch with weighted key
    • sketchlib_cms_query(): Query frequency estimate

Modified Files

asap-common/sketch-core/Cargo.toml

  • Added sketchlib-rust dependency from GitHub

asap-common/sketch-core/src/lib.rs

  • Added pub mod count_min_sketchlib;
  • Added test constructor to force legacy mode during tests

asap-common/sketch-core/src/count_min.rs

  • Introduced CountMinBackend enum (Legacy/Sketchlib)
  • Introduced WireFormat struct for serialization
  • Refactored CountMinSketch:
    • Changed from direct sketch: Vec<Vec<f64>> field to backend: CountMinBackend
    • Added sketch() method to get matrix view
    • Added sketch_mut() method (returns Some only for Legacy)
    • Added from_legacy_matrix() constructor for deserialization
    • Backend selection based on use_sketchlib_for_count_min() at construction time
  • Updated all operations to dispatch through backend:
    • new(), update(), query_key(), merge()
  • Updated serialization/deserialization to use wire format struct

asap-query-engine/src/main.rs

  • Added use sketch_core::config::{self, ImplMode};
  • Added CLI arguments: sketch_cms_impl, sketch_kll_impl, sketch_cmwh_impl
  • Calls config::configure() at startup before any sketch operations

asap-query-engine/src/lib.rs

  • Added test constructor that configures backends based on sketchlib-tests feature

asap-query-engine/src/precompute_operators/count_min_sketch_accumulator.rs

  • Replaced direct struct construction with CountMinSketch::from_legacy_matrix()
  • Updated field access to use sketch() method instead of direct field
  • Updated all tests to use accessor methods instead of direct field mutation

asap-summary-ingest/templates/udfs/countminsketch_count.rs.j2

  • Added sketchlib-rust dependency
  • Added ImplMode enum and IMPL_MODE constant (set to Sketchlib)
  • Implemented dual-path UDF logic:
    • Legacy path: Uses twox-hash and manual matrix updates
    • Sketchlib path: Uses SketchlibCms with integer counters

asap-summary-ingest/templates/udfs/countminsketch_sum.rs.j2

  • Similar dual-path implementation for sum aggregation
  • Added sketchlib-rust integration

asap-query-engine/Cargo.toml

  • Added sketchlib-tests feature flag
  • Added ctor dev dependency

asap-common/sketch-core/src/config.rs

  • Configuration infrastructure for backend selection

asap-query-engine/tests/test_both_backends.rs

  • Ensures both backends are tested

asap-common/sketch-core/src/bin/sketchlib_fidelity.rs

  • Added CountMinSketch benchmarking functions
  • Compares legacy vs sketchlib implementations for accuracy

asap-common/sketch-core/report.md

  • Added CountMinSketch fidelity results showing near-identical accuracy

Cargo.lock

  • Updated with sketchlib-rust dependency

Technical Approach

Backend Abstraction Pattern

pub enum CountMinBackend {
    Legacy(Vec<Vec<f64>>),
    Sketchlib(SketchlibCms),
}

pub struct CountMinSketch {
    pub row_num: usize,
    pub col_num: usize,
    pub backend: CountMinBackend,
}

All operations dispatch through the backend enum, allowing zero runtime overhead for monomorphized code paths.

Wire Format Compatibility

The msgpack serialization format remains unchanged:

struct WireFormat {
    sketch: Vec<Vec<f64>>,
    row_num: usize,
    col_num: usize,
}

Both backends serialize to and deserialize from this common format, ensuring UDF-to-QueryEngine compatibility.

Hash Function Difference

Note: QueryEngineRust uses xxhash-rust::xxh32 while Arroyo UDF templates historically used twox-hash::XxHash32. The UDF templates now use sketchlib-rust's internal hashing (MurmurHash3), which matches neither. This is acceptable as the sketches are probabilistic and all hash functions provide good distribution.

Testing

# Unit tests (legacy backend)
cargo test -p sketch-core
cargo test -p query_engine_rust

# Unit tests with sketchlib backend
cargo test -p query_engine_rust --features sketchlib-tests

# Fidelity comparison
cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl sketchlib
cargo run -p sketch-core --bin sketchlib_fidelity -- --cms-impl legacy

Fidelity Results

CountMinSketch achieves near-identical accuracy between Legacy and sketchlib-rust:

  • Pearson correlation: >0.999 (effectively perfect)
  • MAPE: <25% for high-collision scenarios, 0% for low-collision
  • RMSE: <55% for high-collision scenarios, 0% for low-collision

See asap-common/sketch-core/report.md for detailed results.

Copy link
Copy Markdown
Contributor

@milindsrivastava1997 milindsrivastava1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you verify that the new asap-summary-ingest UDFs can be compiled by arroyo? Use the validate_udfs.py script

@GnaneshGnani
Copy link
Copy Markdown
Contributor Author

@milindsrivastava1997, the validation script currently does not handle the new implementation-mode template parameter correctly which was introduced to allow selecting either Sketchlib or the existing implementation. For now, I set the default to Sketchlib so validation can run without changing the validator script. After this update, validation passed in Sketchlib mode for the CMS UDFs with no errors.

I still observed a separate failure in HydraKLL, which is unrelated to the changes in this PR. I will address that in the KLL PR, or open a dedicated follow-up PR if needed.

@GnaneshGnani
Copy link
Copy Markdown
Contributor Author

@milindsrivastava1997, please let me know if you need any other changes for this PR

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

@GnaneshGnani will check tomorrow and let you know. Thanks.

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

@GnaneshGnani Can you please run the quick start in your branch and check that it works? Change the quantile by .... to count by ..... to exercise the CountMin path.

@zzylol pls help if needed.

@GnaneshGnani
Copy link
Copy Markdown
Contributor Author

GnaneshGnani commented Mar 24, 2026

@milindsrivastava1997 , I ran quickstart on my branch and updated quickstart queries from quantile to count in controller-config.yaml

count by (pattern) (sensor_reading)
count by (pattern, region) (sensor_reading)
count by (pattern, service) (sensor_reading)
count by (pattern, job) (sensor_reading)

Regenerated dashboards and launched quickstart cleanly.

I verified

  • QueryEngine inference config showed count-only queries loaded.
asap-queryengine  | 2026-03-24T17:03:41.744125Z  INFO query_engine_rust: asap-query-engine/src/main.rs:145: Inference config: InferenceConfig { schema: PromQL(PromQLSchema { config: {"sensor_reading": KeyByLabelNames { labels: ["host", "instance", "job", "pattern", "region", "service"] }} }), 
query_configs: [QueryConfig { query: "count by (pattern) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 1, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 2, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }, 
QueryConfig { query: "count by (pattern, region) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 3, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 4, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }, 
QueryConfig { query: "count by (pattern, service) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 5, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 6, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }, 
QueryConfig { query: "count by (pattern, job) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 7, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 8, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }], cleanup_policy: ReadBased }
  • QueryEngine streaming config showed CountMinSketch with aggregation_sub_type = count.
asap-queryengine  | 2026-03-24T17:03:41.744444Z  INFO query_engine_rust: asap-query-engine/src/main.rs:155: Streaming config: StreamingConfig { aggregation_configs: {5: AggregationConfig { aggregation_id: 5, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "service"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 7: AggregationConfig { aggregation_id: 7, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["job", "pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
6: AggregationConfig { aggregation_id: 6, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"width": Number(1024), "depth": Number(3)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "service"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
2: AggregationConfig { aggregation_id: 2, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"width": Number(1024), "depth": Number(3)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
1: AggregationConfig { aggregation_id: 1, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
8: AggregationConfig { aggregation_id: 8, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"depth": Number(3), "width": Number(1024)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["job", "pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
3: AggregationConfig { aggregation_id: 3, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "region"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
4: AggregationConfig { aggregation_id: 4, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"depth": Number(3), "width": Number(1024)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "region"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }} }
  • API query results were returned successfully for count queries.
curl -sG "http://localhost:8088/api/v1/query" --data-urlencode "query=count by (pattern) (sensor_reading)" | jq .
{
  "data": {
    "result": [
      {
        "metric": {
          "pattern": "exp_up"
        },
        "value": [
          1774372849.846,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "constant"
        },
        "value": [
          1774372849.846,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_up"
        },
        "value": [
          1774372849.846,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "sine_noise"
        },
        "value": [
          1774372849.846,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "step"
        },
        "value": [
          1774372849.846,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_down"
        },
        "value": [
          1774372849.846,
          "27000"
        ]
      }
    ],
    "resultType": "vector"
  },
  "status": "success"
}
curl -sG "http://localhost:9090/api/v1/query" --data-urlencode "query=count by (pattern) (sensor_reading)" | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "pattern": "exp_up"
        },
        "value": [
          1774372864.115,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "constant"
        },
        "value": [
          1774372864.115,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_up"
        },
        "value": [
          1774372864.115,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "sine_noise"
        },
        "value": [
          1774372864.115,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "step"
        },
        "value": [
          1774372864.115,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_down"
        },
        "value": [
          1774372864.115,
          "27000"
        ]
      }
    ]
  }
}

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

@GnaneshGnani Thank you for this. @zzylol pointed out that the quickstart is using pre-built images so actually it will not be using the code in your branch. Sorry, I didn't realize this.

For now, can you change the quickstart docker-compose.yml to not use the pre-built asap-* images and instead have something like this?

queryengine:
    # image: ghcr.io/projectasap/asap-query-engine:v0.2.0
    build:
      context: ..
      dockerfile: asap-query-engine/Dockerfile

Do this for each asap image except asap-arroyo. Then re-run the count experiment as above. Then, you can discard your changes to the quickstart docker-compose.yml

@GnaneshGnani
Copy link
Copy Markdown
Contributor Author

@milindsrivastava1997 , tested with the local images.

docker compose ps
NAME                             IMAGE                                            COMMAND                  SERVICE                     CREATED              STATUS                        PORTS
asap-arroyo                      ghcr.io/projectasap/asap-arroyo:v0.1.0           "/app/arroyo --confi…"   arroyo                      About a minute ago   Up About a minute (healthy)   0.0.0.0:5115->5115/tcp, [::]:5115->5115/tcp
asap-fake-exporter-constant      asapquery-quickstart-fake-exporter-constant      "target/release/fake…"   fake-exporter-constant      About a minute ago   Up About a minute             50000/tcp
asap-fake-exporter-exp-up        asapquery-quickstart-fake-exporter-exp-up        "target/release/fake…"   fake-exporter-exp-up        About a minute ago   Up About a minute             50007/tcp
asap-fake-exporter-linear-down   asapquery-quickstart-fake-exporter-linear-down   "target/release/fake…"   fake-exporter-linear-down   About a minute ago   Up About a minute             50002/tcp
asap-fake-exporter-linear-up     asapquery-quickstart-fake-exporter-linear-up     "target/release/fake…"   fake-exporter-linear-up     About a minute ago   Up About a minute             50001/tcp
asap-fake-exporter-sine-noise    asapquery-quickstart-fake-exporter-sine-noise    "target/release/fake…"   fake-exporter-sine-noise    About a minute ago   Up About a minute             50004/tcp
asap-fake-exporter-step          asapquery-quickstart-fake-exporter-step          "target/release/fake…"   fake-exporter-step          About a minute ago   Up About a minute             50005/tcp
asap-grafana                     grafana/grafana-enterprise:12.3.3                "/run.sh"                grafana                     About a minute ago   Up About a minute (healthy)   0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp
asap-kafka                       apache/kafka:3.7.0                               "/bin/bash -c 'chown…"   kafka                       About a minute ago   Up About a minute (healthy)   9092/tcp
asap-prometheus                  prom/prometheus:v3.9.1                           "/bin/prometheus --c…"   prometheus                  About a minute ago   Up 38 seconds (healthy)       0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp
asap-queryengine                 asapquery-quickstart-queryengine                 "query_engine_rust -…"   queryengine                 About a minute ago   Up 27 seconds                 0.0.0.0:8088->8088/tcp, [::]:8088->8088/tcp

docker compose logs --tail=260 queryengine | rg -n "Inference config|count by|quantile by|CountMinSketch|DatasketchesKLL|ERROR|panic"

7:asap-queryengine  | 2026-03-25T15:47:04.994244Z  INFO query_engine_rust: asap-query-engine/src/main.rs:167: Inference config: InferenceConfig { schema: PromQL(PromQLSchema { config: {"sensor_reading": KeyByLabelNames { labels: ["host", "instance", "job", "pattern", "region", "service"] }} }), 
    query_configs: [QueryConfig { query: "count by (pattern) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 1, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 2, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }, 
    QueryConfig { query: "count by (pattern, region) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 3, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 4, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }, 
    QueryConfig { query: "count by (pattern, service) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 5, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 6, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }, 
    QueryConfig { query: "count by (pattern, job) (sensor_reading)", aggregations: [AggregationReference { aggregation_id: 7, num_aggregates_to_retain: None, read_count_threshold: Some(31) }, AggregationReference { aggregation_id: 8, num_aggregates_to_retain: None, read_count_threshold: Some(31) }] }], cleanup_policy: ReadBased }

9:asap-queryengine  | 2026-03-25T15:47:04.994748Z  INFO query_engine_rust: asap-query-engine/src/main.rs:177: Streaming config: StreamingConfig { aggregation_configs: {
    6: AggregationConfig { aggregation_id: 6, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"depth": Number(3), "width": Number(1024)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "service"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    2: AggregationConfig { aggregation_id: 2, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"depth": Number(3), "width": Number(1024)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    3: AggregationConfig { aggregation_id: 3, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "region"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    5: AggregationConfig { aggregation_id: 5, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "service"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    7: AggregationConfig { aggregation_id: 7, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["job", "pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    1: AggregationConfig { aggregation_id: 1, aggregation_type: "DeltaSetAggregator", aggregation_sub_type: "", parameters: {}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    4: AggregationConfig { aggregation_id: 4, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"width": Number(1024), "depth": Number(3)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["pattern", "region"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "job", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }, 
    8: AggregationConfig { aggregation_id: 8, aggregation_type: "CountMinSketch", aggregation_sub_type: "count", parameters: {"width": Number(1024), "depth": Number(3)}, grouping_labels: KeyByLabelNames { labels: [] }, aggregated_labels: KeyByLabelNames { labels: ["job", "pattern"] }, rollup_labels: KeyByLabelNames { labels: ["host", "instance", "region", "service"] }, original_yaml: "", window_size: 1, slide_interval: 1, window_type: "tumbling", tumbling_window_size: 1, spatial_filter: "", spatial_filter_normalized: "", metric: "sensor_reading", num_aggregates_to_retain: None, read_count_threshold: None, table_name: None, value_column: None }} }
curl -sG "http://localhost:8088/api/v1/query" --data-urlencode "query=count by (pattern) (sensor_reading)" | jq .
{
  "data": {
    "result": [
      {
        "metric": {
          "pattern": "sine_noise"
        },
        "value": [
          1774453736.095,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "step"
        },
        "value": [
          1774453736.095,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_down"
        },
        "value": [
          1774453736.095,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "exp_up"
        },
        "value": [
          1774453736.095,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "constant"
        },
        "value": [
          1774453736.095,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_up"
        },
        "value": [
          1774453736.095,
          "27000"
        ]
      }
    ],
    "resultType": "vector"
  },
  "status": "success"
}
curl -sG "http://localhost:9090/api/v1/query" --data-urlencode "query=count by (pattern) (sensor_reading)" | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "pattern": "sine_noise"
        },
        "value": [
          1774453744.243,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "step"
        },
        "value": [
          1774453744.243,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_down"
        },
        "value": [
          1774453744.243,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "exp_up"
        },
        "value": [
          1774453744.243,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "constant"
        },
        "value": [
          1774453744.243,
          "27000"
        ]
      },
      {
        "metric": {
          "pattern": "linear_up"
        },
        "value": [
          1774453744.243,
          "27000"
        ]
      }
    ]
  }
}

@milindsrivastava1997
Copy link
Copy Markdown
Contributor

@GnaneshGnani LGTM. Pls fix conflicts and then we can merge. Thanks.

@milindsrivastava1997 milindsrivastava1997 merged commit 0426b66 into main Mar 28, 2026
16 checks passed
@milindsrivastava1997 milindsrivastava1997 deleted the sketchlib-count-min branch March 28, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants