Support for PromQL #57545

nikitamikhaylov · 2023-12-05T18:55:29Z

Prometheus Query Language is a powerful tool to that lets the user select and aggregate time series data efficiently. https://prometheus.io/docs/prometheus/latest/querying/basics/

Use case

Prometheus is a de-facto standard for metrics;
The primary use case is to query data stored in ClickHouse using PromQL.
Optimized columnar storage.
Optimized table structure for Prometheus migration use case (drop-in replacement).

Describe the solution you'd like

The task is divided into several sub-tasks:

PromQL interpreter

The biggest challenge are several functions like rate , increase - functions which take possibly huge range-vector and depend on the order of elements in this vector.
PromQL parser and other functions (simple, aggregate and window ones) are doable.

We can support even different "dialects" of this language e.g. support MetricsQL or Grafana's LogQL.

Table structure

Similar to what we have right now for Kusto or PRQL we can put it under the setting sql_dialect. When the sql_dialect is set to promql ClickHouse will read only from the tables with the specified structure. Specifically:

The main table with schema like (timestamp, value, fingerprint) . Fingerprint could be just a hash of all the tags associated with the metric.
Index table which will be used for filtering the series by tags. We can pick one of proposed schemas:
- (tag_key, tag_value, fingerprint)
- (date, fingerprint, tags_array)

The way how to define the name for these tables can be different - it could be another user-level setting or part of the configuration file.

Native integration with Prometheus API:

What needs to be done for the native integration with Prometheus and drop-in replacement?

Support for Prometheus HTTP API
Support for Prometheus Remote Read API (Write?)
InfluxDB Line Protocol ?

The text was updated successfully, but these errors were encountered:

qoega · 2023-12-05T19:48:15Z

(timestamp, value, fingerprint)

This schema looks strange:
(date, fingerprint, hour, timestamp, value) can make more sense.

But as I wrote somewhere in related threads, I believe that we need to store vector for some larger granularity(i.e. hour) as a single value in a column.
It also makes even more sense if we will want to have fixed grid: 1 second grid is natural, but for 15 second grid it is nice to have precomputed column with array of last values per 15 second interval. Those can be used even automatically if we will add MATERIALIZED=>ASSUME(or just projections) that will substitute query parts using precomputed projections.

alont · 2023-12-05T20:14:24Z

It would be beneficial to include a tenant column in these tables -- we have seen scenarios where there were collisions between time series associated with different tenants.

UnamedRus · 2023-12-05T21:33:14Z

https://github.com/metrico/qryn they do the similar, but outside of ClickHouse.

There is quite a few issues arise, when you start to use ClickHouse as backend for promethreus like data.

Not optimal reads when you try to get last value by fingerprint.
PromQL allow to do manipulations with timeseries which makes it much harder to deal with index table in efficient way.

MacNale · 2023-12-07T14:04:29Z

Interesting feature. This helps reducing one more moving part from the observability backend stack storing metrics along side traces and logs in clickhouse.

Agree with @UnamedRus.
Handling cardinality is hard. There is no efficient solution now. Storing the Resource Attributes in Map brings the flexibility but degrades the lookup performance.
Schema(less) design is hard for storing metrics

This is from OpenTelemetry Exporter -

(
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ResourceSchemaUrl` String CODEC(ZSTD(1)),
    `ScopeName` String CODEC(ZSTD(1)),
    `ScopeVersion` String CODEC(ZSTD(1)),
    `ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ScopeDroppedAttrCount` UInt32 CODEC(ZSTD(1)),
    `ScopeSchemaUrl` String CODEC(ZSTD(1)),
    `MetricName` String CODEC(ZSTD(1)),
    `MetricDescription` String CODEC(ZSTD(1)),
    `MetricUnit` String CODEC(ZSTD(1)),
    `Attributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `StartTimeUnix` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TimeUnix` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `Value` Float64 CODEC(ZSTD(1)),
    `Flags` UInt32 CODEC(ZSTD(1)),
    `Exemplars.FilteredAttributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
    `Exemplars.TimeUnix` Array(DateTime64(9)) CODEC(ZSTD(1)),
    `Exemplars.Value` Array(Float64) CODEC(ZSTD(1)),
    `Exemplars.SpanId` Array(String) CODEC(ZSTD(1)),
    `Exemplars.TraceId` Array(String) CODEC(ZSTD(1)),
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_attr_key mapKeys(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_attr_value mapValues(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(TimeUnix)
ORDER BY (MetricName, Attributes, toUnixTimestamp64Nano(TimeUnix))
TTL toDateTime(TimeUnix) + toIntervalDay(3)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │```

oplehto · 2023-12-07T15:10:05Z

I'd be happy with a limited solution that maps the Prometheus data into a more conventional table structure:

Attributes in their own columns that make up the ordering key
Related metrics (that share a common set of attribute keys) in a single table (for example CPU system, user, iowait)
Metric values could be either be in a map or actual columns depending whether they are sparse or dense (as determined by the user)

How to map the Prometheus metrics to the table structures? This could be achieved by decorating the table with hints for the write driver (for example with special keywords in the comment fields)

This approach would of course not be the "full" Prometheus experience because you would need to predefine the schemas but from a performance standpoint it would be superior. As an added bonus, the query ergonomics would be better also for non-PromQL uses.

With some tooling the schema management could also be somewhat painless.

This approach would of course not work for all cases but for the cases where the Prometheus data is structured and well-known, it would be sufficient and would leverage the advantages of ClickHouse.

qoega · 2023-12-08T08:41:48Z

I'd be happy with a limited solution that maps the Prometheus data into a more conventional table structure

For this solution you do not need this solution at all. If you know some most common attributes you can create a materialized column out of them. There is a feature ASSUME CONSTRAINT that will allow to use precomputed column instead of expression that you used in MATERIALIZE i.e. sensors['my_key'] -> my_key column. As you know your columns in advance you can add assume constraint as well. To make it automatic there is this feature request #56588.

Parsing of prometheus data format is just

SELECT name, extractKeyValuePairs(trim(BOTH '{}' FROM labels), '=') as labels, value, fromUnixTimestamp64Milli(toUInt64OrNull(ts)) as t FROM INFILE('input.txt', 'Regexp', 'name String, labels String, value Float64, ts String') 
SETTINGS format_regexp='^(?<metric>[^\s{]+)(?<labels>{[^}]+})? +(?<value>[-+]?[0-9]*\.?[0-9]+[eE]?[-+]?[0-9]*) *(?<ts>[0-9]+)?$', format_regexp_escaping_rule='Raw', format_regexp_skip_unmatched=0;

fzyzcjy · 2024-04-02T13:27:28Z

Hi, is there any updates? Thanks!

nikitamikhaylov · 2024-04-04T12:57:03Z

@fzyzcjy Stay tuned 😌

fzyzcjy · 2024-04-04T13:21:39Z

@nikitamikhaylov Wow it will be implemented? Looking forward to it! (I personally have interest in using clickhouse as an alternative to prometheus in order to handle system metrics)

MacNale · 2024-04-10T15:30:14Z

@oplehto

I'd be happy with a limited solution that maps the Prometheus data into a more conventional table structure:

Attributes in their own columns that make up the ordering key

Related metrics (that share a common set of attribute keys) in a single table (for example CPU system, user, iowait)

Metric values could be either be in a map or actual columns depending whether they are sparse or dense (as determined by the user)

This approach would of course not be the "full" Prometheus experience because you would need to predefine the schemas but from a performance standpoint it would be superior. As an added bonus, the query ergonomics would be better also for non-PromQL uses.

Table per resource will indeed perform better. BUT as you noted, schema creation, insertion and query routing to bunch of tables will become nightmare over the period of time. In addition to this, managing the carnality is also a challenge.

oplehto · 2024-04-10T18:08:42Z

Table per resource will indeed perform better. BUT as you noted, schema creation, insertion and query routing to bunch of tables will become nightmare over the period of time. In addition to this, managing the carnality is also a challenge.

This depends a definitely on the use case. In my situation a steady state has been reached and additional metrics are not that common. In this case optimizing for performance has higher value than providing flexibility: Schema creation is a one-off cost and a pretty standard template can be used. The insertion/query routing could be done either with a materialized views and async inserts or a fairly simple external driver.

I'm not really sure how cardinality would be a challenge? I have ClickHouse tables where the cardinality is extremely high without significant performance impact. I would argue that with a schema the cardinality is easier to deal with than a table where the data ordering is decoupled from the actual hierarchy (e.g. metadata and data in separate tables).

MacNale · 2024-04-16T22:11:22Z

@oplehto It makes sense in the situation with the steady state like you have.

In my case, the metrics and labels both are open. My comment about the cardinality was related to the schema I posted earlier..

One single table with Attributes (aka labels) as map type does not work well for the lookups.

syepes · 2024-04-20T16:18:53Z

+1

zlseu-edu · 2024-04-22T13:17:40Z

+1

nikitamikhaylov added the feature label Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for PromQL #57545

Support for PromQL #57545

nikitamikhaylov commented Dec 5, 2023

qoega commented Dec 5, 2023

alont commented Dec 5, 2023

UnamedRus commented Dec 5, 2023

MacNale commented Dec 7, 2023 •

edited

oplehto commented Dec 7, 2023 •

edited

qoega commented Dec 8, 2023

fzyzcjy commented Apr 2, 2024

nikitamikhaylov commented Apr 4, 2024

fzyzcjy commented Apr 4, 2024 •

edited

MacNale commented Apr 10, 2024

oplehto commented Apr 10, 2024

MacNale commented Apr 16, 2024

syepes commented Apr 20, 2024

zlseu-edu commented Apr 22, 2024

Support for PromQL #57545

Support for PromQL #57545

Comments

nikitamikhaylov commented Dec 5, 2023

PromQL interpreter

Table structure

Native integration with Prometheus API:

qoega commented Dec 5, 2023

alont commented Dec 5, 2023

UnamedRus commented Dec 5, 2023

MacNale commented Dec 7, 2023 • edited

oplehto commented Dec 7, 2023 • edited

qoega commented Dec 8, 2023

fzyzcjy commented Apr 2, 2024

nikitamikhaylov commented Apr 4, 2024

fzyzcjy commented Apr 4, 2024 • edited

MacNale commented Apr 10, 2024

oplehto commented Apr 10, 2024

MacNale commented Apr 16, 2024

syepes commented Apr 20, 2024

zlseu-edu commented Apr 22, 2024

MacNale commented Dec 7, 2023 •

edited

oplehto commented Dec 7, 2023 •

edited

fzyzcjy commented Apr 4, 2024 •

edited