Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for PromQL #57545

Open
nikitamikhaylov opened this issue Dec 5, 2023 · 14 comments
Open

Support for PromQL #57545

nikitamikhaylov opened this issue Dec 5, 2023 · 14 comments
Labels

Comments

@nikitamikhaylov
Copy link
Member

Prometheus Query Language is a powerful tool to that lets the user select and aggregate time series data efficiently. https://prometheus.io/docs/prometheus/latest/querying/basics/

Use case

  • Prometheus is a de-facto standard for metrics;
  • The primary use case is to query data stored in ClickHouse using PromQL.
  • Optimized columnar storage.
  • Optimized table structure for Prometheus migration use case (drop-in replacement).

Describe the solution you'd like

The task is divided into several sub-tasks:

PromQL interpreter

The biggest challenge are several functions like rate , increase - functions which take possibly huge range-vector and depend on the order of elements in this vector.
PromQL parser and other functions (simple, aggregate and window ones) are doable.

We can support even different "dialects" of this language e.g. support MetricsQL or Grafana's LogQL.

Table structure

Similar to what we have right now for Kusto or PRQL we can put it under the setting sql_dialect. When the sql_dialect is set to promql ClickHouse will read only from the tables with the specified structure. Specifically:

  • The main table with schema like (timestamp, value, fingerprint) . Fingerprint could be just a hash of all the tags associated with the metric.
  • Index table which will be used for filtering the series by tags. We can pick one of proposed schemas:
    • (tag_key, tag_value, fingerprint)
    • (date, fingerprint, tags_array)

The way how to define the name for these tables can be different - it could be another user-level setting or part of the configuration file.

Native integration with Prometheus API:

What needs to be done for the native integration with Prometheus and drop-in replacement?

  • Support for Prometheus HTTP API
  • Support for Prometheus Remote Read API (Write?)
  • InfluxDB Line Protocol ?
@qoega
Copy link
Member

qoega commented Dec 5, 2023

(timestamp, value, fingerprint)

This schema looks strange:
(date, fingerprint, hour, timestamp, value) can make more sense.

But as I wrote somewhere in related threads, I believe that we need to store vector for some larger granularity(i.e. hour) as a single value in a column.
It also makes even more sense if we will want to have fixed grid: 1 second grid is natural, but for 15 second grid it is nice to have precomputed column with array of last values per 15 second interval. Those can be used even automatically if we will add MATERIALIZED=>ASSUME(or just projections) that will substitute query parts using precomputed projections.

@alont
Copy link

alont commented Dec 5, 2023

It would be beneficial to include a tenant column in these tables -- we have seen scenarios where there were collisions between time series associated with different tenants.

@UnamedRus
Copy link
Contributor

https://github.com/metrico/qryn they do the similar, but outside of ClickHouse.

There is quite a few issues arise, when you start to use ClickHouse as backend for promethreus like data.

  1. Not optimal reads when you try to get last value by fingerprint.
  2. PromQL allow to do manipulations with timeseries which makes it much harder to deal with index table in efficient way.

@MacNale
Copy link

MacNale commented Dec 7, 2023

Interesting feature. This helps reducing one more moving part from the observability backend stack storing metrics along side traces and logs in clickhouse.

  • Agree with @UnamedRus.
  • Handling cardinality is hard. There is no efficient solution now. Storing the Resource Attributes in Map brings the flexibility but degrades the lookup performance.
  • Schema(less) design is hard for storing metrics

This is from OpenTelemetry Exporter -

(
    `ResourceAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ResourceSchemaUrl` String CODEC(ZSTD(1)),
    `ScopeName` String CODEC(ZSTD(1)),
    `ScopeVersion` String CODEC(ZSTD(1)),
    `ScopeAttributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `ScopeDroppedAttrCount` UInt32 CODEC(ZSTD(1)),
    `ScopeSchemaUrl` String CODEC(ZSTD(1)),
    `MetricName` String CODEC(ZSTD(1)),
    `MetricDescription` String CODEC(ZSTD(1)),
    `MetricUnit` String CODEC(ZSTD(1)),
    `Attributes` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `StartTimeUnix` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `TimeUnix` DateTime64(9) CODEC(Delta(8), ZSTD(1)),
    `Value` Float64 CODEC(ZSTD(1)),
    `Flags` UInt32 CODEC(ZSTD(1)),
    `Exemplars.FilteredAttributes` Array(Map(LowCardinality(String), String)) CODEC(ZSTD(1)),
    `Exemplars.TimeUnix` Array(DateTime64(9)) CODEC(ZSTD(1)),
    `Exemplars.Value` Array(Float64) CODEC(ZSTD(1)),
    `Exemplars.SpanId` Array(String) CODEC(ZSTD(1)),
    `Exemplars.TraceId` Array(String) CODEC(ZSTD(1)),
    INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_attr_key mapKeys(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1,
    INDEX idx_attr_value mapValues(Attributes) TYPE bloom_filter(0.01) GRANULARITY 1
)
ENGINE = MergeTree
PARTITION BY toDate(TimeUnix)
ORDER BY (MetricName, Attributes, toUnixTimestamp64Nano(TimeUnix))
TTL toDateTime(TimeUnix) + toIntervalDay(3)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1 │```

@oplehto
Copy link

oplehto commented Dec 7, 2023

I'd be happy with a limited solution that maps the Prometheus data into a more conventional table structure:

  • Attributes in their own columns that make up the ordering key
  • Related metrics (that share a common set of attribute keys) in a single table (for example CPU system, user, iowait)
  • Metric values could be either be in a map or actual columns depending whether they are sparse or dense (as determined by the user)

How to map the Prometheus metrics to the table structures? This could be achieved by decorating the table with hints for the write driver (for example with special keywords in the comment fields)

This approach would of course not be the "full" Prometheus experience because you would need to predefine the schemas but from a performance standpoint it would be superior. As an added bonus, the query ergonomics would be better also for non-PromQL uses.

With some tooling the schema management could also be somewhat painless.

This approach would of course not work for all cases but for the cases where the Prometheus data is structured and well-known, it would be sufficient and would leverage the advantages of ClickHouse.

@qoega
Copy link
Member

qoega commented Dec 8, 2023

I'd be happy with a limited solution that maps the Prometheus data into a more conventional table structure

For this solution you do not need this solution at all. If you know some most common attributes you can create a materialized column out of them. There is a feature ASSUME CONSTRAINT that will allow to use precomputed column instead of expression that you used in MATERIALIZE i.e. sensors['my_key'] -> my_key column. As you know your columns in advance you can add assume constraint as well. To make it automatic there is this feature request #56588.

Parsing of prometheus data format is just

SELECT name, extractKeyValuePairs(trim(BOTH '{}' FROM labels), '=') as labels, value, fromUnixTimestamp64Milli(toUInt64OrNull(ts)) as t FROM INFILE('input.txt', 'Regexp', 'name String, labels String, value Float64, ts String') 
SETTINGS format_regexp='^(?<metric>[^\s{]+)(?<labels>{[^}]+})? +(?<value>[-+]?[0-9]*\.?[0-9]+[eE]?[-+]?[0-9]*) *(?<ts>[0-9]+)?$', format_regexp_escaping_rule='Raw', format_regexp_skip_unmatched=0;

@fzyzcjy
Copy link

fzyzcjy commented Apr 2, 2024

Hi, is there any updates? Thanks!

@nikitamikhaylov
Copy link
Member Author

@fzyzcjy Stay tuned 😌

@fzyzcjy
Copy link

fzyzcjy commented Apr 4, 2024

@nikitamikhaylov Wow it will be implemented? Looking forward to it! (I personally have interest in using clickhouse as an alternative to prometheus in order to handle system metrics)

@MacNale
Copy link

MacNale commented Apr 10, 2024

@oplehto

I'd be happy with a limited solution that maps the Prometheus data into a more conventional table structure:

  • Attributes in their own columns that make up the ordering key
  • Related metrics (that share a common set of attribute keys) in a single table (for example CPU system, user, iowait)
  • Metric values could be either be in a map or actual columns depending whether they are sparse or dense (as determined by the user)

This approach would of course not be the "full" Prometheus experience because you would need to predefine the schemas but from a performance standpoint it would be superior. As an added bonus, the query ergonomics would be better also for non-PromQL uses.

Table per resource will indeed perform better. BUT as you noted, schema creation, insertion and query routing to bunch of tables will become nightmare over the period of time. In addition to this, managing the carnality is also a challenge.

@oplehto
Copy link

oplehto commented Apr 10, 2024

Table per resource will indeed perform better. BUT as you noted, schema creation, insertion and query routing to bunch of tables will become nightmare over the period of time. In addition to this, managing the carnality is also a challenge.

This depends a definitely on the use case. In my situation a steady state has been reached and additional metrics are not that common. In this case optimizing for performance has higher value than providing flexibility: Schema creation is a one-off cost and a pretty standard template can be used. The insertion/query routing could be done either with a materialized views and async inserts or a fairly simple external driver.

I'm not really sure how cardinality would be a challenge? I have ClickHouse tables where the cardinality is extremely high without significant performance impact. I would argue that with a schema the cardinality is easier to deal with than a table where the data ordering is decoupled from the actual hierarchy (e.g. metadata and data in separate tables).

@MacNale
Copy link

MacNale commented Apr 16, 2024

@oplehto It makes sense in the situation with the steady state like you have.

In my case, the metrics and labels both are open. My comment about the cardinality was related to the schema I posted earlier..

One single table with Attributes (aka labels) as map type does not work well for the lookups.

@syepes
Copy link

syepes commented Apr 20, 2024

+1

1 similar comment
@zlseu-edu
Copy link

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants