Prometheus endpoint #7900

vdimir · 2019-11-24T17:08:51Z

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

New Feature

Changelog entry:

Add Prometheus support.

Detailed description (optional):

ClickHouse serves metrics in Prometheus format on specified endpoint.
Close #7369

…rFactory

vdimir · 2019-11-24T17:12:01Z

Initial implementation very basic, and I want to discuss some question.

Prometheus support Histogram and Summary metric types.
Should we use one of them (e.g. for CurrentMetrics)?
It may require collecting and analyzing metrics in background (now it scrapped on demand when Prometheus client request for it) for calculating buckets or quantiles.

I make changes in HttpHandlerFactory to support different handles (Prometheus one of them).
But implementation may seems over-complicated, we may want to make some changes in it, or revert some changes and use different approach.

Should AsyncMetrics be added to Prometheus ad in Graphite exporter?
Or some data from system.parts, e.g. disk usage for each table or number of rows ?

New PrometheusMetricsWriter class contains some common parts with MetricsTransiter.
Do we need to merge entities that aggregate metrics into one or several with common parts?

Does test required for new functionality? I think that implementing integration test possible, but we need to decide what exactly we want to check.

vdimir · 2019-11-24T21:04:40Z

Also I add metric names in snake case following prometheus guidelines. I wrote names directly in source code, because some metrics should contain units suffix (or _total) and some names needs to be changed to plural form.

zhang2014 · 2019-11-25T02:05:41Z

Can it be implemented through a configuration of this #7572 ?

abyss7 · 2019-11-25T18:08:28Z

Should AsyncMetrics be added to Prometheus ad in Graphite exporter?
Or some data from system.parts, e.g. disk usage for each table or number of rows ?

For now we suggest that only metrics and events contain all that we need.

alexey-milovidov · 2019-11-26T07:34:14Z

Initial implementation very basic, and I want to discuss some question.

The implementation looks very sound.

Prometheus support Histogram and Summary metric types.
Should we use one of them (e.g. for CurrentMetrics)?
It may require collecting and analyzing metrics in background (now it scrapped on demand when Prometheus client request for it) for calculating buckets or quantiles.

Better to finish this PR and postpone implementation of derived types of metrics.

Should AsyncMetrics be added to Prometheus ad in Graphite exporter?

AsyncMetrics should be added exactly in the same way as for Graphite - for consistency.

Does test required for new functionality? I think that implementing integration test possible, but we need to decide what exactly we want to check.

Yes, we need a test. A test will check that the server responds for prometheus handle and the response has some expected value of some metric (not necessarily to check for exact value, just check the presence of some metric and that value is greater than something).

alexey-milovidov · 2019-11-26T07:37:09Z

@zhang2014

Can it be implemented through a configuration of this #7572 ?

Yes, this feature along with Template format looks powerful enough to implement something like Prometheus handler without any specialized code. But I'm not sure. Some custom logic like converting names to snake case or calculating deltas from previous values still require additional logic. Let's try to do it as an example. PS. I think, that custom code is Ok

alexey-milovidov · 2019-11-26T07:39:09Z

@abyss7

For now we suggest that only metrics and events contain all that we need.

AsynchronousMetrics contain valuable information for monitoring. If we don't add them, then Prometheous support will be inferior than Graphite support.

alex-zaitsev · 2019-11-26T12:40:24Z

@vdimir , your implementation hardcodes metric names and description. It would require to remember adding new metrics in monitoring when something is added in ClickHouse. Outside of ClickHouse implementations (https://github.com/f1yegor/clickhouse_exporter/ or inside https://github.com/Altinity/clickhouse-operator) try to create metrics on the fly from system tables. Could you elaborate why you decided to hardcode, please?

filimonov · 2019-11-26T15:18:50Z

Outside of ClickHouse implementations (https://github.com/f1yegor/clickhouse_exporter/ or inside https://github.com/Altinity/clickhouse-operator) try to create metrics on the fly from system tables.

Yes. One line instead introducing extra param seems much simpler:

Also, both exporters provide results of some extra queries as metrics (so the people migrating from those exporters will have less metrics). Maybe we can consider adding them (or similar) too:

alexey-milovidov · 2019-11-26T15:42:33Z

@filimonov

Also, both exporters provide results of some extra queries as metrics (so the people migrating from those exporters will have less metrics). Maybe we can consider adding them (or similar)

This should be implemented in a separate PR. The goal of this PR is to add Prometheus handler to provide the same set of metrics that we already support for Graphite.

vdimir · 2019-11-26T16:32:17Z

@vdimir , your implementation hardcodes metric names and description. It would require to remember adding new metrics in monitoring when something is added in ClickHouse. Outside of ClickHouse implementations (https://github.com/f1yegor/clickhouse_exporter/ or inside https://github.com/Altinity/clickhouse-operator) try to create metrics on the fly from system tables. Could you elaborate why you decided to hardcode, please?

I add snake case names because some of them needs to be changed in plural form, or some suffix should be added. (my comment above).

I considered automatic conversion to snake case too, but I haven't decide what is better.
Ok, I'll change it.

zhang2014 · 2019-11-27T02:55:50Z

Yes, this feature along with Template format looks powerful enough to implement something like Prometheus handler without any specialized code. But I'm not sure. Some custom logic like converting names to snake case or calculating deltas from previous values still require additional logic. Let's try to do it as an example. PS. I think, that custom code is Ok

Custom code is fine. I'm just offering another way:

names to snake case: It maybe a function?
calculating deltas from previous values still require additional logic: Maybe metrics, events and AsynchronousMetrics can add some columns to provide the last two measurements? like uptime, this is useful for some other monitoring systems.

It looks like a good contribution anyway.

alexey-milovidov · 2019-11-27T07:01:45Z

calculating deltas from previous values still require additional logic: Maybe metrics, events and AsynchronousMetrics can add some columns to provide the last two measurements? like uptime, this is useful for some other monitoring systems.

Three points:

We already have system.metric_log table.
The user decide for which period to accumulate the values (one second, minute, etc) - probably we should not hardcode a few such periods in tables.
There are many derived metrics possible. The most popular are: exponential smoothed averages; quantiles of a set of measurements...

zhang2014 · 2019-11-27T08:58:34Z

The following configuration currently work in #7572

config.xml:

<predefine_query_handler>
    <url>/metrics</url>
    <method>GET</method>
    <queries>
        <query>SELECT * FROM system.metrics LIMIT 5 FORMAT Template SETTINGS format_template_resultset = 'prometheus_template_output_format_resultset', format_template_row = 'prometheus_template_output_format_row', format_template_rows_between_delimiter = '\n'</query>
    </queries>
</predefine_query_handler>

prometheus_template_output_format_resultset:

${data:None}

prometheus_template_output_format_row:

# HELP ${metric:CSV} ${description:CSV}
# TYPE ${metric:CSV} counter
${metric:CSV} ${value:CSV}

Result:

curl -vvv 'http://localhost:8123/metrics'
*   Trying ::1...
* Connected to localhost (::1) port 8123 (#0)
> GET /metrics HTTP/1.1
> Host: localhost:8123
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Wed, 27 Nov 2019 08:54:25 GMT
< Connection: Keep-Alive
< Content-Type: text/plain; charset=UTF-8
< X-ClickHouse-Server-Display-Name: i-tl62qd0o
< Transfer-Encoding: chunked
< X-ClickHouse-Query-Id: f39235f6-6ed7-488c-ae07-c7ceafb960f6
< Keep-Alive: timeout=3
< X-ClickHouse-Summary: {"read_rows":"0","read_bytes":"0","written_rows":"0","written_bytes":"0","total_rows_to_read":"0"}
< 
# HELP "Query" "Number of executing queries"
# TYPE "Query" counter
"Query" 1

# HELP "Merge" "Number of executing background merges"
# TYPE "Merge" counter
"Merge" 0

# HELP "PartMutation" "Number of mutations (ALTER DELETE/UPDATE)"
# TYPE "PartMutation" counter
"PartMutation" 0

# HELP "ReplicatedFetch" "Number of data parts being fetched from replica"
# TYPE "ReplicatedFetch" counter
"ReplicatedFetch" 0

# HELP "ReplicatedSend" "Number of data parts being sent to replicas"
# TYPE "ReplicatedSend" counter
"ReplicatedSend" 0

* Connection #0 to host localhost left intact

zhang2014 · 2019-11-27T09:02:06Z

We already have system.metric_log table.
The user decide for which period to accumulate the values (one second, minute, etc) - probably we should not hardcode a few such periods in tables.
There are many derived metrics possible. The most popular are: exponential smoothed averages; quantiles of a set of measurements...

Can we add AsynchronousMetrics to the system.metric_log ?

alexey-milovidov · 2019-12-03T01:10:44Z

Can we add AsynchronousMetrics to the system.metric_log ?

We have not added it, because they are updated in longer intervals (60 seconds by default) in contrast to other values, that are always up to date and can be written as frequent as needed.

alexey-milovidov · 2019-12-03T01:21:09Z

Can we avoid duplicating metric names in snake_case?
snake_case is just a guideline from Prometheus, not a strict requirement.

This reverts commit 2ddb801.

alexey-milovidov · 2019-12-09T23:25:13Z

Is it acceptable to keep metric names in CamelCase without any changes? To keep the code more maintenable.

vdimir · 2019-12-10T08:46:22Z

Is it acceptable to keep metric names in CamelCase without any changes? To keep the code more maintenable.

I have not enough experience with Prometheus to decide which solution more acceptable. But now I reverted names to CamelCase and ready to implement other approach for naming if we'll find it.

Other uncompleted part of this task is adding integration test. I have some issue with running it on MacOS, because I couldn't build ClickHouse natively and use docker with Ubuntu. But integration tests require docker itself which is not available because I already in container. I am interested to figure out machinery with integration tests and finish this task, so I hope that I find some solution for it.

azat · 2019-12-15T20:28:45Z

Is it acceptable to keep metric names in CamelCase without any changes?

Another reason for keeping the metric names - it will allow use various dashboards without any changes

alexey-milovidov · 2019-12-15T22:45:11Z

About the choice of default port in configuration example, 8001.
Is it reasonable?

We should default to some most widely used port for that type of handles or simply reuse http_port.

alexey-milovidov · 2019-12-15T22:48:14Z

2019.12.16 01:47:49.789705 [ 1 ] {} <Information> Application: Listening http://[::1]:8001
not informative enough.

alexey-milovidov · 2019-12-15T22:49:26Z

But works perfectly.

vdimir · 2019-12-16T07:53:24Z

About the choice of default port in configuration example, 8001.
Is it reasonable?

We should default to some most widely used port for that type of handles or simply reuse http_port.

Common Prometheus port is 9090 (link), I had to use it in prometheus.port.
If prometheus section in config doesn't contain port then http_port would be used.

2019.12.16 01:47:49.789705 [ 1 ] {} <Information> Application: Listening http://[::1]:8001
not informative enough.

Ok, I'll make response more verbose and send other PR.
Also I'll change prometheus default settings port to 9090.

UPD: I didn't notice that you already change response text.

azat · 2019-12-16T11:19:03Z

Common Prometheus port is 9090

Hm, but this is prometheus server, not the exporter (while this is an exporter)

P.S. this is just a note, no objections against 9090 port

vdimir · 2019-12-16T12:07:24Z

Hm, but this is prometheus server, not the exporter (while this is an exporter)

Thanks for note, I was wrong. Exporters uses ports starting at 9100.
https://prometheus.io/docs/instrumenting/writing_exporters/#port-numbers

Feel free to grab the next free port number when developing your exporter, preferably before publicly announcing it. If you’re not ready to release yet, putting your username and WIP is fine.

alexey-milovidov · 2019-12-16T12:11:36Z

Ok. Waiting for a PR!

response more verbose

I already made it.

azat · 2019-12-16T12:33:13Z

Also there are problems with jemalloc statistics, there is . in it, i.e.:

ClickHouseAsyncMetricsjemalloc.background_thread.run_interval 0
ClickHouseAsyncMetricsjemalloc.background_thread.num_runs 0
ClickHouseAsyncMetricsjemalloc.background_thread.num_threads 0
ClickHouseAsyncMetricsjemalloc.metadata 66363096
ClickHouseAsyncMetricsjemalloc.active 3122552832
ClickHouseAsyncMetricsjemalloc.retained 2311983104
ClickHouseAsyncMetricsjemalloc.metadata_thp 0
ClickHouseAsyncMetricsjemalloc.resident 7604621312
ClickHouseAsyncMetricsjemalloc.mapped 8007053312
ClickHouseAsyncMetricsjemalloc.allocated 3066734960

alexey-milovidov · 2019-12-16T19:21:20Z

Two suggestions:

Let's add underscores after metric group: ClickHouseAsyncMetrics_
Let's replace dots with underscores: jemalloc_background_thread_run_interval

ut0mt8 · 2020-05-15T10:26:57Z

Hello, this looks cool. Just few comments/questions : so we have a basic /metrics endpoints which answer so hardcoded queries and we can add some predefined_queries in http_handler conf.
Great but it would be better if we can add predefined_queries at the end of the same path /metrics endpoints (unless I miss something).
Also it seems it doesn't support prometheus labels , which is a must in the prom world.
Good work anyway.

vdimir · 2020-05-15T20:18:31Z

Hello, this looks cool. Just few comments/questions : so we have a basic /metrics endpoints which answer so hardcoded queries and we can add some predefined_queries in http_handler conf.

Great but it would be better if we can add predefined_queries at the end of the same path /metrics endpoints (unless I miss something).

Do you mean same as #10043 ?

Also it seems it doesn't support prometheus labels , which is a must in the prom world.

Could you explain, what types of labels should be added and how it should be set?

ut0mt8 · 2020-05-15T21:28:02Z

In prometheus label can be anything.
In our queries we wants labels for fields in the group by clause.

For example :

select sum(somefield) as super_stat , pid from stats_table group by pid;

would result in :

super_stat{pid="somepid"} XX

So yep add more custom queries possible for the same htttp route / path (/metrics) and add the possibilities to have more custom output.
No idea of the grammar so far.

zhang2014 · 2020-05-19T02:02:56Z

So yep add more custom queries possible for the same htttp route / path (/metrics) and add the possibilities to have more custom output.

@ut0mt8 Maybe HTTP handler config + template format will help you ?

reference:
#7900 (comment)

https://clickhouse.tech/docs/en/interfaces/http/#predefined_query_handler

https://clickhouse.tech/docs/en/interfaces/formats/#format-template

vdimir added 3 commits November 24, 2019 15:06

Create http handler responding metrics in prometheus format

947fc8d

Make HTTPRequestHandlerFactory composable, add PrometeusRequestHandle…

b19e33e

…rFactory

Prometheus run on custom port

ffcd902

vdimir added 2 commits November 24, 2019 23:03

Style fixes

0e65716

Metric names in prometheus-friendly snake_case format

2ddb801

abyss7 added the pr-feature Pull request with new product feature label Nov 25, 2019

abyss7 self-assigned this Nov 25, 2019

alexey-milovidov added the can be tested label Nov 26, 2019

vdimir added 2 commits November 26, 2019 22:03

Style fixes in HTTPHandlerFactory

eadc58b

Add async metrics to Prometheus output

3c42200

alexey-milovidov added 4 commits December 3, 2019 04:14

Update Server.cpp

937386a

Update CurrentMetrics.cpp

4183266

Update ProfileEvents.cpp

30029e5

Update ProfileEvents.cpp

839cfd1

alexey-milovidov added the st-need-info We need extra data to continue (waiting for response) label Dec 3, 2019

Revert "Metric names in prometheus-friendly snake_case format"

29312f5

This reverts commit 2ddb801.

Prometheus metric prefixes in CamelCase

b6aadf4

Add test_prometheus_endpoint

d413c4a

alexey-milovidov approved these changes Dec 15, 2019

View reviewed changes

alexey-milovidov merged commit d498e14 into ClickHouse:master Dec 15, 2019

alexey-milovidov added a commit that referenced this pull request Dec 15, 2019

Minor changes after merge #7900

d5d2959

vdimir mentioned this pull request Dec 21, 2019

Replace ClickHouse Intergration link to native exporter prometheus/docs#1513

Closed

Slach mentioned this pull request Apr 2, 2020

add <prometheus> in server settings and monitoring section #10015

Merged

Tri0L mentioned this pull request Apr 3, 2020

Export sql query results to prometheus #10043

Closed

vdimir deleted the prometheus-metrics-7369 branch May 16, 2020 07:20

jameshartig mentioned this pull request Jul 21, 2020

Additional metrics added to Prometheus endpoint #12647

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus endpoint #7900

Prometheus endpoint #7900

vdimir commented Nov 24, 2019

vdimir commented Nov 24, 2019 •

edited

vdimir commented Nov 24, 2019

zhang2014 commented Nov 25, 2019

abyss7 commented Nov 25, 2019

alexey-milovidov commented Nov 26, 2019

alexey-milovidov commented Nov 26, 2019 •

edited

alexey-milovidov commented Nov 26, 2019 •

edited

alex-zaitsev commented Nov 26, 2019

filimonov commented Nov 26, 2019

alexey-milovidov commented Nov 26, 2019 •

edited

vdimir commented Nov 26, 2019

zhang2014 commented Nov 27, 2019 •

edited

alexey-milovidov commented Nov 27, 2019

zhang2014 commented Nov 27, 2019 •

edited

zhang2014 commented Nov 27, 2019

alexey-milovidov commented Dec 3, 2019

alexey-milovidov commented Dec 3, 2019

alexey-milovidov commented Dec 9, 2019

vdimir commented Dec 10, 2019

azat commented Dec 15, 2019

alexey-milovidov commented Dec 15, 2019 •

edited

alexey-milovidov commented Dec 15, 2019

alexey-milovidov commented Dec 15, 2019

vdimir commented Dec 16, 2019 •

edited

azat commented Dec 16, 2019

vdimir commented Dec 16, 2019 •

edited

alexey-milovidov commented Dec 16, 2019

azat commented Dec 16, 2019 •

edited

alexey-milovidov commented Dec 16, 2019

ut0mt8 commented May 15, 2020

vdimir commented May 15, 2020

ut0mt8 commented May 15, 2020

zhang2014 commented May 19, 2020

Prometheus endpoint #7900

Prometheus endpoint #7900

Conversation

vdimir commented Nov 24, 2019

vdimir commented Nov 24, 2019 • edited

vdimir commented Nov 24, 2019

zhang2014 commented Nov 25, 2019

abyss7 commented Nov 25, 2019

alexey-milovidov commented Nov 26, 2019

alexey-milovidov commented Nov 26, 2019 • edited

alexey-milovidov commented Nov 26, 2019 • edited

alex-zaitsev commented Nov 26, 2019

filimonov commented Nov 26, 2019

alexey-milovidov commented Nov 26, 2019 • edited

vdimir commented Nov 26, 2019

zhang2014 commented Nov 27, 2019 • edited

alexey-milovidov commented Nov 27, 2019

zhang2014 commented Nov 27, 2019 • edited

zhang2014 commented Nov 27, 2019

alexey-milovidov commented Dec 3, 2019

alexey-milovidov commented Dec 3, 2019

alexey-milovidov commented Dec 9, 2019

vdimir commented Dec 10, 2019

azat commented Dec 15, 2019

alexey-milovidov commented Dec 15, 2019 • edited

alexey-milovidov commented Dec 15, 2019

alexey-milovidov commented Dec 15, 2019

vdimir commented Dec 16, 2019 • edited

azat commented Dec 16, 2019

vdimir commented Dec 16, 2019 • edited

alexey-milovidov commented Dec 16, 2019

azat commented Dec 16, 2019 • edited

alexey-milovidov commented Dec 16, 2019

ut0mt8 commented May 15, 2020

vdimir commented May 15, 2020

ut0mt8 commented May 15, 2020

zhang2014 commented May 19, 2020

vdimir commented Nov 24, 2019 •

edited

alexey-milovidov commented Nov 26, 2019 •

edited

alexey-milovidov commented Nov 26, 2019 •

edited

alexey-milovidov commented Nov 26, 2019 •

edited

zhang2014 commented Nov 27, 2019 •

edited

zhang2014 commented Nov 27, 2019 •

edited

alexey-milovidov commented Dec 15, 2019 •

edited

vdimir commented Dec 16, 2019 •

edited

vdimir commented Dec 16, 2019 •

edited

azat commented Dec 16, 2019 •

edited