feat(promql): Allow Prometheus handlers to target TimeSeries tables via HTTP headers.#104902
feat(promql): Allow Prometheus handlers to target TimeSeries tables via HTTP headers.#104902JTCunning wants to merge 9 commits into
Conversation
When the handler XML omits a table, remote_write, remote_read, and query_api resolve the target from X-ClickHouse-Database and X-ClickHouse-Table with fallback to default.prometheus. Route prometheus http_handlers rules by URL. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Workflow [PR], commit [6d4c8eb] Summary: ❌
AI ReviewSummaryThis PR adds header-based TimeSeries target selection ( Findings❌ Blockers
Tests
Final VerdictStatus:
|
Exercise deferred table resolution via X-ClickHouse-Database and X-ClickHouse-Table, XML database defaults, pinned XML table precedence, and default.prometheus fallback on the main HTTP port. Co-authored-by: Cursor <cursoragent@cursor.com>
Gate per-request X-ClickHouse-Database and X-ClickHouse-Table overrides behind allow_table_target_in_headers, keep legacy default.prometheus pinning when the table element is omitted, and infer protocol handlers only from canonical Prometheus API paths. Rename the integration test directory and expand coverage for pinned versus header-driven targets. Co-authored-by: Cursor <cursoragent@cursor.com>
Send snappy-compressed protobuf requests with the expected protocol headers and verify table selection across default and otherdb targets instead of empty POST bodies that can pass without exercising resolveTimeSeriesStorageID. Co-authored-by: Cursor <cursoragent@cursor.com>
Use the dedicated Prometheus listener on port 9093 with explicit remote_write, remote_read, and query_api handlers, matching test_prometheus_protocols paths and cluster wiring instead of main HTTP port http_handlers rules. Co-authored-by: Cursor <cursoragent@cursor.com>
Parse db.table in X-ClickHouse-Table when the database header is omitted, and document that legacy pinning uses configured database.prometheus. Co-authored-by: Cursor <cursoragent@cursor.com>
Exercise remote write, remote read, and query API on the main HTTP port through http_handlers rules with type prometheus, and keep metrics exposition on a non-canonical path. Co-authored-by: Cursor <cursoragent@cursor.com>
Use a non-write URL for the expose_metrics http_handlers rule so it is not mistaken for a remote_write endpoint. Co-authored-by: Cursor <cursoragent@cursor.com>
| { | ||
| if (handler_type == "expose_metrics" || handler_type == "prometheus") | ||
| { | ||
| const String url = config.getString(config_prefix + ".url", ""); |
There was a problem hiding this comment.
http_handlers url supports regex: expressions, but this dispatch compares the raw config string to literal canonical paths. For example, with <url>regex:/api/v1/write</url> (or a regex that covers query endpoints), urlFilter still matches requests, but this branch falls through to parseExposeMetricsConfig, so protocol requests are handled as expose_metrics instead of remote_write/remote_read/query_api.
Could we normalize url before dispatch (at least strip optional regex: and handle exact canonical regexes), or explicitly reject regex URLs for handler.type=prometheus so this does not silently misroute traffic?
There was a problem hiding this comment.
No, that is too much scope creep.
There was a problem hiding this comment.
This is still a real misrouting bug in the current code, not just feature scope.
addFiltersFromConfig applies urlFilter (which supports regex:), so a rule like <url>regex:/api/v1/write</url> will match incoming /api/v1/write requests. But protocol selection in createPrometheusHandlerFactoryForHTTPRule compares the raw config string literally (url == "/api/v1/write"), so the same rule is parsed as expose_metrics and does not execute the remote_write path.
So the same config both matches protocol traffic and routes it to the wrong handler. That is silent behavioral divergence.
If broad regex support is out of scope, please at least fail closed here: when handler.type=prometheus and url starts with regex:, reject the rule with a config exception instead of silently falling back to expose_metrics.
When handler XML omits a table, keep the legacy default target for headerless requests and honor X-ClickHouse-Database and X-ClickHouse-Table when either is sent. Drop allow_table_target_in_headers from configuration, docs, and integration tests. Co-authored-by: Cursor <cursoragent@cursor.com>
LLVM Coverage Report
Changed lines: 94.19% (81/86) · Uncovered code |
…formed_json
Both tests in test_storage_kafka/test_batch_fast.py wait for the
Kafka 'Committed offset 128' log line and then assert the destination
row counts immediately:
instance.wait_for_log_line(f'{kafka_table}.*Committed offset 128')
assert TSV(instance.query(f'SELECT count() FROM test.{kafka_table}_data'))
== TSV('...')
assert TSV(instance.query(f'SELECT count() FROM test.{kafka_table}_errors'))
== TSV('64')
The offset commit and the materialized-view INSERT into the destination
MergeTree are not strictly ordered: the consumer side can log
'Committed offset 128' before the last MV block has finished flushing,
so the immediate SELECT undercounts (typically by exactly the last
block, i.e. assert 54 == 64).
Across both parametrized variants of each test the family has hit 23+
failures across 5 distinct unrelated PRs in the last 30 days (e.g.
ClickHouse#96886, ClickHouse#104902, ClickHouse#104905), with 0 master failures. Surfaced again by
@alexey-milovidov on
ClickHouse#104905 (comment).
The helper file in the same suite already acknowledges this anti-pattern
(tests/integration/helpers/kafka/common.py:463):
"Since everything is async and shaky when receiving messages from
Kafka, we may want to try and check results multiple times in a loop."
Most other tests in test_batch_fast.py already use query_with_retry
with a check_callback for exactly this reason (see e.g. line 1053,
1516). These two tests were the outliers.
Replace the bare assert with query_with_retry + check_callback. The
asserted values are unchanged, so no test intent is weakened: if the
count never reaches the expected value, query_with_retry returns the
last observed result and the surrounding assert ... == EXPECTED fails
loudly.
When the handler XML omits a table, remote_write, remote_read, and query_api resolve the target from X-ClickHouse-Database and X-ClickHouse-Table with fallback to default.prometheus. Route prometheus http_handlers rules by URL.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Adds the ability to use
X-ClickHouse-Database/Tableto target tables in the Prometheus server.