Skip to content

feat(promql): Allow Prometheus handlers to target TimeSeries tables via HTTP headers.#104902

Draft
JTCunning wants to merge 9 commits into
ClickHouse:masterfrom
JTCunning:feature/prometheus-timeseries-http-headers
Draft

feat(promql): Allow Prometheus handlers to target TimeSeries tables via HTTP headers.#104902
JTCunning wants to merge 9 commits into
ClickHouse:masterfrom
JTCunning:feature/prometheus-timeseries-http-headers

Conversation

@JTCunning
Copy link
Copy Markdown
Contributor

When the handler XML omits a table, remote_write, remote_read, and query_api resolve the target from X-ClickHouse-Database and X-ClickHouse-Table with fallback to default.prometheus. Route prometheus http_handlers rules by URL.

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Adds the ability to use X-ClickHouse-Database/Table to target tables in the Prometheus server.

When the handler XML omits a table, remote_write, remote_read, and query_api
resolve the target from X-ClickHouse-Database and X-ClickHouse-Table with
fallback to default.prometheus. Route prometheus http_handlers rules by URL.

Co-authored-by: Cursor <cursoragent@cursor.com>
@JTCunning JTCunning changed the title Allow Prometheus handlers to target TimeSeries tables via HTTP headers. feat(promql): Allow Prometheus handlers to target TimeSeries tables via HTTP headers. May 14, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 14, 2026

Workflow [PR], commit [6d4c8eb]

Summary:

job_name test_name status info comment
Stateless tests (amd_msan, WasmEdge, parallel, 1/2) FAIL
02346_text_index_queries FAIL cidb, issue
00945_bloom_filter_index FAIL cidb, issue
Integration tests (amd_asan_ubsan, db disk, old analyzer, 2/6) FAIL
test_storage_kafka/test_batch_fast.py::test_kafka_engine_put_errors_to_stream[generate_old_create_table_query] FAIL cidb
test_storage_kafka/test_batch_fast.py::test_kafka_engine_put_errors_to_stream_with_random_malformed_json[generate_old_create_table_query] FAIL cidb
Stress test (amd_asan_ubsan) FAIL
Hung check failed, possible deadlock found FAIL cidb
Stress test (arm_debug) FAIL
Hung check failed, possible deadlock found FAIL cidb

AI Review

Summary

This PR adds header-based TimeSeries target selection (X-ClickHouse-Database / X-ClickHouse-Table) for remote_write, remote_read, and query_api when <table> is omitted, and adds URL-based protocol dispatch for http_handlers prometheus rules. The main implementation is solid, but one routing invariant is still broken for supported http_handlers URL expressions.

Findings

❌ Blockers

  • [src/Server/PrometheusRequestHandlerFactory.cpp:275] http_handlers URL matching supports regex: expressions, but protocol dispatch here checks only literal canonical strings ("/api/v1/write", "/api/v1/read", etc.). A rule like <url>regex:/api/v1/write</url> can match requests via urlFilter, yet this code falls back to parseExposeMetricsConfig, silently routing protocol traffic to the wrong handler.
    Suggested fix: either normalize/interpret regex: forms for canonical endpoints, or fail closed by rejecting handler.type=prometheus rules with url starting with regex: instead of silently using expose_metrics.
Tests

⚠️ Add an integration test for http_handlers + handler.type=prometheus + url=regex:/api/v1/write (and optionally read/query variants) that proves config is either routed correctly or rejected with a clear config exception.

Final Verdict

Status: ⚠️ Request changes
Minimum required actions:

  1. Fix prometheus dispatch for regex: url expressions (or explicitly reject them with a config error).
  2. Add focused coverage for that behavior.

@clickhouse-gh clickhouse-gh Bot added the pr-experimental Experimental Feature label May 14, 2026
Comment thread src/Server/PrometheusRequestHandlerFactory.cpp
Comment thread src/Server/PrometheusRequestHandlerFactory.cpp Outdated
Comment thread src/Server/PrometheusRequestHandler.cpp
Exercise deferred table resolution via X-ClickHouse-Database and
X-ClickHouse-Table, XML database defaults, pinned XML table precedence,
and default.prometheus fallback on the main HTTP port.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread tests/integration/test_prometheus_http_handlers_time_series_targeting/test.py Outdated
JTCunning and others added 2 commits May 14, 2026 05:17
Gate per-request X-ClickHouse-Database and X-ClickHouse-Table overrides behind
allow_table_target_in_headers, keep legacy default.prometheus pinning when the
table element is omitted, and infer protocol handlers only from canonical
Prometheus API paths. Rename the integration test directory and expand
coverage for pinned versus header-driven targets.

Co-authored-by: Cursor <cursoragent@cursor.com>
Send snappy-compressed protobuf requests with the expected protocol headers
and verify table selection across default and otherdb targets instead of
empty POST bodies that can pass without exercising resolveTimeSeriesStorageID.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use the dedicated Prometheus listener on port 9093 with explicit remote_write,
remote_read, and query_api handlers, matching test_prometheus_protocols paths
and cluster wiring instead of main HTTP port http_handlers rules.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/Server/PrometheusRequestHandler.cpp
Parse db.table in X-ClickHouse-Table when the database header is omitted,
and document that legacy pinning uses configured database.prometheus.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/Server/PrometheusRequestHandlerFactory.cpp
JTCunning and others added 2 commits May 14, 2026 06:52
Exercise remote write, remote read, and query API on the main HTTP port
through http_handlers rules with type prometheus, and keep metrics
exposition on a non-canonical path.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use a non-write URL for the expose_metrics http_handlers rule so it is
not mistaken for a remote_write endpoint.

Co-authored-by: Cursor <cursoragent@cursor.com>
{
if (handler_type == "expose_metrics" || handler_type == "prometheus")
{
const String url = config.getString(config_prefix + ".url", "");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http_handlers url supports regex: expressions, but this dispatch compares the raw config string to literal canonical paths. For example, with <url>regex:/api/v1/write</url> (or a regex that covers query endpoints), urlFilter still matches requests, but this branch falls through to parseExposeMetricsConfig, so protocol requests are handled as expose_metrics instead of remote_write/remote_read/query_api.

Could we normalize url before dispatch (at least strip optional regex: and handle exact canonical regexes), or explicitly reject regex URLs for handler.type=prometheus so this does not silently misroute traffic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is too much scope creep.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a real misrouting bug in the current code, not just feature scope.

addFiltersFromConfig applies urlFilter (which supports regex:), so a rule like <url>regex:/api/v1/write</url> will match incoming /api/v1/write requests. But protocol selection in createPrometheusHandlerFactoryForHTTPRule compares the raw config string literally (url == "/api/v1/write"), so the same rule is parsed as expose_metrics and does not execute the remote_write path.

So the same config both matches protocol traffic and routes it to the wrong handler. That is silent behavioral divergence.

If broad regex support is out of scope, please at least fail closed here: when handler.type=prometheus and url starts with regex:, reject the rule with a config exception instead of silently falling back to expose_metrics.

When handler XML omits a table, keep the legacy default target for
headerless requests and honor X-ClickHouse-Database and
X-ClickHouse-Table when either is sent. Drop allow_table_target_in_headers
from configuration, docs, and integration tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented May 14, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 92.00% 92.00% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 94.19% (81/86) · Uncovered code

Full report · Diff report

pull Bot pushed a commit to infdahai/ClickHouse that referenced this pull request May 17, 2026
…formed_json

Both tests in test_storage_kafka/test_batch_fast.py wait for the
Kafka 'Committed offset 128' log line and then assert the destination
row counts immediately:

    instance.wait_for_log_line(f'{kafka_table}.*Committed offset 128')
    assert TSV(instance.query(f'SELECT count() FROM test.{kafka_table}_data'))
        == TSV('...')
    assert TSV(instance.query(f'SELECT count() FROM test.{kafka_table}_errors'))
        == TSV('64')

The offset commit and the materialized-view INSERT into the destination
MergeTree are not strictly ordered: the consumer side can log
'Committed offset 128' before the last MV block has finished flushing,
so the immediate SELECT undercounts (typically by exactly the last
block, i.e. assert 54 == 64).

Across both parametrized variants of each test the family has hit 23+
failures across 5 distinct unrelated PRs in the last 30 days (e.g.
ClickHouse#96886, ClickHouse#104902, ClickHouse#104905), with 0 master failures. Surfaced again by
@alexey-milovidov on
ClickHouse#104905 (comment).

The helper file in the same suite already acknowledges this anti-pattern
(tests/integration/helpers/kafka/common.py:463):
"Since everything is async and shaky when receiving messages from
Kafka, we may want to try and check results multiple times in a loop."

Most other tests in test_batch_fast.py already use query_with_retry
with a check_callback for exactly this reason (see e.g. line 1053,
1516). These two tests were the outliers.

Replace the bare assert with query_with_retry + check_callback. The
asserted values are unchanged, so no test intent is weakened: if the
count never reaches the expected value, query_with_retry returns the
last observed result and the surrounding assert ... == EXPECTED fails
loudly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-experimental Experimental Feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant