Skip to content

[DBMON-6141] Adding support for single connection to self-hosted cluster#22970

Open
sangeetashivaji wants to merge 9 commits intomasterfrom
sangeeta.shivajirao/self-hosted-single-node-connection
Open

[DBMON-6141] Adding support for single connection to self-hosted cluster#22970
sangeetashivaji wants to merge 9 commits intomasterfrom
sangeeta.shivajirao/self-hosted-single-node-connection

Conversation

@sangeetashivaji
Copy link
Contributor

@sangeetashivaji sangeetashivaji commented Mar 18, 2026

What does this PR do?

Adds support for monitoring self-hosted multi-node ClickHouse clusters through a single agent connection. Introduces a new cluster_name config option that, when set, causes the agent to connect to one node but collect metrics and samples from all nodes in the cluster via clusterAllReplicas("cluster", "table").

Also adds per-node hostname attribution to query metrics, activity samples, and query completions - so each query is tagged with the specific ClickHouse node it ran on (server_node/hostname fields).

Motivation

Previously, clusterAllReplicas was only supported via single_endpoint_mode, which was designed for ClickHouse Cloud and hardcoded the cluster name to 'default'. Self-hosted clusters have named clusters (e.g.
dbm_cluster) and had no way to use the single-connection monitoring pattern — they had to configure a separate agent instance per node.

This PR decouples the two use cases:

  • cluster_name - self-hosted clusters (connects to one node, queries all via named cluster)
  • single_endpoint_mode - ClickHouse Cloud (same behavior as before, cluster name 'default')

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b15b1c36d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


result_row = {
'normalized_query_hash': str(normalized_query_hash),
'server_node': str(server_node) if server_node else '',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale server_node from merged statement metrics

This row now carries server_node, but ClickhouseStatementMetrics._merge_rows_across_nodes still collapses multiple node rows into one row per normalized_query_hash by summing metrics across all nodes. The merged row therefore keeps a single node label (from the max-count row) while count, total_time, read/write bytes, etc. represent cluster-wide totals, which misattributes data whenever the same query runs on more than one node.

Useful? React with 👍 / 👎.

Comment on lines +138 to +139
if self._config.cluster_name:
self.tag_manager.set_tag("clustername", self._config.cluster_name, replace=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Populate clustername before computing database_instance

database_instance is set before the new clustername tag is added, and database_identifier is computed/cached from the tags available at that moment. With cluster_name configured, templates that include $clustername will not resolve correctly, which prevents users from distinguishing multiple cluster configurations on the same endpoint and can collapse identities unexpectedly.

Useful? React with 👍 / 👎.

@datadog-prod-us1-4
Copy link
Contributor

datadog-prod-us1-4 bot commented Mar 18, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 972e974 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.03%. Comparing base (9ec7b03) to head (972e974).
⚠️ Report is 25 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sangeetashivaji sangeetashivaji changed the title [DRAFT] Adding support for single connection to self-hosted cluster [DBMON-6141] Adding support for single connection to self-hosted cluster Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant