Skip to content

Add ZooKeeper connection log#79494

Merged
antaljanosbenjamin merged 20 commits intomasterfrom
zookeeper-connection-log
Aug 12, 2025
Merged

Add ZooKeeper connection log#79494
antaljanosbenjamin merged 20 commits intomasterfrom
zookeeper-connection-log

Conversation

@antaljanosbenjamin
Copy link
Copy Markdown
Member

@antaljanosbenjamin antaljanosbenjamin commented Apr 23, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add zookeeper_connection_log system table to store historical information about ZooKeeper connections.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Apr 23, 2025

Workflow [PR], commit [34f7deb]

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Apr 23, 2025
@nikitamikhaylov
Copy link
Copy Markdown
Member

Why not to add a new type to existing system.zookeeper_log ?

@Algunenano
Copy link
Copy Markdown
Member

Why not to add a new type to existing system.zookeeper_log ?

Zookeeper log is unusable in the long term due to the high volume of messages it stores.

@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

Yes, exactly what @Algunenano said. This should be a much more lightweight table that we might even enable in our cloud. Recently both @vdimir run into issues where this would have been a valuable information.

@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

Github suggested me to ask Copilot for a review. I cannot stand not to do so. Let's see how good is that :D

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new system table for logging ZooKeeper connection information and integrates it into both the Context and SystemLog machinery while also replacing legacy print statements in integration helpers with proper logging.

  • Replaces print() calls with logging.debug in helper modules
  • Introduces ZooKeeperConnectionLog (header and implementation) and integrates connection/disconnection event logging in Context and ZooKeeper-related modules
  • Updates related components (SystemLog, ZooKeeper classes, etc.) to support the new logging mechanism

Reviewed Changes

Copilot reviewed 18 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/integration/helpers/cluster.py Replaces print statements with logging.debug calls for better logging consistency
src/Storages/System/StorageSystemZooKeeperConnection.cpp Updates retrieval of enabled feature flags and uses the new default ZooKeeper name from ZooKeeperConnectionLog
src/Interpreters/ZooKeeperConnectionLog.{h,cpp} Introduces new system log element and logging API for ZooKeeper connection events
src/Interpreters/SystemLog.{h,cpp} Integrates the new ZooKeeperConnectionLog into the system logs
src/Interpreters/Context.{h,cpp} Adds and updates functions to log ZooKeeper connection events and to use the default ZooKeeper name consistently
src/Common/ZooKeeper/{ZooKeeperImpl.h, ZooKeeper.h, ZooKeeper.cpp} Improves const correctness and minor naming clarifications
src/Common/SystemLogBase.{h,cpp} Maps the new ZooKeeperConnectionLogElement in the system log base
Files not reviewed (5)
  • programs/server/config.d/zookeeper_connection_log.xml: Language not supported
  • tests/integration/log.txt: Language not supported
  • tests/integration/test_reload_auxiliary_zookeepers/configs/config.xml: Language not supported
  • tests/integration/test_reload_auxiliary_zookeepers/configs/users.xml: Language not supported
  • tests/integration/test_zookeeper_connection_log/configs/auxiliary_zookeepers.xml: Language not supported
Comments suppressed due to low confidence (1)

src/Interpreters/Context.cpp:4365

  • [nitpick] The parameter name 'zk_conection_log' seems to have a typo; consider renaming it to 'zk_connection_log' for clarity and consistency.
std::shared_ptr<ZooKeeperConnectionLog> zk_conection_log,

@antaljanosbenjamin antaljanosbenjamin force-pushed the zookeeper-connection-log branch from 6d7123d to fe01c37 Compare May 9, 2025 12:42
@antaljanosbenjamin antaljanosbenjamin marked this pull request as ready for review May 9, 2025 12:46
element.port = static_cast<UInt16>(Poco::NumberParser::parseUnsigned(host_port.substr(offset + 1)));
}
element.index = zookeeper.getConnectedHostIdx();
element.keeper_api_version = 0;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is done in system.zookeeper_connection. I tried to expose the API version, but for some reason it sometimes returned non-sense values, therefore I decided to do the same as in zookeeper_connection as it is not a very important information (enabled feature flags contain the necessary information).

…order

By changing the normal zookeeper config first we can ensure that the order of events in `zookeeper_connection_log` will match the expected one even in cases when the config reload is triggered automatically while only one of the files are modified.
Copy link
Copy Markdown
Member Author

@antaljanosbenjamin antaljanosbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing I just realized: we don't log disconnected events in case of server shutdown/restart. However because of the initialization events it should be obvious. That can be added later.

@Avogar Avogar self-assigned this May 14, 2025
Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

I have to fix the test, it failed in private for some reason.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Jun 24, 2025

Dear @Avogar, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 11, 2025

Workflow [PR], commit [f35e504]

@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

@Avogar I made the test pass also in private with f35e504, so I think the PR should be ready to be merged. Could you please take another look?

@antaljanosbenjamin
Copy link
Copy Markdown
Member Author

  • 02443_detach_attach_partition is flaky
  • test_async_connect_to_multiple_ips/test.py::test is also flaky
  • Fuzzer issue

Making this green manually.

@antaljanosbenjamin antaljanosbenjamin added this pull request to the merge queue Aug 12, 2025
Merged via the queue into master with commit f08c6da Aug 12, 2025
120 of 123 checks passed
@antaljanosbenjamin antaljanosbenjamin deleted the zookeeper-connection-log branch August 12, 2025 12:22
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants