Incompatibility with Zookeeper 3.9 #53749

Algunenano · 2023-08-23T13:24:00Z

It seems ZK 3.9 has changed something in its protocol and ClickHouse can't connect to it.

The error seems to be related to the handshake:

2023.08.23 13:11:59.885984 [ 422494 ] {} <Error> virtual bool DB::DDLWorker::initializeMainThread(): Code: 999. Coordination::Exception: Connection loss, path: All connection tries failed while connecting to ZooKeeper. nodes: 127.0.0.1:12183, 127.0.0.1:12181, 127.0.0.1:12182
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12183
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12181
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12182
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12183
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12181
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12182
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12183
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12181
Code: 999. Coordination::Exception: Unexpected handshake length received: 37 (Marshalling error): while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 23.6.1.1524 (official build)), 127.0.0.1:12182
. (KEEPER_EXCEPTION), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000e1fc3f5 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
1. Coordination::Exception::Exception(String const&, Coordination::Error, int) @ 0x0000000015220571 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
2. Coordination::Exception::Exception(Coordination::Error, String const&) @ 0x0000000015220c6d in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
3. Coordination::ZooKeeper::connect(std::vector<Coordination::ZooKeeper::Node, std::allocator<Coordination::ZooKeeper::Node>> const&, Poco::Timespan) @ 0x000000001527030e in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
4. Coordination::ZooKeeper::ZooKeeper(std::vector<Coordination::ZooKeeper::Node, std::allocator<Coordination::ZooKeeper::Node>> const&, zkutil::ZooKeeperArgs const&, std::shared_ptr<DB::ZooKeeperLog>) @ 0x000000001526dccd in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
5. zkutil::ZooKeeper::init(zkutil::ZooKeeperArgs) @ 0x0000000015223553 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
6. zkutil::ZooKeeper::ZooKeeper(Poco::Util::AbstractConfiguration const&, String const&, std::shared_ptr<DB::ZooKeeperLog>) @ 0x00000000152270c3 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
7. DB::Context::getZooKeeper() const @ 0x0000000012f73dcc in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
8. DB::DDLWorker::getAndSetZooKeeper() @ 0x0000000012fdfa8d in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
9. DB::DDLWorker::initializeMainThread() @ 0x0000000012ff2c6c in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
10. DB::DDLWorker::runMainThread() @ 0x0000000012fdd771 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
11. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<void (DB::DDLWorker::*)(), DB::DDLWorker*>(void (DB::DDLWorker::*&&)(), DB::DDLWorker*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000012ff3dc9 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
12. ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x000000000e2d1a74 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
13. ? @ 0x000000000e2d7281 in /mnt/ch/official_binaries/clickhouse-common-static-23.6.1.1524/usr/bin/clickhouse
14. ? @ 0x00007f4708c8c9eb in ?
15. ? @ 0x00007f4708d10dfc in ?
 (version 23.6.1.1524 (official build))

ZK 3.8.2 is fine.
Keeper is fine too.

The text was updated successfully, but these errors were encountered:

chhetripradeep · 2023-08-25T16:34:48Z

I think it is related to this change in zk v3.9 https://issues.apache.org/jira/browse/ZOOKEEPER-4492

More detail in the PR: apache/zookeeper#1837 (comment)

…ickhouse and zookeeper 3.9.0, see details in apache/zookeeper#1837 (comment) return `:latest` default value after resolve ClickHouse/ClickHouse#53749

alexey-milovidov · 2023-08-30T16:13:59Z

Cool kids use ClickHouse Keeper.

bputt-e · 2023-09-13T00:05:34Z

Cool kids use ClickHouse Keeper.

We would but we had stability issues and we believe it was from the --force-recovery logic https://github.com/Altinity/clickhouse-operator/blob/master/deploy/clickhouse-keeper/clickhouse-keeper-3-nodes.yaml#L175

Slach · 2023-09-13T02:22:36Z

@bputt-e sorry, but clickhouse-keeper manifests it not related to clickouse-keeper itself
we trying just create manifest, this is not complete now
let's continue discuss in Altinity/clickhouse-operator#1234

@alexey-milovidov
many other things which not allow use clickhouse-keeper in kubernetes for scale-up / scale-down scenarios right now
#53481
#54129

root reason is how eBay/NuRaft store quorum peers and how it update quorum state

alexey-milovidov · 2023-09-13T08:03:51Z

@bputt-e, you are pointing to a third-party ClickHouse operator from altinity, which is unrelated to ClickHouse - it can contain mistakes. And having --force-recovery in the operator is 100% a mistake. Do not use this operator with Keeper.

alexey-milovidov · 2023-09-13T08:06:39Z

@Slach

many other things which not allow use clickhouse-keeper in kubernetes for scale-up / scale-down scenarios right now
#53481
#54129

You don't need to do any fancy stuff with Keeper. It is a very simple software. If you want to scale up - scale the server/pod up and that's it.

alexey-milovidov · 2023-09-13T08:09:16Z

@Slach you are pointing to a new reconfig command, implemented by a third-party contributor, but incompletely:
#49450

I have no idea why someone needs this command. ClickHouse Keeper works perfectly without a reconfig request. We don't need it. If the existence of this incomplete implementation bothers you, I can remove it.

Slach · 2023-09-13T14:20:29Z

@alexey-milovidov
sorry, could you explain, how incompatibility with zookeeper 3.9 relates to reconfig?
could you please reopen this issue?

Could you explain how to create cluster with 3 clickouse-keeper nodes and scale down it to just 1 clickhouse-keeper node?
Without reconfig and without --force-recovery?

Because even when you change XML config with <raft_configuration> it will have no effect in clickhouse-keeper and no one keeper node will start without achieve quorum for 3 nodes.

Please don't remove reconfig this is usefull functionality, just complete sync applying raft_configuration changes during reconfig #53481 command.

UnamedRus · 2023-09-13T14:35:08Z

I would probably ask about another completely reasonable scenario:

2 DC: A & B

3 keeper nodes in DC A (they participate in quorum)
3 keeper nodes in DC B (they only listen to changes, analogue of observer in ZK)

And what if DC A is completely down and we need to switch to DC B, so do reconfigure keeper nodes in DC B, without quorum being up.

It's quite common approach for companies, which value their data and ability to survive any cataclysm.
May be they are not cool kids, but at least they do care about their data.

We don't need it.

Until first disaster?

alexey-milovidov · 2023-09-13T23:47:51Z

Switching leaders without a quorum can lead to data loss (of the data that was present in the unavailable datacenter).

A bulletproof approach is to have three Keeper nodes in three different data centers, but not too far from each other (say, less than 30 ms RTT).

An approach when you switch the leader manually makes sense, but only when you can accept data loss - it is similar to, say, changing the master in MySQL replication (a source of many horror stories, especially if done with some automation).

UnamedRus · 2023-09-14T00:40:15Z

(of the data that was present in the unavailable datacenter).

Datacenter is already gone, so at least temporary, but this DC specific data is already lost from user perspective. Plus learners, should be pretty up to date with latest changes in keeper, much better than ClickHouse replication (just because of data size)
https://github.com/eBay/NuRaft/blob/99eeef34a2620686e0dd40ad7fbd5cab561140fc/docs/readonly_member.md?plain=1

but not too far from each other (say, less than 30 ms RTT).

30 ms RTT is too much for quorum for my taste.

but only when you can accept data loss

Normal replication in ClickHouse is also for people, who can accept data loss. (no quorum during write/async replication)

tavplubix · 2023-09-14T11:54:29Z

Could you explain how to create cluster with 3 clickouse-keeper nodes and scale down it to just 1 clickhouse-keeper node?

Don't do this, it's an antipattern. Single-node [Zoo]Keeper clusters are good for dev/staging env, but I would not recommend it for production.

reconfigure keeper nodes in DC B, without quorum being up
they do care about their data

They do not care about their data if they reconfigure a coordination service forcefully without a quorum.

tavplubix · 2023-09-14T12:07:38Z

But I agree that reconfig and --force-recovery are not related to this issue. If you have something to say regarding ClickHouse Keeper usability, then please create another issue, and let's continue the discussion there. Off-topic comments may be removed.

As for the incompatibility with ZooKeeper 3.9, it's a minor issue because:

we have ClickHouse Keeper
you can just postpone upgrading your ZooKeeper clusters for a while, there's no way it's urgent

So we can reopen this issue and hope that some good person from the community will send us a PR

* add connection to gcs and use different context for upload incase it got cancel by another thread * save * keep ctx * keep ctx * use v2 * change to GCS_CLIENT_POOL_SIZE * pin zookeeper to 3.8.2 version for resolve incompatibility between clickhouse and zookeeper 3.9.0, see details in apache/zookeeper#1837 (comment) return `:latest` default value after resolve ClickHouse/ClickHouse#53749 * Revert "add more precise disk re-balancing for not exists disks, during download, partial fix Altinity#561" This reverts commit 20e250c. * fix S3 head object Server Side Encryption parameters, fix Altinity#709 * change timeout to 60m, TODO make tests Parallel --------- Co-authored-by: Slach <bloodjazman@gmail.com>

This commit enables the read-only flag when connecting to the ZooKeeper server. This flag is enabled by sending one extra byte when connecting, and then receiving one extra byte during the first response. In addition to that, we modify createIfNotExists to not complain about attempting to alter a read-only ZooKeeper cluster if the node already exists. This makes ClickHouse more useful in the event of a loss of quorum, user credentials are still accessible, which makes it possible to connect to the cluster and run read queries. Any DDL or DML query on a Distributed database or ReplicatedMergeTree table will correctly fail, since it needs to write to ZooKeeper to execute the query. Any non-distributed query will be possible, which is ok since the query was never replicated in the first place, there is no loss of consistency. Fixes ClickHouse#53749 as it seems to be the only thing 3.9 enforced.

There is an incompatibility of ClickHouse with Zookeeper 3.9. See: - apache/zookeeper#2146 - apache/zookeeper#1837 - ClickHouse/ClickHouse#53749

Algunenano added the usability label Aug 23, 2023

Slach mentioned this issue Aug 28, 2023

add connection to gcs and use different context for upload incase it … Altinity/clickhouse-backup#727

Merged

CeliaGMqrz mentioned this issue Sep 4, 2023

[bitnami/clickhouse] Set Zookeeper 3.8.x as subchart bitnami/charts#19005

Merged

4 tasks

alexey-milovidov added the st-wontfix Known issue, no plans to fix it currenlty label Sep 13, 2023

alexey-milovidov closed this as completed Sep 13, 2023

tavplubix reopened this Sep 14, 2023

tavplubix added help wanted minor Priority: minor and removed st-wontfix Known issue, no plans to fix it currenlty labels Sep 14, 2023

mkmkme mentioned this issue Dec 4, 2023

Add support for read-only mode in ZooKeeper #57479

Merged

1 task

alexey-milovidov closed this as completed in #57479 Dec 7, 2023

vincentbernat added a commit to akvorado/akvorado that referenced this issue Apr 4, 2024

docker: downgrade to Zookeeper 3.8

a154a5a

There is an incompatibility of ClickHouse with Zookeeper 3.9. See: - apache/zookeeper#2146 - apache/zookeeper#1837 - ClickHouse/ClickHouse#53749

vincentbernat added a commit to akvorado/akvorado that referenced this issue Apr 4, 2024

docker: downgrade to Zookeeper 3.8

b9cd302

There is an incompatibility of ClickHouse with Zookeeper 3.9. See: - apache/zookeeper#2146 - apache/zookeeper#1837 - ClickHouse/ClickHouse#53749

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatibility with Zookeeper 3.9 #53749

Incompatibility with Zookeeper 3.9 #53749

Algunenano commented Aug 23, 2023

chhetripradeep commented Aug 25, 2023

alexey-milovidov commented Aug 30, 2023

bputt-e commented Sep 13, 2023

Slach commented Sep 13, 2023

alexey-milovidov commented Sep 13, 2023

alexey-milovidov commented Sep 13, 2023

alexey-milovidov commented Sep 13, 2023

Slach commented Sep 13, 2023

UnamedRus commented Sep 13, 2023 •

edited

Loading

alexey-milovidov commented Sep 13, 2023

UnamedRus commented Sep 14, 2023

tavplubix commented Sep 14, 2023

tavplubix commented Sep 14, 2023 •

edited

Loading

Incompatibility with Zookeeper 3.9 #53749

Incompatibility with Zookeeper 3.9 #53749

Comments

Algunenano commented Aug 23, 2023

chhetripradeep commented Aug 25, 2023

alexey-milovidov commented Aug 30, 2023

bputt-e commented Sep 13, 2023

Slach commented Sep 13, 2023

alexey-milovidov commented Sep 13, 2023

alexey-milovidov commented Sep 13, 2023

alexey-milovidov commented Sep 13, 2023

Slach commented Sep 13, 2023

UnamedRus commented Sep 13, 2023 • edited Loading

alexey-milovidov commented Sep 13, 2023

UnamedRus commented Sep 14, 2023

tavplubix commented Sep 14, 2023

tavplubix commented Sep 14, 2023 • edited Loading

UnamedRus commented Sep 13, 2023 •

edited

Loading

tavplubix commented Sep 14, 2023 •

edited

Loading