New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce fallback ZooKeeper sessions #50424
Introduce fallback ZooKeeper sessions #50424
Conversation
args.session_lifetime->max_sec, | ||
}; | ||
UInt64 client_session_duration_sec = distribution(thread_local_rng); | ||
client_session_deadline = clock::now() + std::chrono::seconds(client_session_duration_sec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work with system clock changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a look!
I'm not 100% sure. I'm following an example in src/Dictionaries/SSDCacheDictionaryStorage.h:1366 , does that account for system clock changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using clock = std::chrono::steady_clock; |
Class std::chrono::steady_clock represents a monotonic clock.
So it's okay.
5e1b0d8
to
2d89eef
Compare
This is an automated comment for commit 5dfc305 with description of existing statuses. It's updated for the latest CI running
|
# wait for zk sessions to reach deadline | ||
time.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unreliable, the test is going to be flaky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestion on how to improve? The test needs to wait for ZK sessions to expire and re-connect. They are set to expire between 2 and 4 seconds, hence 5. I understand sleeping for a magic number of seconds is not ideal.
Is there a way to query clickhouse itself for which ZK replica it's connected to? Maybe we could use that in query_with_retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe system.zookeeper_connection
, query_with_retry
and zookeeper_load_balancing
will help
I'll need to re-work this PR. I see the exception leaking through to a client that sends queries and that shouldn't happen. |
712f391
to
dfc3dea
Compare
Do you mean that a "session expired" exception is thrown on each attempt to make a zk request when the session is closed due to lifetime timeout? That's unavoidable. We need this exception. |
I suggested in #51524 a similar idea; but in this implementation we do a session refresh even when there is no failover. Does it make sense in the code to have this only kick in after a fail-over event to prevent constant needless refreshs, or is it preferred for simplicity to just have a maximum session duration? |
As a user, right now ZK session refreshes are a PITA. At least now inserts are able to survive it by retrying, but other operations (ALTER, CREATE, etc) won't. It might help in specific cases, but I don't think having ZK refreshes happening all the time by default is a good default (but that's why it should be behind a setting). |
@devoxel I quite like your idea actually. So only the "failover" session would have a deadline, not regular ones. Do you have any implementation in mind, or an integration test maybe? If not, I think I'm going to change my PR to do what you suggest. |
edb7267
to
da93c5c
Compare
{ | ||
if (args.hosts[i] == address.toString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This did not work for cases where ZooKeeper host is a DNS name (including integration tests)
@devoxel I've updated my change, does it fit your use-case? The ZooKeeper session now only has a limited timeline if a fallback node is used. @tavplubix could I get another review please? Thanks. |
src/Common/ZooKeeper/ZooKeeper.cpp
Outdated
bool fallback_session_expired = ((session_lifetime_seconds > 0) && (getSessionUptime() >= session_lifetime_seconds)); | ||
return fallback_session_expired || impl->isExpired(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only correct way to expire the current session and start a new one is to call finalize
. Otherwise, some objects will start using a new session while others still use the old one. It will cause consistency issues and will trigger obscure bugs. See #50424 (comment)
@tavplubix fair enough. Let's discuss follow-up work that needs to be done before I implement anything further. Here's my understanding of how things are meant to work but please correct me if I'm wrong:
So what my implementation essentially does is it schedules the restarting thread close to "artificial" ZooKeeper session expiry to get a new session on time. Now, my understanding is that you suggest that when the session is "expired" because of the time limit, all its methods should throw ZSESSIONEXPIRED exception, so that clients have a chance to refresh the session? |
There's also a question I hope you could answer separately. So there's meant to be one ZooKeeper session active at a time, stored on Context. This is where it's updated: ClickHouse/src/Interpreters/Context.cpp Line 2698 in 4d03c23
And returned to the caller: ClickHouse/src/Interpreters/Context.cpp Line 2702 in 4d03c23
The startNewSession method creates a new ZK session and returns a shared_ptr to is: ClickHouse/src/Common/ZooKeeper/ZooKeeper.cpp Line 879 in 4d03c23
Here's the caveat: some callers of getZooKeeper() may store the shared_ptr internally, e.g. ClickHouse/src/Interpreters/DDLWorker.h Line 132 in 4d03c23
The implication is that the old expired session will stay alive until all callers update their references by calling getZooKeeper() on context to receive the new, non-expired session. Only then the old session will get "finalized" because the destructor will be called. So there may be a situation where there are multiple active ZooKeeper sessions present, at least conceptually.
Does this look like a bug to you or do I misunderstand something? Would it be better to destroy the old session right there in the same call the creates a new one (Context::getZooKeeper)? |
No,
That's true.
No, it's being set by anyone who had called ClickHouse/src/Interpreters/Context.cpp Lines 2687 to 2702 in 4d03c23
It can be
(just an observation) Hmm, looks like we don't really need to run the restarting thread every minute anymore (previously we didn't have that notification mechanism)
Yes, exactly
I'm not sure about
Yes, but it's okay (see below)
A session is finalized in the dtor only when clickhouse-server is shutting down. If an error happens in ZooKeeper client while clickhouse-server is working, then ClickHouse/src/Common/ZooKeeper/ZooKeeperImpl.cpp Lines 650 to 654 in 8b63760
ClickHouse/src/Common/ZooKeeper/ZooKeeper.cpp Lines 234 to 238 in 6d4143d
And if no errors happen, then the session is still active and Also,
Yes, some objects may still use shared_ptr to the old ZooKeeper object, but since that object was finalized it will throw "session expired" exceptions on any attempts to make a request. Having multiple "finalized" ZooKeeper clients is harmless.
No, it's possible to have multiple ZooKeeper objects, but only one of them can be active. And others are finalized and basically unusable |
c51e916
to
8f6594f
Compare
8f6594f
to
3f3f4dd
Compare
bea6aa1
to
0d8ba7e
Compare
|
||
def test_fallback_session(started_cluster: ClickHouseCluster): | ||
# only leave connecting to zoo3 possible | ||
with PartitionManager() as pm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this test to parallel_skip.json
, otherwise PartitionManager
may work incorrectly
7a82996
to
5dfc305
Compare
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Introduce fallback ZooKeeper sessions which are time-bound. Fixed
index
column in system.zookeeper_connection for DNS addresses.Documentation entry for user-facing changes
This change allows users to limit lifetime of ZooKeeper sessions. It's useful to balance load between ZooKeeper replicas over long periods of time. A hypothetical situation where this may be useful (please see the integration test in the PR):
I believe this will help the scenario described in #29617 (comment) as well as #51524