Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare the introduction of more keeper faults #56917

Merged
merged 29 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
19931fe
Prepare the introduction of more keeper faults
Algunenano Nov 16, 2023
cb9c973
WIP: Move implementation
Algunenano Nov 16, 2023
3633e77
Refactor ZooKeeperWithFaultInjection
Algunenano Nov 17, 2023
9154e2f
Style
Algunenano Nov 17, 2023
40175f2
Style and fix log level
Algunenano Nov 17, 2023
210a0ee
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Nov 17, 2023
aadb786
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Nov 20, 2023
20eb5d3
CI: Increase keeper retries and decrease retry backoff to a minimum
Algunenano Nov 20, 2023
8217915
Replace part_committed_locally_but_zookeeper with retries
Algunenano Nov 20, 2023
04f966c
Recover special handling of ephemeral nodes in ZooKeeperWithFaultInje…
Algunenano Nov 20, 2023
820c1a5
Tidy
Algunenano Nov 21, 2023
8972cde
Fix WithRetries callback
Algunenano Nov 21, 2023
16ad3ef
Fix bug in exists()
Algunenano Nov 22, 2023
2810603
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Nov 22, 2023
2539100
Review improvements
Algunenano Nov 24, 2023
63fe821
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Nov 24, 2023
a55a0c0
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Dec 11, 2023
e6be38b
Adapt from HEAD
Algunenano Dec 11, 2023
462cd0e
Fix CLICKHOUSE_KEEPER_CLIENT with CLICKHOUSE_BINARY
Algunenano Dec 11, 2023
e1965bb
WIP: Remove UNCERTAIN_COMMIT in INSERT
Algunenano Dec 11, 2023
9d8d5df
Partially revert "make stages commit"
Algunenano Dec 12, 2023
923c3b7
Implement retries when ZK connection fails without committing the tra…
Algunenano Dec 12, 2023
c77f30d
Change check order in replication.lib
Algunenano Dec 12, 2023
049fb60
Fix error on retries due to TABLE_IS_READ_ONLY
Algunenano Dec 12, 2023
dd405a6
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Dec 13, 2023
efcacd3
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Dec 14, 2023
3200933
Leave backups as HEAD
Algunenano Dec 14, 2023
546484d
Merge remote-tracking branch 'blessed/master' into backup_1
Algunenano Dec 14, 2023
6cf8c9b
Review improvements
Algunenano Dec 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 4 additions & 3 deletions src/Common/FailPoint.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,14 @@ static struct InitFiu

/// We should define different types of failpoints here. There are four types of them:
/// - ONCE: the failpoint will only be triggered once.
/// - REGULAR: the failpoint will always be triggered util disableFailPoint is called.
/// - PAUSAEBLE_ONCE: the failpoint will be blocked one time when pauseFailPoint is called, util disableFailPoint is called.
/// - PAUSAEBLE: the failpoint will be blocked every time when pauseFailPoint is called, util disableFailPoint is called.
/// - REGULAR: the failpoint will always be triggered until disableFailPoint is called.
/// - PAUSEABLE_ONCE: the failpoint will be blocked one time when pauseFailPoint is called, util disableFailPoint is called.
/// - PAUSEABLE: the failpoint will be blocked every time when pauseFailPoint is called, util disableFailPoint is called.

#define APPLY_FOR_FAILPOINTS(ONCE, REGULAR, PAUSEABLE_ONCE, PAUSEABLE) \
ONCE(replicated_merge_tree_commit_zk_fail_after_op) \
ONCE(replicated_merge_tree_insert_quorum_fail_0) \
REGULAR(replicated_merge_tree_commit_zk_fail_when_recovering_from_hw_fault) \
REGULAR(use_delayed_remote_source) \
REGULAR(cluster_discovery_faults) \
REGULAR(check_table_query_delay_for_part) \
Expand Down
5 changes: 3 additions & 2 deletions src/Common/ZooKeeper/ZooKeeper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -869,7 +869,7 @@ bool ZooKeeper::waitForDisappear(const std::string & path, const WaitCondition &
/// method is called.
do
{
/// Use getData insteand of exists to avoid watch leak.
/// Use getData instead of exists to avoid watch leak.
impl->get(path, callback, std::make_shared<Coordination::WatchCallback>(watch));

if (!state->event.tryWait(1000))
Expand All @@ -888,7 +888,7 @@ bool ZooKeeper::waitForDisappear(const std::string & path, const WaitCondition &
return false;
}

void ZooKeeper::handleEphemeralNodeExistence(const std::string & path, const std::string & fast_delete_if_equal_value)
void ZooKeeper::deleteEphemeralNodeIfContentMatches(const std::string & path, const std::string & fast_delete_if_equal_value)
{
zkutil::EventPtr eph_node_disappeared = std::make_shared<Poco::Event>();
String content;
Expand Down Expand Up @@ -1175,6 +1175,7 @@ std::future<Coordination::RemoveResponse> ZooKeeper::asyncRemove(const std::stri
return future;
}

/// Needs to match ZooKeeperWithInjection::asyncTryRemove implementation
std::future<Coordination::RemoveResponse> ZooKeeper::asyncTryRemove(const std::string & path, int32_t version)
{
auto promise = std::make_shared<std::promise<Coordination::RemoveResponse>>();
Expand Down
10 changes: 6 additions & 4 deletions src/Common/ZooKeeper/ZooKeeper.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ namespace CurrentMetrics

namespace DB
{
class ZooKeeperLog;
class ZooKeeperLog;
class ZooKeeperWithFaultInjection;

namespace ErrorCodes
{
Expand Down Expand Up @@ -194,6 +195,9 @@ struct MultiReadResponses
/// Methods with names not starting at try- raise KeeperException on any error.
class ZooKeeper
{
/// ZooKeeperWithFaultInjection wants access to `impl` pointer to reimplement some async functions with faults
friend class DB::ZooKeeperWithFaultInjection;
Algunenano marked this conversation as resolved.
Show resolved Hide resolved

public:

using Ptr = std::shared_ptr<ZooKeeper>;
Expand Down Expand Up @@ -470,7 +474,7 @@ class ZooKeeper
/// If the node exists and its value is equal to fast_delete_if_equal_value it will remove it
/// If the node exists and its value is different, it will wait for it to disappear. It will throw a LOGICAL_ERROR if the node doesn't
/// disappear automatically after 3x session_timeout.
void handleEphemeralNodeExistence(const std::string & path, const std::string & fast_delete_if_equal_value);
void deleteEphemeralNodeIfContentMatches(const std::string & path, const std::string & fast_delete_if_equal_value);

Coordination::ReconfigResponse reconfig(
const std::string & joining,
Expand Down Expand Up @@ -646,8 +650,6 @@ class ZooKeeper

ZooKeeperArgs args;

std::mutex mutex;

Poco::Logger * log = nullptr;
std::shared_ptr<DB::ZooKeeperLog> zk_log;

Expand Down