Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replica stuck in ReadOnly State #58126

Open
karolcienkosz opened this issue Dec 21, 2023 · 5 comments
Open

replica stuck in ReadOnly State #58126

karolcienkosz opened this issue Dec 21, 2023 · 5 comments

Comments

@karolcienkosz
Copy link

Hello,
I have ClickHouse cluster(version: 23.8.7) with multiple nodes. There is a table with 12 shards, and each shard has 3 replicas.

CREATE TABLE stats_s1.invoicing
(
    `received_at_hour` DateTime('Etc/UTC'),
    `endpoint_id` UInt32,
    `deal_line_item_id` UInt32,
    `format` LowCardinality(String),
    `costs` Int64
)
ENGINE = ReplicatedSummingMergeTree('/clickhouse/stats/tables/stats_s3/invoicing', 'clickhouse-3.clickhouse.clickhouse.svc.cluster.local')
PARTITION BY toYYYYMM(received_at_hour)
PRIMARY KEY (received_at_hour, deal_line_item_id, endpoint_id)
ORDER BY (received_at_hour, deal_line_item_id, endpoint_id, format)
SETTINGS index_granularity = 8192

the table is fed with data through the following flow:

app -> events_entry(TABLE, ENGINE: Null) --> mv_exchange_stats_entry(MATERIALIZED VIEW, ENGINE = Distributed) --> invoicing(TABLE, ENGINE=ReplicatedSummingMergeTree)

From time to time, when node which stores replica is restarted, replica stucks in ReadOnly state with error(from table system.replicas):
Code: 84. DB::Exception: Directory /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/detached/covered-by-broken_202311_820132_935806_43436_try9 already exists and is not empty. (DIRECTORY_ALREADY_EXISTS)

All the time zookeepers are reachable and other 2 replicas of this shard do not reports any issues.

In logs I can see:

<Warning> stats_s1.invoicing (ReplicatedMergeTreeRestartingThread): Table was in readonly mode. Will try to activate it.
...
Directory covered-by-broken_202311_879557_936229_18409_try1 (to detach to) already exists. Will detach to directory with '_tryN' suffix
.
.
Directory covered-by-broken_202311_879557_936229_18409_try9 (to detach to) already exists. Will detach to directory with '_tryN' suffix

but there is no log(or I cannot find) which explains what was the reason that replica cannot be activated.

To fix this replica, I need to detach, drop and then attach, restore and sync replica.

In case when I have 2/3 replicas available, after restart node should be able restore replica based on others.

Any suggestion what else can i check to troubleshoot this issue, or there is a chance that there is a bug?

Thank you in advance!

@den-crane
Copy link
Contributor

detach table  stats_s1.invoicing;

rm -rf /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/202311_820132_935806_43436

attach table  stats_s1.invoicing;

@karolcienkosz
Copy link
Author

detach table  stats_s1.invoicing;

rm -rf /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/202311_820132_935806_43436

attach table  stats_s1.invoicing;

I would like to understand what is a problem. As I mentioned above, I;m able to fix readonly replica but after some time when node is restarted, issue occurs again.

Or is there any significant difference between detach, drop and then attach, restore and sync replica and your suggestion? @den-crane

@tavplubix
Copy link
Member

but there is no log(or I cannot find) which explains what was the reason that replica cannot be activated.

You have found it, the reason is that the replica cannot detach the broken part and startup replication.

It would be great to find the first log messages about that part so these logs would explain which part was broken and why (202311_879557_936229_18409 is covered by a broken part, the actual broken part has another name).

Also, 202311_879557_936229_18409 should not have been detached many times because of this:

if (broken
&& isFullPartStorage(*this)
&& looksLikeBrokenDetachedPartHasTheSameContent(res, original_checksums_content, original_files_list))
{
LOG_WARNING(log, "Directory {} (to detach to) already exists, "
"but its content looks similar to content of the broken part which we are going to detach. "
"Assuming it was already cloned to detached, will not do it again to avoid redundant copies of broken part.", res);
return {};
}
LOG_WARNING(log, "Directory {} (to detach to) already exists. Will detach to directory with '_tryN' suffix.", res);

So it's worth checking if /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/202311_820132_935806_43436/ and /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/detached/covered-by-broken_202311_820132_935806_43436_try*/ have the same list of files and the same content of checksums.txt and columns.txt

@SaltTan
Copy link
Contributor

SaltTan commented Apr 3, 2024

I had this happen to me with 23.12.4.15 after an unexpected restart.
Unfortunately I deleted the parts and I cannot check what was wrong with them.
The only information that I have is that the size of parts in system.parts (both compressed and uncompressed) was the same as in system.part_log.
The system.detached_parts.bytes_on_disk number was a bit higher than system.parts.bytes_on_disk though.

@viacheslav-mik
Copy link

viacheslav-mik commented Jun 5, 2024

Same issue after hard reset on 23.8.12.13 on a ReplicatedMergeTree engine.
I thought it was fixed in #41981
There's a lot of broken parts and at least 5 broken tables, let's review one of the parts.
Note: hard shut down had happened at 11:10 and server started at 11:40.

Here is the first mention of example part in err log:

2024.06.05 11:40:17.051139 [ 4380 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Source replica does not have part 20242002_938800_940504_564. Removing it from ZooKeeper.

Then following error is logged a bit later (there's a bunch of other broken part logs skipped):

2024.06.05 11:40:31.902858 [ 4393 ] {} <Warning> cloud.edr_process_started (ReplicatedMergeTreeRestartingThread): Table was in readonly mode. Will try to activate it.
2024.06.05 11:40:31.932487 [ 4393 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Detached 0 parts covered by broken part 20242202_618911_618911_0: 
2024.06.05 11:40:31.938434 [ 4393 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Detached 0 parts covered by broken part 20242102_764786_764786_0: 
2024.06.05 11:40:31.941086 [ 4393 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
....
2024.06.05 11:40:31.961942 [ 4393 ] {} <Error> cloud.edr_process_started (ReplicatedMergeTreeRestartingThread): Failed to restart the table. Will try again: Code: 84. DB::Exception: Directory /mnt/HC_Volume_13358330/clickhouse/store/24e/24e58837-1475-49f4-a4e5-88371475e9f4/detached/covered-by-broken_20242002_938800_940633_689_try9 already exists and is not empty. (DIRECTORY_ALREADY_EXISTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c623797 in /usr/bin/clickhouse
1. DB::Exception::Exception<String>(int, FormatStringHelperImpl<std::type_identity<String>::type>, String&&) @ 0x000000000713252d in /usr/bin/clickhouse
2. DB::localBackup(std::shared_ptr<DB::IDisk> const&, String const&, String const&, bool, std::optional<unsigned long>, bool, std::unordered_set<String, std::hash<String>, std::equal_to<String>, std::allocator<String>> const&, std::shared_ptr<DB::IDiskTransaction>) @ 0x0000000012f344b0 in /usr/bin/clickhouse
3. DB::DataPartStorageOnDiskBase::freeze(String const&, String const&, std::function<void (std::shared_ptr<DB::IDisk> const&)>, DB::IDataPartStorage::ClonePartParams const&) const @ 0x0000000012b36c4d in /usr/bin/clickhouse
4. DB::IMergeTreeDataPart::makeCloneInDetached(String const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::IDiskTransaction> const&) const @ 0x0000000012b666a7 in /usr/bin/clickhouse
5. DB::StorageReplicatedMergeTree::removePartAndEnqueueFetch(String const&, bool) @ 0x00000000127cd187 in /usr/bin/clickhouse
6. DB::ReplicatedMergeTreeQueue::createLogEntriesToFetchBrokenParts() @ 0x0000000012ebc5b0 in /usr/bin/clickhouse
7. DB::ReplicatedMergeTreeRestartingThread::tryStartup() @ 0x0000000012ef6cc4 in /usr/bin/clickhouse
8. DB::ReplicatedMergeTreeRestartingThread::run() @ 0x0000000012ef4bf0 in /usr/bin/clickhouse
9. DB::BackgroundSchedulePool::threadFunction() @ 0x00000000110a3ddf in /usr/bin/clickhouse
10. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000110a4e11 in /usr/bin/clickhouse
11. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c704604 in /usr/bin/clickhouse
12. ? @ 0x00007fdb235fa609 in ?
13. ? @ 0x00007fdb2351f353 in ?
 (version 23.8.12.13 (official build))

A bit later exception is thrown for the example part itself:

2024.06.05 11:41:46.345691 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Detached 0 parts covered by broken part 20242102_764786_764786_0: 
2024.06.05 11:41:46.351466 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351529 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try1 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351577 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try2 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351624 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try3 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351671 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try4 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351719 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try5 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351764 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try6 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351858 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try7 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351910 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try8 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351962 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try9 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.352943 [ 4361 ] {} <Error> cloud.edr_process_started (ReplicatedMergeTreeRestartingThread): Failed to restart the table. Will try again: Code: 84. DB::Exception: Directory /mnt/HC_Volume_13358330/clickhouse/store/24e/24e58837-1475-49f4-a4e5-88371475e9f4/detached/covered-by-broken_20242002_938800_940504_564_try9 already exists and is not empty. (DIRECTORY_ALREADY_EXISTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c623797 in /usr/bin/clickhouse
1. DB::Exception::Exception<String>(int, FormatStringHelperImpl<std::type_identity<String>::type>, String&&) @ 0x000000000713252d in /usr/bin/clickhouse
2. DB::localBackup(std::shared_ptr<DB::IDisk> const&, String const&, String const&, bool, std::optional<unsigned long>, bool, std::unordered_set<String, std::hash<String>, std::equal_to<String>, std::allocator<String>> const&, std::shared_ptr<DB::IDiskTransaction>) @ 0x0000000012f344b0 in /usr/bin/clickhouse
3. DB::DataPartStorageOnDiskBase::freeze(String const&, String const&, std::function<void (std::shared_ptr<DB::IDisk> const&)>, DB::IDataPartStorage::ClonePartParams const&) const @ 0x0000000012b36c4d in /usr/bin/clickhouse
4. DB::IMergeTreeDataPart::makeCloneInDetached(String const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::IDiskTransaction> const&) const @ 0x0000000012b666a7 in /usr/bin/clickhouse
5. DB::StorageReplicatedMergeTree::removePartAndEnqueueFetch(String const&, bool) @ 0x00000000127cd187 in /usr/bin/clickhouse
6. DB::ReplicatedMergeTreeQueue::createLogEntriesToFetchBrokenParts() @ 0x0000000012ebc5b0 in /usr/bin/clickhouse
7. DB::ReplicatedMergeTreeRestartingThread::tryStartup() @ 0x0000000012ef6cc4 in /usr/bin/clickhouse
8. DB::ReplicatedMergeTreeRestartingThread::run() @ 0x0000000012ef4bf0 in /usr/bin/clickhouse
9. DB::BackgroundSchedulePool::threadFunction() @ 0x00000000110a3ddf in /usr/bin/clickhouse
10. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000110a4e11 in /usr/bin/clickhouse
11. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c704604 in /usr/bin/clickhouse
12. ? @ 0x00007fdb235fa609 in ?
13. ? @ 0x00007fdb2351f353 in ?
 (version 23.8.12.13 (official build))

Checksums and columns of covered-by-broken_20242002_938800_940504_564_try9 and covered-by-broken_20242002_938800_940504_564 are the same.
Please let me know what data you need to resolve that, it's 4th time we fix it manually by deleting parts in production env.

UPD. It looks like the issue resolved after DB restart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants