replica stuck in ReadOnly State #58126

karolcienkosz · 2023-12-21T13:38:46Z

Hello,
I have ClickHouse cluster(version: 23.8.7) with multiple nodes. There is a table with 12 shards, and each shard has 3 replicas.

CREATE TABLE stats_s1.invoicing
(
    `received_at_hour` DateTime('Etc/UTC'),
    `endpoint_id` UInt32,
    `deal_line_item_id` UInt32,
    `format` LowCardinality(String),
    `costs` Int64
)
ENGINE = ReplicatedSummingMergeTree('/clickhouse/stats/tables/stats_s3/invoicing', 'clickhouse-3.clickhouse.clickhouse.svc.cluster.local')
PARTITION BY toYYYYMM(received_at_hour)
PRIMARY KEY (received_at_hour, deal_line_item_id, endpoint_id)
ORDER BY (received_at_hour, deal_line_item_id, endpoint_id, format)
SETTINGS index_granularity = 8192

the table is fed with data through the following flow:

app -> events_entry(TABLE, ENGINE: Null) --> mv_exchange_stats_entry(MATERIALIZED VIEW, ENGINE = Distributed) --> invoicing(TABLE, ENGINE=ReplicatedSummingMergeTree)

From time to time, when node which stores replica is restarted, replica stucks in ReadOnly state with error(from table system.replicas):
Code: 84. DB::Exception: Directory /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/detached/covered-by-broken_202311_820132_935806_43436_try9 already exists and is not empty. (DIRECTORY_ALREADY_EXISTS)

All the time zookeepers are reachable and other 2 replicas of this shard do not reports any issues.

In logs I can see:

<Warning> stats_s1.invoicing (ReplicatedMergeTreeRestartingThread): Table was in readonly mode. Will try to activate it.
...
Directory covered-by-broken_202311_879557_936229_18409_try1 (to detach to) already exists. Will detach to directory with '_tryN' suffix
.
.
Directory covered-by-broken_202311_879557_936229_18409_try9 (to detach to) already exists. Will detach to directory with '_tryN' suffix

but there is no log(or I cannot find) which explains what was the reason that replica cannot be activated.

To fix this replica, I need to detach, drop and then attach, restore and sync replica.

In case when I have 2/3 replicas available, after restart node should be able restore replica based on others.

Any suggestion what else can i check to troubleshoot this issue, or there is a chance that there is a bug?

Thank you in advance!

The text was updated successfully, but these errors were encountered:

den-crane · 2023-12-21T16:15:23Z

detach table  stats_s1.invoicing;

rm -rf /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/202311_820132_935806_43436

attach table  stats_s1.invoicing;

karolcienkosz · 2023-12-22T14:48:49Z

detach table  stats_s1.invoicing;

rm -rf /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/202311_820132_935806_43436

attach table  stats_s1.invoicing;

I would like to understand what is a problem. As I mentioned above, I;m able to fix readonly replica but after some time when node is restarted, issue occurs again.

Or is there any significant difference between detach, drop and then attach, restore and sync replica and your suggestion? @den-crane

tavplubix · 2023-12-27T11:36:00Z

but there is no log(or I cannot find) which explains what was the reason that replica cannot be activated.

You have found it, the reason is that the replica cannot detach the broken part and startup replication.

It would be great to find the first log messages about that part so these logs would explain which part was broken and why (202311_879557_936229_18409 is covered by a broken part, the actual broken part has another name).

Also, 202311_879557_936229_18409 should not have been detached many times because of this:

ClickHouse/src/Storages/MergeTree/DataPartStorageOnDiskBase.cpp

Lines 85 to 95 in 21a17f8

    
           if (broken 
        
               && isFullPartStorage(*this) 
        
               && looksLikeBrokenDetachedPartHasTheSameContent(res, original_checksums_content, original_files_list)) 
        
           { 
        
               LOG_WARNING(log, "Directory {} (to detach to) already exists, " 
        
                           "but its content looks similar to content of the broken part which we are going to detach. " 
        
                           "Assuming it was already cloned to detached, will not do it again to avoid redundant copies of broken part.", res); 
        
               return {}; 
        
           } 
        
           LOG_WARNING(log, "Directory {} (to detach to) already exists. Will detach to directory with '_tryN' suffix.", res);

So it's worth checking if /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/202311_820132_935806_43436/ and /var/lib/clickhouse/store/1e1/1e16af1b-dcd8-4d1a-9a86-a21f62290d62/detached/covered-by-broken_202311_820132_935806_43436_try*/ have the same list of files and the same content of checksums.txt and columns.txt

SaltTan · 2024-04-03T15:37:35Z

I had this happen to me with 23.12.4.15 after an unexpected restart.
Unfortunately I deleted the parts and I cannot check what was wrong with them.
The only information that I have is that the size of parts in system.parts (both compressed and uncompressed) was the same as in system.part_log.
The system.detached_parts.bytes_on_disk number was a bit higher than system.parts.bytes_on_disk though.

viacheslav-mik · 2024-06-05T13:52:43Z

Same issue after hard reset on 23.8.12.13 on a ReplicatedMergeTree engine.
I thought it was fixed in #41981
There's a lot of broken parts and at least 5 broken tables, let's review one of the parts.
Note: hard shut down had happened at 11:10 and server started at 11:40.

Here is the first mention of example part in err log:

2024.06.05 11:40:17.051139 [ 4380 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Source replica does not have part 20242002_938800_940504_564. Removing it from ZooKeeper.

Then following error is logged a bit later (there's a bunch of other broken part logs skipped):

2024.06.05 11:40:31.902858 [ 4393 ] {} <Warning> cloud.edr_process_started (ReplicatedMergeTreeRestartingThread): Table was in readonly mode. Will try to activate it.
2024.06.05 11:40:31.932487 [ 4393 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Detached 0 parts covered by broken part 20242202_618911_618911_0: 
2024.06.05 11:40:31.938434 [ 4393 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Detached 0 parts covered by broken part 20242102_764786_764786_0: 
2024.06.05 11:40:31.941086 [ 4393 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
....
2024.06.05 11:40:31.961942 [ 4393 ] {} <Error> cloud.edr_process_started (ReplicatedMergeTreeRestartingThread): Failed to restart the table. Will try again: Code: 84. DB::Exception: Directory /mnt/HC_Volume_13358330/clickhouse/store/24e/24e58837-1475-49f4-a4e5-88371475e9f4/detached/covered-by-broken_20242002_938800_940633_689_try9 already exists and is not empty. (DIRECTORY_ALREADY_EXISTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c623797 in /usr/bin/clickhouse
1. DB::Exception::Exception<String>(int, FormatStringHelperImpl<std::type_identity<String>::type>, String&&) @ 0x000000000713252d in /usr/bin/clickhouse
2. DB::localBackup(std::shared_ptr<DB::IDisk> const&, String const&, String const&, bool, std::optional<unsigned long>, bool, std::unordered_set<String, std::hash<String>, std::equal_to<String>, std::allocator<String>> const&, std::shared_ptr<DB::IDiskTransaction>) @ 0x0000000012f344b0 in /usr/bin/clickhouse
3. DB::DataPartStorageOnDiskBase::freeze(String const&, String const&, std::function<void (std::shared_ptr<DB::IDisk> const&)>, DB::IDataPartStorage::ClonePartParams const&) const @ 0x0000000012b36c4d in /usr/bin/clickhouse
4. DB::IMergeTreeDataPart::makeCloneInDetached(String const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::IDiskTransaction> const&) const @ 0x0000000012b666a7 in /usr/bin/clickhouse
5. DB::StorageReplicatedMergeTree::removePartAndEnqueueFetch(String const&, bool) @ 0x00000000127cd187 in /usr/bin/clickhouse
6. DB::ReplicatedMergeTreeQueue::createLogEntriesToFetchBrokenParts() @ 0x0000000012ebc5b0 in /usr/bin/clickhouse
7. DB::ReplicatedMergeTreeRestartingThread::tryStartup() @ 0x0000000012ef6cc4 in /usr/bin/clickhouse
8. DB::ReplicatedMergeTreeRestartingThread::run() @ 0x0000000012ef4bf0 in /usr/bin/clickhouse
9. DB::BackgroundSchedulePool::threadFunction() @ 0x00000000110a3ddf in /usr/bin/clickhouse
10. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000110a4e11 in /usr/bin/clickhouse
11. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c704604 in /usr/bin/clickhouse
12. ? @ 0x00007fdb235fa609 in ?
13. ? @ 0x00007fdb2351f353 in ?
 (version 23.8.12.13 (official build))

A bit later exception is thrown for the example part itself:

2024.06.05 11:41:46.345691 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Detached 0 parts covered by broken part 20242102_764786_764786_0: 
2024.06.05 11:41:46.351466 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351529 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try1 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351577 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try2 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351624 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try3 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351671 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try4 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351719 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try5 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351764 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try6 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351858 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try7 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351910 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try8 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.351962 [ 4361 ] {} <Warning> cloud.edr_process_started (24e58837-1475-49f4-a4e5-88371475e9f4): Directory covered-by-broken_20242002_938800_940504_564_try9 (to detach to) already exists. Will detach to directory with '_tryN' suffix.
2024.06.05 11:41:46.352943 [ 4361 ] {} <Error> cloud.edr_process_started (ReplicatedMergeTreeRestartingThread): Failed to restart the table. Will try again: Code: 84. DB::Exception: Directory /mnt/HC_Volume_13358330/clickhouse/store/24e/24e58837-1475-49f4-a4e5-88371475e9f4/detached/covered-by-broken_20242002_938800_940504_564_try9 already exists and is not empty. (DIRECTORY_ALREADY_EXISTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c623797 in /usr/bin/clickhouse
1. DB::Exception::Exception<String>(int, FormatStringHelperImpl<std::type_identity<String>::type>, String&&) @ 0x000000000713252d in /usr/bin/clickhouse
2. DB::localBackup(std::shared_ptr<DB::IDisk> const&, String const&, String const&, bool, std::optional<unsigned long>, bool, std::unordered_set<String, std::hash<String>, std::equal_to<String>, std::allocator<String>> const&, std::shared_ptr<DB::IDiskTransaction>) @ 0x0000000012f344b0 in /usr/bin/clickhouse
3. DB::DataPartStorageOnDiskBase::freeze(String const&, String const&, std::function<void (std::shared_ptr<DB::IDisk> const&)>, DB::IDataPartStorage::ClonePartParams const&) const @ 0x0000000012b36c4d in /usr/bin/clickhouse
4. DB::IMergeTreeDataPart::makeCloneInDetached(String const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::IDiskTransaction> const&) const @ 0x0000000012b666a7 in /usr/bin/clickhouse
5. DB::StorageReplicatedMergeTree::removePartAndEnqueueFetch(String const&, bool) @ 0x00000000127cd187 in /usr/bin/clickhouse
6. DB::ReplicatedMergeTreeQueue::createLogEntriesToFetchBrokenParts() @ 0x0000000012ebc5b0 in /usr/bin/clickhouse
7. DB::ReplicatedMergeTreeRestartingThread::tryStartup() @ 0x0000000012ef6cc4 in /usr/bin/clickhouse
8. DB::ReplicatedMergeTreeRestartingThread::run() @ 0x0000000012ef4bf0 in /usr/bin/clickhouse
9. DB::BackgroundSchedulePool::threadFunction() @ 0x00000000110a3ddf in /usr/bin/clickhouse
10. void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000110a4e11 in /usr/bin/clickhouse
11. void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000c704604 in /usr/bin/clickhouse
12. ? @ 0x00007fdb235fa609 in ?
13. ? @ 0x00007fdb2351f353 in ?
 (version 23.8.12.13 (official build))

Checksums and columns of covered-by-broken_20242002_938800_940504_564_try9 and covered-by-broken_20242002_938800_940504_564 are the same.
Please let me know what data you need to resolve that, it's 4th time we fix it manually by deleting parts in production env.

UPD. It looks like the issue resolved after DB restart.

karolcienkosz added the unexpected behaviour label Dec 21, 2023

SaltTan mentioned this issue Jun 5, 2024

DB::ReplicatedMergeTreeQueue::createLogEntriesToFetchBrokenParts encounter error Code: 84. DB::Exception: Directory already exists and is not empty. #40957

Closed

SaltTan mentioned this issue Jun 16, 2024

Replica table stuck in read-only (after I manually edited files on disk) #65317

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replica stuck in ReadOnly State #58126

replica stuck in ReadOnly State #58126

karolcienkosz commented Dec 21, 2023

den-crane commented Dec 21, 2023

karolcienkosz commented Dec 22, 2023

tavplubix commented Dec 27, 2023

SaltTan commented Apr 3, 2024 •

edited

Loading

viacheslav-mik commented Jun 5, 2024 •

edited

Loading

replica stuck in ReadOnly State #58126

replica stuck in ReadOnly State #58126

Comments

karolcienkosz commented Dec 21, 2023

den-crane commented Dec 21, 2023

karolcienkosz commented Dec 22, 2023

tavplubix commented Dec 27, 2023

SaltTan commented Apr 3, 2024 • edited Loading

viacheslav-mik commented Jun 5, 2024 • edited Loading

SaltTan commented Apr 3, 2024 •

edited

Loading

viacheslav-mik commented Jun 5, 2024 •

edited

Loading