Replies: 1 comment 7 replies
-
Hi! May I know the detailed exception message you get when disk failure happens? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The scenario is as follows:
Our table A is of the ReplicatedMergeTree type, which is a distributed table consisting of 5 shards and 2 replicas. In the cluster shard configuration, internal_replication is set to true, and data distribution is managed by the ReplicatedMergeTree engine.
The workflow we follow is that Spark writes to a local instance of Table A within Clickhouse via nginx, randomly choosing one machine rather than directly writing into the distributed table.
We have observed that when a disk failure happens on Clickhouse, if we are attempting to write to a shard where host2 is a replica of host1 for Table A, and a write request fails due to a disk failure on host1, an exception is returned to the Spark client, triggering a task retry, which then sends another write request through nginx.
However, despite the failed write to host1, ReplicatedMergeTree synchronizes this failed request to host2, causing the first request to be successfully written. As a result, a batch of data gets written twice.
How can we avoid such a situation?
Beta Was this translation helpful? Give feedback.
All reactions