Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -557,11 +557,7 @@

## Create a table on the cluster {#creating-a-table}

Now that the database has been created, you will create a distributed table.
Distributed tables are tables which have access to shards located on different
hosts and are defined using the `Distributed` table engine. The distributed table
acts as the interface across all the shards in the cluster.

Now that the database has been created, you will create a table.

Check warning on line 560 in docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will create', use present tense.
Run the following query from any of the host clients:

```sql
Expand Down Expand Up @@ -608,8 +604,6 @@
└─────────────────────┘
```

## Insert data into a distributed table {#inserting-data}

Before we insert the UK price paid data, let's perform a quick experiment to see
what happens when we insert data into an ordinary table from either host.

Expand All @@ -622,7 +616,7 @@
`id` UInt64,
`name` String
)
ENGINE = ReplicatedMergeTree
ENGINE = MergeTree()
ORDER BY id;
```

Expand Down Expand Up @@ -654,16 +648,18 @@
-- └────┴────────────────────┘
```

You will notice that only the row that was inserted into the table on that
You will notice that unlike with a `ReplicatedMergeTree` table only the row that was inserted into the table on that

Check warning on line 651 in docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will notice', use present tense.
particular host is returned and not both rows.

To read the data from the two shards we need an interface which can handle queries
To read the data across the two shards, we need an interface which can handle queries
across all the shards, combining the data from both shards when we run select queries
on it, and handling the insertion of data to the separate shards when we run insert queries.
on it or inserting data to both shards when we run insert queries.

In ClickHouse this interface is called a distributed table, which we create using
In ClickHouse this interface is called a **distributed table**, which we create using
the [`Distributed`](/engines/table-engines/special/distributed) table engine. Let's take a look at how it works.

## Create a distributed table {#create-distributed-table}

Create a distributed table with the following query:

```sql
Expand All @@ -674,8 +670,12 @@
In this example, the `rand()` function is chosen as the sharding key so that
inserts are randomly distributed across the shards.

Now query the distributed table from either host and you will get back
both of the rows which were inserted on the two hosts:
Now query the distributed table from either host, and you will get back

Check warning on line 673 in docs/deployment-guides/replication-sharding-examples/02_2_shards_1_replica.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will get', use present tense.
both of the rows which were inserted on the two hosts, unlike in our previous example:

```sql
SELECT * FROM test.test_table_dist;
```

```sql
┌─id─┬─name───────────────┐
Expand All @@ -694,6 +694,8 @@
ENGINE = Distributed('cluster_2S_1R', 'uk', 'uk_price_paid_local', rand());
```

## Insert data into a distributed table {#inserting-data-into-distributed-table}

Now connect to either of the hosts and insert the data:

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -586,12 +586,9 @@
└────────────────────┘
```

## Create a distributed table on the cluster {#creating-a-table}
## Create a table on the cluster {#creating-a-table}

Now that the database has been created, next you will create a distributed table.
Distributed tables are tables which have access to shards located on different
hosts and are defined using the `Distributed` table engine. The distributed table
acts as the interface across all the shards in the cluster.
Now that the database has been created, next you will create a table with replication.

Check warning on line 591 in docs/deployment-guides/replication-sharding-examples/03_2_shards_2_replicas.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will create', use present tense.

Run the following query from any of the host clients:

Expand Down Expand Up @@ -663,14 +660,16 @@

## Insert data into a distributed table {#inserting-data-using-distributed}

To insert data into the distributed table, `ON CLUSTER` cannot be used as it does
To insert data into the table, `ON CLUSTER` cannot be used as it does
not apply to DML (Data Manipulation Language) queries such as `INSERT`, `UPDATE`,
and `DELETE`. To insert data, it is necessary to make use of the
[`Distributed`](/engines/table-engines/special/distributed) table engine.
As you learned in the [guide](/architecture/horizontal-scaling) for setting up a cluster with 2 shards and 1 replica, distributed tables are tables which have access to shards located on different
hosts and are defined using the `Distributed` table engine.
The distributed table acts as the interface across all the shards in the cluster.

From any of the host clients, run the following query to create a distributed table
using the existing table we created previously with `ON CLUSTER` and use of the
`ReplicatedMergeTree`:
using the existing replicated table we created in the previous step:

```sql
CREATE TABLE IF NOT EXISTS uk.uk_price_paid_distributed
Expand Down Expand Up @@ -749,4 +748,16 @@
└──────────┘
```

</VerticalStepper>
</VerticalStepper>

## Conclusion {#conclusion}

The advantage of this cluster topology with 2 shards and 2 replicas is that it provides both scalability and fault tolerance.
Data is distributed across separate hosts, reducing storage and I/O requirements per node, while queries are processed in parallel across both shards for improved performance and memory efficiency.
Critically, the cluster can tolerate the loss of one node and continue serving queries without interruption, as each shard has a backup replica available on another node.

The main disadvantage of this cluster topology is the increased storage overhead—it requires twice the storage capacity compared to a setup without replicas, as each shard is duplicated.
Additionally, while the cluster can survive a single node failure, losing two nodes simultaneously may render the cluster inoperable, depending on which nodes fail and how shards are distributed.
This topology strikes a balance between availability and cost, making it suitable for production environments where some level of fault tolerance is required without the expense of higher replication factors.

To learn how ClickHouse Cloud processes queries, offering both scalability and fault-tolerance, see the section ["Parallel Replicas"](/deployment-guides/parallel-replicas).
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@
The following steps will walk you through setting up the cluster from
scratch. If you prefer to skip these steps and jump straight to running the
cluster, you can obtain the example
files from the [examples repository](https://github.com/ClickHouse/examples/tree/main/docker-compose-recipes)
files from the examples repository ['docker-compose-recipes' directory](https://github.com/ClickHouse/examples/tree/main/docker-compose-recipes/recipes).
:::