Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CHECK TABLE system query #52745

Merged
merged 7 commits into from Aug 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
99 changes: 76 additions & 23 deletions docs/en/sql-reference/statements/check-table.md
Expand Up @@ -5,19 +5,38 @@ sidebar_label: CHECK TABLE
title: "CHECK TABLE Statement"
---

Checks if the data in the table is corrupted.
The `CHECK TABLE` query in ClickHouse is used to perform a validation check on a specific table or its partitions. It ensures the integrity of the data by verifying the checksums and other internal data structures.

``` sql
CHECK TABLE [db.]name [PARTITION partition_expr]
Particularly it compares actual file sizes with the expected values which are stored on the server. If the file sizes do not match the stored values, it means the data is corrupted. This can be caused, for example, by a system crash during query execution.

:::note
The `CHECK TABLE`` query may read all the data in the table and hold some resources, making it resource-intensive.
Consider the potential impact on performance and resource utilization before executing this query.
:::

## Syntax

The basic syntax of the query is as follows:

```sql
CHECK TABLE table_name [PARTITION partition_expression] [FORMAT format] [SETTINGS check_query_single_value_result = (0|1) [, other_settings] ]
```

The `CHECK TABLE` query compares actual file sizes with the expected values which are stored on the server. If the file sizes do not match the stored values, it means the data is corrupted. This can be caused, for example, by a system crash during query execution.
- `table_name`: Specifies the name of the table that you want to check.
- `partition_expression`: (Optional) If you want to check a specific partition of the table, you can use this expression to specify the partition.
- `FORMAT format`: (Optional) Allows you to specify the output format of the result.
- `SETTINGS`: (Optional) Allows additional settings.
- **`check_query_single_value_result`**: (Optional) This setting allows you to toggle between a detailed result (`0`) or a summarized result (`1`).
- Other settings (e.g. `max_threads` can be applied as well).


The query response contains the `result` column with a single row. The row has a value of
[Boolean](../../sql-reference/data-types/boolean.md) type:
The query response depends on the value of contains `check_query_single_value_result` setting.
In case of `check_query_single_value_result = 1` only `result` column with a single row is returned. Value inside this row is `1` if the integrity check is passed and `0` if data is corrupted.

- 0 - The data in the table is corrupted.
- 1 - The data maintains integrity.
With `check_query_single_value_result = 0` the query returns the following columns:
- `part_path`: Indicates the path to the data part or file name.
- `is_passed`: Returns 1 if the check for this part is successful, 0 otherwise.
- `message`: Any additional messages related to the check, such as errors or success messages.

The `CHECK TABLE` query supports the following table engines:

Expand All @@ -26,44 +45,78 @@ The `CHECK TABLE` query supports the following table engines:
- [StripeLog](../../engines/table-engines/log-family/stripelog.md)
- [MergeTree family](../../engines/table-engines/mergetree-family/mergetree.md)

Performed over the tables with another table engines causes an exception.
Performed over the tables with another table engines causes an `NOT_IMPLEMETED` exception.

Engines from the `*Log` family do not provide automatic data recovery on failure. Use the `CHECK TABLE` query to track data loss in a timely manner.

## Checking the MergeTree Family Tables
## Examples

For `MergeTree` family engines, if [check_query_single_value_result](../../operations/settings/settings.md#check_query_single_value_result) = 0, the `CHECK TABLE` query shows a check status for every individual data part of a table on the local server.
By default `CHECK TABLE` query shows the general table check status:

```sql
SET check_query_single_value_result = 0;
CHECK TABLE test_table;
```

```text
┌─part_path─┬─is_passed─┬─message─┐
│ all_1_4_1 │ 1 │ │
│ all_1_4_2 │ 1 │ │
└───────────┴───────────┴─────────┘
┌─result─┐
│ 1 │
└────────┘
```

If `check_query_single_value_result` = 1, the `CHECK TABLE` query shows the general table check status.
If you want to see the check status for every individual data part you may use `check_query_single_value_result` setting.

Also, to check a specific partition of the table, you can use the `PARTITION` keyword.

```sql
SET check_query_single_value_result = 1;
CHECK TABLE test_table;
CHECK TABLE t0 PARTITION ID '201003'
FORMAT PrettyCompactMonoBlock
SETTINGS check_query_single_value_result = 0
```

Output:

```text
┌─result─┐
│ 1 │
└────────┘
┌─part_path────┬─is_passed─┬─message─┐
│ 201003_7_7_0 │ 1 │ │
│ 201003_3_3_0 │ 1 │ │
└──────────────┴───────────┴─────────┘
```

### Receiving a 'Corrupted' Result

:::warning
Disclaimer: The procedure described here, including the manual manipulating or removing files directly from the data directory, is for experimental or development environments only. Do **not** attempt this on a production server, as it may lead to data loss or other unintended consequences.
:::

Remove the existing checksum file:

```bash
rm /var/lib/clickhouse-server/data/default/t0/201003_3_3_0/checksums.txt
```

```sql
CHECK TABLE t0 PARTITION ID '201003'
FORMAT PrettyCompactMonoBlock
SETTINGS check_query_single_value_result = 0


Output:

```text
┌─part_path────┬─is_passed─┬─message──────────────────────────────────┐
│ 201003_7_7_0 │ 1 │ │
│ 201003_3_3_0 │ 1 │ Checksums recounted and written to disk. │
└──────────────┴───────────┴──────────────────────────────────────────┘
```

If the checksums.txt file is missing, it can be restored. It will be recalculated and rewritten during the execution of the CHECK TABLE command for the specific partition, and the status will still be reported as 'success.'"


## If the Data Is Corrupted

If the table is corrupted, you can copy the non-corrupted data to another table. To do this:

1. Create a new table with the same structure as damaged table. To do this execute the query `CREATE TABLE <new_table_name> AS <damaged_table_name>`.
2. Set the [max_threads](../../operations/settings/settings.md#settings-max_threads) value to 1 to process the next query in a single thread. To do this run the query `SET max_threads = 1`.
2. Set the `max_threads` value to 1 to process the next query in a single thread. To do this run the query `SET max_threads = 1`.
3. Execute the query `INSERT INTO <new_table_name> SELECT * FROM <damaged_table_name>`. This request copies the non-corrupted data from the damaged table to another table. Only the data before the corrupted part will be copied.
4. Restart the `clickhouse-client` to reset the `max_threads` value.
42 changes: 22 additions & 20 deletions src/Common/FileChecker.cpp
Expand Up @@ -82,33 +82,35 @@ size_t FileChecker::getTotalSize() const
}


CheckResults FileChecker::check() const
FileChecker::DataValidationTasksPtr FileChecker::getDataValidationTasks()
{
if (map.empty())
return {};

CheckResults results;
return std::make_unique<DataValidationTasks>(map);
}

for (const auto & name_size : map)
CheckResult FileChecker::checkNextEntry(DataValidationTasksPtr & check_data_tasks, bool & has_nothing_to_do) const
{
String name;
size_t expected_size;
bool is_finished = check_data_tasks->next(name, expected_size);
if (is_finished)
{
const String & name = name_size.first;
String path = parentPath(files_info_path) + name;
bool exists = fileReallyExists(path);
auto real_size = exists ? getRealFileSize(path) : 0; /// No race condition assuming no one else is working with these files.
has_nothing_to_do = true;
return {};
}

if (real_size != name_size.second)
{
String failure_message = exists
? ("Size of " + path + " is wrong. Size is " + toString(real_size) + " but should be " + toString(name_size.second))
: ("File " + path + " doesn't exist");
results.emplace_back(name, false, failure_message);
break;
}
String path = parentPath(files_info_path) + name;
bool exists = fileReallyExists(path);
auto real_size = exists ? getRealFileSize(path) : 0; /// No race condition assuming no one else is working with these files.

results.emplace_back(name, true, "");
if (real_size != expected_size)
{
String failure_message = exists
? ("Size of " + path + " is wrong. Size is " + toString(real_size) + " but should be " + toString(expected_size))
: ("File " + path + " doesn't exist");
return CheckResult(name, false, failure_message);
}

return results;
return CheckResult(name, true, "");
}

void FileChecker::repair()
Expand Down
37 changes: 36 additions & 1 deletion src/Common/FileChecker.h
Expand Up @@ -3,6 +3,7 @@
#include <Storages/CheckResults.h>
#include <map>
#include <base/types.h>
#include <mutex>

namespace Poco { class Logger; }

Expand All @@ -28,7 +29,11 @@ class FileChecker
bool empty() const { return map.empty(); }

/// Check the files whose parameters are specified in sizes.json
CheckResults check() const;
/// See comment in IStorage::checkDataNext
struct DataValidationTasks;
using DataValidationTasksPtr = std::unique_ptr<DataValidationTasks>;
DataValidationTasksPtr getDataValidationTasks();
CheckResult checkNextEntry(DataValidationTasksPtr & check_data_tasks, bool & has_nothing_to_do) const;

/// Truncate files that have excessive size to the expected size.
/// Throw exception if the file size is less than expected.
Expand All @@ -41,6 +46,36 @@ class FileChecker
/// Returns total size of all files.
size_t getTotalSize() const;

struct DataValidationTasks
{
DataValidationTasks(const std::map<String, size_t> & map_)
: map(map_), it(map.begin())
{}

bool next(String & out_name, size_t & out_size)
{
std::lock_guard lock(mutex);
if (it == map.end())
return true;
out_name = it->first;
out_size = it->second;
++it;
return false;
}

size_t size() const
{
std::lock_guard lock(mutex);
return std::distance(it, map.end());
}

const std::map<String, size_t> & map;

mutable std::mutex mutex;
using Iterator = std::map<String, size_t>::const_iterator;
Iterator it;
};

private:
void load();

Expand Down