Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 43666: Add skip_unavailable_shards as a setting for Distributed table. #57218

Merged
merged 5 commits into from Dec 18, 2023

Conversation

tntnatbry
Copy link
Contributor

@tntnatbry tntnatbry commented Nov 25, 2023

This setting, when enabled (disabled by default), allows ClickHouse to silently skip unavailable shards of a Distributed table during a query execution, instead of throwing an exception to the client.

Changelog category:

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add skip_unavailable_shards as a setting for Distributed tables that is similar to the corresponding query-level setting. Closes #43666.

@CLAassistant
Copy link

CLAassistant commented Nov 25, 2023

CLA assistant check
All committers have signed the CLA.

@tntnatbry
Copy link
Contributor Author

Setup: 3-Shard (with 1 replica each) ClickHouse cluster in Docker.

On Shard 1:

d1da863ce311 :) select * from system.clusters;

SELECT *
FROM system.clusters

Query id: a9449f11-21ea-40d2-be64-0fcb062b8f04

┌─cluster────────────┬─shard_num─┬─shard_weight─┬─internal_replication─┬─replica_num─┬─host_name───┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┬─database_shard_name─┬─database_replica_name─┬─is_active─┐
│ replicated_cluster │         1101 │ clickhouse1 │ 192.168.112.490001 │ default │                  │            000 │                     │                       │      ᴺᵁᴸᴸ │
│ replicated_cluster │         2101 │ clickhouse2 │ 192.168.112.390000 │ default │                  │            000 │                     │                       │      ᴺᵁᴸᴸ │
│ replicated_cluster │         3101 │ clickhouse3 │ 192.168.112.590000 │ default │                  │            000 │                     │                       │      ᴺᵁᴸᴸ │
└────────────────────┴───────────┴──────────────┴──────────────────────┴─────────────┴─────────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────┘

3 rows in set. Elapsed: 0.035 sec.

d1da863ce311 :) show create table t1;

SHOW CREATE TABLE t1

Query id: ce5bdb07-c2a3-4bc1-968a-8d4ef46f0396

┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t1
(
    `ID` UInt32,
    `Name` String
)
ENGINE = MergeTree
ORDER BY ID
SETTINGS index_granularity = 8192 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.012 sec.

Add one row to t1 individually on each shard:
On Shard 1:

d1da863ce311 :) select * from t1;

SELECT *
FROM t1

Query id: 249ca169-1c6b-45e1-ace4-4cc33d5d90d3

┌─ID─┬─Name─┐
│  1 │ a    │
└────┴──────┘

1 row in set. Elapsed: 0.012 sec.

On Shard 2:

6377136da30b :) select * from t1;

SELECT *
FROM t1

Query id: 54deffa0-f52a-4ae5-9a54-03377775830f

┌─ID─┬─Name─┐
│  2 │ b    │
└────┴──────┘

1 row in set. Elapsed: 0.015 sec.

On Shard 3:

91dc18ab6188 :) select * from t1;

SELECT *
FROM t1

Query id: e0c54baa-9cc1-4e93-b591-c1d331233bbe

┌─ID─┬─Name─┐
│  3 │ c    │
└────┴──────┘

1 row in set. Elapsed: 0.011 sec.

Now, create the Distributed version of t1 on Shard 1:

d1da863ce311 :) CREATE TABLE t1_distributed
(
    ID UInt32,
    Name String
) 
ENGINE = Distributed('replicated_cluster', 'default', t1, rand());

CREATE TABLE t1_distributed
(
    `ID` UInt32,
    `Name` String
)
ENGINE = Distributed('replicated_cluster', 'default', t1, rand())

Query id: 8f1b8e11-7d63-4718-8b08-ea0e5af983b4

Ok.

0 rows in set. Elapsed: 0.009 sec.


d1da863ce311 :) select * from t1_distributed;

SELECT *
FROM t1_distributed

Query id: 96e99458-e47a-4065-931f-70a15df84540

┌─ID─┬─Name─┐
│  1 │ a    │
└────┴──────┘
┌─ID─┬─Name─┐
│  2 │ b    │
└────┴──────┘
┌─ID─┬─Name─┐
│  3 │ c    │
└────┴──────┘

3 rows in set. Elapsed: 0.052 sec.

As expected, all 3 rows are returned as all 3 shards are online.

Now, stop Shard 2. On Shard 1:

d1da863ce311 :) select * from t1_distributed;

SELECT *
FROM t1_distributed

Query id: 76f34040-4577-409c-8b2a-b9bbe7c413a0

┌─ID─┬─Name─┐
│  1 │ a    │
└────┴──────┘
┌─ID─┬─Name─┐
│  3 │ c    │
└────┴──────┘
↗ Progress: 2.00 rows, 28.00 B (3.67 rows/s., 51.36 B/s.)  99%
2 rows in set. Elapsed: 0.545 sec. 

Received exception from server (version 23.11.1):
Code: 279. DB::Exception: Received from localhost:9000. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 32. DB::Exception: Attempt to read after eof. (ATTEMPT_TO_READ_AFTER_EOF) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)

: While executing Remote. (ALL_CONNECTION_TRIES_FAILED)

As expected, exception is thrown to the client since Shard 2 is unreachable.

Now, re-create t1_distributed on Shard 1 with the new setting:

d1da863ce311 :) drop table t1_distributed;

DROP TABLE t1_distributed

Query id: f242a3cf-ab62-4eef-a2c5-07a7591178c6

Ok.

0 rows in set. Elapsed: 0.004 sec. 

d1da863ce311 :) CREATE TABLE t1_distributed
(
    ID UInt32,
    Name String
) 
ENGINE = Distributed('replicated_cluster', 'default', t1, rand())
SETTINGS skip_unavailable_shards=true;

CREATE TABLE t1_distributed
(
    `ID` UInt32,
    `Name` String
)
ENGINE = Distributed('replicated_cluster', 'default', t1, rand())
SETTINGS skip_unavailable_shards = 1

Query id: 14aebde8-f4bf-4631-b29e-2ae97da5ef36

Ok.

0 rows in set. Elapsed: 0.009 sec. 

d1da863ce311 :) select * from t1_distributed;

SELECT *
FROM t1_distributed

Query id: 217b2347-9924-4d1e-92fa-72150da484b1

┌─ID─┬─Name─┐
│  1 │ a    │
└────┴──────┘
┌─ID─┬─Name─┐
│  3 │ c    │
└────┴──────┘

2 rows in set. Elapsed: 0.476 sec.

As expected, ClickHouse now silently skips the unavailable Shard 2.

Confirm that the query-level setting overrides the table-level setting:

d1da863ce311 :) select * from t1_distributed SETTINGS skip_unavailable_shards=false;

SELECT *
FROM t1_distributed
SETTINGS skip_unavailable_shards = 0

Query id: 6b09dc96-9a4b-4670-b41f-1ead48998d96

┌─ID─┬─Name─┐
│  1 │ a    │
└────┴──────┘
┌─ID─┬─Name─┐
│  3 │ c    │
└────┴──────┘
← Progress: 2.00 rows, 28.00 B (2.31 rows/s., 32.39 B/s.)  99%
2 rows in set. Elapsed: 0.864 sec. 

Received exception from server (version 23.11.1):
Code: 279. DB::Exception: Received from localhost:9000. DB::NetException. DB::NetException: All connection tries failed. Log: 

Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)

: While executing Remote. (ALL_CONNECTION_TRIES_FAILED)

I have also confirmed that the user-level setting also overrides the table-level setting as expected.

@xiedeyantu
Copy link
Contributor

Why do we need to do this feature? It can lead to incorrect data.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Nov 26, 2023
@@ -17,6 +17,8 @@ class ASTStorage;
#define LIST_OF_DISTRIBUTED_SETTINGS(M, ALIAS) \
M(Bool, fsync_after_insert, false, "Do fsync for every inserted. Will decreases performance of inserts (only for background INSERT, i.e. distributed_foreground_insert=false)", 0) \
M(Bool, fsync_directories, false, "Do fsync for temporary directory (that is used for background INSERT only) after all part operations (writes, renames, etc.).", 0) \
/** This is the distributed version of the skip_unavailable_shards setting available in src/Core/Settings.h */ \
M(Bool, skip_unavailable_shards, false, "If true, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is ambiguous. It could be read as only DNS failures are skipped. But connection failures and even shards with non-existent tables are skipped as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the original description has the same problem, it should be fixed as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment. Please take a look and let me know if the new description is OK, or if you need this further updated. I have also updated the comment in the original description in src/Core/Settings.h.

@alexey-milovidov
Copy link
Member

Need to add a test. It can be a functional test using the test clusters. You can take a similar test for skip_unavailable_shards as an example.

@alexey-milovidov alexey-milovidov removed the can be tested Allows running workflows for external contributors label Nov 26, 2023
@alexey-milovidov alexey-milovidov self-assigned this Nov 26, 2023
@alexey-milovidov
Copy link
Member

Missing changelog entry.

@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-feature Pull request with new product feature label Nov 26, 2023
@robot-ch-test-poll1
Copy link
Contributor

robot-ch-test-poll1 commented Nov 26, 2023

This is an automated comment for commit bae58fe with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR✅ success
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker image for serversThe check to build and optionally push the mentioned image to docker hub✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Mergeable CheckChecks if all other necessary checks are successful✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Push to DockerhubThe check for building and pushing the CI related docker images to docker hub✅ success
SQLTestThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
SQLancerFuzzing tests that detect logical bugs with SQLancer tool✅ success
SqllogicRun clickhouse on the sqllogic test set against sqlite and checks that all statements are passed✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style CheckRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success
Check nameDescriptionStatus
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests❌ failure

@tntnatbry
Copy link
Contributor Author

@alexey-milovidov I have added a functional test. Also added the changelog entry. Please review.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Dec 4, 2023
@tntnatbry
Copy link
Contributor Author

It looks like the test I added has turned flaky: https://s3.amazonaws.com/clickhouse-test-reports/57218/946845a6425ea01514ca456af96447967029433d/stateless_tests_flaky_check__asan_.html

This is due to this line in the test case for testing the query-level override of the setting:

SELECT *, _shard_num FROM table_02916_distributed SETTINGS skip_unavailable_shards=0; -- { serverError ALL_CONNECTION_TRIES_FAILED }

I don't see any other tests in the testsuite which has ALL_CONNECTION_TRIES_FAILED as the -- {serverError}.

We have two options:

  1. Add the long tag to the test and see if that helps.
  2. Remove this line altogether from the test and skip testing the query-level override.

What do you think?

@tntnatbry
Copy link
Contributor Author

I added the long tag to the test. Let's see how this looks now. Can you please re-trigger the CI checks?

… table.

This setting, when enabled (disabled by default), allows ClickHouse to
silently skip unavailable shards of a Distributed table during a query
execution, instead of throwing an exception to the client.
@tntnatbry
Copy link
Contributor Author

@alexey-milovidov The CI is now green, except for the Docs Check.

I had to comment out the query-level override test statement in the test file:

--SELECT *, _shard_num FROM table_02916_distributed SETTINGS skip_unavailable_shards=0; -- { serverError ALL_CONNECTION_TRIES_FAILED }

For some reason, if I remove this line instead of commenting out, the test turns flaky. It doesn't look like an issue with the code itself but it's probably a bug in the testing framework.

Please review.

@alexey-milovidov
Copy link
Member

Ok. The docs check asks for adding documentation. I will change the category to "Imrovement" to not have this requirement.

@alexey-milovidov
Copy link
Member

Flaky check also introduces random sleeps, and slowdowns the test many times, and then checks that it fits in 60 seconds. This could be the reason for what you see. Usually, we merge if the slow-down is still reasonable.

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is perfect, let's check the CI run, and we can merge...

@robot-ch-test-poll robot-ch-test-poll added pr-improvement Pull request with some product improvements and removed pr-feature Pull request with new product feature labels Dec 10, 2023
@tntnatbry
Copy link
Contributor Author

@alexey-milovidov It looks like you updated the test case and uncommented this line:

--SELECT *, _shard_num FROM table_02916_distributed SETTINGS skip_unavailable_shards=0; -- { serverError ALL_CONNECTION_TRIES_FAILED }

The test has again turned flaky due to this change. If you read my previous comment, I had to comment out this line in the test to make it pass the CI. This is most likely an issue in the testing framework that needs a separate investigation/bug ticket.

You will have to revert your commit.

@alexey-milovidov
Copy link
Member

"Flaky check" runs the test in parallel with itself; there might be a problem with that. I will check.

@alexey-milovidov
Copy link
Member

@alexey-milovidov
Copy link
Member

@tntnatbry We have settings randomization in tests to cover more edge cases in a randomized fashion.
The test is failing when --prefer_localhost_replica 1.

The reason - when it is set, and the address is localhost, the query goes to localhost without network communication, regardless of the address's port.

You can change the cluster definition for test_unavailable_shard to use something other than localhost or modify your test and SET prefer_localhost_replica = 0 to avoid randomization.

@alexey-milovidov
Copy link
Member

@alexey-milovidov
Copy link
Member

Still, there is something.

@tntnatbry
Copy link
Contributor Author

Reference file also needed to be updated. Just pushed another commit. Let's see how this looks now.

Thanks for sharing the article, almost that time of the year again!

@tntnatbry
Copy link
Contributor Author

@alexey-milovidov Test is no longer flaky now. I needed to SET send_logs_level='fatal'; in the test case (similar to the original 00183_skip_unavailable_shards test). So we are good here. Please take a look.

I am now seeing an integration test failing: https://s3.amazonaws.com/clickhouse-test-reports/57218/bae58febf3b2573a745243ae5ed016aae13b3d49/integration_tests__release__[4_4].html

Out of these 2, only test_delayed_replica_failover is remotely related to this patch. If you look at the logs, the test is failing for the below query (which is not using the skip_unavailable_shards query override setting):

        # If we forbid stale replicas, the query must fail. But sometimes we must have bigger timeouts.
        for _ in range(20):
            try:
                instance_with_dist_table.query(
                    """
SELECT count() FROM distributed SETTINGS
    load_balancing='in_order',
    max_replica_delay_for_distributed_queries=1,
    fallback_to_stale_replicas_for_distributed_queries=0
"""
                )
                time.sleep(0.5)
            except:
                break
        else:
            raise Exception("Didn't raise when stale replicas are not allowed")

When I run this integration test locally on my machine, it is passing:

root@tntnatbry:/home/tntnatbry/git-projects/ClickHouse/tests/integration# ./runner 'test_delayed_replica_failover'
2023-12-13 14:56:19,783 [ 445878 ] INFO : ClickHouse root is not set. Will use /home/tntnatbry/git-projects/ClickHouse (runner:42, check_args_and_update_paths)
2023-12-13 14:56:19,784 [ 445878 ] INFO : Cases dir is not set. Will use /home/tntnatbry/git-projects/ClickHouse/tests/integration (runner:90, check_args_and_update_paths)
2023-12-13 14:56:19,784 [ 445878 ] INFO : utils dir is not set. Will use /home/tntnatbry/git-projects/ClickHouse/utils (runner:101, check_args_and_update_paths)
2023-12-13 14:56:19,784 [ 445878 ] INFO : base_configs_dir: /home/tntnatbry/git-projects/ClickHouse/programs/server, binary: /home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-server, cases_dir: /home/tntnatbry/git-projects/ClickHouse/tests/integration  (runner:103, check_args_and_update_paths)
clickhouse_integration_tests_volume
Running pytest container as: 'docker run --net=host -it --rm --name clickhouse_integration_tests_1pn11o --privileged --dns-search='.' --volume=/home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-odbc-bridge:/clickhouse-odbc-bridge --volume=/home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-server:/clickhouse --volume=/home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-library-bridge:/clickhouse-library-bridge --volume=/home/tntnatbry/git-projects/ClickHouse/programs/server:/clickhouse-config --volume=/home/tntnatbry/git-projects/ClickHouse/tests/integration:/ClickHouse/tests/integration --volume=/home/tntnatbry/git-projects/ClickHouse/utils/backupview:/ClickHouse/utils/backupview --volume=/home/tntnatbry/git-projects/ClickHouse/utils/grpc-client/pb2:/ClickHouse/utils/grpc-client/pb2 --volume=/run:/run/host:ro --volume=clickhouse_integration_tests_volume:/var/lib/docker   -e DOCKER_CLIENT_TIMEOUT=300 -e COMPOSE_HTTP_TIMEOUT=600  -e PYTHONUNBUFFERED=1 -e PYTEST_ADDOPTS=" test_delayed_replica_failover   -vvv" clickhouse/integration-tests-runner:latest '.
Start tests
========================================================================================================================================================================================================================================== test session starts ==========================================================================================================================================================================================================================================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /ClickHouse/tests/integration
configfile: pytest.ini
plugins: xdist-3.5.0, repeat-0.9.3, order-1.0.0, random-0.2, timeout-2.2.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
collected 1 item                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

test_delayed_replica_failover/test.py::test PASSED                                                                                                                                                                                                                                                                                                                                                                                                                                                [100%]

===================================================================================================================================================================================================================================== 1 passed in 63.37s (0:01:03) ======================================================================================================================================================================================================================================

Do you know what the issue here is?

@alexey-milovidov alexey-milovidov merged commit 58396c5 into ClickHouse:master Dec 18, 2023
335 of 337 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add skip_unavailable_shards as a setting for Distributed table.
6 participants