New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 43666: Add skip_unavailable_shards as a setting for Distributed table. #57218
Conversation
Setup: 3-Shard (with 1 replica each) ClickHouse cluster in Docker. On Shard 1: d1da863ce311 :) select * from system.clusters;
SELECT *
FROM system.clusters
Query id: a9449f11-21ea-40d2-be64-0fcb062b8f04
┌─cluster────────────┬─shard_num─┬─shard_weight─┬─internal_replication─┬─replica_num─┬─host_name───┬─host_address──┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─slowdowns_count─┬─estimated_recovery_time─┬─database_shard_name─┬─database_replica_name─┬─is_active─┐
│ replicated_cluster │ 1 │ 1 │ 0 │ 1 │ clickhouse1 │ 192.168.112.4 │ 9000 │ 1 │ default │ │ 0 │ 0 │ 0 │ │ │ ᴺᵁᴸᴸ │
│ replicated_cluster │ 2 │ 1 │ 0 │ 1 │ clickhouse2 │ 192.168.112.3 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ │ │ ᴺᵁᴸᴸ │
│ replicated_cluster │ 3 │ 1 │ 0 │ 1 │ clickhouse3 │ 192.168.112.5 │ 9000 │ 0 │ default │ │ 0 │ 0 │ 0 │ │ │ ᴺᵁᴸᴸ │
└────────────────────┴───────────┴──────────────┴──────────────────────┴─────────────┴─────────────┴───────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────┘
3 rows in set. Elapsed: 0.035 sec.
d1da863ce311 :) show create table t1;
SHOW CREATE TABLE t1
Query id: ce5bdb07-c2a3-4bc1-968a-8d4ef46f0396
┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE default.t1
(
`ID` UInt32,
`Name` String
)
ENGINE = MergeTree
ORDER BY ID
SETTINGS index_granularity = 8192 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.012 sec. Add one row to t1 individually on each shard: d1da863ce311 :) select * from t1;
SELECT *
FROM t1
Query id: 249ca169-1c6b-45e1-ace4-4cc33d5d90d3
┌─ID─┬─Name─┐
│ 1 │ a │
└────┴──────┘
1 row in set. Elapsed: 0.012 sec. On Shard 2: 6377136da30b :) select * from t1;
SELECT *
FROM t1
Query id: 54deffa0-f52a-4ae5-9a54-03377775830f
┌─ID─┬─Name─┐
│ 2 │ b │
└────┴──────┘
1 row in set. Elapsed: 0.015 sec. On Shard 3: 91dc18ab6188 :) select * from t1;
SELECT *
FROM t1
Query id: e0c54baa-9cc1-4e93-b591-c1d331233bbe
┌─ID─┬─Name─┐
│ 3 │ c │
└────┴──────┘
1 row in set. Elapsed: 0.011 sec. Now, create the Distributed version of t1 on Shard 1: d1da863ce311 :) CREATE TABLE t1_distributed
(
ID UInt32,
Name String
)
ENGINE = Distributed('replicated_cluster', 'default', t1, rand());
CREATE TABLE t1_distributed
(
`ID` UInt32,
`Name` String
)
ENGINE = Distributed('replicated_cluster', 'default', t1, rand())
Query id: 8f1b8e11-7d63-4718-8b08-ea0e5af983b4
Ok.
0 rows in set. Elapsed: 0.009 sec.
d1da863ce311 :) select * from t1_distributed;
SELECT *
FROM t1_distributed
Query id: 96e99458-e47a-4065-931f-70a15df84540
┌─ID─┬─Name─┐
│ 1 │ a │
└────┴──────┘
┌─ID─┬─Name─┐
│ 2 │ b │
└────┴──────┘
┌─ID─┬─Name─┐
│ 3 │ c │
└────┴──────┘
3 rows in set. Elapsed: 0.052 sec. As expected, all 3 rows are returned as all 3 shards are online. Now, stop Shard 2. On Shard 1: d1da863ce311 :) select * from t1_distributed;
SELECT *
FROM t1_distributed
Query id: 76f34040-4577-409c-8b2a-b9bbe7c413a0
┌─ID─┬─Name─┐
│ 1 │ a │
└────┴──────┘
┌─ID─┬─Name─┐
│ 3 │ c │
└────┴──────┘
↗ Progress: 2.00 rows, 28.00 B (3.67 rows/s., 51.36 B/s.) 99%
2 rows in set. Elapsed: 0.545 sec.
Received exception from server (version 23.11.1):
Code: 279. DB::Exception: Received from localhost:9000. DB::NetException. DB::NetException: All connection tries failed. Log:
Code: 32. DB::Exception: Attempt to read after eof. (ATTEMPT_TO_READ_AFTER_EOF) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
: While executing Remote. (ALL_CONNECTION_TRIES_FAILED) As expected, exception is thrown to the client since Shard 2 is unreachable. Now, re-create d1da863ce311 :) drop table t1_distributed;
DROP TABLE t1_distributed
Query id: f242a3cf-ab62-4eef-a2c5-07a7591178c6
Ok.
0 rows in set. Elapsed: 0.004 sec.
d1da863ce311 :) CREATE TABLE t1_distributed
(
ID UInt32,
Name String
)
ENGINE = Distributed('replicated_cluster', 'default', t1, rand())
SETTINGS skip_unavailable_shards=true;
CREATE TABLE t1_distributed
(
`ID` UInt32,
`Name` String
)
ENGINE = Distributed('replicated_cluster', 'default', t1, rand())
SETTINGS skip_unavailable_shards = 1
Query id: 14aebde8-f4bf-4631-b29e-2ae97da5ef36
Ok.
0 rows in set. Elapsed: 0.009 sec.
d1da863ce311 :) select * from t1_distributed;
SELECT *
FROM t1_distributed
Query id: 217b2347-9924-4d1e-92fa-72150da484b1
┌─ID─┬─Name─┐
│ 1 │ a │
└────┴──────┘
┌─ID─┬─Name─┐
│ 3 │ c │
└────┴──────┘
2 rows in set. Elapsed: 0.476 sec. As expected, ClickHouse now silently skips the unavailable Shard 2. Confirm that the query-level setting overrides the table-level setting: d1da863ce311 :) select * from t1_distributed SETTINGS skip_unavailable_shards=false;
SELECT *
FROM t1_distributed
SETTINGS skip_unavailable_shards = 0
Query id: 6b09dc96-9a4b-4670-b41f-1ead48998d96
┌─ID─┬─Name─┐
│ 1 │ a │
└────┴──────┘
┌─ID─┬─Name─┐
│ 3 │ c │
└────┴──────┘
← Progress: 2.00 rows, 28.00 B (2.31 rows/s., 32.39 B/s.) 99%
2 rows in set. Elapsed: 0.864 sec.
Received exception from server (version 23.11.1):
Code: 279. DB::Exception: Received from localhost:9000. DB::NetException. DB::NetException: All connection tries failed. Log:
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
Code: 210. DB::NetException: Connection refused (clickhouse2:9000). (NETWORK_ERROR) (version 23.11.1.1)
: While executing Remote. (ALL_CONNECTION_TRIES_FAILED) I have also confirmed that the user-level setting also overrides the table-level setting as expected. |
Why do we need to do this feature? It can lead to incorrect data. |
@@ -17,6 +17,8 @@ class ASTStorage; | |||
#define LIST_OF_DISTRIBUTED_SETTINGS(M, ALIAS) \ | |||
M(Bool, fsync_after_insert, false, "Do fsync for every inserted. Will decreases performance of inserts (only for background INSERT, i.e. distributed_foreground_insert=false)", 0) \ | |||
M(Bool, fsync_directories, false, "Do fsync for temporary directory (that is used for background INSERT only) after all part operations (writes, renames, etc.).", 0) \ | |||
/** This is the distributed version of the skip_unavailable_shards setting available in src/Core/Settings.h */ \ | |||
M(Bool, skip_unavailable_shards, false, "If true, ClickHouse silently skips unavailable shards and nodes unresolvable through DNS. Shard is marked as unavailable when none of the replicas can be reached.", 0) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is ambiguous. It could be read as only DNS failures are skipped. But connection failures and even shards with non-existent tables are skipped as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the original description has the same problem, it should be fixed as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the comment. Please take a look and let me know if the new description is OK, or if you need this further updated. I have also updated the comment in the original description in src/Core/Settings.h.
Need to add a test. It can be a functional test using the test clusters. You can take a similar test for skip_unavailable_shards as an example. |
Missing changelog entry. |
This is an automated comment for commit bae58fe with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
58a8700
to
946845a
Compare
@alexey-milovidov I have added a functional test. Also added the changelog entry. Please review. |
It looks like the test I added has turned flaky: https://s3.amazonaws.com/clickhouse-test-reports/57218/946845a6425ea01514ca456af96447967029433d/stateless_tests_flaky_check__asan_.html This is due to this line in the test case for testing the query-level override of the setting: SELECT *, _shard_num FROM table_02916_distributed SETTINGS skip_unavailable_shards=0; -- { serverError ALL_CONNECTION_TRIES_FAILED } I don't see any other tests in the testsuite which has We have two options:
What do you think? |
946845a
to
932327a
Compare
I added the |
932327a
to
3ba3c93
Compare
… table. This setting, when enabled (disabled by default), allows ClickHouse to silently skip unavailable shards of a Distributed table during a query execution, instead of throwing an exception to the client.
3ba3c93
to
e547db0
Compare
@alexey-milovidov The CI is now green, except for the Docs Check. I had to comment out the query-level override test statement in the test file: --SELECT *, _shard_num FROM table_02916_distributed SETTINGS skip_unavailable_shards=0; -- { serverError ALL_CONNECTION_TRIES_FAILED } For some reason, if I remove this line instead of commenting out, the test turns flaky. It doesn't look like an issue with the code itself but it's probably a bug in the testing framework. Please review. |
Ok. The docs check asks for adding documentation. I will change the category to "Imrovement" to not have this requirement. |
Flaky check also introduces random sleeps, and slowdowns the test many times, and then checks that it fits in 60 seconds. This could be the reason for what you see. Usually, we merge if the slow-down is still reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is perfect, let's check the CI run, and we can merge...
@alexey-milovidov It looks like you updated the test case and uncommented this line: --SELECT *, _shard_num FROM table_02916_distributed SETTINGS skip_unavailable_shards=0; -- { serverError ALL_CONNECTION_TRIES_FAILED } The test has again turned flaky due to this change. If you read my previous comment, I had to comment out this line in the test to make it pass the CI. This is most likely an issue in the testing framework that needs a separate investigation/bug ticket. You will have to revert your commit. |
"Flaky check" runs the test in parallel with itself; there might be a problem with that. I will check. |
But the test is, in fact, flaky: https://s3.amazonaws.com/clickhouse-test-reports/57218/d4e8c0558302b2a006cac17976558e2617ddd982/stateless_tests__asan__[1_4].html |
@tntnatbry We have settings randomization in tests to cover more edge cases in a randomized fashion. The reason - when it is set, and the address is localhost, the query goes to localhost without network communication, regardless of the address's port. You can change the cluster definition for |
Still, there is something. |
Reference file also needed to be updated. Just pushed another commit. Let's see how this looks now. Thanks for sharing the article, almost that time of the year again! |
@alexey-milovidov Test is no longer flaky now. I needed to I am now seeing an integration test failing: https://s3.amazonaws.com/clickhouse-test-reports/57218/bae58febf3b2573a745243ae5ed016aae13b3d49/integration_tests__release__[4_4].html Out of these 2, only # If we forbid stale replicas, the query must fail. But sometimes we must have bigger timeouts.
for _ in range(20):
try:
instance_with_dist_table.query(
"""
SELECT count() FROM distributed SETTINGS
load_balancing='in_order',
max_replica_delay_for_distributed_queries=1,
fallback_to_stale_replicas_for_distributed_queries=0
"""
)
time.sleep(0.5)
except:
break
else:
raise Exception("Didn't raise when stale replicas are not allowed") When I run this integration test locally on my machine, it is passing: root@tntnatbry:/home/tntnatbry/git-projects/ClickHouse/tests/integration# ./runner 'test_delayed_replica_failover'
2023-12-13 14:56:19,783 [ 445878 ] INFO : ClickHouse root is not set. Will use /home/tntnatbry/git-projects/ClickHouse (runner:42, check_args_and_update_paths)
2023-12-13 14:56:19,784 [ 445878 ] INFO : Cases dir is not set. Will use /home/tntnatbry/git-projects/ClickHouse/tests/integration (runner:90, check_args_and_update_paths)
2023-12-13 14:56:19,784 [ 445878 ] INFO : utils dir is not set. Will use /home/tntnatbry/git-projects/ClickHouse/utils (runner:101, check_args_and_update_paths)
2023-12-13 14:56:19,784 [ 445878 ] INFO : base_configs_dir: /home/tntnatbry/git-projects/ClickHouse/programs/server, binary: /home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-server, cases_dir: /home/tntnatbry/git-projects/ClickHouse/tests/integration (runner:103, check_args_and_update_paths)
clickhouse_integration_tests_volume
Running pytest container as: 'docker run --net=host -it --rm --name clickhouse_integration_tests_1pn11o --privileged --dns-search='.' --volume=/home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-odbc-bridge:/clickhouse-odbc-bridge --volume=/home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-server:/clickhouse --volume=/home/tntnatbry/git-projects/ClickHouse/build/programs/clickhouse-library-bridge:/clickhouse-library-bridge --volume=/home/tntnatbry/git-projects/ClickHouse/programs/server:/clickhouse-config --volume=/home/tntnatbry/git-projects/ClickHouse/tests/integration:/ClickHouse/tests/integration --volume=/home/tntnatbry/git-projects/ClickHouse/utils/backupview:/ClickHouse/utils/backupview --volume=/home/tntnatbry/git-projects/ClickHouse/utils/grpc-client/pb2:/ClickHouse/utils/grpc-client/pb2 --volume=/run:/run/host:ro --volume=clickhouse_integration_tests_volume:/var/lib/docker -e DOCKER_CLIENT_TIMEOUT=300 -e COMPOSE_HTTP_TIMEOUT=600 -e PYTHONUNBUFFERED=1 -e PYTEST_ADDOPTS=" test_delayed_replica_failover -vvv" clickhouse/integration-tests-runner:latest '.
Start tests
========================================================================================================================================================================================================================================== test session starts ==========================================================================================================================================================================================================================================
platform linux -- Python 3.10.12, pytest-7.4.3, pluggy-1.3.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /ClickHouse/tests/integration
configfile: pytest.ini
plugins: xdist-3.5.0, repeat-0.9.3, order-1.0.0, random-0.2, timeout-2.2.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
collected 1 item
test_delayed_replica_failover/test.py::test PASSED [100%]
===================================================================================================================================================================================================================================== 1 passed in 63.37s (0:01:03) ====================================================================================================================================================================================================================================== Do you know what the issue here is? |
This setting, when enabled (disabled by default), allows ClickHouse to silently skip unavailable shards of a Distributed table during a query execution, instead of throwing an exception to the client.
Changelog category:
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add
skip_unavailable_shards
as a setting forDistributed
tables that is similar to the corresponding query-level setting. Closes #43666.