Skip to content

Commit

Permalink
tests: more robust 02903_rmt_retriable_merge_exception
Browse files Browse the repository at this point in the history
- separate uuid extraction
- add preliminary exit
- disable for ordinary database
- less number of attempts
- add optimize_throw_if_noop and missing sync replica, to fix:

  2023.10.24 15:18:35.925533 [ 640 ] {da7418c6-3d51-45bc-a0d0-4970bb0cdd51} <Debug> test_3kgjgry1.rmt1 (d18afb81-3a4b-4c02-b281-5575dce2f440): Cannot select parts for optimization: Entry for part all_1_1_0 hasn't been read from the replication log yet (in partition all)

- fix in case of ZooKeeper retries

  2023.10.24 11:50:24.792511 [ 1437 ] {c39fd15b-e2e6-4291-9912-39fda75ebcd5} <Trace> test_qxkzmigq.rmt1 (1c086c74-9ebe-495c-bbd2-87ab2d8ec43d): Renaming temporary part tmp_insert_all_1_1_0 to all_1_1_0 with tid (1, 1, 00000000-0000-0000-0000-000000000000).
  2023.10.24 11:50:24.797320 [ 1437 ] {c39fd15b-e2e6-4291-9912-39fda75ebcd5} <Trace> test_qxkzmigq.rmt1 (1c086c74-9ebe-495c-bbd2-87ab2d8ec43d) (Replicated OutputStream): ZooKeeperWithFaultInjection call FAILED: seed=17644626169032325693 func=tryMulti path=/clickhouse/zero_copy code=Session expired message=Fault injection before operation
  2023.10.24 11:50:24.797536 [ 1437 ] {c39fd15b-e2e6-4291-9912-39fda75ebcd5} <Debug> test_qxkzmigq.rmt1 (1c086c74-9ebe-495c-bbd2-87ab2d8ec43d): Undoing transaction. Rollbacking parts state to temporary and removing from working set: all_1_1_0.
  ...
  2023.10.24 11:50:25.000349 [ 1437 ] {c39fd15b-e2e6-4291-9912-39fda75ebcd5} <Trace> test_qxkzmigq.rmt1 (1c086c74-9ebe-495c-bbd2-87ab2d8ec43d): Renaming temporary part tmp_insert_all_1_1_0 to all_2_2_0 with tid (1, 1, 00000000-0000-0000-0000-000000000000).
  2023.10.24 11:50:25.007477 [ 760 ] {} <Trace> test_qxkzmigq.rmt1 (ReplicatedMergeTreeQueue): Insert entry queue-0000000000 to queue with type GET_PART with virtual parts [all_2_2_0]

CI:
- https://s3.amazonaws.com/clickhouse-test-reports/55944/da408df4a7296835897d7cef80d63f252df79b75/stateless_tests__tsan__s3_storage__[2_5].html
- https://s3.amazonaws.com/clickhouse-test-reports/55944/da408df4a7296835897d7cef80d63f252df79b75/stateless_tests_flaky_check__asan_.html
- https://s3.amazonaws.com/clickhouse-test-reports/55944/02fdd0513f7d413ce4ac39a70566855327ebfade/stateless_tests__aarch64_.html
Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
  • Loading branch information
azat committed Oct 24, 2023
1 parent da408df commit 66c4a3b
Showing 1 changed file with 21 additions and 11 deletions.
32 changes: 21 additions & 11 deletions tests/queries/0_stateless/02903_rmt_retriable_merge_exception.sh
@@ -1,4 +1,6 @@
#!/usr/bin/env bash
# Tags: no-ordinary-database
# Tag no-ordinary-database: requires UUID

CUR_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# shellcheck source=../shell_config.sh
Expand All @@ -18,23 +20,32 @@ $CLICKHOUSE_CLIENT -nm -q "
insert into rmt1 values (1);
insert into rmt1 values (2);
system sync replica rmt1;
system stop pulling replication log rmt2;
optimize table rmt1 final settings alter_sync=0;
"
optimize table rmt1 final settings alter_sync=0, optimize_throw_if_noop=1;
" || exit 1

table_uuid=$($CLICKHOUSE_CLIENT -q "select uuid from system.tables where database = currentDatabase() and table = 'rmt1'")
if [[ -z $table_uuid ]]; then
echo "Table does not have UUID" >&2
exit 1
fi

# NOTE: that part name can be different from all_0_1_1, in case of ZooKeeper retries
part_name='%'

# wait while there be at least one 'No active replica has part all_0_1_1 or covering part' in logs
for _ in {0..1000}; do
for _ in {0..50}; do
no_active_repilica_messages=$($CLICKHOUSE_CLIENT -nm -q "
system flush logs;
with
(select uuid from system.tables where database = currentDatabase() and table = 'rmt1') as uuid_
select count()
from system.text_log
where
event_date >= yesterday() and event_time >= now() - 600 and
(
(logger_name = 'MergeTreeBackgroundExecutor' and message like '%{' || uuid_::String || '::all_0_1_1}%No active replica has part all_0_1_1 or covering part%') or
(logger_name = uuid_::String || '::all_0_1_1 (MergeFromLogEntryTask)' and message like '%No active replica has part all_0_1_1 or covering part%')
(logger_name = 'MergeTreeBackgroundExecutor' and message like '%{$table_uuid::$part_name}%No active replica has part $part_name or covering part%') or
(logger_name like '$table_uuid::$part_name (MergeFromLogEntryTask)' and message like '%No active replica has part $part_name or covering part%')
);
")
if [[ $no_active_repilica_messages -gt 0 ]]; then
Expand All @@ -47,16 +58,15 @@ done
$CLICKHOUSE_CLIENT -nm -q "
system start pulling replication log rmt2;
system flush logs;
with
(select uuid from system.tables where database = currentDatabase() and table = 'rmt1') as uuid_
select
level, count() > 0
from system.text_log
where
event_date >= yesterday() and event_time >= now() - 600 and
(
(logger_name = 'MergeTreeBackgroundExecutor' and message like '%{' || uuid_::String || '::all_0_1_1}%No active replica has part all_0_1_1 or covering part%') or
(logger_name = uuid_::String || '::all_0_1_1 (MergeFromLogEntryTask)' and message like '%No active replica has part all_0_1_1 or covering part%')
(logger_name = 'MergeTreeBackgroundExecutor' and message like '%{$table_uuid::$part_name}%No active replica has part $part_name or covering part%') or
(logger_name like '$table_uuid::$part_name (MergeFromLogEntryTask)' and message like '%No active replica has part $part_name or covering part%')
)
group by level;
"

0 comments on commit 66c4a3b

Please sign in to comment.