-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Fix possible parts duplicates after ALTER TABLE MOVE PART TO SHARD #50777
Closed
Commits on Jun 9, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 48e18d6 - Browse repository at this point
Copy the full SHA 48e18d6View commit details -
Do not try to parse non existing replication queue entries
This is possible for parts moves between shards: 2023.06.08 12:22:47.764173 [ 104 ] {} <Error> default.test_move (PartMovesBetweenShardsOrchestrator): bool DB::PartMovesBetweenShardsOrchestrator::step(): Code: 27. DB::ParsingException: Cannot parse input: expected 'format version: ' at end of stream.: while reading entry: : while reading entry: . (CANNOT_PARSE_INPUT_ASSERTION_FAILED), Stack trace (when copying this message, always include the lines below): 0. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/exception:134: Poco::Exception::Exception(String const&, int) @ 0x000000001ae703f2 in /usr/bin/clickhouse 1. ./.cmake-llvm16/./src/Common/Exception.cpp:92: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000011ec4355 in /usr/bin/clickhouse 2. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/string:1499: DB::ParsingException::ParsingException<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000011f213a3 in /usr/bin/clickhouse 3. ./.cmake-llvm16/./src/IO/ReadHelpers.cpp:101: DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x0000000011f12d12 in /usr/bin/clickhouse 4. ./.cmake-llvm16/./src/IO/ReadHelpers.cpp:137: ? @ 0x0000000011f12faf in /usr/bin/clickhouse 5. ./.cmake-llvm16/./src/IO/Operators.h:0: DB::ReplicatedMergeTreeLogEntryData::readText(DB::ReadBuffer&, StrongTypedef<unsigned int, DB::MergeTreeDataFormatVersionTag>) @ 0x0000000018391af9 in /usr/bin/clickhouse 6. ./.cmake-llvm16/./src/Storages/MergeTree/ReplicatedMergeTreeLogEntry.cpp:0: DB::ReplicatedMergeTreeLogEntry::parse(String const&, Coordination::Stat const&, StrongTypedef<unsigned int, DB::MergeTreeDataFormatVersionTag>) @ 0x0000000018392fe8 in /usr/bin/clickhouse 7. ./.cmake-llvm16/./src/Storages/StorageReplicatedMergeTree.cpp:6070: DB::StorageReplicatedMergeTree::tryWaitForReplicaToProcessLogEntry(String const&, String const&, DB::ReplicatedMergeTreeLogEntryData const&, long) @ 0x0000000017de2627 in /usr/bin/clickhouse 8. ./.cmake-llvm16/./src/Storages/StorageReplicatedMergeTree.cpp:5864: DB::StorageReplicatedMergeTree::tryWaitForAllReplicasToProcessLogEntry(String const&, DB::ReplicatedMergeTreeLogEntryData const&, long) @ 0x0000000017dddc0c in /usr/bin/clickhouse 9. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/vector:543: DB::PartMovesBetweenShardsOrchestrator::stepEntry(DB::PartMovesBetweenShardsOrchestrator::Entry, std::shared_ptr<zkutil::ZooKeeper>) @ 0x000000001836eb2b in /usr/bin/clickhouse 10. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/string:1499: DB::PartMovesBetweenShardsOrchestrator::step() @ 0x0000000018365f6e in /usr/bin/clickhouse 11. ./.cmake-llvm16/./src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp:47: DB::PartMovesBetweenShardsOrchestrator::run() @ 0x0000000018365094 in /usr/bin/clickhouse 12. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/__functional/function.h:0: DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x00000000166d3394 in /usr/bin/clickhouse 13. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:701: DB::BackgroundSchedulePool::threadFunction() @ 0x00000000166d5a96 in /usr/bin/clickhouse 14. ./.cmake-llvm16/./src/Core/BackgroundSchedulePool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000166d60ae in /usr/bin/clickhouse 15. ./.cmake-llvm16/./base/base/../base/wide_integer_impl.h:796: ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x0000000011f7f45e in /usr/bin/clickhouse 16. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:302: void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x0000000011f82eae in /usr/bin/clickhouse 17. ? @ 0x00007ffff7f95609 in ? 18. __clone @ 0x00007ffff7eba133 in ? (version 23.5.1.1) Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Configuration menu - View commit details
-
Copy full SHA for d7894e4 - Browse repository at this point
Copy the full SHA d7894e4View commit details -
Fix possible parts duplicates after ALTER TABLE MOVE PART TO SHARD
Before this patch ALTER TABLE MOVE PART TO SHARD has a race, and that was the reason for flaky test. Parts moves between shards works as follow (some steps are omitted, only relevant is here): 1) I: writes parts for migration written to zookeeper (/pinned_parts_uuids) 2) I: queue request to read this parts from zookeeper (SYNC_SOURCE) 3) R1/R2: process SYNC_SOURCE 4) R1/R2: process SYNC_PINNED_PARTS_UUIDS (read parts from /pinned_parts_uuids) 5) R1/R2: triggers DESTINATION_FETCH (after this destionation shard could clone the part) Where: - I - initiator - R1/R2 - replica 1 and 2 Now imagine the following, SELECT query started before (4) finished, and now (4) finished, and so now remote nodes could already fetch data from the replicas, but those SELECT query may know nothing about pinned_parts_uuids and in this case, the query will not filter out duplicates (since it is done only if there are pinned_parts_uuids - to avoid overhead for tables without active parts movements), and this will lead to duplicates in the query. So to fix this I see only one (icky) option - do not allow any queries in parallel with updating pinned_parts_uuids. Note, that this should be enough because SYNC_PINNED_PARTS_UUIDS processed synchronously on all replicas, so it is not possible that parts will appears on the destination shards before it is finished. And AFAICS dead lock should not be possible, since acquiring locks in reverse order (exclusive_lock, pinned_part_uuids_mutex) does not happens. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Configuration menu - View commit details
-
Copy full SHA for 14ecff6 - Browse repository at this point
Copy the full SHA 14ecff6View commit details
Commits on Jun 12, 2023
-
Avoid deadlock for syncing pinned part UUIDs on startup
CI: https://s3.amazonaws.com/clickhouse-test-reports/50777/14ecff6db4b603bba1529f9c2790488216e1d5e1/integration_tests__asan__[6_6].html Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Configuration menu - View commit details
-
Copy full SHA for d89561c - Browse repository at this point
Copy the full SHA d89561cView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.