Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Fix possible parts duplicates after ALTER TABLE MOVE PART TO SHARD #50777

Closed
wants to merge 4 commits into from

Commits on Jun 9, 2023

  1. Revert "Merge pull request ClickHouse#49579 from ClickHouse/remove-ob…

    …solete-test"
    
    This reverts commit fe00e66, reversing
    changes made to 6b6504e.
    azat committed Jun 9, 2023
    Configuration menu
    Copy the full SHA
    48e18d6 View commit details
    Browse the repository at this point in the history
  2. Do not try to parse non existing replication queue entries

    This is possible for parts moves between shards:
    
        2023.06.08 12:22:47.764173 [ 104 ] {} <Error> default.test_move (PartMovesBetweenShardsOrchestrator): bool DB::PartMovesBetweenShardsOrchestrator::step(): Code: 27. DB::ParsingException: Cannot parse input: expected 'format version: ' at end of stream.: while reading entry: : while reading entry: . (CANNOT_PARSE_INPUT_ASSERTION_FAILED), Stack trace (when copying this message, always include the lines below):
    
        0. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/exception:134: Poco::Exception::Exception(String const&, int) @ 0x000000001ae703f2 in /usr/bin/clickhouse
        1. ./.cmake-llvm16/./src/Common/Exception.cpp:92: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x0000000011ec4355 in /usr/bin/clickhouse
        2. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/string:1499: DB::ParsingException::ParsingException<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000011f213a3 in /usr/bin/clickhouse
        3. ./.cmake-llvm16/./src/IO/ReadHelpers.cpp:101: DB::throwAtAssertionFailed(char const*, DB::ReadBuffer&) @ 0x0000000011f12d12 in /usr/bin/clickhouse
        4. ./.cmake-llvm16/./src/IO/ReadHelpers.cpp:137: ? @ 0x0000000011f12faf in /usr/bin/clickhouse
        5. ./.cmake-llvm16/./src/IO/Operators.h:0: DB::ReplicatedMergeTreeLogEntryData::readText(DB::ReadBuffer&, StrongTypedef<unsigned int, DB::MergeTreeDataFormatVersionTag>) @ 0x0000000018391af9 in /usr/bin/clickhouse
        6. ./.cmake-llvm16/./src/Storages/MergeTree/ReplicatedMergeTreeLogEntry.cpp:0: DB::ReplicatedMergeTreeLogEntry::parse(String const&, Coordination::Stat const&, StrongTypedef<unsigned int, DB::MergeTreeDataFormatVersionTag>) @ 0x0000000018392fe8 in /usr/bin/clickhouse
        7. ./.cmake-llvm16/./src/Storages/StorageReplicatedMergeTree.cpp:6070: DB::StorageReplicatedMergeTree::tryWaitForReplicaToProcessLogEntry(String const&, String const&, DB::ReplicatedMergeTreeLogEntryData const&, long) @ 0x0000000017de2627 in /usr/bin/clickhouse
        8. ./.cmake-llvm16/./src/Storages/StorageReplicatedMergeTree.cpp:5864: DB::StorageReplicatedMergeTree::tryWaitForAllReplicasToProcessLogEntry(String const&, DB::ReplicatedMergeTreeLogEntryData const&, long) @ 0x0000000017dddc0c in /usr/bin/clickhouse
        9. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/vector:543: DB::PartMovesBetweenShardsOrchestrator::stepEntry(DB::PartMovesBetweenShardsOrchestrator::Entry, std::shared_ptr<zkutil::ZooKeeper>) @ 0x000000001836eb2b in /usr/bin/clickhouse
        10. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/string:1499: DB::PartMovesBetweenShardsOrchestrator::step() @ 0x0000000018365f6e in /usr/bin/clickhouse
        11. ./.cmake-llvm16/./src/Storages/MergeTree/PartMovesBetweenShardsOrchestrator.cpp:47: DB::PartMovesBetweenShardsOrchestrator::run() @ 0x0000000018365094 in /usr/bin/clickhouse
        12. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/__functional/function.h:0: DB::BackgroundSchedulePoolTaskInfo::execute() @ 0x00000000166d3394 in /usr/bin/clickhouse
        13. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:701: DB::BackgroundSchedulePool::threadFunction() @ 0x00000000166d5a96 in /usr/bin/clickhouse
        14. ./.cmake-llvm16/./src/Core/BackgroundSchedulePool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x00000000166d60ae in /usr/bin/clickhouse
        15. ./.cmake-llvm16/./base/base/../base/wide_integer_impl.h:796: ThreadPoolImpl<std::thread>::worker(std::__list_iterator<std::thread, void*>) @ 0x0000000011f7f45e in /usr/bin/clickhouse
        16. ./.cmake-llvm16/./contrib/llvm-project/libcxx/include/__memory/unique_ptr.h:302: void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x0000000011f82eae in /usr/bin/clickhouse
        17. ? @ 0x00007ffff7f95609 in ?
        18. __clone @ 0x00007ffff7eba133 in ?
         (version 23.5.1.1)
    
    Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
    azat committed Jun 9, 2023
    Configuration menu
    Copy the full SHA
    d7894e4 View commit details
    Browse the repository at this point in the history
  3. Fix possible parts duplicates after ALTER TABLE MOVE PART TO SHARD

    Before this patch ALTER TABLE MOVE PART TO SHARD has a race, and that
    was the reason for flaky test.
    
    Parts moves between shards works as follow (some steps are omitted, only
    relevant is here):
    
    1) I: writes parts for migration written to zookeeper (/pinned_parts_uuids)
    2) I: queue request to read this parts from zookeeper (SYNC_SOURCE)
    3) R1/R2: process SYNC_SOURCE
    4) R1/R2: process SYNC_PINNED_PARTS_UUIDS (read parts from /pinned_parts_uuids)
    5) R1/R2: triggers DESTINATION_FETCH (after this destionation shard could clone the part)
    
    Where:
    - I - initiator
    - R1/R2 - replica 1 and 2
    
    Now imagine the following, SELECT query started before (4) finished, and
    now (4) finished, and so now remote nodes could already fetch data from
    the replicas, but those SELECT query may know nothing about
    pinned_parts_uuids and in this case, the query will not filter out
    duplicates (since it is done only if there are pinned_parts_uuids - to
    avoid overhead for tables without active parts movements), and this will
    lead to duplicates in the query.
    
    So to fix this I see only one (icky) option - do not allow any queries
    in parallel with updating pinned_parts_uuids. Note, that this should be
    enough because SYNC_PINNED_PARTS_UUIDS processed synchronously on all
    replicas, so it is not possible that parts will appears on the
    destination shards before it is finished. And AFAICS dead lock should
    not be possible, since acquiring locks in reverse order
    (exclusive_lock, pinned_part_uuids_mutex) does not happens.
    
    Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
    azat committed Jun 9, 2023
    Configuration menu
    Copy the full SHA
    14ecff6 View commit details
    Browse the repository at this point in the history

Commits on Jun 12, 2023

  1. Configuration menu
    Copy the full SHA
    d89561c View commit details
    Browse the repository at this point in the history