Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: distributed execution of replace into statement #12119

Merged
merged 78 commits into from
Aug 10, 2023

Conversation

SkyFan2002
Copy link
Member

@SkyFan2002 SkyFan2002 commented Jul 17, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Summary about this PR


This change is Reviewable

@vercel
Copy link

vercel bot commented Jul 17, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
databend ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 10, 2023 2:43pm

# Conflicts:
#	src/query/service/src/interpreters/interpreter_copy.rs
#	src/query/service/src/interpreters/interpreter_replace.rs
#	src/query/service/src/pipelines/pipeline_builder.rs
#	src/query/sql/src/executor/format.rs
#	src/query/sql/src/executor/physical_plan.rs
#	src/query/storages/fuse/src/operations/common/mutation_accumulator.rs
#	src/query/storages/fuse/src/operations/common/processors/transform_mutation_aggregator.rs
#	src/query/storages/fuse/src/operations/replace.rs
#	src/query/storages/fuse/src/operations/replace_into/mutator/merge_into_mutator.rs
#	src/query/storages/fuse/src/operations/replace_into/mutator/mutator_replace_into.rs
#	src/query/storages/fuse/src/operations/replace_into/processors/processor_replace_into.rs
# Conflicts:
#	src/query/service/src/pipelines/pipeline_builder.rs
#	src/query/sql/src/executor/format.rs
#	src/query/sql/src/executor/physical_plan.rs
#	src/query/sql/src/executor/physical_plan_display.rs
#	src/query/sql/src/executor/physical_plan_visitor.rs
#	src/query/sql/src/executor/profile.rs
@BohuTANG BohuTANG removed their request for review August 8, 2023 06:30
SkyFan2002 and others added 9 commits August 8, 2023 21:50
Co-authored-by: dantengsky <dantengsky@gmail.com>
Co-authored-by: dantengsky <dantengsky@gmail.com>
…/processor_unbranched_replace_into.rs

Co-authored-by: dantengsky <dantengsky@gmail.com>
Co-authored-by: dantengsky <dantengsky@gmail.com>
Co-authored-by: dantengsky <dantengsky@gmail.com>
@dantengsky
Copy link
Member

dantengsky commented Aug 9, 2023

@SkyFan2002 LGTM & thanks, but I found a highly specious bug of replace-into, standalone mode. before the bug is fixed or identified as non-bug, let's postpone the merging of this PR.

# Conflicts:
#	src/query/service/src/pipelines/pipeline_builder.rs
#	src/query/sql/src/executor/physical_plan.rs
#	src/query/sql/src/executor/physical_plan_visitor.rs
#	src/query/sql/src/executor/profile.rs
@SkyFan2002 SkyFan2002 added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Aug 10, 2023
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-12119-e37c90a

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dantengsky
Copy link
Member

dantengsky commented Aug 10, 2023

rr long-run, 1000 iterations, local fs + standalone mode passed

@dantengsky
Copy link
Member

3 nodes, distributed deployment, local fs, 10000 iteration

PASSED

MySQL [(none)]> select * from system.clusters;
+------------------------+-----------+-------+--------------------------------------------------------------------------------------------------------------+
| name                   | host      | port  | version                                                                                                      |
+------------------------+-----------+-------+--------------------------------------------------------------------------------------------------------------+
| JQQmH7yG3CjyKSqZY2LyD4 | 127.0.0.1 | 19093 | v1.2.55-nightly-f545f7c2cb08567013c327acfc5271ac3531e274(rust-1.72.0-nightly-2023-08-10T09:27:14.168048062Z) |
| KyVJW8dsE1wYbkiepZrPA4 | 127.0.0.1 | 19091 | v1.2.55-nightly-f545f7c2cb08567013c327acfc5271ac3531e274(rust-1.72.0-nightly-2023-08-10T09:27:14.168048062Z) |
| fYzUEb3BGIZfAcmlvT1FA1 | 127.0.0.1 | 19092 | v1.2.55-nightly-f545f7c2cb08567013c327acfc5271ac3531e274(rust-1.72.0-nightly-2023-08-10T09:27:14.168048062Z) |
+------------------------+-----------+-------+--------------------------------------------------------------------------------------------------------------+
3 rows in set (0.002 sec)

MySQL [(none)]> set global enable_distributed_replace_into = 1;

[2023-08-10T14:29:59Z INFO  test_replace_recluster] ==========================
[2023-08-10T14:29:59Z INFO  test_replace_recluster] ====verify table state====
[2023-08-10T14:29:59Z INFO  test_replace_recluster] ==========================
[2023-08-10T14:29:59Z INFO  test_replace_recluster]
[2023-08-10T14:29:59Z INFO  test_replace_recluster]
[2023-08-10T14:29:59Z INFO  test_replace_recluster] number of successfully executed replace-into statements : 8326
[2023-08-10T14:29:59Z INFO  test_replace_recluster]
[2023-08-10T14:29:59Z INFO  test_replace_recluster]
[2023-08-10T14:29:59Z INFO  test_replace_recluster] CHECK: value of successfully executed replace into statements
[2023-08-10T14:29:59Z INFO  test_replace_recluster] CHECK: value of correlated column
[2023-08-10T14:29:59Z INFO  test_replace_recluster] CHECK: full table scanning
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ===========================
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ======     PASSED      ====
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ===========================
[2023-08-10T14:30:03Z INFO  test_replace_recluster]
[2023-08-10T14:30:03Z INFO  test_replace_recluster]
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ========METRICS============
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_commit_mutation_unresolvable_conflict : 8896.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_after_pruning : 74331.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_block_number_after_pruning : 37121.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_block_number_after_pruning : 199039.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_bloom_pruned : 70404.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_block_number_bloom_pruned : 34972.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_block_number_bloom_pruned : 180515.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_source : 32221.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_totally_loaded : 3541.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_block_number_totally_loaded : 2020.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_block_number_totally_loaded : 17260.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_whole_block_deletion : 53.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_block_number_whole_block_deletion : 217.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_write : 3541.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_block_number_write : 2020.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_block_number_write : 17260.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_block_number_zero_row_deleted : 333.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_block_number_zero_row_deleted : 129.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_block_number_zero_row_deleted : 1047.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_number_accumulate_merge_action : 9917.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_number_accumulate_merge_action : 3489.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_number_accumulate_merge_action : 35903.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_number_apply_deletion : 4328.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_number_apply_deletion : 1470.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_number_apply_deletion : 14231.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_partition_number : 13482774.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_partition_number : 16660260.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_partition_number : 16660260.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_row_number_after_pruning : 11719470790.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_row_number_after_pruning : 6094381660.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_row_number_after_pruning : 29244447168.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_row_number_after_table_level_pruning : 13484951.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_row_number_after_table_level_pruning : 16662834.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_row_number_after_table_level_pruning : 16662834.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_row_number_source_block : 13485000.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_row_number_source_block : 16663000.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_row_number_source_block : 16663000.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_row_number_totally_loaded : 607752660.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_row_number_totally_loaded : 353914055.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_row_number_totally_loaded : 2811621848.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_row_number_write : 607291502.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_row_number_write : 353705055.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_row_number_write : 2808425860.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_segment_number_after_pruning : 4327.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_segment_number_after_pruning : 1470.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_segment_number_after_pruning : 14219.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_time_accumulated_merge_action_ms : 14503.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_time_accumulated_merge_action_ms : 8123.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_time_accumulated_merge_action_ms : 54011.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_time_apply_deletion_ms : 646281.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_time_apply_deletion_ms : 378893.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_time_apply_deletion_ms : 2998191.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] JQQmH7yG3CjyKSqZY2LyD4  fuse_replace_into_time_process_input_block_ms : 15305.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  fuse_replace_into_time_process_input_block_ms : 14567.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] fYzUEb3BGIZfAcmlvT1FA1  fuse_replace_into_time_process_input_block_ms : 14484.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  replace_into_time_execution_ms : 4834740.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] KyVJW8dsE1wYbkiepZrPA4  replace_into_time_mutation_ms : 3670031.0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ===========================
[2023-08-10T14:30:03Z INFO  test_replace_recluster]
[2023-08-10T14:30:03Z INFO  test_replace_recluster]
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ======CLUSTERING INFO======
[2023-08-10T14:30:03Z INFO  test_replace_recluster] cluster_key : (to_yyyymmdd(insert_time), id)
[2023-08-10T14:30:03Z INFO  test_replace_recluster] block_count: 56
[2023-08-10T14:30:03Z INFO  test_replace_recluster] constant_block_count: 2
[2023-08-10T14:30:03Z INFO  test_replace_recluster] unclustered_block_count: 0
[2023-08-10T14:30:03Z INFO  test_replace_recluster] average_overlaps: 15.6429
[2023-08-10T14:30:03Z INFO  test_replace_recluster] average_depth: 10.2679
[2023-08-10T14:30:03Z INFO  test_replace_recluster] block_depth_histogram: {"00002":1,"00008":4,"00009":10,"00010":17,"00011":7,"00012":17}
[2023-08-10T14:30:03Z INFO  test_replace_recluster] ===========================

Copy link
Member

@dantengsky dantengsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge

@BohuTANG BohuTANG merged commit 93ad5f7 into datafuselabs:main Aug 10, 2023
55 checks passed
andylokandy pushed a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
…2119)

* refactor copy into

* fix panic

* fix

* fix

* fix

* make lint

* fix logic error

* replace into values

* fix

* fix

* fix render result

* fix schema cast

* temp

* respect datafuselabs#12147

* respect datafuselabs#12100

* make lint

* respect datafuselabs#12130

* fix merge

* add exchange

* fix conflict

* fix schema cast

* fix conlfict

* fix

* fix copy plan

* clear log

* fix copy

* fix copy

* run ci

* fix purge

* make lint

* add exchange

* disable dist for value source

* adjust exchange

* remove top exchange

* adjust replace into

* reshuffle

* fix

* fix reshuffle

* move segment_partition_num

* resolve conflicts

* add need insert flag

* unbranched_replace_into_processor

* merge only pipeline

* fix segment index

* fix conflict

* remove log

* fix empty table

* fix stateful test

* fix stateful test

* modify test

* fix typo

* fix random source

* add setting

* remove empty file

* remove dead code

* add default setting

* Update src/query/service/src/interpreters/interpreter_replace.rs

Co-authored-by: dantengsky <dantengsky@gmail.com>

* Update src/query/sql/src/executor/physical_plan_display.rs

Co-authored-by: dantengsky <dantengsky@gmail.com>

* Update src/query/storages/fuse/src/operations/replace_into/processors/processor_unbranched_replace_into.rs

Co-authored-by: dantengsky <dantengsky@gmail.com>

* Update src/query/sql/src/executor/physical_plan.rs

Co-authored-by: dantengsky <dantengsky@gmail.com>

* Update src/query/sql/src/executor/physical_plan_display.rs

Co-authored-by: dantengsky <dantengsky@gmail.com>

* rename struct

* default 0

* regen golden file

* set enable_distributed_replace_into = 1 in slt

* make lint

---------

Co-authored-by: dantengsky <dantengsky@gmail.com>
Co-authored-by: JackTan25 <60096118+JackTan25@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: distributed execution of replace into statement
5 participants