Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support Merge-Into V1 #12350

Merged
merged 124 commits into from Aug 29, 2023
Merged

Conversation

JackTan25
Copy link
Collaborator

@JackTan25 JackTan25 commented Aug 3, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

snowflake merge-into
postgres merge-into
Summary about this PR
merge into semantic:

  1. for one row in target_table, if there is over one row matched with it from the source, we need to throw an exception. (delta_lake semantic)
  2. when there are no matched clauses for the merge command and when there is nothing matched for the merge command even if there are matched clauses, use merge_insert_only (delta_lake implementation)
  3. delta_lake support table, but we need to do more like streaming,streaming_v2,values, this is hard to build plan.In delta lake, The source can be anything that you can turn into a DataFrame (when using the Scala or Python API) or anything that you can turn into a SQL query (when using the SQL API). It could even be the result of joining a 100 tables.

delta-lake implementation survey


This change is Reviewable

image a logic implementation for V1 . V1 is not ready for users, it's just a testing version. we will do more optimizations based on it. Some optimizations in the future:

tracking

@vercel
Copy link

vercel bot commented Aug 3, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
databend ⬜️ Ignored (Inspect) Visit Preview Aug 29, 2023 11:35am

@JackTan25 JackTan25 changed the title try to add merge grammer feat: support Merge Aug 3, 2023
@dantengsky dantengsky changed the title feat: support Merge feat: support Merge-Into Aug 3, 2023
@JackTan25
Copy link
Collaborator Author

JackTan25 commented Aug 10, 2023

I need to transform the merge into statement as a select ,build the insert_source as a TableReference firstly. There is a shortcoming that we can't use optimizer for the query_source. Let's resolve it in V2.

@JackTan25 JackTan25 changed the title feat: support Merge-Into feat: support Merge-Into V1 Aug 10, 2023
src/common/exception/src/exception_code.rs Outdated Show resolved Hide resolved
src/query/expression/src/kernels/filter.rs Outdated Show resolved Hide resolved
src/query/ast/src/parser/statement.rs Outdated Show resolved Hide resolved
@dantengsky
Copy link
Member

linux / sqllogic_standalone_base_parquet failed

0: statement failed: mysql client error: Server error: `ERROR HY000 (1105): InvalidRowIdIndex. Code: 1503, Text = row id column should be a column, but it's a scalar.'
[SQL] merge into t1 using (select * from t2 as t2) on t1.a = t2.a  when matched then update set t1.c = t2.c  when not matched then insert (a,b,c) values(t2.a,t2.b,t2.c);
at tests/sqllogictests/suites/base/09_fuse_engine/09_0026_merge_into:73

https://github.com/datafuselabs/databend/actions/runs/6000760718/job/16292350787?pr=12350#step:4:838

@JackTan25
Copy link
Collaborator Author

There are some optimizations advised by @b41sh , I will do it firstly. For now, I use some concat in matched and not matched, there will be many memory copy, and I will use update expr (just use only the one and the same block, no need to split). Thanks

Copy link
Member

@dantengsky dantengsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@JackTan25 JackTan25 merged commit cad282e into datafuselabs:main Aug 29, 2023
54 of 55 checks passed
@BohuTANG BohuTANG mentioned this pull request Sep 27, 2023
8 tasks
andylokandy added a commit to andylokandy/databend that referenced this pull request Nov 27, 2023
* try to add merge grammer

* finish parser stage

* finish match_clause and unmatch_clause

* finish display for merge_into

* finish grammer parser, start bind stage

* remove useless codes

* add MergeIntoPlan

* fix distributed http error

* support insert values for match

* remove useless codes

* revert match_insert

* refactor merge_into_stmt, build table_reference

* cover match pattern

* cover match pattern

* stash

* try to add merge_into_source_scan

* add merge_source_scan

* bind join

* stash

* refactor merge_source

* bind clauses

* add new plan node

* add interpreter and refactor bind

* try to add processor

* add physical plan

* fix columns_set

* add update/insert expression

* remove unused codes and start to build processor and pipeline

* add split operator

* build source pipeline

* finish pipeline build, continue to work on not-matched and matched processor

* forbidden different schema for now

* refactor expr and finish event schedule

* add util split_by_expr

* finish not match insert

* fix

* refactor merge into pipeline

* add mutation logentries

* add matched mutation

* add setting

* set not support computed expr

* fix bug

* fix col_index bug

* fix pipeline bug and add basic tests

* fix test

* fix typos

* fix typos

* fix clippy

* add more tests

* add tests

* fix bugs

* use enable_experimental_merge_into adviced by BohuTang instead

* add info

* fix typo

* fix ut

* fix native failure

* remove streamingV2Source, need to support streaming in next pr

* rename vars adviced by b41sh

* Update src/common/exception/src/exception_code.rs

Co-authored-by: Andy Lok <andylokandy@hotmail.com>

* remove useless comments

* fix

* unify codes, use bitmap to filter

* fix check

* unify codes

* fix

* check duplicate

* check duplicate

---------

Co-authored-by: Andy Lok <andylokandy@hotmail.com>
Co-authored-by: dantengsky <dantengsky@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants