Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Cow non-partition table upsert action effect when preCombineField field value change to Smaller (SparkSQL) #6285

Closed
fujianhua168 opened this issue Aug 3, 2022 · 2 comments

Comments

@fujianhua168
Copy link

Describe the problem you faced

table description: a cow non-partition table (tblproperties with primaryKey and preCombineField) .
sql environment: spark sql
describe:
1. if I action the insert or update action with same primary key and preCombineField field value change to smaller, Unexpectedly the action effect (note: preCombineField field value change smaller should not effect, but actual this is effect);
2. if I action the merge into action with same primary key and preCombineField field value change to smaller yet, it action not effect expectedly( note: I think this is normal);
the 1 and 2 abover upsert action seem to conflict,Strangely enough.

To Reproduce

Steps to reproduce the behavior with SPARKSQL:

--step 1: create table
drop table hudi_cow_pk_cbfield_tbl;
create table hudi_cow_pk_cbfield_tbl (
id bigint,
name string,
ts bigint
) using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts',
hoodie.datasource.write.hive_style_partitioning = false
)
;

--step 2: insert into a recored with primaryKey=1, preCombineField=1000
insert into hudi_cow_pk_cbfield_tbl select 1, 'a0', 1000;
--step 3: 'insert' action with same primaryKey, but change the preCombineField value to the smaller value 100, Unexpectedly the action effect (note:is not normal)
insert into hudi_cow_pk_cbfield_tbl select 1, 'a0_new', 100;
select * from hudi_cow_pk_cbfield_tbl;

--step 4: 'update' action with same primaryKey, but change the preCombineField value to the smaller value 20, Unexpectedly the action effect yet (note:is not normal)
update hudi_cow_pk_cbfield_tbl set name='a1_new',ts=20 where id= 1;
select * from hudi_cow_pk_cbfield_tbl;

--step 5: 'merge into' action with same primaryKey, but change the preCombineField value to the smaller value 10, the action effect(note: I think this is normal)
merge into hudi_cow_pk_cbfield_tbl as target
using (select 1 as id,'a1_merge' as name,10 as ts) as source
on target.id = source.id
when matched then update set *
when not matched then insert *
;
select * from hudi_cow_pk_cbfield_tbl;

Expected behavior
if the preCombineField value chagne to smaller, the row data upsert action should not effect( i experiment in the cow-partition table, mor-table, it's normal)

Environment Description

  • Hudi version :0.11.1

  • Spark version :3.2.1

  • Hive version :3.1.0

  • Hadoop version :3.1.1

  • Storage (HDFS/S3/GCS..) :HDFS

  • Running on Docker? (yes/no) :no

@KnightChess
Copy link
Contributor

KnightChess commented Aug 3, 2022

cow use OverwriteWithLatestAvroPayload for default, but mor use DefaultHoodieRecordPayload.

hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload' in cow can achieve the effect you want

hoodie.compaction.payload.class for mor

@fujianhua168
Copy link
Author

@KnightChess , i try it, it's effect. thank you~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants