Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] spark sql cow hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload' not work #9342

Closed
lucienoz opened this issue Aug 2, 2023 · 3 comments
Labels
on-call-triaged priority:minor everything else; usability gaps; questions; feature reqs spark-sql

Comments

@lucienoz
Copy link

lucienoz commented Aug 2, 2023

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :0.12.3

  • Spark version :3.3.1

  • Hive version :3.1.2

  • Hadoop version :3.3.5

  • Storage (HDFS/S3/GCS..) :hdfs

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@lucienoz
Copy link
Author

lucienoz commented Aug 2, 2023

create table hudi.payload_demo_cow
(id bigint,
value string,
ts long)
using hudi
tblproperties (
type = 'cow',
primaryKey = 'id',
hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload',
preCombineField = 'ts' );

-- first insert
insert into hudi.payload_demo_cow (id,value,ts) values (1,'2',3);

select * from hudi.payload_demo_cow;
+--+-----+--+
|id|value|ts|
+--+-----+--+
|1 |2 |3 |
+--+-----+--+
-- second insert
insert into hudi.payload_demo_cow (id,value,ts) values (1,'1',2);

select * from hudi.payload_demo_cow;
+--+-----+--+
|id|value|ts|
+--+-----+--+
|1 |1 |2 |
+--+-----+--+

@codope codope added priority:minor everything else; usability gaps; questions; feature reqs spark-sql labels Aug 2, 2023
@lucienoz lucienoz changed the title [SUPPORT] spark sql cow how to set payload class? [SUPPORT] spark sql cow hoodie.datasource.write.payload.class = 'org.apache.hudi.common.model.DefaultHoodieRecordPayload' not work Aug 3, 2023
@ad1happy2go
Copy link
Collaborator

@lucienoz Sorry for the delay here. I was able to reproduce this issue with hudi 0.12.3 but this was fixed later with 0.13.1 version. Can you please upgrade Hudi version to 0.13.1.

Also, With new release (0.14.0) which will be out soon, we have quite a few improvements on the same line. INSERT INTO by default will behave as operation type insert and allow all the duplicates to flow in. If using insert.into behaviour as insert, it is behaving correctly based on the payload.

@lucienoz
Copy link
Author

Got it, thank you @ad1happy2go I'll upgrade and give it a try later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
on-call-triaged priority:minor everything else; usability gaps; questions; feature reqs spark-sql
Projects
Archived in project
Development

No branches or pull requests

3 participants