Skip to content

[SUPPORT] Primary key check in deltastreamer #6487

@WangCHX

Description

@WangCHX

Describe the problem you faced

we accidentally configure wrong primary key in the spark write config, it cause duplicate data. wondering if there is a way to avoid it.

To Reproduce
change the primary config in write config and run the spark job.

Expected behavior
maybe should block the spark job to write data if the primary key config is different from the primary key in the original table.

Environment Description

  • Hudi version : 0.11.0

  • Spark version : 3.2.1

  • Storage (HDFS/S3/GCS..) : GCS

  • Running on Docker? (yes/no) : yes. on k8s.

Metadata

Metadata

Assignees

Labels

area:ingestIngestion into Hudiarea:writerWrite client and core write operationspriority:mediumModerate impact; usability gaps

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions