Skip to content

[HUDI-6692] Don't default to bulk insert on nonpkless table if recordkey is omitted#9444

Merged
nsivabalan merged 6 commits intoapache:masterfrom
jonvex:spark_sql_write_dont_get_tableconfig_recordkey
Aug 23, 2023
Merged

[HUDI-6692] Don't default to bulk insert on nonpkless table if recordkey is omitted#9444
nsivabalan merged 6 commits intoapache:masterfrom
jonvex:spark_sql_write_dont_get_tableconfig_recordkey

Conversation

@jonvex
Copy link
Contributor

@jonvex jonvex commented Aug 14, 2023

Change Logs

If a write to a table with a pk was missing the recordkey field in options it could default to bulk insert because it was using the pre-merging properties. Now it uses the post merging properties for the recordkey field.

Impact

prevent unexpected behavior

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@nsivabalan nsivabalan changed the title [HUDI-6692] If pk table has no recordkey in write, it should fail [HUDI-6692] Do not allow switching from Primary keyed table to primary key less table Aug 14, 2023
@nsivabalan nsivabalan added release-0.14.0 priority:blocker Production down; release blocker labels Aug 14, 2023
@danny0405
Copy link
Contributor

If a write to a table with a pk was missing the recordkey field in options it would think it was a pkless write. now it fails

I'm confused, if we already know it is a table with pk, can we just use the field from table config as the record key by default. And we should not think it as a pkless table.

@jonvex
Copy link
Contributor Author

jonvex commented Aug 15, 2023

@danny0405 we default to bulk insert for pkless so what was happening is if the user forgets the recordkey field for a write it will do a bulk insert.

@danny0405
Copy link
Contributor

a write it will do a bulk insert.

Do we generate the _hoodie_record_key correctly in such case?

@jonvex jonvex changed the title [HUDI-6692] Do not allow switching from Primary keyed table to primary key less table [HUDI-6692] Don't default to bulk insert on nonpkless table if recordkey is omitted Aug 21, 2023
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan
Copy link
Contributor

@rahil-c : do you happened to know why TestFSUtilsWithRetryWrapperEnable is failing w/ java 17 specifically? if you check github actions, java 17 module is failing and when looked at the logs, I see TestFSUtilsWithRetryWrapperEnable is failing and is repeated for 204 times.

@nsivabalan nsivabalan merged commit cbd6f91 into apache:master Aug 23, 2023
@CTTY
Copy link
Contributor

CTTY commented Aug 23, 2023

Hi @nsivabalan, I checked the code and it looks like an expected behavior. For non-java17 test it also fails and repeats for 204 times. If we want it to repeat less then maybe we can look into reducing the retryLoop number at this line

prashantwason pushed a commit that referenced this pull request Sep 1, 2023
…key is omitted (#9444)

- If a write to a table with a pk was missing the recordkey field in options it could default to bulk insert because it was using the pre-merging properties. Now it uses the post merging properties for the recordkey field.

---------

Co-authored-by: Jonathan Vexler <=>
leosanqing pushed a commit to leosanqing/hudi that referenced this pull request Sep 13, 2023
…key is omitted (apache#9444)

- If a write to a table with a pk was missing the recordkey field in options it could default to bulk insert because it was using the pre-merging properties. Now it uses the post merging properties for the recordkey field.

---------

Co-authored-by: Jonathan Vexler <=>
TheR1sing3un pushed a commit to TheR1sing3un/hudi that referenced this pull request Feb 12, 2025
…key is omitted (apache#9444)

- If a write to a table with a pk was missing the recordkey field in options it could default to bulk insert because it was using the pre-merging properties. Now it uses the post merging properties for the recordkey field.

---------

Co-authored-by: Jonathan Vexler <=>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker release-0.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants