[HUDI-5329] Spark reads table error when Flink creates table without record key and primary key by SteNicholas · Pull Request #8933 · apache/hudi

SteNicholas · 2023-06-12T03:33:54Z

Change Logs

Spark reads table error when Flink creates table without precombine.field. Spark should read table successfully when Flink create table without precombine.field.

Impact

Flink HoodieCatalog and HoodieHiveCatalog checks precombine.field for createTable.

Risk level (write none, low medium or high below)

none.

Documentation Update

none.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

danny0405 · 2023-06-12T03:59:13Z

hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java

+   */
+  public static void checkPreCombineField(Configuration conf, List<String> columnNames) {
+    String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
+    if (!columnNames.contains(preCombineField)) {


Does it still throw after spark pkless support: #8107

@danny0405, it should still throw after spark pkless support #8107, because #8107 is used for auto generation of record keys not precombine field.

After #8107, spark will not throw exception if there is no primary key, maybe we should just fix the primary key set up.

@SteNicholas This comment does not look like resolved. Can you please revisit?

SteNicholas · 2023-06-16T07:09:15Z

@danny0405, I have addressed above comments. PTAL.

danny0405 · 2023-06-16T09:37:47Z

...ink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieHiveCatalog.java

+    if (!flinkConf.contains(FlinkOptions.RECORD_KEY_FIELD)) {
+      if (catalogTable.getUnresolvedSchema().getPrimaryKey().isPresent()) {
+        final String pkColumns = String.join(",", catalogTable.getUnresolvedSchema().getPrimaryKey().get().getColumnNames());
+        flinkConf.setString(FlinkOptions.RECORD_KEY_FIELD, pkColumns);


Why we must have a primary key definition then?

danny0405 · 2023-06-16T09:38:18Z

hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java

-    if (!resolvedSchema.getColumnNames().contains(preCombineField)) {
-      if (OptionsResolver.isDefaultHoodieRecordPayloadClazz(conf)) {
-        throw new HoodieValidationException("Option '" + FlinkOptions.PRECOMBINE_FIELD.key()
-            + "' is required for payload class: " + DefaultHoodieRecordPayload.class.getName());


The preCombine field can be omitted only when it is append mode.

…record key and primary key

hudi-bot · 2023-06-29T10:25:52Z

CI report:

d1564f4 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

SteNicholas · 2023-07-04T12:34:56Z

@danny0405, could you please take a look?

codope

@SteNicholas Some comments did not look like resolved. Can you please respond to Danny's comments? Also, please rebase after addressing the comments. We have some test fixes that went in last one month.

codope · 2023-07-31T09:36:06Z

hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java

+   */
+  public static void checkPreCombineField(Configuration conf, List<String> columnNames) {
+    String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
+    if (!columnNames.contains(preCombineField)) {


@SteNicholas This comment does not look like resolved. Can you please revisit?

codope · 2023-08-05T05:25:43Z

Closing it in favor of #9370. That PR handles relaxing precombine and #8107 already hanles for primary key.

SteNicholas force-pushed the HUDI-5329 branch from 2d163e6 to 5e05a1c Compare June 12, 2023 03:37

danny0405 reviewed Jun 12, 2023

View reviewed changes

SteNicholas requested a review from danny0405 June 12, 2023 05:08

yihua added engine:spark Spark integration writer-core engine:flink Flink integration labels Jun 15, 2023

SteNicholas force-pushed the HUDI-5329 branch from 5e05a1c to 9ab390d Compare June 16, 2023 07:05

SteNicholas changed the title ~~[HUDI-5329] Spark reads table error when Flink creates table without precombine.field~~ [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key Jun 16, 2023

danny0405 reviewed Jun 16, 2023

View reviewed changes

codope added priority:high Significant impact; potential bugs release-0.14.0 labels Jun 28, 2023

SteNicholas added 2 commits June 28, 2023 20:55

[HUDI-5329] Spark reads table error when Flink creates table without …

3ea6cb3

…record key and primary key

[HUDI-5329] Spark reads table error when Flink creates table without …

d1564f4

…record key and primary key

SteNicholas force-pushed the HUDI-5329 branch from 9ab390d to d1564f4 Compare June 29, 2023 06:03

SteNicholas requested a review from danny0405 July 4, 2023 05:23

nsivabalan added priority:blocker Production down; release blocker and removed priority:high Significant impact; potential bugs labels Jul 5, 2023

codope reviewed Jul 31, 2023

View reviewed changes

codope assigned danny0405 and codope Aug 4, 2023

codope closed this Aug 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-5329] Spark reads table error when Flink creates table without record key and primary key#8933

[HUDI-5329] Spark reads table error when Flink creates table without record key and primary key#8933
SteNicholas wants to merge 2 commits intoapache:masterfrom
SteNicholas:HUDI-5329

SteNicholas commented Jun 12, 2023

Uh oh!

danny0405 Jun 12, 2023

Uh oh!

SteNicholas Jun 12, 2023 •

edited

Loading

Uh oh!

danny0405 Jun 12, 2023

Uh oh!

codope Jul 31, 2023

Uh oh!

SteNicholas commented Jun 16, 2023

Uh oh!

danny0405 Jun 16, 2023

Uh oh!

danny0405 Jun 16, 2023

Uh oh!

hudi-bot commented Jun 29, 2023

Uh oh!

SteNicholas commented Jul 4, 2023

Uh oh!

codope left a comment •

edited

Loading

Uh oh!

codope Jul 31, 2023

Uh oh!

codope commented Aug 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

SteNicholas commented Jun 12, 2023

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

danny0405 Jun 12, 2023

Choose a reason for hiding this comment

Uh oh!

SteNicholas Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danny0405 Jun 12, 2023

Choose a reason for hiding this comment

Uh oh!

codope Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

SteNicholas commented Jun 16, 2023

Uh oh!

danny0405 Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

danny0405 Jun 16, 2023

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Jun 29, 2023

CI report:

Uh oh!

SteNicholas commented Jul 4, 2023

Uh oh!

codope left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codope Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

codope commented Aug 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SteNicholas Jun 12, 2023 •

edited

Loading

codope left a comment •

edited

Loading