Skip to content

[HUDI-5329] Spark reads table error when Flink creates table without record key and primary key#8933

Closed
SteNicholas wants to merge 2 commits intoapache:masterfrom
SteNicholas:HUDI-5329
Closed

[HUDI-5329] Spark reads table error when Flink creates table without record key and primary key#8933
SteNicholas wants to merge 2 commits intoapache:masterfrom
SteNicholas:HUDI-5329

Conversation

@SteNicholas
Copy link
Member

Change Logs

Spark reads table error when Flink creates table without precombine.field. Spark should read table successfully when Flink create table without precombine.field.

Impact

Flink HoodieCatalog and HoodieHiveCatalog checks precombine.field for createTable.

Risk level (write none, low medium or high below)

none.

Documentation Update

none.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

*/
public static void checkPreCombineField(Configuration conf, List<String> columnNames) {
String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
if (!columnNames.contains(preCombineField)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it still throw after spark pkless support: #8107

Copy link
Member Author

@SteNicholas SteNicholas Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405, it should still throw after spark pkless support #8107, because #8107 is used for auto generation of record keys not precombine field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After #8107, spark will not throw exception if there is no primary key, maybe we should just fix the primary key set up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteNicholas This comment does not look like resolved. Can you please revisit?

@SteNicholas SteNicholas requested a review from danny0405 June 12, 2023 05:08
@yihua yihua added engine:spark Spark integration writer-core engine:flink Flink integration labels Jun 15, 2023
@SteNicholas SteNicholas changed the title [HUDI-5329] Spark reads table error when Flink creates table without precombine.field [HUDI-5329] Spark reads table error when Flink creates table without record key and primary key Jun 16, 2023
@SteNicholas
Copy link
Member Author

@danny0405, I have addressed above comments. PTAL.

if (!flinkConf.contains(FlinkOptions.RECORD_KEY_FIELD)) {
if (catalogTable.getUnresolvedSchema().getPrimaryKey().isPresent()) {
final String pkColumns = String.join(",", catalogTable.getUnresolvedSchema().getPrimaryKey().get().getColumnNames());
flinkConf.setString(FlinkOptions.RECORD_KEY_FIELD, pkColumns);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we must have a primary key definition then?

if (!resolvedSchema.getColumnNames().contains(preCombineField)) {
if (OptionsResolver.isDefaultHoodieRecordPayloadClazz(conf)) {
throw new HoodieValidationException("Option '" + FlinkOptions.PRECOMBINE_FIELD.key()
+ "' is required for payload class: " + DefaultHoodieRecordPayload.class.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preCombine field can be omitted only when it is append mode.

@codope codope added priority:high Significant impact; potential bugs release-0.14.0 labels Jun 28, 2023
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@SteNicholas SteNicholas requested a review from danny0405 July 4, 2023 05:23
@SteNicholas
Copy link
Member Author

@danny0405, could you please take a look?

@nsivabalan nsivabalan added priority:blocker Production down; release blocker and removed priority:high Significant impact; potential bugs labels Jul 5, 2023
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteNicholas Some comments did not look like resolved. Can you please respond to Danny's comments? Also, please rebase after addressing the comments. We have some test fixes that went in last one month.

*/
public static void checkPreCombineField(Configuration conf, List<String> columnNames) {
String preCombineField = conf.get(FlinkOptions.PRECOMBINE_FIELD);
if (!columnNames.contains(preCombineField)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteNicholas This comment does not look like resolved. Can you please revisit?

@codope
Copy link
Member

codope commented Aug 5, 2023

Closing it in favor of #9370. That PR handles relaxing precombine and #8107 already hanles for primary key.

@codope codope closed this Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

engine:flink Flink integration engine:spark Spark integration priority:blocker Production down; release blocker release-0.14.0

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

6 participants