[HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements by lokeshj1703 · Pull Request #12716 · apache/hudi

lokeshj1703 · 2025-01-27T18:05:59Z

Change Logs

Spark SQL UPDATE and DELETE do not write record positions to the log files. The PR aims to add the metadata in log files so that it can be used for faster merge operations.

Impact

Spark SQL UPDATE and DELETE would write record positions to the log files now and it is useful for merging records.

Risk level (write none, low medium or high below)

low

Documentation Update

NA

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

… SQL UPDATE and DELETE statements

nsivabalan

I don't see any changes in the MOR snapshot read relation. I was expecting some change there to return the right value for the row position.

is the patch not yet fully ready ?
I mean, I am not asking about tests, etc, but just from source code standpoint.

nsivabalan · 2025-01-27T22:10:35Z

...rk-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieCreateRecordUtils.scala

-      val fileId = FSUtils.getFileId(fileName.get)
-      Some(new HoodieRecordLocation(instantTime.get, fileId))
+    val recordPosition: Option[Long] = if (fetchRecordLocationFromMetaFields) {
+      // TODO(yihua): fix


to be fixed

yihua · 2025-01-27T22:33:33Z

...e/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/UpdateHoodieTableCommand.scala

+    attributeRefs = attributeRefs :+ AttributeReference(SparkAdapterSupport.sparkAdapter.getTemporaryRowIndexColumnName(), LongType, nullable = true)()
+
+    val schema = AvroSchemaUtils.projectSchema(
+      convertToAvroSchema(catalogTable.tableSchema, catalogTable.tableName),


Let's see whether we need the full table schema or not, i.e., whether getting targetAttributes or non-updated columns are enough.

hudi-bot · 2025-01-28T20:10:10Z

CI report:

4fd3789 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

yihua and others added 2 commits January 27, 2025 15:14

[HUDI-8553] Support writing record positions to log blocks from Spark…

e43cef0

… SQL UPDATE and DELETE statements

Fix update table logical plan and compilation with older spark versions

ae40dd5

github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 27, 2025

nsivabalan reviewed Jan 27, 2025

View reviewed changes

yihua reviewed Jan 27, 2025

View reviewed changes

Other fixes

d5aa5fe

github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Jan 28, 2025

lokeshj1703 added 2 commits January 28, 2025 23:03

Fix compilation

7b6c99e

Fix compilation

4fd3789

lokeshj1703 closed this Jun 17, 2025

hudi-bot mentioned this pull request Dec 9, 2025

Spark SQL UPDATE and DELETE should write record positions #17309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements#12716

[HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements#12716
lokeshj1703 wants to merge 5 commits intoapache:masterfrom
lokeshj1703:HUDI-8553-sql-update-delete-positions

lokeshj1703 commented Jan 27, 2025

Uh oh!

nsivabalan left a comment

Uh oh!

nsivabalan Jan 27, 2025

Uh oh!

yihua Jan 27, 2025

Uh oh!

hudi-bot commented Jan 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

lokeshj1703 commented Jan 27, 2025

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

nsivabalan Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

yihua Jan 27, 2025

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Jan 28, 2025

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants