[HUDI-6686] - Handling empty commits after s3 applyFilter api by lokesh-lingarajan-0310 · Pull Request #9433 · apache/hudi

lokesh-lingarajan-0310 · 2023-08-11T20:13:29Z

Change Logs

Handling empty commit and returning current batch's endpoint to handle scenarios of customer configuring filters for specific objects in s3 among other objects.

Impact

Medium

Risk level (write none, low medium or high below)

Medium

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

amrishlal · 2023-08-11T22:23:35Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java

    LOG.info("Processed batch size: " + row.get(row.fieldIndex(CUMULATIVE_COLUMN_NAME)) + " bytes");
    sourceData.unpersist();
-    return Pair.of(new CloudObjectIncrCheckpoint(row.getString(0), row.getString(1)), collectedRows);
+    return Pair.of(new CloudObjectIncrCheckpoint(row.getString(0), row.getString(1)), Option.of(collectedRows));


Minor: Would suggest using row.fieldIndex just like what you have for CUMULATIVE_COLUMN_NAME1

Also, please open a ticket for optimizations that we were discussing offline (Use of applyOrdering in line 182 and commit_key temporary column)

https://issues.apache.org/jira/browse/HUDI-6687

hudi-bot · 2023-08-12T00:03:09Z

CI report:

32538d1 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

codope · 2023-08-15T06:51:54Z

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java

    // Create S3 paths
    SerializableConfiguration serializableHadoopConf = new SerializableConfiguration(sparkContext.hadoopConfiguration());
-    List<CloudObjectMetadata> cloudObjectMetadata = checkPointAndDataset.getRight()
+    List<CloudObjectMetadata> cloudObjectMetadata = checkPointAndDataset.getRight().get()


Can the Option be empty or nullable? Should we check before calling get() on Option?

we are doing that in line 166

Handling empty commit and returning current batch's endpoint to handle scenarios of customer configuring filters for specific objects in s3 among other objects. Co-authored-by: Lokesh Lingarajan <lokeshlingarajan@Lokeshs-MacBook-Pro.local>

Handling empty commits after s3 applyFilter api

32538d1

lokesh-lingarajan-0310 force-pushed the emptycommit branch from cf25b9b to 32538d1 Compare August 11, 2023 20:15

amrishlal approved these changes Aug 11, 2023

View reviewed changes

lokesh-lingarajan-0310 requested a review from amrishlal August 12, 2023 00:08

codope approved these changes Aug 15, 2023

View reviewed changes

nsivabalan added release-0.14.0 priority:blocker Production down; release blocker labels Aug 15, 2023

nsivabalan merged commit 7eef7d9 into apache:master Aug 15, 2023

hudi-bot mentioned this pull request Nov 30, 2025

S3/GCS incr job improvements #16174

Open

hudi-bot mentioned this pull request Dec 9, 2025

Handling empty commit for s3 Incr job #16173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-6686] - Handling empty commits after s3 applyFilter api#9433

[HUDI-6686] - Handling empty commits after s3 applyFilter api#9433
nsivabalan merged 1 commit intoapache:masterfrom
lokesh-lingarajan-0310:emptycommit

lokesh-lingarajan-0310 commented Aug 11, 2023 •

edited

Loading

Uh oh!

amrishlal Aug 11, 2023

Uh oh!

amrishlal Aug 11, 2023

Uh oh!

lokesh-lingarajan-0310 Aug 11, 2023

Uh oh!

hudi-bot commented Aug 12, 2023

Uh oh!

codope Aug 15, 2023

Uh oh!

lokesh-lingarajan-0310 Aug 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

lokesh-lingarajan-0310 commented Aug 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level (write none, low medium or high below)

Contributor's checklist

Uh oh!

amrishlal Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

amrishlal Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

lokesh-lingarajan-0310 Aug 11, 2023

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Aug 12, 2023

CI report:

Uh oh!

codope Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

lokesh-lingarajan-0310 Aug 15, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

lokesh-lingarajan-0310 commented Aug 11, 2023 •

edited

Loading