[HUDI-6686] - Handling empty commits after s3 applyFilter api#9433
Merged
nsivabalan merged 1 commit intoapache:masterfrom Aug 15, 2023
Merged
[HUDI-6686] - Handling empty commits after s3 applyFilter api#9433nsivabalan merged 1 commit intoapache:masterfrom
nsivabalan merged 1 commit intoapache:masterfrom
Conversation
cf25b9b to
32538d1
Compare
amrishlal
approved these changes
Aug 11, 2023
| LOG.info("Processed batch size: " + row.get(row.fieldIndex(CUMULATIVE_COLUMN_NAME)) + " bytes"); | ||
| sourceData.unpersist(); | ||
| return Pair.of(new CloudObjectIncrCheckpoint(row.getString(0), row.getString(1)), collectedRows); | ||
| return Pair.of(new CloudObjectIncrCheckpoint(row.getString(0), row.getString(1)), Option.of(collectedRows)); |
Contributor
There was a problem hiding this comment.
Minor: Would suggest using row.fieldIndex just like what you have for CUMULATIVE_COLUMN_NAME1
Contributor
There was a problem hiding this comment.
Also, please open a ticket for optimizations that we were discussing offline (Use of applyOrdering in line 182 and commit_key temporary column)
Contributor
Author
There was a problem hiding this comment.
Collaborator
codope
approved these changes
Aug 15, 2023
| // Create S3 paths | ||
| SerializableConfiguration serializableHadoopConf = new SerializableConfiguration(sparkContext.hadoopConfiguration()); | ||
| List<CloudObjectMetadata> cloudObjectMetadata = checkPointAndDataset.getRight() | ||
| List<CloudObjectMetadata> cloudObjectMetadata = checkPointAndDataset.getRight().get() |
Member
There was a problem hiding this comment.
Can the Option be empty or nullable? Should we check before calling get() on Option?
Contributor
Author
There was a problem hiding this comment.
we are doing that in line 166
prashantwason
pushed a commit
that referenced
this pull request
Aug 18, 2023
Handling empty commit and returning current batch's endpoint to handle scenarios of customer configuring filters for specific objects in s3 among other objects. Co-authored-by: Lokesh Lingarajan <lokeshlingarajan@Lokeshs-MacBook-Pro.local>
leosanqing
pushed a commit
to leosanqing/hudi
that referenced
this pull request
Sep 13, 2023
Handling empty commit and returning current batch's endpoint to handle scenarios of customer configuring filters for specific objects in s3 among other objects. Co-authored-by: Lokesh Lingarajan <lokeshlingarajan@Lokeshs-MacBook-Pro.local>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Logs
Handling empty commit and returning current batch's endpoint to handle scenarios of customer configuring filters for specific objects in s3 among other objects.
Impact
Medium
Risk level (write none, low medium or high below)
Medium
Contributor's checklist