-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-5336] Fixing log file pattern match to ignore extraneous files #7612
[HUDI-5336] Fixing log file pattern match to ignore extraneous files #7612
Conversation
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
Outdated
Show resolved
Hide resolved
@@ -134,7 +134,7 @@ public void testFailedToGetAppendStreamFromHDFSNameNode() | |||
// Opening a new Writer right now will throw IOException. The code should handle this, rollover the logfile and | |||
// return a new writer with a bumped up logVersion | |||
writer = HoodieLogFormat.newWriterBuilder().onParentPath(testPath) | |||
.withFileExtension(HoodieArchivedLogFile.ARCHIVE_EXTENSION).withFileId("commits.archive") | |||
.withFileExtension(HoodieArchivedLogFile.ARCHIVE_EXTENSION).withFileId("commits") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean the original tests do not properly construct the archived log file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.
hudi-common/src/test/java/org/apache/hudi/common/table/view/TestHoodieTableFileSystemView.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…pache#7612) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…pache#7612) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…pache#7612) Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Change Logs
GCS might create some marker files for log files created by hudi. While constructing the file groups w/ log files, we should ignore such files. Format of those marker file is "GCS_SYNCABLE_TEMPFILE" + [LOG_FILE_NAME]. Eg: "GCS_SYNCABLE_TEMPFILE.files-0000_20230104082331173001.log.10_0-52-553.1.8170c3dc-f1f0-474f-aabf-b53a474aa18d".
We did make an attempt to fix before, but looks like the file pattern is different. So, have made the pattern match more strict.
This patch fixes the Log file pattern match to ensure we ignore such extraneous files.
Impact
Queries and compaction may not fail occationally.
Risk level (write none, low medium or high below)
low.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
N/A
Contributor's checklist