New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the disk usage for segment conversion task #7193
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7193 +/- ##
============================================
- Coverage 73.49% 73.48% -0.02%
Complexity 92 92
============================================
Files 1506 1506
Lines 73808 73795 -13
Branches 10650 10655 +5
============================================
- Hits 54243 54225 -18
- Misses 16026 16032 +6
+ Partials 3539 3538 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) { | ||
segmentDir = TarGzCompressionUtils.untar(segmentDir, untarredSegmentsDir).get(0); | ||
} else { | ||
throw new IllegalStateException("Unsupported segment format: " + segmentDir.getAbsolutePath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to this PR.
I somehow feel we may want to have a util function to check if a file is in tar gz format.
E.g. controller directory stores segment tar gz files without extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. One workaround would be blindly untar the file assuming it is a tar.gz file.
Ideally we should put some extra config to indicate the file type, and we can use this command to process any data files, not limited to pinot segments.
Description
Reduce the disk usage for segment conversion task by: