Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the disk usage for segment conversion task #7193

Merged
merged 1 commit into from Jul 24, 2021

Conversation

Jackie-Jiang
Copy link
Contributor

Description

Reduce the disk usage for segment conversion task by:

  • Eliminate the file copy
  • Delete the file once it's no longer needed

@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2021

Codecov Report

Merging #7193 (4e6476c) into master (a0ad49d) will decrease coverage by 0.01%.
The diff coverage is 75.67%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7193      +/-   ##
============================================
- Coverage     73.49%   73.48%   -0.02%     
  Complexity       92       92              
============================================
  Files          1506     1506              
  Lines         73808    73795      -13     
  Branches      10650    10655       +5     
============================================
- Hits          54243    54225      -18     
- Misses        16026    16032       +6     
+ Partials       3539     3538       -1     
Flag Coverage Δ
integration 41.85% <58.10%> (+0.05%) ⬆️
unittests 65.21% <62.16%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ion/tasks/BaseSingleSegmentConversionExecutor.java 73.77% <27.27%> (-10.85%) ⬇️
.../tasks/BaseMultipleSegmentsConversionExecutor.java 75.86% <43.75%> (-6.19%) ⬇️
...rocessing/framework/SegmentProcessorFramework.java 98.36% <95.65%> (+2.80%) ⬆️
...on/tasks/merge_rollup/MergeRollupTaskExecutor.java 88.37% <100.00%> (-0.27%) ⬇️
...egments/RealtimeToOfflineSegmentsTaskExecutor.java 94.11% <100.00%> (-0.09%) ⬇️
...nction/DistinctCountBitmapAggregationFunction.java 47.42% <0.00%> (-9.80%) ⬇️
...ntroller/helix/core/minion/TaskMetricsEmitter.java 77.27% <0.00%> (-9.10%) ⬇️
.../impl/dictionary/FloatOnHeapMutableDictionary.java 69.87% <0.00%> (-6.03%) ⬇️
...lix/core/minion/PinotHelixTaskResourceManager.java 72.00% <0.00%> (-4.67%) ⬇️
...t/processing/framework/SegmentProcessorConfig.java 95.65% <0.00%> (-4.35%) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a0ad49d...4e6476c. Read the comment docs.

@xiangfu0 xiangfu0 self-requested a review July 23, 2021 21:20
if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
segmentDir = TarGzCompressionUtils.untar(segmentDir, untarredSegmentsDir).get(0);
} else {
throw new IllegalStateException("Unsupported segment format: " + segmentDir.getAbsolutePath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to this PR.
I somehow feel we may want to have a util function to check if a file is in tar gz format.
E.g. controller directory stores segment tar gz files without extension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. One workaround would be blindly untar the file assuming it is a tar.gz file.
Ideally we should put some extra config to indicate the file type, and we can use this command to process any data files, not limited to pinot segments.

@Jackie-Jiang Jackie-Jiang merged commit 2f3774b into apache:master Jul 24, 2021
@Jackie-Jiang Jackie-Jiang deleted the reduce_file_copies branch July 24, 2021 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants