Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow uploading a segment with a different name from HTTP header #7306

Closed
wants to merge 1 commit into from

Conversation

xiangfu0
Copy link
Contributor

Description

Allow uploading a segment with a different name from HTTP header

Sample request for batch quickstarts:

curl -X POST -H 'Pinot-Table-Name:baseballStats'  -H 'DOWNLOAD_URI:http://192.168.1.110:9000/segments/baseballStats/baseballStats_OFFLINE_0' -H 'UPLOAD_TYPE:METADATA'  -H "Pinot-Segment-Name:baseballStats_OFFLINE_1" -F baseballStats_OFFLINE_1=@examples/batch/baseballStats/segments/baseballStats_OFFLINE_0.tar.gz localhost:9000/segments

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

@codecov-commenter
Copy link

codecov-commenter commented Aug 15, 2021

Codecov Report

Merging #7306 (6970d07) into master (50884b7) will increase coverage by 0.06%.
The diff coverage is 42.85%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #7306      +/-   ##
============================================
+ Coverage     71.25%   71.31%   +0.06%     
+ Complexity     3361     3277      -84     
============================================
  Files          1503     1503              
  Lines         74007    74012       +5     
  Branches      10712    10713       +1     
============================================
+ Hits          52730    52783      +53     
+ Misses        17667    17621      -46     
+ Partials       3610     3608       -2     
Flag Coverage Δ
integration1 29.98% <42.85%> (+0.15%) ⬆️
integration2 29.33% <42.85%> (+0.11%) ⬆️
unittests1 69.01% <0.00%> (+0.01%) ⬆️
unittests2 14.65% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../org/apache/pinot/segment/spi/SegmentMetadata.java 100.00% <ø> (ø)
...egment/spi/index/metadata/SegmentMetadataImpl.java 73.25% <0.00%> (-0.87%) ⬇️
...ces/PinotSegmentUploadDownloadRestletResource.java 55.71% <60.00%> (-0.33%) ⬇️
.../org/apache/pinot/core/startree/StarTreeUtils.java 71.42% <0.00%> (-2.20%) ⬇️
...ache/pinot/common/metadata/ZKMetadataProvider.java 80.45% <0.00%> (-0.76%) ⬇️
...ion/groupby/AggregationGroupByTrimmingService.java 93.47% <0.00%> (-0.73%) ⬇️
...nction/DistinctCountBitmapAggregationFunction.java 55.15% <0.00%> (-0.52%) ⬇️
...manager/realtime/LLRealtimeSegmentDataManager.java 72.37% <0.00%> (-0.15%) ⬇️
...time/impl/dictionary/OffHeapMutableBytesStore.java 91.11% <0.00%> (ø)
...roker/requesthandler/BaseBrokerRequestHandler.java 71.50% <0.00%> (+0.21%) ⬆️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50884b7...6970d07. Read the comment docs.

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a check to make sure that realtime and offline segment names correspond to the format that is required by these tables? Otherwise things can go wrong

@@ -235,7 +237,10 @@ private SuccessResponse uploadSegment(@Nullable String tableName, TableType tabl
SegmentMetadata segmentMetadata = getSegmentMetadata(tempDecryptedFile, tempSegmentDir, metadataProviderClass);

// Fetch segment name
String segmentName = segmentMetadata.getName();
if (!StringUtils.isEmpty(segmentName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to replace the segment name in the metadata file and rewrite the tar file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to override the segment name when registering the segment in a table. So no need to rewrite the metadata file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does segment load work fine? Used to be that the name in the segment metadata is the source of truth for segment name. Has that changed?

@kishoreg
Copy link
Member

why do we need this feature?

@xiangfu0
Copy link
Contributor Author

why do we need this feature?

This is mostly for testing/benchmarking purposes. Right now segment name is hardcoded inside segment metadata, so if I want to upload 10k segments, then I need to generate 10k data files to generate 10k segments then upload.

@xiangfu0
Copy link
Contributor Author

Can you add a check to make sure that realtime and offline segment names correspond to the format that is required by these tables? Otherwise things can go wrong

will do

@mcvsubbu
Copy link
Contributor

why do we need this feature?

This is mostly for testing/benchmarking purposes. Right now segment name is hardcoded inside segment metadata, so if I want to upload 10k segments, then I need to generate 10k data files to generate 10k segments then upload.

Another useful thing we can do with this feature:
#6302 (comment)

@mcvsubbu
Copy link
Contributor

why do we need this feature?

I think it is a useful feature to have, esp for one-time operation mentioned in
#6302 (comment)

@Jackie-Jiang
Copy link
Contributor

Allowing custom segment name from HTTP header requires one of the following 2 conditions:

  1. Update the physical stored metadata file (for deep store URL push, need to push back the segment)
  2. Deprecate SegmentMetadata.getName() and only use it when segment name is not available from the HTTP header (similar to SegmentMetadata.getTableName(). On the server side, the segment name should never be read from the metadata, but from the ZK path

@xiangfu0 xiangfu0 marked this pull request as draft August 16, 2021 21:21
@xiangfu0 xiangfu0 closed this Apr 6, 2022
@xiangfu0 xiangfu0 deleted the allow_setting_segment_name branch April 6, 2022 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants