Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiered storage #5793

Merged
merged 6 commits into from
Aug 10, 2020
Merged

Tiered storage #5793

merged 6 commits into from
Aug 10, 2020

Conversation

npawar
Copy link
Contributor

@npawar npawar commented Aug 3, 2020

Description

Issue: #5553
Tiered storage support in Pinot - Phase 1.
This phase supports default tag based instance assignments only, which are not persisted in zk.

Phase 2 will introduce instanceAssignmentConfig for tiers, which will allow us to support replica groups for tiers and also let us persist the InstancePartitions

Example:
This example show how to configure segments older than 15 days move to tier_b_OFFLINE and segments older than 7 days move to tier_a_OFFLINE

{
  "tableName": "myTable",
  "tableType": ...,
  ...
  "tierConfigs": [{
    "name": "tierA",
    "segmentSelectorType": "time",
    "segmentAge": "7d",
    "storageType": "pinot_server",
    "serverTag": "tier_a_OFFLINE"
  }, {
    "name": "tierB",
    "segmentSelectorType": "TIME",
    "segmentAge": "15d",
    "storageType": "PINOT_SERVER",
    "serverTag": "tier_b_OFFLINE"
  }] 
}

Release Notes

Tiered storage phase 1 - default tag based instance assignment for tiers

@npawar npawar added release-notes Referenced by PRs that need attention when compiling the next release notes feature labels Aug 3, 2020
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done.
Most comments are minor (comments in OfflineSegmentAssignment also applies to RealtimeSegmentAssignment).
Please merge the 2 relocators into one.

@npawar
Copy link
Contributor Author

npawar commented Aug 6, 2020

Well done.
Most comments are minor (comments in OfflineSegmentAssignment also applies to RealtimeSegmentAssignment).
Please merge the 2 relocators into one.

Thanks for the review. I've merged the 2 relocators into SegmentRelocator. I introduced controller confs for controller.segment.relocator.frequency, and deprecated the confs for .controller.realtime.segment.relocator.frequency. For some releases, we'll carry both around.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. One critical comment in SegmentRelocator

@npawar
Copy link
Contributor Author

npawar commented Aug 8, 2020

LGTM otherwise. One critical comment in SegmentRelocator

Made one small change not part of the review. Converted the "segmentSelectorType" and "storageType" to enums. If you want to look again.

@Jackie-Jiang
Copy link
Contributor

Made one small change not part of the review. Converted the "segmentSelectorType" and "storageType" to enums. If you want to look again.

Since we introduced the enum, try to use enum over string for these 2 fields

*/
public class PinotServerTierStorage implements TierStorage {
private final String _type = TierStorageType.PINOT_SERVER.toString();
private final String _tag;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we support multiple tags for a tier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a good idea. I can add that for phase 2 (where I'll be handling advanced instance assignments for tiers)

/**
* Interface for the segment selection strategy of a tier
*/
public interface TierSegmentSelector {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking of adding a method like int getPriority()?
For time based tiers we can use internal age for comparison if priority is the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I'll add that when the requirements demand it.. As of now, I didn't see it being requirwd

@npawar
Copy link
Contributor Author

npawar commented Aug 9, 2020

Made one small change not part of the review. Converted the "segmentSelectorType" and "storageType" to enums. If you want to look again.

Since we introduced the enum, try to use enum over string for these 2 fields

The reason I kept it string is that if we want people to plug in their own strategies, they can do so without needing to add it to the enum

@Jackie-Jiang
Copy link
Contributor

Made one small change not part of the review. Converted the "segmentSelectorType" and "storageType" to enums. If you want to look again.

Since we introduced the enum, try to use enum over string for these 2 fields

The reason I kept it string is that if we want people to plug in their own strategies, they can do so without needing to add it to the enum

In that case, IMO we should not introduce the enum

@npawar
Copy link
Contributor Author

npawar commented Aug 10, 2020

Made one small change not part of the review. Converted the "segmentSelectorType" and "storageType" to enums. If you want to look again.

Since we introduced the enum, try to use enum over string for these 2 fields

The reason I kept it string is that if we want people to plug in their own strategies, they can do so without needing to add it to the enum

In that case, IMO we should not introduce the enum

Hmm, agreed. Reverted enum change.

@npawar npawar merged commit 9f23e18 into apache:master Aug 10, 2020
@npawar npawar deleted the tiered_storage branch August 10, 2020 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants