Skip to content

Comments

Enable Pinot realtime table ingestion without a deep store. Segment tar/zip and upload will thus be skipped.#10400

Draft
chenboat wants to merge 1 commit intoapache:masterfrom
chenboat:no_deep_store
Draft

Enable Pinot realtime table ingestion without a deep store. Segment tar/zip and upload will thus be skipped.#10400
chenboat wants to merge 1 commit intoapache:masterfrom
chenboat:no_deep_store

Conversation

@chenboat
Copy link
Contributor

@chenboat chenboat commented Mar 9, 2023

This PR enables a realtime table ingestion using Low Level Consumer (LLC) to run without deep store configured. The steps to tar/zip a segment and then upload will be skipped. The segment download will then be done through peer servers. A new table config "segmentUploadToDeepStoreDisabled" will be added to support this ingestion mode.

Instructions:

  1. The PR has to be tagged with at least one of the following labels (*):

    1. feature
  2. Remove these instructions before publishing the PR.

(**) Use release-notes label for scenarios like:

  • New configuration options


// True if and only if the segment upload to deep store is disabled. This is usefully for Pinot cluster which does not
// have any deep store.
private boolean _segmentUploadToDeepStoreDisabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shall also disallow rebalance with downtime when this is disabled, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If there is no deep store copy, the rebalance with downtime will result in segment data loss. I will add a note about this mode of ingestion with its limitation. We can also file a PR to detect the missing segments during rebalance.

String.format("Caught exception while taring index directory from: %s to: %s", indexDir, segmentTarFile);
_segmentLogger.error(errorMessage, e);
_realtimeTableDataManager.addSegmentError(_segmentNameStr, new SegmentErrorInfo(now(), errorMessage, e));
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall this return or continue (as if deep store was disabled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if deepstore is disabled, the protocol will continue because it still needs to send metadata file info about the segment to controller for segment completion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it shall not return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior is the same as before and not changed (note the if condition is !deepStoreDisabled()). The idea then was then if segment tar/zip fails, the segment completion protocol just failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sounds good.

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Mar 13, 2023
@Jackie-Jiang
Copy link
Contributor

Currently we have multiple modes for LLC segment commit, shall we re-design the configs so that it is easier for users to understand?

  1. Push segment to controller (must succeed)
  2. Push segment to deep store (must succeed), then push metadata with download url to controller
  3. Push segment to deep store
    • If succeeds, push metadata with download url to controller
    • If fails, push metadata without download url to controller (peer download mode)
  4. Directly push metadata without download url to controller (peer download mode)

@ankitsultana
Copy link
Contributor

To add to @Jackie-Jiang 's comment, right now for split-commit we have a "commitSegmentFile" step where we do a rename and some other PinotFS operations. Can we also discuss how this patch fits in with that?

@navina
Copy link
Contributor

navina commented Mar 15, 2023

+1 to Jackie's comment above.
We just recently managed to decouple the config for push to deepstore from the peer segment download (#10136 ). It was pretty confusing to get that in. Let's try to configs around the use-cases that Jackie presented above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants