Explicitly fail or alert users when batch generating segments with non-compliant data (unsorted with sortkey, partitioning)

While using the Spark segment generation, we've had a configuration with configurations such as a `sortedColumn`. Initially it was assumed that the data will be sorted by the segment generation jobs and we missed that it needed to be sorted upstream.

This is documented on the sort index section (https://docs.pinot.apache.org/basics/indexing/forward-index).
> For offline push, input data needs to be sorted before running Pinot segment conversion and push job.

It may be helpful to add this information to batch ingestion pages, such as https://docs.pinot.apache.org/basics/data-import/batch-ingestion.

Additionally, it's unclear if the same will happen if users specify a partition scheme on a table but do not correctly partition the input data. (Searching "partition" on the docs yielded no mentions about this).

This is an easy thing to miss, and while pre-processing jobs especially help (https://github.com/apache/incubator-pinot/issues/4353) it would be good to prevent the mistake in the first place with invariants and actionable errors.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly fail or alert users when batch generating segments with non-compliant data (unsorted with sortkey, partitioning) #6126

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Explicitly fail or alert users when batch generating segments with non-compliant data (unsorted with sortkey, partitioning) #6126

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions