Skip to content

better support for multiple parallelism in SegmentWriter and SegmentUploader default impl  #7433

@walterddr

Description

@walterddr

In many use cases such as Spark / Flink ingestion, multiple instances of workers will be instantiated to upload segments in parallel

Currently the default SegmentWriter and SegmentUploader cannot support this easily: One needs to generate slightly different TableConfig and BatchIngestionConfig in order to modify the tmp directory name, segment name and other usages. Otherwise, spawning multiples instances within the same host will cause File write conflict.

Propose to: create a method to easily set parallelism index for SegmentWriter and Uploader; such as adding APIs to modify the parallelism index directly instead of via configuration changes.

CC @npawar, @xiangfu0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions