Skip to content

Push parallelism isn't implemented in Hadoop batch ingestion #6505

@kkrugler

Description

@kkrugler

In HadoopSegmentTarPushJobRunner.run(), this is the code at the very end of the method:

    int pushParallelism = _spec.getPushJobSpec().getPushParallelism();
    if (pushParallelism < 1) {
      pushParallelism = segmentsToPush.size();
    }
    // Push from driver
    try {
      SegmentPushUtils.pushSegments(_spec, outputDirFS, segmentsToPush);
    } catch (RetriableOperationException | AttemptsExceededException e) {
      throw new RuntimeException(e);
    }

So it doesn't actually use pushParallelism, and SegmentPushUtils.pushSegments() does a single-threaded (sequential) push.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions