Skip to content

[Feature Request]: Differentiate between batchSize and minBatchSize in JDBCIO Write #29955

@scott-strong

Description

@scott-strong

What would you like to happen?

Prior to release 2.37.0, setting the batchSize indicated that a batch would consist of at least that many records if executed prior to the window duration. After the release of 2.37.0, batchSize instead became the minimum batch size, meaning that if the elements in the given bundle did not exceed the set batchSize, the elements would be executed immediately rather than collected to be executed once the batchSize was reached.

This has caused issues in pipelines where more frequent writes are causing more deadlocks/lock wait timeouts. The problem then gets exacerbated as backlog grows and workers are added.

I'm proposing that we have a way to at least differentiate between these two batching strategies to allow for batches to build up rather than execute with a very small number of elements.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions