Skip to content

BigQueryIO multi-partitioned write doesn't work for streaming writes #18992

@kennknowles

Description

@kennknowles

BigQueryIO performes multi-partitioned write (MultiPartitionsWriteTables step) when there's more data than the quota allowed by BigQuery (10k files or 11TB of data) to be written to a single BQ table.

 

When writing using load jobs in streaming mode (with a triggering frequency) we hit following location where we set CREATE_DISPOSITION to CREATE_NEVER for all panes other than the first one. This is fine when we are writing a single partition (all panes of a window should write to the same table) but when there are multiple partitions this is incorrect since we need to create temp tables for all panes.

https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L165

 

Imported from Jira BEAM-5216. Original Jira may contain additional context.
Reported by: chamikara.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions