-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Description
BigQueryIO Writer is slow / fail if the input source is bounded.
EDIT: Input BQ: 294 GB, 741,896,827 events
If the input source is bounded (GCS / BQ select / ...), BigQueryIO Writer use the "Method.FILE_LOADS" instead of streaming inserts.
Large amounts of input datas result in a java.lang.OutOfMemoryError / Java heap space (500 millions rows).
!PrepareWrite.BatchLoads.png|thumbnail!
We cannot use "Method.STREAMING_INSERTS" or control the batchs sizes since
withMaxFilesPerBundle is private :(
Someone reported a similar problem with GCS -> BQ on Stackoverflow:
Why is writing to BigQuery from a Dataflow/Beam pipeline slow?
Imported from Jira BEAM-2840. Original Jira may contain additional context.
Reported by: vspiewak.
Reactions are currently unavailable