[Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding() #22951

nbali · 2022-08-30T01:35:50Z

What happened?

BigQueryIO.Write.FILE_LOADS.withAutoSharding() uses GroupIntoBatches.withShardedKey(), which uses 'workerUuid' and 'threadId' as the sharding key. According to my understanding the problem is that the Kafka consumer read in KafkaIO.Read for a single partition most likely happens without any parallelism on the same worker on the same thread as it's being read in a FIFO manner due to the offset. This essentially means that .withShardedKey() has no effect whatsoever.

Although there is a 'FILE_TRIGGERING_BATCHING_DURATION' with '1s' duration, and a 'FILE_TRIGGERING_RECORD_COUNT' with '500000' count - and both triggers grouping, it still means if we are under 500k elements, and under 1s it will try to fire them at once. It is totally possible - with sufficiently high throughput, or 'outputWithTimestamp' that we have 500k elements in a single sec). This could result in OOME.

We should also have a size limit, not only time and count.

beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java

Lines 638 to 651 in a60105a

    
           Duration maxBufferingDuration = 
        
               options.getMaxBufferingDurationMilliSec() > 0 
        
                   ? Duration.millis(options.getMaxBufferingDurationMilliSec()) 
        
                   : FILE_TRIGGERING_BATCHING_DURATION; 
        
           // In contrast to fixed sharding with user trigger, here we use a global window with default 
        
           // trigger and rely on GroupIntoBatches transform to group, batch and at the same time 
        
           // parallelize properly. We also ensure that the files are written if a threshold number of 
        
           // records are ready. Dynamic sharding is achieved via the withShardedKey() option provided by 
        
           // GroupIntoBatches. 
        
           return input 
        
               .apply( 
        
                   GroupIntoBatches.<DestinationT, ElementT>ofSize(FILE_TRIGGERING_RECORD_COUNT) 
        
                       .withMaxBufferingDuration(maxBufferingDuration) 
        
                       .withShardedKey())

Issue Priority

Priority: 2

Issue Component

Component: io-java-gcp

nbali · 2022-08-30T01:39:06Z

.take-issue

…rk layer

…chesTest

…chesTranslationTest

… introduced test in GroupIntoBatchesTest

nbali · 2022-09-12T13:05:32Z

Note to self: the same happens with https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java

…in unit tests

* Fix for #22951 * Fix Dataflow GroupIntoBatchesOverride to match updated GroupIntoBatches size limit implementation. Co-authored-by: Balázs Németh <nbali@users.noreply.github.com>

* Fix for apache#22951 * Fix Dataflow GroupIntoBatchesOverride to match updated GroupIntoBatches size limit implementation. Co-authored-by: Balázs Németh <nbali@users.noreply.github.com>

nbali added awaiting triage bug labels Aug 30, 2022

github-actions bot added gcp io java P2 labels Aug 30, 2022

github-actions bot removed the awaiting triage label Aug 30, 2022

github-actions bot assigned nbali Aug 30, 2022

nbali added a commit to nbali/beam that referenced this issue Aug 30, 2022

Fix for apache#22951

2eacbd4

nbali mentioned this issue Aug 30, 2022

Fix for #22951 #22953

Closed

2 tasks

nbali added a commit to nbali/beam that referenced this issue Sep 6, 2022

Fix for apache#22951 - PR CR - using the same chunk size as the netwo…

bcd4ba9

…rk layer

nbali added a commit to nbali/beam that referenced this issue Sep 6, 2022

Fix for apache#22951 - PR CR - added new test cases into GroupIntoBat…

040b744

…chesTest

nbali added a commit to nbali/beam that referenced this issue Sep 7, 2022

Fix for apache#22951 - PR CR - added new test cases into GroupIntoBat…

b1b732c

…chesTranslationTest

nbali added a commit to nbali/beam that referenced this issue Sep 7, 2022

Fix for apache#22951 - PR CR - guaranteed element order for the newly…

70a92b6

… introduced test in GroupIntoBatchesTest

nbali added a commit to nbali/beam that referenced this issue Nov 15, 2022

apache#22951 Adding byte size limit to WriteFiles transform as well

f00c85d

nbali added a commit to nbali/beam that referenced this issue Nov 16, 2022

apache#22951 Enforcing the new expectations for the 'byteSize' limit …

340f99d

…in unit tests

nbali added a commit to nbali/beam that referenced this issue Nov 16, 2022

apache#22951 Enforcing the new expectations for the 'byteSize' limit …

5840e55

…in unit tests

lukecwik pushed a commit to lukecwik/incubator-beam that referenced this issue Dec 1, 2022

Fix for apache#22951

4e615ae

lukecwik mentioned this issue Dec 1, 2022

Fix for #22951 w/ GroupIntoBatchesOverride fix #24463

Merged

4 tasks

lukecwik closed this as completed in #24463 Dec 2, 2022

github-actions bot added this to the 2.45.0 Release milestone Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding() #22951

[Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding() #22951

nbali commented Aug 30, 2022 •

edited

nbali commented Aug 30, 2022

nbali commented Sep 12, 2022

[Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding() #22951

[Bug]: KafkaIO could fail with BigQueryIO.Write.withAutoSharding() #22951

Comments

nbali commented Aug 30, 2022 • edited

What happened?

Issue Priority

Issue Component

nbali commented Aug 30, 2022

nbali commented Sep 12, 2022

nbali commented Aug 30, 2022 •

edited