[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker #12403

robinyqiu · 2020-07-29T05:31:34Z

FYI: this change is very similar to #8973

Post-Commit Tests Status (on master branch)

Lang	Dataflow	Samza	Twister2
Go	---	---	---
Java
Python		---	---
XLang	---	---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

robinyqiu · 2020-07-30T21:01:09Z

Run Python PreCommit

chamikaramj

Thanks!

chamikaramj · 2020-08-03T17:17:22Z

.../worker/src/main/java/org/apache/beam/runners/dataflow/worker/BatchModeExecutionContext.java

+          container.tryGetCounter(
+              MetricName.named(
+                  BIGQUERY_STREAMING_INSERT_THROTTLE_TIME_NAMESPACE,
+                  BIGQUERY_STREAMING_INSERT_THROTTLE_TIME_NAME));


Can we use the same name as above ("cumulativeThrottlingSeconds") and move it to a constant (and also do ms to sec conversion when setting) ?

+1
It would be great if we could use the same constant for all three use cases.

Here we can use seconds, but on the streaming side msec is needed. That's the reason why I kept msec.

For consistency, we can change all counters to use msec originally, and do msec to sec conversion here. WDYT?

Hi @chamikaramj @ihji ! I have made the change such that BQ, GCS, and Datastore all report throttled time in milliseconds at the beginning, and they now share the common counter name for consistency. The millisecond to second conversion is done only when later a throttled time in seconds is expected by worker side code. PTAL

chamikaramj · 2020-08-03T17:18:49Z

...va/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java

-          && THROTTLING_MSECS_METRIC_NAME.getName().equals(structuredName.getName())) {
+      if ((THROTTLING_MSECS_METRIC_NAME.getNamespace().equals(structuredName.getOriginNamespace())
+              && THROTTLING_MSECS_METRIC_NAME.getName().equals(structuredName.getName()))
+          || (BIGQUERY_STREAMING_INSERT_THROTTLE_TIME_NAMESPACE.equals(


Is there a reason why we needed to use a unique name for BQ but not for GCS or Datastore ?

Yes. GCS and Datastore counters are only consumed by batch worker (the THROTTLING_MSECS_METRIC_NAME counter here is a separate counter; I am not sure what this is. Maybe all throttling metrics should go to this counter? @ihji I saw you have a JIRA about it, not sure if this what you want to do).

Here in the streaming case, precision is on millisecond (whereas GCS and DataStore only store seconds)

chamikaramj · 2020-08-03T17:21:38Z

...e-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java

@@ -867,6 +872,7 @@ public void deleteDataset(String projectId, String datasetId)
        }
        try {
          sleeper.sleep(nextBackoffMillis);
+          throttlingMilliSeconds.inc(nextBackoffMillis);


This is for failures. Probably you need to increment the counter for backoff1 for rate limit errors above.

cc: @ihji

The retried failures here are transient failures, which I believe include throttling. I have thought about incrementing backoff1 but that is executed in a future (a parallel thread). If we accumulate counters over all threads then I think we will over calculate the number. So I add the counter here in the main thread.

Throttling results in rate limit errors, right ? If so that would be captured by backoff1 I think. Prob. Heejong can confirm.

My major concern on accumulating backoff1 is that we may over calculate, because we will be adding time being throttled on all threads.

Yes, we should use backoff1. Rate limit errors only reach to this point after 2 minutes of backoffs by backoff1 silently inside of the future. Why do you think it's over-calculated? Each thread is doing its own insert job and it doesn't look strange to me to calculate the total throttling time by adding all backoff times from parallel threads.

Yes, we should use backoff1. Rate limit errors only reach to this point after 2 minutes of backoffs by backoff1 silently inside of the future.

I see. That make sense to me.

Why do you think it's over-calculated?

Because I am not sure how this metrics is used downstream. I vaguely remember Dataflow autoscaling will use this number divided by the total time spent on work item to yield a fraction which signals throttling. If the total time is not accumulating time spent on all threads then we may over-calculate.

Yeah, I think that's a valid concern. We probably need to figure out the time requests are throttled without including backoff due to other errors. Is there a way to get throttled time from all parallel threads and just use the maximum ?

Is there a way to get throttled time from all parallel threads and just use the maximum?

Yes I think this is the right thing to do. Made the change already. PTAL. WDYT, @ihji?

aaltay · 2020-08-13T19:56:18Z

@robinyqiu - What is the next step on this PR?

robinyqiu · 2020-08-13T20:32:21Z

@robinyqiu - What is the next step on this PR?

I thought I replied to the comments but I didn't actually send it out. Thanks for the ping.

chamikaramj · 2020-08-20T17:13:30Z

...e-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java

@@ -424,6 +427,9 @@ public Job getJob(JobReference jobRef, Sleeper sleeper, BackOff backoff)
    private final PipelineOptions options;
    private final long maxRowsPerBatch;
    private final long maxRowBatchSize;
+    // aggregate the total time spent in exponential backoff


Please consider adding this to Python as well (in a separate PR).

cc: @pabloem

Will do in a new PR.

chamikaramj

LGTM. Thanks.

chamikaramj · 2020-08-20T17:13:50Z

Retest this please

robinyqiu · 2020-08-21T00:22:55Z

Run Java PreCommit

… in Java SDK

robinyqiu · 2020-08-21T01:01:48Z

Thank you for the review! I factored out the Python side change and will move it to a followup PR, for easier testing and importing.

robinyqiu · 2020-08-21T01:02:13Z

I will merge after the tests are green. Thank you both!

… in Java SDK (apache#12403)

probot-autolabeler bot added dataflow gcp io java runners python labels Jul 29, 2020

robinyqiu changed the title ~~[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker in Java SDK~~ [BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker Jul 29, 2020

aaltay requested a review from chamikaramj July 29, 2020 20:49

robinyqiu force-pushed the bq branch from d3a6b07 to 09077d0 Compare July 30, 2020 19:58

chamikaramj reviewed Aug 3, 2020

View reviewed changes

probot-autolabeler bot added extensions google-cloud-platform-core labels Aug 17, 2020

robinyqiu force-pushed the bq branch from f596b43 to 9dc9157 Compare August 17, 2020 23:20

robinyqiu requested a review from chamikaramj August 17, 2020 23:33

robinyqiu force-pushed the bq branch from 9dc9157 to 8196150 Compare August 17, 2020 23:54

chamikaramj reviewed Aug 20, 2020

View reviewed changes

chamikaramj approved these changes Aug 20, 2020

View reviewed changes

Propagate BigQuery streaming insert throttled time to Dataflow worker…

bffcd23

… in Java SDK

robinyqiu force-pushed the bq branch from 621a4c3 to bffcd23 Compare August 21, 2020 01:00

robinyqiu merged commit 0040865 into apache:master Aug 21, 2020

robinyqiu mentioned this pull request Aug 21, 2020

[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker in Python SDK #12663

Merged

TobKed pushed a commit to damgadbot/beam that referenced this pull request Aug 27, 2020

Propagate BigQuery streaming insert throttled time to Dataflow worker…

cc845f3

… in Java SDK (apache#12403)

ibzib pushed a commit to ibzib/beam that referenced this pull request Sep 30, 2020

Propagate BigQuery streaming insert throttled time to Dataflow worker…

9d597a1

… in Java SDK (apache#12403)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker #12403

[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker #12403

robinyqiu commented Jul 29, 2020 •

edited

robinyqiu commented Jul 30, 2020

chamikaramj left a comment

chamikaramj Aug 3, 2020

ihji Aug 4, 2020

robinyqiu Aug 13, 2020

robinyqiu Aug 17, 2020 •

edited

chamikaramj Aug 3, 2020

robinyqiu Aug 13, 2020 •

edited

chamikaramj Aug 3, 2020

robinyqiu Aug 13, 2020

chamikaramj Aug 18, 2020

robinyqiu Aug 18, 2020

ihji Aug 18, 2020

robinyqiu Aug 18, 2020

chamikaramj Aug 18, 2020

robinyqiu Aug 18, 2020

aaltay commented Aug 13, 2020

robinyqiu commented Aug 13, 2020

chamikaramj Aug 20, 2020

robinyqiu Aug 21, 2020

chamikaramj left a comment

chamikaramj commented Aug 20, 2020

robinyqiu commented Aug 21, 2020

robinyqiu commented Aug 21, 2020

robinyqiu commented Aug 21, 2020

[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker #12403

[BEAM-10597] Propagate BigQuery streaming insert throttled time to Dataflow worker #12403

Conversation

robinyqiu commented Jul 29, 2020 • edited

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

robinyqiu commented Jul 30, 2020

chamikaramj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinyqiu Aug 17, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinyqiu Aug 13, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaltay commented Aug 13, 2020

robinyqiu commented Aug 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chamikaramj left a comment

Choose a reason for hiding this comment

chamikaramj commented Aug 20, 2020

robinyqiu commented Aug 21, 2020

robinyqiu commented Aug 21, 2020

robinyqiu commented Aug 21, 2020

robinyqiu commented Jul 29, 2020 •

edited

robinyqiu Aug 17, 2020 •

edited

robinyqiu Aug 13, 2020 •

edited