Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigtable: Retry GC policy operations with a longer poll interval #6627

Merged
merged 2 commits into from Oct 4, 2022

Conversation

kevinsi4508
Copy link
Contributor

@kevinsi4508 kevinsi4508 commented Sep 29, 2022

A followup of PR/6581, use the utility function to retry GC policy operations with a longer poll interval.
With this, we hope our users will have less chance of running out of the API quota.

See the internal bug for details: b/247584824.

If this PR is for Terraform, I acknowledge that I have:

  • Searched through the issue tracker for an open issue that this either resolves or contributes to, commented on it to claim it, and written "fixes {url}" or "part of {url}" in this PR description. If there were no relevant open issues, I opened one and commented that I would like to work on it (not necessary for very small changes).
  • Generated Terraform, and ran make test and make lint to ensure it passes unit and linter tests.
  • Ensured that all new fields I added that can be set by a user appear in at least one example (for generated resources) or third_party test (for handwritten resources or update tests).
  • Ran relevant acceptance tests (If the acceptance tests do not yet pass or you are unable to run them, please let your reviewer know).
  • Read the Release Notes Guide before writing my release note below.

Release Note Template for Downstream PRs (will be copied)

bigtable: retry GC policy operations with a longer poll interval

@modular-magician
Copy link
Collaborator

Hello! I am a robot who works on Magic Modules PRs.

I've detected that you're a community contributor. @rileykarson, a repository maintainer, has been assigned to assist you and help review your changes.

❓ First time contributing? Click here for more details

Your assigned reviewer will help review your code by:

  • Ensuring it's backwards compatible, covers common error cases, etc.
  • Summarizing the change into a user-facing changelog note.
  • Passes tests, either our "VCR" suite, a set of presubmit tests, or with manual test runs.

You can help make sure that review is quick by running local tests and ensuring they're passing in between each push you make to your PR's branch. Also, try to leave a comment with each push you make, as pushes generally don't generate emails.

If your reviewer doesn't get back to you within a week after your most recent change, please feel free to leave a comment on the issue asking them to take a look! In the absence of a dedicated review dashboard most maintainers manage their pending reviews through email, and those will sometimes get lost in their inbox.


@kevinsi4508
Copy link
Contributor Author

@hoangpham95 Please also take a look.

@modular-magician
Copy link
Collaborator

Hi! I'm the modular magician. Your PR generated some diffs in downstreams - here they are.

Diff report:

Terraform GA: Diff ( 1 file changed, 14 insertions(+), 6 deletions(-))
Terraform Beta: Diff ( 1 file changed, 14 insertions(+), 6 deletions(-))
TF Validator: Diff ( 2 files changed, 3 insertions(+), 3 deletions(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 2193
Passed tests 1940
Skipped tests: 240
Failed tests: 13

Action taken

Triggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed tests
TestAccComputeVpnTunnel_vpnTunnelBetaExample|TestAccComputeRouterInterface_basic|TestAccComputeInstance_soleTenantNodeAffinities|TestAccComputeForwardingRule_forwardingRuleExternallbExample|TestAccComputeForwardingRule_internalTcpUdpLbWithMigBackendExample|TestAccComputeGlobalForwardingRule_externalTcpProxyLbMigBackendExample|TestAccComputeForwardingRule_networkTier|TestAccComputeForwardingRule_forwardingRuleRegionalHttpXlbExample|TestAccClouddeployDeliveryPipeline_DeliveryPipeline|TestAccCGCSnippet_eventarcWorkflowsExample|TestAccSqlDatabaseInstance_mysqlMajorVersionUpgrade|TestAccComputeFirewallPolicyRule_update|TestAccComputeFirewallPolicy_update

@modular-magician
Copy link
Collaborator

Tests passed during RECORDING mode:
TestAccComputeVpnTunnel_vpnTunnelBetaExample[Debug log]
TestAccComputeRouterInterface_basic[Debug log]
TestAccComputeForwardingRule_forwardingRuleExternallbExample[Debug log]
TestAccComputeForwardingRule_internalTcpUdpLbWithMigBackendExample[Debug log]
TestAccComputeGlobalForwardingRule_externalTcpProxyLbMigBackendExample[Debug log]
TestAccComputeForwardingRule_networkTier[Debug log]
TestAccComputeForwardingRule_forwardingRuleRegionalHttpXlbExample[Debug log]
TestAccClouddeployDeliveryPipeline_DeliveryPipeline[Debug log]
TestAccSqlDatabaseInstance_mysqlMajorVersionUpgrade[Debug log]
TestAccComputeFirewallPolicyRule_update[Debug log]
TestAccComputeFirewallPolicy_update[Debug log]

Tests failed during RECORDING mode:
TestAccComputeInstance_soleTenantNodeAffinities[Error message] [Debug log]
TestAccCGCSnippet_eventarcWorkflowsExample[Error message] [Debug log]

Please fix these to complete your PR
View the build log or the debug log for each test

@kevinsi4508
Copy link
Contributor Author

Please take look when you get a chance. Thanks!

@rileykarson
Copy link
Member

This is a pretty substantial change to the request behaviour in the resource, and one that will make Terraform feel a lot more sluggish, so I want to make sure I understand the motivation behind the change.

  • Are there reports from users indicating they've run out of quota that we're addressing here, or is it preemptive?
  • Has a similar precedent of fixed 30s retries been set in another client like gcloud, or the Console?
  • Add retry util func with polling interval #6581 (comment) mentions a 30s recommendation from the server. Can you share that guidance?

@kevinsi4508
Copy link
Contributor Author

kevinsi4508 commented Oct 3, 2022

This change only impacts concurrent GC writes. Creating multiple GC policies for a table at the same time can race each other. We need to retry to get them created, see b/235959128 for details. We added a retry delay in the response and that instructs our clients to wait for at least 30s because updating GC policy (modifying table) can be pretty slow: cl/472756815.

This is a pretty substantial change to the request behaviour in the resource, and one that will make Terraform feel a lot more sluggish, so I want to make sure I understand the motivation behind the change.

  • Are there reports from users indicating they've run out of quota that we're addressing here, or is it preemptive?

Yes, in b/203208187, b/243519363. KCC has some tickets as well, see b/211021264.

  • Has a similar precedent of fixed 30s retries been set in another client like gcloud, or the Console?

I don't know. What I do know is this operation is slow. 30s is what's recommended from the server side.

The guidance is from cl/472756815, our operation is slow and we ask our client to wait for at least 30s. If there is some specific guidance you are looking for, please let me know. For example, do you want me for provide details on why this operation is slow in Bigtable? Maybe I can dig something up. Thanks!

I understand that we want the resource writes to be fast, but we have to be aware of some Bigtable admin operations are slow and we can't do much about it.

Copy link
Member

@rileykarson rileykarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion on my end, otherwise LGTM

…cy.go

Co-authored-by: Riley Karson <rileykarson@google.com>
@kevinsi4508
Copy link
Contributor Author

This is a pretty substantial change to the request behaviour in the resource, and one that will make Terraform feel a lot more sluggish, so I want to make sure I understand the motivation behind the change.

  • Are there reports from users indicating they've run out of quota that we're addressing here, or is it preemptive?
  • Has a similar precedent of fixed 30s retries been set in another client like gcloud, or the Console?
  • Add retry util func with polling interval #6581 (comment) mentions a 30s recommendation from the server. Can you share that guidance?

One small suggestion on my end, otherwise LGTM

Done. Thanks Riley!

@modular-magician
Copy link
Collaborator

Hi there, I'm the Modular magician. I've sorted out the following information about your changes, here it is - !

Diff report

Terraform GA: Diff ( 1 file changed, 17 insertions(+), 6 deletions(-))
Terraform Beta: Diff ( 1 file changed, 17 insertions(+), 6 deletions(-))
TF Validator: Diff ( 2 files changed, 3 insertions(+), 3 deletions(-))

@modular-magician
Copy link
Collaborator

Tests analytics

Total tests: 2198
Passed tests 1956
Skipped tests: 240
Failed tests: 2

Action taken

Triggering VCR tests in RECORDING mode for the tests that failed during VCR. Click here to see the failed tests
TestAccComputeInstance_soleTenantNodeAffinities|TestAccSqlDatabaseInstance_mysqlMajorVersionUpgrade

@modular-magician
Copy link
Collaborator

Tests passed during RECORDING mode:
TestAccSqlDatabaseInstance_mysqlMajorVersionUpgrade[Debug log]

Tests failed during RECORDING mode:
TestAccComputeInstance_soleTenantNodeAffinities[Error message] [Debug log]

Please fix these to complete your PR
View the build log or the debug log for each test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants