Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config Connector HTTP Error 413 (Request Entity Too Large) #288

Closed
rnaveiras opened this issue Oct 9, 2020 · 16 comments
Closed

Config Connector HTTP Error 413 (Request Entity Too Large) #288

rnaveiras opened this issue Oct 9, 2020 · 16 comments
Labels
bug Something isn't working

Comments

@rnaveiras
Copy link

rnaveiras commented Oct 9, 2020

Describe the bug
After upgrading the config connector from 1.20.1 to 1.24.0, is not able to create BigQueryDatasets

ConfigConnector Version
Version 1.24.0

To Reproduce
Create a new BigQueryDatasets with the above YAML, and the config connector fails the reconciliation loop with different errors. The BigQueryDataSet is never created.

---
apiVersion: bigquery.cnrm.cloud.google.com/v1beta1
kind: BigQueryDataset
metadata:
  annotations:
    cnrm.cloud.google.com/delete-contents-on-destroy: "false"
  name: foo
spec:
  access:
  - role: OWNER
    specialGroup: projectOwners
  - groupByEmail: team@example.com
    role: READER
  - role: WRITER
    userByEmail: service-account-email@project-id.iam.gserviceaccount.com
  friendlyName: foo
  location: EU

When you describe the kubernetes resource you get this message as part of the status.conditions

"message": "Update call failed: error fetching live state: error reading underlying resource: summary: Error when reading or editing BigQueryDataset \"projects/project-id/datasets/foo\": Get \"https://bigquery.googleapis.com/bigquery/v2/projects/project-iddatasets/foo?alt=json\": net/http: invalid header field value \"Terraform/ (+https://www.terraform.io) Terraform-Plugin-SDK/2.0.3 terraform-provider-google-beta/dev \\x00      \\x00....

Or the following message in other occurrences:

Events:
  Type     Reason        Age                   From                        Message
  ----     ------        ----                  ----                        -------
  Warning  UpdateFailed  6m28s (x65 over 17h)  bigquerydataset-controller  Update call failed: error fetching live state: error reading underlying resource: summary: Error when reading or editing BigQueryDataset "projects/project-id/datasets/foo": googleapi: got HTTP response code 413 with body: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 413 (Request Entity Too Large)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7%!a(MISSING)uto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100%!p(MISSING)x no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0%/100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>413.</b> <ins>That’s an error.</ins>
  <p>Your client issued a request that was too large.
 <script>
  (function() { /*

 Copyright The Closure Library Authors.
 SPDX-License-Identifier: Apache-2.0
*/
var c=function(a,d,b){a=a+"=deleted; path="+d;null!=b&&(a+="; domain="+b);document.cookie=a+"; expires=Thu, 01 Jan 1970 00:00:00 GMT"};var g=function(a){var d=e,b=location.hostname;c(d,a,null);c(d,a,b);for(var f=0;;){f=b.indexOf(".",f+1);if(0>f)break;c(d,a,b.substring(f+1))}};var h;if(4E3<unescape(encodeURI(document.cookie)).length){for(var k=document.cookie.split(";"),l=[],m=0;m<k.length;m++){var n=k[m].match(/^\s*([^=]+)/);n&&l.push(n[1])}for(var p=0;p<l.length;p++){var e=l[p];g("/");for(var q=location.pathname,r=0;;){r=q.indexOf("/",r+1);if(0>r)break;var t=q.substring(0,r);g(t);g(t+"/")}"/"!=q.charAt(q.length-1)&&(g(q),g(q+"/"))}h=!0}else h=!1;
h&&setTimeout(function(){if(history.replaceState){var a=location.href;history.replaceState(null,"","/");location.replace(a)}},1E3); })();

</script>
 <ins>That’s all we know.</ins>
, detail:
@rnaveiras rnaveiras added the bug Something isn't working label Oct 9, 2020
@jcanseco
Copy link
Member

jcanseco commented Oct 9, 2020

Hi @rnaveiras. I was unable to reproduce this. Are you seeing this on other resources too, or just for BigQueryDataset? Also, are you able to create any BigQueryDataset resources at all? (e.g. are you able to create our basic sample or does this one also fail with similar errors?)

We'll continue investigating the problem in the meantime and let you know if we find anything.

@lawrencejones
Copy link

Hey @jcanseco, I work on the same team as Raúl.

Something just caught my eye, in a terraform run on an unrelated Google project, using pure terraform (no Config Connector here), I just got a similar error:

Error: Error when reading or editing Resource “project \“gc-lab\“” with IAM Member: Role “roles/container.clusterViewer” Member “group:x@gocardless.com”: Error retrieving IAM policy for project “gc-lab-1eb1": Post “https://cloudresourcemanager.googleapis.com/v1/projects/gc-lab:getIamPolicy?alt=json&prettyPrint=false“: net/http: invalid header field value “google-api-go-client/0.5 Terraform/0.13.2 (+https://www.terraform.io) Terraform-Plugin-SDK/2.0.3 terraform-provider-google/3.42.0    \x00    ” for key User-Agent

That resolved itself on retry.

This is starting to look like a transient error on the GCP resource manager APIs end, rather than something to do with Config Connector. I assume you guys are using the terraform provider code to implement your reconcile loop anyways? So our terraform and Config Connector codepaths are probably triggering the same issue.

@jcanseco
Copy link
Member

Thanks @lawrencejones, that is helpful information.

I assume you guys are using the terraform provider code to implement your reconcile loop anyways? So our terraform and Config Connector codepaths are probably triggering the same issue.

Yes we are, and yes that sounds about right. If this error is due to a transient issue on the GCP API side (which it seems like it might be), we would want to investigate how to mitigate it especially if it is causing issues on KCC side.

@rnaveiras How often does this error occur for you, and does it disappear eventually after a few reconciliations?

@rnaveiras
Copy link
Author

rnaveiras commented Oct 14, 2020

It happens at least for 10 hours on each reconcile loop until I revert the upgraded. After reverting to the previous version everything starts working and we're able to create the bigquery dataset.

@jcanseco
Copy link
Member

Thanks @rnaveiras. So to confirm, you weren't able to successfully create any BigQueryDatasets? Even those with different configurations?

Also, were you able to create any other kind of resource at all (e.g. PubSubTopic), or was it just BigQueryDataset that was encountering issues?

As of now, we are still unable to reproduce the issue on our side. We will keep investigating.

@rnaveiras
Copy link
Author

Hi @jcanseco did a quick test and it seems this is working with this version but I found another problem with BigQueryDataSet.spec.access. I will open a new issue for that and I will close this one.

I think that @lawrencejones was on point and it might be a transient error on the GCP resource manager API, but normal never seem them go for so long.

@jcanseco
Copy link
Member

Gotcha, thanks @rnaveiras. I agree that this is quite strange behavior. I'm glad that you're no longer facing the issue anyhow. Please let us know if you face the issue again.

@jcanseco
Copy link
Member

Hi @rnaveiras, we've identified an issue on KCC 1.24.0 that is leading to the HTTP Error 413s. This is not a transient GCP API issue.

We're working on a fix and we'll let you know when that's out.

@jcanseco jcanseco reopened this Oct 16, 2020
@jcanseco
Copy link
Member

If downgrading is an option, our recommendation is to downgrade to KCC 1.23.0. However, note that we don't support in-place downgrades; you'd have to uninstall KCC and then install an older KCC to do a downgrade, which may not be an option for you since this would mean abandoning all your existing resources.

If downgrading is not an option, a workaround for when the issue starts occurring is to restart the cnrm-controller-manager pod. You can do this via:

kubectl delete pod -n cnrm-system cnrm-controller-manager-0

@jcanseco jcanseco changed the title Config connector is not creating bigquery dataset after upgrade Config Connector HTTP Error 413 Oct 16, 2020
@jcanseco jcanseco changed the title Config Connector HTTP Error 413 Config Connector HTTP Error 413 (Request Entity Too Large) Oct 16, 2020
@tonybenchsci
Copy link

Thanks @jcanseco . We're also seeing this issue today.

@jcanseco
Copy link
Member

@tonybenchsci thanks, we're treating this as a high priority issue so we're aiming to get a fix out sooner rather than later. Were you able to confirm if the workaround worked for you?

@kibbles-n-bytes
Copy link
Contributor

We have a fix out for this in the just-released 1.26.0 version. Please upgrade and let us know if you continue to have any issues with 413 error requests.

@ryanbenchsci
Copy link

@jcanseco The workaround did work for us in the interim. We upgraded to 1.26.0 a few days ago and the issue appears to be fixed. Thank you!

@errordeveloper
Copy link

Last time I bumped into this issue, I wasn't sure if this may only apply to BigQueryDataset, we are seeing this with 1.24.0 and ComputeNetwork objects in particular. I will upgrade ASAP, glad to there there was a fix.

@errordeveloper
Copy link

@jcanseco should this issue be closed now?

@jcanseco
Copy link
Member

jcanseco commented Nov 4, 2020

Yes thank you @errordeveloper, marking it closed now.

@jcanseco jcanseco closed this as completed Nov 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants