Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud Run service fails to apply update #832

Closed
3 tasks done
knowhoper opened this issue Jun 23, 2023 · 9 comments
Closed
3 tasks done

Cloud Run service fails to apply update #832

knowhoper opened this issue Jun 23, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@knowhoper
Copy link

Checklist

Bug Description

I am seeing the error

{"name":"xxx-api","namespace":"xxx-api"},"namespace":"xxx-api","name":"xxx-api","reconcileID":"feb918c5-8c38-4e40-b0c6-e4f080b2660b","error":"Update call failed: error generating the diffs from desired state: \"Location\" must be set"}

When updating the CloudRun service. Below is the resource YAML. Noting deleting and re-creating works.

apiVersion: run.cnrm.cloud.google.com/v1beta1
kind: RunService
metadata:
  name: xxx-api
  namespace: xxx-api
  annotations:
    argocd.argoproj.io/sync-wave: "20"
    cnrm.cloud.google.com/project-id: acme-uat
spec:
  ingress: "INGRESS_TRAFFIC_ALL"
  launchStage: "GA"
  location: australia-southeast1
  projectRef:
    external: projects/acme-uat
  template:
    containerConcurrency:  80
    scaling:
      minInstanceCount: 1
      maxInstanceCount: 2
    revision: xxx-api-v1-4-44-uatj
    serviceAccountRef:
      external: "serviceAccount:svc-xxx-api@acme-uat.iam.gserviceaccount.com"
    containers:
      - env:
          - name: default_badge_limit
            value: "6"
          - name: bucket_id
            value: "acme-public-images-uat"
        image: "australia-southeast1-docker.pkg.dev/acme-dev-tooling/acme-docker/xxx-api:v1.4.44-uat"
        ports:
          - name: http1
            containerPort: 5000
        resources:
          limits:
            cpu: 1000m
            memory: 1Gi
    serviceAccountRef:
      external: svc-xxx-api@acme-uat.iam.gserviceaccount.com
    vpcAccess:
      connectorRef:
        external: projects/acme-uat/locations/australia-southeast1/connectors/acme-svpc
      egress: PRIVATE_RANGES_ONLY
  traffic:
    - percent: 100
      type: "TRAFFIC_TARGET_ALLOCATION_TYPE_REVISION"
      revision: xxx-api-v1-4-44-uatj


Additional Diagnostic Information

None

Kubernetes Cluster Version

1.25.8-gke.1000

Config Connector Version

1.105.0

Config Connector Mode

cluster mode

Log Output

{"severity":"info","timestamp":"2023-06-23T08:59:13.837Z","logger":"runservice-controller","msg":"starting reconcile","resource":{"namespace":"xxx-api","name":"xxx-api"}}
{"severity":"error","timestamp":"2023-06-23T08:59:13.917Z","msg":"Reconciler error","controller":"runservice-controller","controllerGroup":"run.cnrm.cloud.google.com","controllerKind":"RunService","RunService":{"name":"xxx-api","namespace":"xxx-api"},"namespace":"xxx-api","name":"xxx-api","reconcileID":"f405ed57-31a1-4aab-a1f0-f1e4977dec34","error":"Update call failed: error generating the diffs from desired state: "Location" must be set"}

Steps to reproduce the issue

Create CloudRun service. Add a revision.

YAML snippets

apiVersion: run.cnrm.cloud.google.com/v1beta1
kind: RunService
metadata:
  name: xxx-api
  namespace: xxx-api
  annotations:
    argocd.argoproj.io/sync-wave: "20"
    cnrm.cloud.google.com/project-id: acme-uat
spec:
  ingress: "INGRESS_TRAFFIC_ALL"
  launchStage: "GA"
  location: australia-southeast1
  projectRef:
    external: projects/acme-uat
  template:
    containerConcurrency:  80
    scaling:
      minInstanceCount: 1
      maxInstanceCount: 2
    revision: xxx-api-v1-4-44-uatj
    serviceAccountRef:
      external: "serviceAccount:svc-xxx-api@acme-uat.iam.gserviceaccount.com"
    containers:
      - env:
          - name: default_badge_limit
            value: "6"
          - name: bucket_id
            value: "acme-public-images-uat"
        image: "australia-southeast1-docker.pkg.dev/acme-dev-tooling/acme-docker/xxx-api:v1.4.44-uat"
        ports:
          - name: http1
            containerPort: 5000
        resources:
          limits:
            cpu: 1000m
            memory: 1Gi
    serviceAccountRef:
      external: svc-xxx-api@acme-uat.iam.gserviceaccount.com
  traffic:
    - percent: 100
      type: "TRAFFIC_TARGET_ALLOCATION_TYPE_REVISION"
      revision: xxx-api-v1-4-44-uatj
@knowhoper knowhoper added the bug Something isn't working label Jun 23, 2023
@diviner524
Copy link
Collaborator

@knowhoper thanks for reporting this issue! We saw other users report similar errors as well and we are looking into it.

@knowhoper
Copy link
Author

knowhoper commented Jul 3, 2023

Hi @diviner524, thanks for the response. Any ETA on a fix? This is preventing us deploying some workloads.

We have found a workaround by adding the cnrm.cloud.google.com/deletion-policy: abandon annotation to the CR service, then forcing a recreate of the service in GKE. This appears to clear the issue but it's less than ideal.

@tsallou
Copy link

tsallou commented Jul 6, 2023

I have the same issue using the CloudSchedulerJob ressource : Update call failed: error generating the diffs from desired state: "Location" must be set

@knowhoper
Copy link
Author

@tsallou the way we work around this - we use ArgoCD to manage GKE resources BTW, was to set the annotation cnrm.cloud.google.com/deletion-policy: abandon in our Cloud Run services, so they are never removed from GCP. Then delete the CR crd from the relevant namespace in GKE, removing finalisers if necessary.

@jpeterson-bestbuy
Copy link

I have the same issue using the CloudSchedulerJob ressource : Update call failed: error generating the diffs from desired state: "Location" must be set

We also observed this issue with CloudSchedulerJob. Abandon/recreate worked to get the resource back into a healthy state. @diviner524 -- any idea on when we can expect a resolution?

@davireis
Copy link

I am also observing this with a skaffold + config-connector setup. The deletion-policy workaround works as long as I delete the CR by hand before running skaffold. If I run skaffold without the manual delete, the Location problems shows up and never goes away. Then I am left with another workaround: creating new clusters. But when I try to delete the old clusters, I get

ERROR: (gcloud.anthos.config.controller.delete) Operation https://krmapihosting.googleapis.com/v1/projects/trash-362115/locations/us-central1/operations/operation-1696069559461-60690f79b2821-90f34dd1-a106ef60 has not finished in 1800 seconds. The operations may still be underway remotely and may still succeed; use gcloud list and describe commands or https://console.developers.google.com/ to check resource state.

And now my bill is going through the roof. Would love to see things more robust here as skaffold + config connector is a great experience when it works. Let me know if I can help somehow.

@diviner524
Copy link
Collaborator

We just released CloudRun as a stable CRD in the latest 1.110.0 release, with a few more bug fixes on this resource. Please give it a try and see if this location issue persists.

@jpeterson-bestbuy
Copy link

Thank you @diviner524 -- I can confirm that we're no longer affected by this issue.

@diviner524
Copy link
Collaborator

@jpeterson-bestbuy Thank you for letting us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants