Rack update stuck #422

nickfishman · 2022-02-23T05:02:10Z

(Apologies in advance if this isn't the right place to report this. Please let me know if there's a better place!)

This evening I tried to update a v3 rack running a fairly old version (3.0.38). I first ran convox rack update 3.0.54 to bring it to the latest version running k8s 1.17. This ran quickly and succeeded without issues.

I then ran convox rack update 3.2.5 (last version before k8s 1.19). Unfortunately, this update has been stuck for several hours with no new updates. Here are the last lines from the terraform run that show up in the https://console.convox.com logs:

module.system.module.rack.module.api.data.aws_iam_policy_document.assume_api: Refreshing state...
module.system.module.rack.module.router.module.nginx.kubernetes_config_map.nginx-configuration: Refreshing state... [id=convoxprod-system/nginx-configuration]
module.system.module.rack.module.router.module.nginx.kubernetes_config_map.tcp-services: Refreshing state... [id=convoxprod-system/tcp-services]
module.system.module.rack.module.router.module.nginx.kubernetes_config_map.udp-services: Refreshing state... [id=convoxprod-system/udp-services]
module.system.module.rack.module.router.module.nginx.kubernetes_horizontal_pod_autoscaler.router: Refreshing state... [id=convoxprod-system/nginx]
module.system.module.rack.module.router.module.nginx.kubernetes_cluster_role_binding.ingress-nginx: Refreshing state... [id=ingress-nginx]
module.system.module.rack.module.router.module.nginx.kubernetes_deployment.ingress-nginx: Refreshing state... [id=convoxprod-system/ingress-nginx]

It's been like this for several hours. It looks like the rack is in a stuck state as well:

$ cx rack params
ERROR: state is locked for rack: <rackname>

After several hours, I tried a variety of approaches to try to unstick the update (including killing the underlying EC2 instances) but none have been successful.

How can I cancel and retry this update? Is there something I can do to ensure the update succeeds next time?

For reference, I am able to run kubectl and run various k8s commands according to the docs here: https://docs.convox.com/management/direct-k8s-access/. I also have access to the EKS cluster info through the AWS web console (I followed the instructions at https://community.convox.com/t/resolved-how-can-i-get-permission-to-access-the-eks-cluster-from-the-aws-console/828/2).

The text was updated successfully, but these errors were encountered:

nickfishman · 2022-02-23T05:29:09Z

If it helps, the full URL to the rack is https://console.convox.com/organizations/583f0d00-02a5-41e4-badf-1815b7623eda/racks#cea49617-08ca-4afe-a6b1-96635fbfaca7

heronrs · 2022-02-23T14:53:25Z

Hello Nick, can you confirm in the EKS UI if all node groups and the cluster itself are with status Active and also what's the current k8s version it's displaying?

Next time feel free to use our forum https://community.convox.com/

nickfishman · 2022-02-23T17:38:02Z

@heronrs Thanks for the quick reply.

The EKS dashboard shows all node groups show as Ready, all workloads are status green, and the k8s version is 1.17.

I checked the update log in the Convox Console and it's still showing the same state, stuck on that last line:

module.system.module.rack.module.router.module.nginx.kubernetes_deployment.ingress-nginx: Refreshing state... [id=convoxprod-system/ingress-nginx]

heronrs · 2022-02-24T13:29:00Z

@nickfishman thanks for all the information. the lock error you see is at the console level so I removed the lock and you should be able to try the update again.
Meanwhile, we'll investigate what might have happened, although I can't give you an ETA sadly.
I'm closing this for now but if you still experience problems, feel free to create a thread in our forum https://community.convox.com/

heronrs closed this as completed Feb 24, 2022

heronrs self-assigned this Feb 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rack update stuck #422

Rack update stuck #422

nickfishman commented Feb 23, 2022

nickfishman commented Feb 23, 2022

heronrs commented Feb 23, 2022

nickfishman commented Feb 23, 2022

heronrs commented Feb 24, 2022

Rack update stuck #422

Rack update stuck #422

Comments

nickfishman commented Feb 23, 2022

nickfishman commented Feb 23, 2022

heronrs commented Feb 23, 2022

nickfishman commented Feb 23, 2022

heronrs commented Feb 24, 2022