C9DiskResize keeps failing #77

aws-ps-hobson · 2021-02-08T23:30:08Z

Failed to create resource. An error occurred (Unavailable) when calling the ModifyVolume operation (reached max retries: 4): The service is unavailable. Please try again shortly.

awsimaya · 2021-04-06T01:03:13Z

Did this error get resolved at all?

edwio · 2021-04-27T05:18:25Z

Still a recurring problem, the lambda times out after reaching the maximum retires. as this will be different for each account

rafaelpereyra · 2021-04-27T12:25:13Z

What region are you deploying the Cloud9 Stack? Can you check the CloudWatch logs for that lambda function and post it?

edwio · 2021-05-14T09:48:12Z

eu-west-1, here is the error in the log of the lambda C9DiskResizeLambda function:

{
    "timestamp": "2021-04-26 19:54:45,192",
    "level": "DEBUG",
    "location": "crhelper.utils._send_response:19",
    "RequestType": "Create",
    "StackId": "arn:aws:cloudformation:eu-west-1:443682937418:stack/C9-Observability-Workshop/deae7fa0-a6c8-11eb-a1d8-0ad741ae72c5",
    "RequestId": "c6b84ba4-5c78-45f5-9b42-92c27540ab77",
    "LogicalResourceId": "C9DiskResize",
    "aws_request_id": "d373d7c2-aa10-4434-8e8a-d2b49d0e741e",
    "message": {
        "Status": "FAILED",
        "PhysicalResourceId": "C9-Observability-Workshop_C9DiskResize_NPBB7K5V",
        "StackId": "arn:aws:cloudformation:eu-west-1:443682937418:stack/C9-Observability-Workshop/deae7fa0-a6c8-11eb-a1d8-0ad741ae72c5",
        "RequestId": "c6b84ba4-5c78-45f5-9b42-92c27540ab77",
        "LogicalResourceId": "C9DiskResize",
        "Reason": "An error occurred (Unavailable) when calling the ModifyVolume operation (reached max retries: 4): The service is unavailable. Please try again shortly.",
        "Data": {}
    }

Also,
Manual option for deploying the lab instead of Cloud 9 isn't working correctly, seems that there is some פrerequisites, like adding permissions to S3, and envsetup.sh script is failing as the following commands are not installed:

pip
npm
git

rafaelpereyra · 2021-05-14T13:12:19Z

Hello, looks like the C9 instance is not ready for the automation to execute.

Can you manually create a Cloud9 Instance and attach the Instance role to it via the AWS Console?

Regarding your second comment, the script is designed to run in Cloud9 were all those applications are already installed (pip, npm, git). Are you running the script from your local machine?

edwio · 2021-05-14T14:47:31Z

I tried your suggestion, and manually created Cloud9, everything seems to be working, until I ran the last command, in the Deploy the stack section, I'm getting an error when running the command: 'cdk deploy Applications --require-approval never':

Received response status [FAILED] from custom resource. Message returned: Error: b'serviceaccount/petsite-sa created\nservice/service-petsite created\ndeployment.apps/petsite-deployment created\nError from server (InternalError): error when creating "/tmp/manifest.yaml": Internal error occurred: failed calling webhook "mtargetgroupbinding.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"\n' Logs: /aws/lambda/Applications-ApplicationsMyCluster-Handler886CB40B-P7J44HT23PXJ at invokeUserFunction (/var/task/framework.js:95:19) at process._tickCallback (internal/process/next_tick.js:68:7) (RequestId: 65c72463-1979-428a-aa9f-4dd4c328fb5e)

i have added the logs of /aws/lambda/Applications-ApplicationsMyCluster-Handler886CB40B-YWFJYW1ENTN3:

[ERROR] Exception: b'Error from server (AlreadyExists): error when creating "/tmp/manifest.yaml": serviceaccounts "petsite-sa" already exists\nError from server (Invalid): error when creating "/tmp/manifest.yaml": Service "service-petsite" is invalid: spec.ports[0].nodePort: Invalid value: 30300: provided port is already allocated\nError from server (AlreadyExists): error when creating "/tmp/manifest.yaml": deployments.apps "petsite-deployment" already exists\nError from server (InternalError): error when creating "/tmp/manifest.yaml": Internal error occurred: failed calling webhook "mtargetgroupbinding.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-elbv2-k8s-aws-v1beta1-targetgroupbinding?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"\n'
Traceback (most recent call last):
  File "/var/task/index.py", line 14, in handler
    return apply_handler(event, context)
  File "/var/task/apply/__init__.py", line 60, in apply_handler
    kubectl('create', manifest_file, *kubectl_opts)
  File "/var/task/apply/__init__.py", line 87, in kubectl
    raise Exception(output)

rafaelpereyra · 2021-05-14T16:01:05Z

Can you check the pods running in your cluster? Looks like the Helm chart for the AWS Load Balancer controller was not deployed properly (webhook is not available).

edwio · 2021-05-15T18:41:27Z

Seems to be running on fine ECS side:

Clusters:

Task Definitions:

But I don't see any pods running in the EKS:

Am I missing something?

rafaelpereyra · 2021-05-17T13:18:49Z

By default, even if the role you're using is Admin of the account your won't have enough permissions in the Kubernetes RBAC to see that dashboard (hence the message).

We added some instruction to add your role to the RBAC in order to get you access to EKS Console here.

You should however be allowed to list the pods using kubectl from the Cloud9 environment. Can you do that please and check if the AWS Load balancer is running?

edwio · 2021-05-18T06:04:28Z

How do I found my value for CONSOLE_ROLE_ARN=<Enter your Role ARN>?

Regards the EC2 Load Balancer, they are in active state:

rafaelpereyra · 2021-05-18T13:40:08Z

That is the ARN of the role you use to connect to the AWS Console.

Load balancers are created by CDK Services Stack, but inside your EKS Cluster there is component deployed (AWS Load Balancer) that is failing according to the log message you sent.

edwio · 2021-05-18T18:55:27Z

How can I fix that (AWS Load Balancer)?

Further more,
What is the difference between envsetup.sh and envsetup_ee.sh which one I need to run when using CD9 manually?

rafaelpereyra · 2021-05-18T19:00:06Z

We need to see the reason why is it failing to help you with that.

For the bash script, just follow the instruction here:

https://observability.workshop.aws/en/installation/not_using_ee/_deploy_app.html#install-tools-and-clone-the-repository

The second script (_ee) is used for Event Engine.

edwio · 2021-05-19T07:02:52Z

Where can I see the data why it is failing?

rafaelpereyra · 2021-05-19T10:44:37Z

You'll see the logs in the EKS cluster using kubectl tool.

Let's try a different approach, can you tear down that environment completely and start from scratch? Please just create the C9 Environment first from Cloudshell as explained here. Please copy and paste the whole list of commands

curl -O https://raw.githubusercontent.com/aws-samples/one-observability-demo/main/cloud9-cfn.yaml

aws cloudformation create-stack --stack-name C9-Observability-Workshop --template-body file://cloud9-cfn.yaml --capabilities CAPABILITY_NAMED_IAM

aws cloudformation wait stack-create-complete --stack-name C9-Observability-Workshop

echo -e "Cloud9 Instance is Ready!!\n\n"

Are there any SCP applied to your account or is this a personal account? Does your role have Admin access to the environment? (looking at reasons why C9 launch / resize would have failed in the first place).

edwio · 2021-05-19T17:39:12Z

found the problem,
the pod that running the aws load balancer, is pulling the image from an ecr in us-west-2 region, which is a region, not accessible for us. due to our organization policy (eu- only).

running kubectl describe pods, against other pods.
I can see that all other pods images, are being pulled from ecr in the eu- region.

how come aws load balancer is being pulled from different region?

also,
is it possible to edit the yaml file, and to specifies an ecr in the eu- region for the aws load balancer?

rafaelpereyra · 2021-05-19T18:00:18Z

We're installing AWS Load Balancer in CDK using the project Helm chart here.

The default image is configured here.

The current policy in your organization is preventing from pulling cross-region so I'll suggest you to change the Helm Chart default value in your local CDK file (see link in the first paragraph) to include an ECR image path that is allowed inside your organization with the value image.repository.

The image is not available in eu-west-1 so you'll probably will need to pull it from us-west-2 and push it into your ECR.

edwio · 2021-06-28T10:30:22Z

@rafaelpereyra when trying to edit the image key, specified in the deployment of the aws-load-balancer,

From:

602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.1.3

To:

602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.1.3

By using the following command :

kubectl edit deployment servicesawsloadbalancercontroller2049c530-aws-load-balancer-con -n kube-system

I'm getting the following error: kubectl Edit cancelled, no changes made

rafaelpereyra · 2021-07-01T14:45:36Z

Hello,

Looks like an issue with the editor you're using, maybe not saving the changes. Please use this instead:

kubectl set image deployment/servicesawsloadbalancercontrollerXXXXXX-aws-load-balancer-con -n kube-system aws-load-balancer-controller=602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.2.1

engrun · 2021-10-18T10:33:40Z

Related to original issue with C9DiskResize

Stumbled upon this today
The resolution, in my case, was to temporarily set default-ebs-encryption to false in the EC2 console.

(This was set to true on my account)

awsimaya closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C9DiskResize keeps failing #77

C9DiskResize keeps failing #77

aws-ps-hobson commented Feb 8, 2021

awsimaya commented Apr 6, 2021

edwio commented Apr 27, 2021

rafaelpereyra commented Apr 27, 2021

edwio commented May 14, 2021 •

edited

Loading

rafaelpereyra commented May 14, 2021

edwio commented May 14, 2021 •

edited

Loading

rafaelpereyra commented May 14, 2021

edwio commented May 15, 2021 •

edited

Loading

rafaelpereyra commented May 17, 2021

edwio commented May 18, 2021 •

edited

Loading

rafaelpereyra commented May 18, 2021

edwio commented May 18, 2021

rafaelpereyra commented May 18, 2021

edwio commented May 19, 2021

rafaelpereyra commented May 19, 2021

edwio commented May 19, 2021 •

edited

Loading

rafaelpereyra commented May 19, 2021

edwio commented Jun 28, 2021

rafaelpereyra commented Jul 1, 2021

engrun commented Oct 18, 2021 •

edited

Loading

C9DiskResize keeps failing #77

C9DiskResize keeps failing #77

Comments

aws-ps-hobson commented Feb 8, 2021

Failed to create resource. An error occurred (Unavailable) when calling the ModifyVolume operation (reached max retries: 4): The service is unavailable. Please try again shortly.

awsimaya commented Apr 6, 2021

edwio commented Apr 27, 2021

rafaelpereyra commented Apr 27, 2021

edwio commented May 14, 2021 • edited Loading

rafaelpereyra commented May 14, 2021

edwio commented May 14, 2021 • edited Loading

rafaelpereyra commented May 14, 2021

edwio commented May 15, 2021 • edited Loading

rafaelpereyra commented May 17, 2021

edwio commented May 18, 2021 • edited Loading

rafaelpereyra commented May 18, 2021

edwio commented May 18, 2021

rafaelpereyra commented May 18, 2021

edwio commented May 19, 2021

rafaelpereyra commented May 19, 2021

edwio commented May 19, 2021 • edited Loading

rafaelpereyra commented May 19, 2021

edwio commented Jun 28, 2021

rafaelpereyra commented Jul 1, 2021

engrun commented Oct 18, 2021 • edited Loading

edwio commented May 14, 2021 •

edited

Loading

edwio commented May 14, 2021 •

edited

Loading

edwio commented May 15, 2021 •

edited

Loading

edwio commented May 18, 2021 •

edited

Loading

edwio commented May 19, 2021 •

edited

Loading

engrun commented Oct 18, 2021 •

edited

Loading